Re: Schema-Aware PCollections revisited

2018-03-04 Thread Jean-Baptiste Onofré
Hi Reuven,

I revive this discussion as I think it would be a great addition.

We had discussion on the fly, but I think now, as base for discussion, it would
be great to have a feature branch where we can start some sketch/impl and 
discuss.

@Reuven, did you start a PoC with what you proposed:
- SchemaCoder
- SchemaRegistry
- @FieldAccess on DoFn
- Select.fields PTransform
?

If not, I'm volunteer to start the branch and start to sketch.

Thoughts ?

Regards
JB

On 02/04/2018 08:23 PM, Reuven Lax wrote:
> Cool, let's chat about this on slack for a bit (which I realized I've been
> signed out of for some time).
> 
> Reuven
> 
> On Sun, Feb 4, 2018 at 9:21 AM, Jean-Baptiste Onofré  > wrote:
> 
> Sorry guys, I was off today. Happy to be part of the party too ;)
> 
> Regards
> JB
> 
> On 02/04/2018 06:19 PM, Reuven Lax wrote:
> > Romain, since you're interested maybe the two of us should put together 
> a
> > proposal for how to set this things (hints, schema) on PCollections? I 
> don't
> > think it'll be hard - the previous list thread on hints already agreed 
> on a
> > general approach, and we would just need to flesh it out.
> >
> > BTW in the past when I looked, Json schemas seemed to have some odd 
> limitations
> > inherited from Javascript (e.g. no distinction between integer and
> > floating-point types). Is that still true?
> >
> > Reuven
> >
> > On Sun, Feb 4, 2018 at 9:12 AM, Romain Manni-Bucau 
> 
> > >> wrote:
> >
> >
> >
> >     2018-02-04 17:53 GMT+01:00 Reuven Lax  
> >     >>:
> >
> >
> >
> >         On Sun, Feb 4, 2018 at 8:42 AM, Romain Manni-Bucau
> >         
> >> wrote:
> >
> >
> >             2018-02-04 17:37 GMT+01:00 Reuven Lax  
> >             >>:
> >
> >                 I'm not sure where proto comes from here. Proto is one 
> example
> >                 of a type that has a schema, but only one example.
> >
> >                 1. In the initial prototype I want to avoid modifying 
> the
> >                 PCollection API. So I think it's best to create a 
> special
> >                 SchemaCoder, and pass the schema into this coder. Later 
> we
> might
> >                 targeted APIs for this instead of going through a coder.
> >                 1.a I don't see what hints have to do with this? 
> >
> >
> >             Hints are a way to replace the new API and unify the way to 
> pass
> >             metadata in beam instead of adding a new custom way each 
> time.
> >
> >
> >         I don't think schema is a hint. But I hear what your saying - 
> hint
> is a
> >         type of PCollection metadata as is schema, and we should have a
> unified
> >         API for setting such metadata. 
> >
> >
> >     :), Ismael pointed me out earlier this week that "hint" had an old 
> meaning
> >     in beam. My usage is purely the one done in most EE spec (your
> "metadata" in
> >     previous answer). But guess we are aligned on the meaning now, just 
> wanted
> >     to be sure.
> >      
> >
> >          
> >
> >              
> >
> >
> >                 2. BeamSQL already has a generic record type which fits
> this use
> >                 case very well (though we might modify it). However as
> mentioned
> >                 in the doc, the user is never forced to use this generic
> record
> >                 type.
> >
> >
> >             Well yes and not. A type already exists but 1. it is very 
> strictly
> >             limited (flat/columns only which is very few of what big 
> data SQL
> >             can do) and 2. it must be aligned on the converge of 
> generic data
> >             the schema will bring (really read "aligned" as "dropped in 
> favor
> >             of" - deprecated being a smooth way to do it).
> >
> >
> >         As I said the existing class needs to be modified and extended,
> and not
> >         just for this schema us was. It was meant to represent Calcite 
> SQL
> rows,
> >         but doesn't quite even do that yet (Calcite supports nested 
> rows).
> >         However I think it's the right basis to start from.
> >
> >
> >     Agree on the state. Current impl issues I hit (additionally to the 
> nested
> >     support which would 

Board report - March '18

2018-03-04 Thread Jean-Baptiste Onofré
Hi guys,

In order to help Davor, I started the template and draft for Board Report (March
'18):

https://docs.google.com/document/d/16VZSlG24wfkFfG2Jdou0B_AG5Z4I-sD4dw3nO1F3Lj8/edit?usp=sharing

I will add more content. Feel free to do the same.

Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com