Re: [Go SDK] User Defined Coders

2019-01-16 Thread Robert Burke
I've updated the design doc with a section on schemas. Interestingly, the lack of Generics in Go ends up being very handy. No incompatibility between converting from a concrete type, and it's Schema equivalent.

Re: [Go SDK] User Defined Coders

2019-01-09 Thread Robert Bradshaw
On Tue, Jan 8, 2019 at 9:15 PM Reuven Lax wrote: > > I wonder if we could do this _only_ over the FnApi. The FnApi already does > batching I believe. What if we made schemas a fundamental part of our protos, > and had no SchemaCoder. One advantage of SchemaCoders is that they allow nesting

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Burke
Schemas allow the runner to know the structure of the data they're manipulating, so if a value is schema encoded, then the runner can manipulate it, including selection and aggregation. In essense, it allows a beam to handle the "common and boring but useful" parts of pipelines agnostic of an SDK

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Reuven Lax
I wonder if we could do this _only_ over the FnApi. The FnApi already does batching I believe. What if we made schemas a fundamental part of our protos, and had no SchemaCoder. The FnApi could then batch up a bunch of rows an encode using Arrow before sending over the wire to the harness. Of

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Kenneth Knowles
On Tue, Jan 8, 2019 at 7:44 AM Robert Bradshaw wrote: > On Tue, Jan 8, 2019 at 4:32 PM Reuven Lax wrote: > > > > Also while columnar can be a large perf win, I suspect that we currently > have lower-hanging fruit to optimize when it comes to performance. > > It's probably a bigger win for

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Tue, Jan 8, 2019 at 4:32 PM Reuven Lax wrote: > > I agree with this, but I think it's a significant rethinking of Beam that I > didn't want to couple to schemas. In addition to rethinking the API, it might > also require rethinking all of our runners. We're already marshaling (including

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Reuven Lax
I agree with this, but I think it's a significant rethinking of Beam that I didn't want to couple to schemas. In addition to rethinking the API, it might also require rethinking all of our runners. Also while columnar can be a large perf win, I suspect that we currently have lower-hanging fruit

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 12:54 AM Reuven Lax wrote: > > I looked at Apache Arrow as a potential serialization format for Row coders. > At the time it didn't seem a perfect fit - Beam's programming model is > record-at-a-time, and Arrow is optimized for large batches of records (while > Beam has

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 7:05 PM Kenneth Knowles wrote: > > On Thu, Jan 3, 2019 at 4:33 PM Reuven Lax wrote: >> >> If a user wants custom encoding for a primitive type, they can create a >> byte-array field and wrap that field with a Coder I don't think the primary use of coders is a custom

Re: [Go SDK] User Defined Coders

2019-01-07 Thread Andrew Pilloud
+1 on this. I think we are somewhat lacking on a written design for Schemas in Java. This would be really useful in driving adoption and expanding to other languages. Andrew On Mon, Jan 7, 2019 at 3:43 PM Robert Burke wrote: > Might I see the design doc (not code) for how they're supposed to

Re: [Go SDK] User Defined Coders

2019-01-07 Thread Robert Burke
Might I see the design doc (not code) for how they're supposed to look and work in Java first? I'd rather not write a document based on a speculative understanding of Schemas based on the littany of assumptions I'm making about them. On Mon, Jan 7, 2019, 2:35 PM Reuven Lax wrote: > I suggest

Re: [Go SDK] User Defined Coders

2019-01-07 Thread Reuven Lax
I suggest that we write out a design of what schemas in go would look like and how it would interact with coders. We'll then be in a much better position to decide what the right short-term path forward is. Even if we decide it makes more sense to build up the coder support first, I think this

Re: [Go SDK] User Defined Coders

2019-01-07 Thread Robert Burke
Kenn has pointed out to me that Coders are not likely going to vanish in the next while, in particular over the FnAPI, so having a coder registry does remain useful, as described by an early adopter in another thread. On Fri, Jan 4, 2019, 10:51 AM Robert Burke wrote: > I think you're right

Re: [Go SDK] User Defined Coders

2019-01-04 Thread Robert Burke
I think you're right Kenn. Reuven alluded to the difficulty in inference of what to use between AtomicType and the rest, in particular Struct. Go has the additional concerns around Pointer vs Non Pointer types which isn't a concern either Python or Java have, but has implications on pipeline

Re: [Go SDK] User Defined Coders

2019-01-04 Thread Reuven Lax
Maybe a good first step would be to write a doc explaining how this would work in the Go SDK and share with the dev list. It's possible we will decide to just implement Coders first, however that way this will be done with everyone fully understanding the design tradeoffs. Reuven On Fri, Jan 4,

Re: [Go SDK] User Defined Coders

2019-01-04 Thread Kenneth Knowles
On Thu, Jan 3, 2019 at 4:33 PM Reuven Lax wrote: > If a user wants custom encoding for a primitive type, they can create a > byte-array field and wrap that field with a Coder > This is the crux of the issue, right? Roughly, today, we've got: Schema ::= [ (fieldname, Type) ]

Re: [Go SDK] User Defined Coders

2019-01-04 Thread Robert Burke
That's an interesting idea. I must confess I don't rightly know the difference between a schema and coder, but here's what I've got with a bit of searching through memory and the mailing list. Please let me know if I'm off track. As near as I can tell, a schema, as far as Beam takes it

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Reuven Lax
On Fri, Jan 4, 2019 at 1:19 AM Robert Burke wrote: > Very interesting Reuven! > > That would be a huge readability improvement, but it would also be a > significant investment over my time budget to implement them on the Go side > correctly. I would certainly want to read your documentation

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Robert Burke
Very interesting Reuven! That would be a huge readability improvement, but it would also be a significant investment over my time budget to implement them on the Go side correctly. I would certainly want to read your documentation before going ahead. Will the Portability FnAPI have dedicated

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Reuven Lax
I looked at Apache Arrow as a potential serialization format for Row coders. At the time it didn't seem a perfect fit - Beam's programming model is record-at-a-time, and Arrow is optimized for large batches of records (while Beam has a concept of "bundles" they are completely non deterministic,

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Gleb Kanterov
Reuven, it sounds great. I see there is a similar thing to Row coders happening in Apache Arrow , and there is a similarity between Apache Arrow Flight and data exchange service in

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Reuven Lax
The biggest advantage is actually readability and usability. A secondary advantage is that it means that Go will be able to interact seamlessly with BeamSQL, which would be a big win for Go. A schema is basically a way of saying that a record has a specific set of (possibly nested, possibly

Re: [Go SDK] User Defined Coders

2019-01-03 Thread Reuven Lax
I'll make a different suggestion. There's been some chatter that schemas are a better tool than coders, and that in Beam 3.0 we should make schemas the basic semantics instead of coders. Schemas provide everything a coder provides, but also allows for far more readable code. We can't make such a

[Go SDK] User Defined Coders

2019-01-03 Thread Robert Burke
One area that the Go SDK currently lacks: is the ability for users to specify their own coders for types. I've written a proposal document, and while I'm confident about the core, there are certainly some edge