Re: Key encodings for state requests

2019-11-12 Thread jincheng sun
Thanks for sharing your thoughts which give me more help to deep understanding the design of FnAPI, and It make more sense to me. Great thanks Robert ! Best, Jincheng Robert Bradshaw 于2019年11月12日周二 上午2:10写道: > On Fri, Nov 8, 2019 at 10:04 PM jincheng sun > wrote: > > > > > Let us first

Re: Key encodings for state requests

2019-11-11 Thread Robert Bradshaw
On Fri, Nov 8, 2019 at 10:04 PM jincheng sun wrote: > > > Let us first define what are "standard coders". Usually it should be the > > coders defined in the Proto. However, personally I think the coders defined > > in the Java ModelCoders [1] seems more appropriate. The reason is that for > >

Re: Key encodings for state requests

2019-11-08 Thread Robert Burke
The SDKs need to know each of the coders defined in the proto. Go and Python can't use the Java coders. Making a standard definition for the coder, adding it to the proto enum, and implementing that coder in each SDK is what makes the coders standard. In other words, the Java model coders are the

Re: Key encodings for state requests

2019-11-08 Thread jincheng sun
> Let us first define what are "standard coders". Usually it should be the coders defined in the Proto. However, personally I think the coders defined in the Java ModelCoders [1] seems more appropriate. The reason is that for a coder which has already appeared in Proto and still not added to the

Re: Key encodings for state requests

2019-11-08 Thread jincheng sun
Hi Robert Bradshaw, Thanks a lot for the explanation. Very interesting topic! Let us first define what are "standard coders". Usually it should be the coders defined in the Proto. However, personally I think the coders defined in the Java ModelCoders [1] seems more appropriate. The reason is

Re: Key encodings for state requests

2019-11-08 Thread Robert Burke
And by "I wasn't clear" I meant "I misread the options". On Fri, Nov 8, 2019, 4:14 PM Robert Burke wrote: > Reading back, I wasn't clear: the Go SDK does Option (1), putting the LP > explicitly during encoding [1] for the runner proto, and explicitly expects > LPs to contain a custom coder URN

Re: Key encodings for state requests

2019-11-08 Thread Robert Burke
Reading back, I wasn't clear: the Go SDK does Option (1), putting the LP explicitly during encoding [1] for the runner proto, and explicitly expects LPs to contain a custom coder URN on decode for execution [2]. (Modulo an old bug in Dataflow where the urn was empty) [1]

Re: Key encodings for state requests

2019-11-08 Thread Robert Bradshaw
On Fri, Nov 8, 2019 at 2:09 AM jincheng sun wrote: > > Hi, > > Sorry for my late reply. It seems the conclusion has been reached. I just > want to share my personal thoughts. > > Generally, both option 1 and 3 make sense to me. > > >> The key concept here is not "standard coder" but "coder that

Re: Key encodings for state requests

2019-11-08 Thread Maximilian Michels
Thank you for your comments. Here is the updated PR according to option (1): https://github.com/apache/beam/pull/9997 -Max On 08.11.19 11:08, jincheng sun wrote: Hi, Sorry for my late reply. It seems the conclusion has been reached. I just want to share my personal thoughts. Generally,

Re: Key encodings for state requests

2019-11-08 Thread jincheng sun
Hi, Sorry for my late reply. It seems the conclusion has been reached. I just want to share my personal thoughts. Generally, both option 1 and 3 make sense to me. >> The key concept here is not "standard coder" but "coder that the >> runner does not understand." This knowledge is only in the

Re: Key encodings for state requests

2019-11-07 Thread Maximilian Michels
While the Go SDK doesn't yet support a State API, Option 3) is what the Go SDK does for all non-standard coders (aka custom coders) anyway. For wire transfer, the Java Runner also adds a LengthPrefixCoder for the coder and its subcomponents. The problem is that this is an implicit assumption

Re: Key encodings for state requests

2019-11-07 Thread Robert Bradshaw
On Thu, Nov 7, 2019 at 6:26 AM Maximilian Michels wrote: > > Thanks for the feedback thus far. Some more comments: > > > Instead, the runner knows ahead of time that it > > will need to instantiate this coder, and should update the bundle > > processor to specify KvCoder, > > VarIntCoder> as the

Re: Key encodings for state requests

2019-11-07 Thread Robert Burke
While the Go SDK doesn't yet support a State API, Option 3) is what the Go SDK does for all non-standard coders (aka custom coders) anyway. While this means that for certain custom encodings of user types there may be the overhead of length prefixing it, it's not likely to be the most significant

Re: Key encodings for state requests

2019-11-06 Thread Robert Bradshaw
On Wed, Nov 6, 2019 at 2:55 AM Maximilian Michels wrote: > > Let me try to clarify: > > > The Coder used for State/Timers in a StatefulDoFn is pulled out of the > > input PCollection. If a Runner needs to partition by this coder, it > > should ensure the coder of this PCollection matches with the

Re: Key encodings for state requests

2019-11-06 Thread Maximilian Michels
Let me try to clarify: The Coder used for State/Timers in a StatefulDoFn is pulled out of the input PCollection. If a Runner needs to partition by this coder, it should ensure the coder of this PCollection matches with the Coder used to create the serialized bytes that are used for partitioning

Re: Key encodings for state requests

2019-11-05 Thread Kenneth Knowles
Specifically, "We have no way of telling from the Runner side, if a length prefix has been used or not." seems false. The runner has all the information since length prefix is a model coder. Didn't we agree that all coders should be self-delimiting in runner/SDK interactions, requiring

Re: Key encodings for state requests

2019-11-05 Thread Luke Cwik
+1 to what Robert said. On Tue, Nov 5, 2019 at 2:36 PM Robert Bradshaw wrote: > The Coder used for State/Timers in a StatefulDoFn is pulled out of the > input PCollection. If a Runner needs to partition by this coder, it > should ensure the coder of this PCollection matches with the Coder >

Re: Key encodings for state requests

2019-11-05 Thread Robert Bradshaw
The Coder used for State/Timers in a StatefulDoFn is pulled out of the input PCollection. If a Runner needs to partition by this coder, it should ensure the coder of this PCollection matches with the Coder used to create the serialized bytes that are used for partitioning (whether or not this is

Key encodings for state requests

2019-11-05 Thread Maximilian Michels
Hi, I wanted to get your opinion on something that I have been struggling with. It is about the coders for state requests in portable pipelines. In contrast to "classic" Beam, the Runner is not guaranteed to know which coder is used by the SDK. If the SDK happens to use a standard coder