Ok, that makes a lot of sense. Thanks Kenneth!
On Mon, Feb 12, 2018 at 5:41 PM Kenneth Knowles <k...@google.com> wrote:
> Hi Carlos,
> You are quite correct that choosing the keys is important for work to be
> evenly distributed. The reason you need to have a KvCoder is that state is
> partitioned per key (to give natural & automatic parallelism) and window
> (to allow reclaiming expired state so you can process unbounded data with
> bounded storage, and also more parallelism). To a Beam runner, most data in
> the pipeline is "just bytes" that it cannot interpret. KvCoder is a special
> case where a runner knows the binary layout of encoded data so it can pull
> out the keys in order to shuffle data of the same key to the same place, so
> that is why it has to be a KvCoder.
> On Mon, Feb 12, 2018 at 5:52 AM, Carlos Alonso <car...@mrcalonso.com>
>> I was refactoring my solution a bit and tried to make my stateful
>> transform to work on simple case classes and I got this exception:
>> https://pastebin.com/x4xADmvL . I'd like to understand the rationale
>> behind this as I think carefully choosing the keys would be very important
>> in order for the work to be properly distributed.