Thanks Max!
I'll try to explain Spark's stateful operators and how/why I used them with
UnboundedSource.
Spark has two stateful operators: *updateStateByKey* and *mapWithState*.
Since updateStateByKey is bound to output the (updated) state itself - the
CheckpointMark in our case - we're left
Just to add a comment from the Flink side and its
UnboundedSourceWrapper. We experienced the only way to guarantee
deterministic splitting of the source, was to generate the splits upon
creation of the source and then checkpoint the assignment during
runtime. When restoring from a checkpoint, the
Thanks for the update Frances.
I will ping my infra contact to move forward quickly.
Regards
JB
On 10/10/2016 07:27 PM, Frances Perry wrote:
Related to #3-5: Also, as we discussed earlier [1], there will be an
additional level of tracking in jira for deeper proposal-style
conversations to
That's a good question Robert, and I did.
First of all, an UnboundedSource is split into splits that implement a sort
of "BoundedReadFromUnboundedSource", with Restrictions on time and
(optional) number of records - this seems to fit nicely into the *SDF*
language.
Taking a look at the diagram
Hi Amit,
The transform, the way it's implemented, actually does several things at
the same time and that's why it's tricky to document it.
Redistribute.arbitrarily():
- Introduces a fusion barrier (in runners that have it), making sure that
the runner can fully parallelize processing the output
On Mon, Oct 10, 2016 at 9:21 PM Robert Bradshaw
wrote:
> On Sat, Oct 8, 2016 at 7:31 AM, Amit Sela wrote:
>
> > Hi Eugene,
>
> >
>
> > This is very interesting.
>
> > Let me see if I get this right, the "Redistribute" transformation
> assigns
Hi Amit,
thanks for the explanation.
For 4, you are right, it's slightly different from DataXchange (related
to the elements in the PCollection). I think storing the "starting
point" for a reader makes sense.
Regards
JB
On 10/10/2016 10:33 AM, Amit Sela wrote:
Inline, thanks JB!
On Mon,
Inline, thanks JB!
On Mon, Oct 10, 2016 at 9:01 AM Jean-Baptiste Onofré
wrote:
> Hi Amit,
>
>
>
> For 1., the runner is responsible of the checkpoint storage (associated
>
> with the source). It's the way for the runner to retry and know the
>
> failed bundles.
>
True, this