Re: Data loss in BeamFlinkRunner

Eugene Kirpichov Thu, 07 Dec 2017 15:09:59 -0800

Most likely this is late data
https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data
. Try configuring a trigger with a late data behavior that is more
appropriate for your particular use case.


On Thu, Dec 7, 2017 at 3:03 PM Nishu <nishuta...@gmail.com> wrote:

> Hi,
>
> I am running a Streaming pipeline  with Flink runner.
> *Operator sequence* is -> Reading the JSON data, Parse JSON String to the
> Object and  Group the object based on common key.  I noticed that
> GroupByKey operator throws away some data in between and hence I don't get
> all the keys as output.
>
> In the below screenshot, 1001 records are read from kafka Topic , each
> record has unique ID .  After grouping it returns only 857 unique IDs.
> Ideally it should return 1001 records from GroupByKey operator.
>
>
> [image: Inline image 3]
>
> Any idea, what can be the issue? Thanks in advance!
>
> --
> Thanks & Regards,
> Nishu Tayal
>

Re: Data loss in BeamFlinkRunner

Reply via email to