Hi Eugene, In my usecase, I use GlobalWindow ( https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions ) and specify the trigger. In GLobal Window, entire data is accumulated every time the trigger fires. so that we can avoid the late data issue.
I found a JIRA issue(https://issues.apache.org/jira/browse/BEAM-3225 ) for the same issue in Beam. Today I am going to try to write similar implementation in Flink. Thanks, Nishu On Fri, Dec 8, 2017 at 12:08 AM, Eugene Kirpichov <kirpic...@google.com> wrote: > Most likely this is late data https://beam.apache.org/ > documentation/programming-guide/#watermarks-and-late-data . Try > configuring a trigger with a late data behavior that is more appropriate > for your particular use case. > > On Thu, Dec 7, 2017 at 3:03 PM Nishu <nishuta...@gmail.com> wrote: > >> Hi, >> >> I am running a Streaming pipeline with Flink runner. >> *Operator sequence* is -> Reading the JSON data, Parse JSON String to >> the Object and Group the object based on common key. I noticed that >> GroupByKey operator throws away some data in between and hence I don't get >> all the keys as output. >> >> In the below screenshot, 1001 records are read from kafka Topic , each >> record has unique ID . After grouping it returns only 857 unique IDs. >> Ideally it should return 1001 records from GroupByKey operator. >> >> >> [image: Inline image 3] >> >> Any idea, what can be the issue? Thanks in advance! >> >> -- >> Thanks & Regards, >> Nishu Tayal >> > -- Thanks & Regards, Nishu Tayal