I don't think the capability matrix is updated, the Spark runner uses LateDataUtils to handle late elements - https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L300 On Fri, Sep 7, 2018 at 6:43 PM Raghu Angadi <[email protected]> wrote:
> I see. Hopefully someone with more familiarity with Spark runner will > chime in. > > On Fri, Sep 7, 2018 at 1:41 PM Vishwas Bm <[email protected]> wrote: > >> Hi, >> >> In our use case the watermark is the processing time. >> >> As per beam capability matrix ( >> https://beam.apache.org/documentation/runners/capability-matrix/) >> lateness is not supported by spark runner. But as per the output in our >> use case we are able to see late data getting emitted. >> >> So we wanted to know whether spark runner supports allowed lateness or >> not. >> >> >> Regards, >> Vishwas >> >> >> On Fri, Sep 7, 2018, 10:09 PM Raghu Angadi <[email protected]> wrote: >> >>> Lateness depends on watermark. How did you configure your KafkaIO >>> reader? Did you set custom timestamp function? By default watermark in >>> KafkaIO is set to same as processing time, in which case, your watermark >>> could be close to 13-38-37 (processing time). Note that this is in general >>> true across all the runners, though I am not aware of any subtle >>> differences in Spark runner. >>> >>> On Fri, Sep 7, 2018 at 7:03 AM rahul patwari <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> We are running a Beam program on Spark. We are using 2.5.0 Beam and >>>> SparkRunner versions. We are seeing Late data in the output emitted by >>>> Spark. As per the capability Matrix, Lateness is not supported in Spark. Is >>>> it supported now? or Are we missing something? >>>> >>>> Steps: >>>> Read from Kafka, Apply a Fixed Window of 1 Min with Lateness as 2 Min >>>> with Late firings when an element is found with Accumulating Fired Panes, >>>> GroupByKey, ParDo to display the result. >>>> >>>> Below is the output of the ParDo in which we are printing the >>>> GroupByKey result: >>>> Pane Timing : LATE >>>> Processing Time : 2018-09-07----13-38-37-7290----+0000 >>>> Element Time : 2018-09-07----13-36-59-9990----+0000 >>>> Window Start Time : 2018-09-07----13-36-00-0000----+0000 >>>> Window End Time : 2018-09-07----13-37-00-0000----+0000 >>>> Pane Index : 1 >>>> Pane NonSpeculativeIndex : 1 >>>> >>>> Regards, >>>> Rahul >>>> >>>
