On Wed, Feb 7, 2018 at 10:33 PM, Pawel Bartoszek <[email protected] > wrote:
> Hi Raghu, > Can you provide more details about increasing allowed lateness? Even if I > do that I still need to compare event time of record with processing > time(system current time) in my ParDo? > I see. PaneInfo() associated with each element has 'Timing' enum, so we can tell if the element is late, but it does not tell how late. How about this : We can have a periodic timer firing every minute and store the scheduled time of the timer in state as the watermark time. We could compare element time to this stored time for good approximation (may require parallel stage with global window, dropping any events that 'clearly within limits' based on current time). There are probably other ways to do this with timers within existing stage. > Pawel > > On 8 February 2018 at 05:40, Raghu Angadi <[email protected]> wrote: > >> The watermark is not directly available, you essentially infer from fired >> triggers (e.g. fixed windows). I would consider some of these options : >> - Adhoc debugging : if the pipeline is close to realtime, you can just >> guess if a element will be dropped based on its timestamp and current time >> in the first stage (before first aggregation) >> - Increase allowed lateness (say to 3 days) and drop the elements >> yourself you notice are later than 1 day. >> - Place the elements into another window with larger allowed lateness >> and log very late elements in another parallel aggregation (see >> TriggerExample.java in Beam repo). >> >> On Wed, Feb 7, 2018 at 2:55 PM, Carlos Alonso <[email protected]> >> wrote: >> >>> Hi everyone!! >>> >>> I have a streaming job running with fixed windows of one hour and >>> allowed lateness of two days and the number of dropped due to lateness >>> elements is slowly, but continuously growing and I'd like to understand >>> which elements are those. >>> >>> I'd like to get the watermark from inside the job to compare it against >>> each element and write log messages with the ones that will be potentially >>> discarded.... Does that approach make any sense? I which case... How can I >>> get the watermark from inside the job? Any other ideas? >>> >>> Thanks in advance!! >>> >> >> >
