You should be able to do this with 24-hour windows and a 30-minute
early processing time trigger. Something like

Window
    .into(FixedWindows.of(24, TimeUnit.HOURS))
    .triggering(AfterWatermark.pastEndOfWindow().withEarlyFirings(
        AfterProcessingTime.pastFirstElementInPane(). alignedTo(30,
TimeUnit.MINUTES)))
    . accumulatingFiredPanes()

Alternatively, you could window into 30 minute fixed windows, write
the snapshot, then also window into 24 hours to write the final
snapshot.

Both of these will buffer all the updates for the day. If that's too
many, you could consider using
https://beam.apache.org/blog/2017/02/13/stateful-processing.html
(either with timers, or windowing.
On Fri, Nov 9, 2018 at 8:14 AM Zdenko Hrcek <[email protected]> wrote:
>
> Greetings,
>
> I'm trying to do following:
>
> I'm streaming messages from PubSub with encoded json which contain DB fields 
> like "primary key" as well as "operation type", which can have values: 
> insert, update, delete to reflect db operations.
>
> Goal is to produce one file per day (with 30 minute updates) which will 
> contain final state of rows/messages.
> 30 minute file update/flush should reflect date state at that time (taking 
> into considerations operations).
> Lets say in first 30 minutes there is insert operation for some row which 
> will be in first flush of output file, but in next 30 minutes comes a message 
> with delete operation then in second file flush it shouldn't be in output 
> file etc.
> For the next day, new file should be created.
>
> I'm just wondering if something like this can be done in Beam (Java).
>
> I had in mind doing fixed 24h window (which I guess is not best practice) 
> which would trigger after 30 minutes and having group by key operation (by 
> primary key) and then handle operations, but it looks like triggering can't 
> be done that way, or maybe my whole approach/understanding is wrong...
>
> Anyway I would be grateful for any advice how this can be done (conceptually) 
> or if there is some code sample of similar pipeline.
>
> Best regards,
>
> Zdenko
> --
> _______________________
> http://www.the-swamp.info

Reply via email to