One solution will be to stabilize data read from redshift DB. To this end, sending your side input through a Reshuffle transform [1] should work for some runners. Robin is working on a more portable solution for supporting stable input [2].
Thanks, Cham [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L64 [2] https://lists.apache.org/thread.html/f8093ad5512a7fce668550e1f9cf0921c5d1e7ff6656c7a6c9950165@%3Cdev.beam.apache.org%3E On Mon, Jul 30, 2018 at 6:16 AM Jean-Baptiste Onofré <[email protected]> wrote: > Hi Jose, > > so basically, you create two PCollections with the same keys and then > you join/filter/flatten ? > > Regards > JB > > On 30/07/2018 15:09, Jose Bermeo wrote: > > Hi, question guys. > > > > I have to filter an unbounded collection based on data from a redshift > > DB. I cannot use a side input as redshift data could change. One way to > > do it would be to group common elements, make a query to filter each > > group, finally flatten the pipe again.Do you know if this is the best > > way to do it? and what would be the way to run the query agains > redshift?. > > > > Thaks. > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
