Right, some of the BoundedSources might not even take a snapshot and read whatever the data that is available when readers are executed. This might lead to data loss if splitting occurred prior to data being updated. I'd say it's not safe to read from a bounded data source that can change dynamically. So you might want to add a Reshuffle transform before data is further processed.
Thanks, Cham On Fri, Aug 3, 2018 at 10:26 AM Robin Qiu <[email protected]> wrote: > Hello Jose, > > As far as I know, Beam currently doesn't have support for read from > Redshift that can be updated. For bounded source, Beam will read it once > and keep a snapshot of it at that time. > > Best, > Robin > > On Fri, Aug 3, 2018 at 4:43 AM Jose Bermeo <[email protected]> wrote: > >> Hi Chamikara, I've looking the stable reads from Apache Beam, and I'm not >> sure if that is what I want, even thought I make redshift input stable, it >> is a bounded PCollection, Eventually new items are added to the redshift >> table and those changes are not reflected in the Pcollection. >> On Mon, 30 Jul 2018 at 12:03, Jose Bermeo <[email protected]> wrote: >> >>> Hi JB. >>> >>> I'm not sure, I could create two PCollections, the question is how do I >>> make the PCollection from Redshift reflect the changes in the table? To >>> refrease my initial question, each element in my PCollection has a >>> foreing_key_id, I need to check if the row associated with the >>> foreing_key_id >>> in redshift is valid? issue is that my PCollection is unbound (new elements >>> with different foreing_key_id can show) and redshift table is also >>> changing. >>> >>> Regards >>> >>> On Mon, 30 Jul 2018 at 08:16, Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Hi Jose, >>>> >>>> so basically, you create two PCollections with the same keys and then >>>> you join/filter/flatten ? >>>> >>>> Regards >>>> JB >>>> >>>> On 30/07/2018 15:09, Jose Bermeo wrote: >>>> > Hi, question guys. >>>> > >>>> > I have to filter an unbounded collection based on data from a redshift >>>> > DB. I cannot use a side input as redshift data could change. One way >>>> to >>>> > do it would be to group common elements, make a query to filter each >>>> > group, finally flatten the pipe again.Do you know if this is the best >>>> > way to do it? and what would be the way to run the query agains >>>> redshift?. >>>> > >>>> > Thaks. >>>> >>>> -- >>>> Jean-Baptiste Onofré >>>> [email protected] >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>
