Ah, so in my approach you would potentially end up with a "report" (final step) which happens before the writes complete and is based on the input contents rather than what was actually written.
Thanks for that. Getting some sort of contextual output from Write results sounds like a good idea, so that we can make dependent chains. Even if it's for something as simple as pinging a monitoring tool like deadmanssnitch to ensure a regular job is completing. On 12 June 2017 at 04:42, Eugene Kirpichov <[email protected]> wrote: > This would write the data, and in parallel, also apply your combiner to > the same data and apply the other thing. Combine transform does not have > any "sequencing" effects - it is a basic aggregation transform; it's under > the hood of Count, Sum, Mean, and other aggregation transforms; it combines > a collection of values into a single value. > > The only sequencing mechanism in pipelines is data dependency (i.e. when > an output of one transform is an input of another). Since Write has no > outputs, it is currently impossible to sequence it against anything. > > It'd probably make sense to modify the Write transform to return some > PValue, rather than PDone. > > On Sun, Jun 11, 2017 at 8:36 PM Gwilym Evans <[email protected]> > wrote: > >> Would the following work though? I could be misunderstanding the >> situation: >> >> transform = p.apply(some transform) >> transform.apply(write) >> transform.apply(combine).apply(something on combined result) >> p.run() >> >> Cheers, >> Gwilym >> >> >> On 12 June 2017 at 02:36, Lukasz Cwik <[email protected]> wrote: >> >>> Unfortunately you can't Combine Writes since they return PDone (a >>> terminal node) during pipeline construction. >>> >>> On Sun, Jun 11, 2017 at 3:23 PM, Gwilym Evans < >>> [email protected]> wrote: >>> >>>> I'm not 100% sure as I haven't tried it, but, Combining comes to mind >>>> as a possible way of doing this, assuming your data is finite >>>> >>>> https://beam.apache.org/documentation/programming- >>>> guide/#transforms-combine >>>> >>>> You could take the PCollection result of 2 and simultaneously apply the >>>> Write and the Combine, using the singular result of the Combine to trigger >>>> the remaining steps >>>> >>>> Hope that helps, I'm still learning >>>> >>>> -Gwilym >>>> >>>> >>>> On 11 June 2017 at 16:50, Morand, Sebastien < >>>> [email protected]> wrote: >>>> >>>>> No, I can't, the pipeline is created within a cron, which is limited >>>>> to 10 minutes. >>>>> >>>>> *Sébastien MORAND* >>>>> Team Lead Solution Architect >>>>> Technology & Operations / Digital Factory >>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>> <+33%201%2085%2057%2071%2008> >>>>> Bureau 0144C (Ouest) >>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>> *www.veolia.com <http://www.veolia.com>* >>>>> <http://www.veolia.com> >>>>> <https://www.facebook.com/veoliaenvironment/> >>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>> <https://twitter.com/veolia> >>>>> >>>>> On 11 June 2017 at 18:21, Eugene Kirpichov <[email protected]> >>>>> wrote: >>>>> >>>>>> Hmm can you simply do this in your main program after the pipeline >>>>>> finishes? >>>>>> >>>>>> p.run().waitUntilFinish(); >>>>>> ... Send report ... >>>>>> >>>>>> On Sun, Jun 11, 2017, 1:50 AM Morand, Sebastien < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Yes this use case can be treated by using parallel operation. >>>>>>> >>>>>>> I have a 2nd one, I would like to send a report at the end of the >>>>>>> pipeline when the last line has been written in bigquery: number of >>>>>>> lines >>>>>>> treated, number of lines ignored (from another part of the pipeline >>>>>>> using >>>>>>> graph as you described), number of files at the begining, and so on. >>>>>>> >>>>>>> This report could be: >>>>>>> >>>>>>> 1. Write a pub/sub >>>>>>> 2. Send an email >>>>>>> 3. Call an url with parameters >>>>>>> >>>>>>> Is this possible? >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Sébastien MORAND* >>>>>>> Team Lead Solution Architect >>>>>>> Technology & Operations / Digital Factory >>>>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>>>> <+33%201%2085%2057%2071%2008> >>>>>>> Bureau 0144C (Ouest) >>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>>>> *www.veolia.com <http://www.veolia.com>* >>>>>>> <http://www.veolia.com> >>>>>>> <https://www.facebook.com/veoliaenvironment/> >>>>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>>>> <https://twitter.com/veolia> >>>>>>> >>>>>>> On 11 June 2017 at 04:14, Eugene Kirpichov <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi! >>>>>>>> It sounds like you want to write data to BigQuery and then load the >>>>>>>> same data back from BigQuery? Why? I'm particularly confused by your >>>>>>>> comment "nothing left in the PCollection" - writing a collection to >>>>>>>> BigQuery doesn't remove data from the collection, a PCollection is >>>>>>>> just a >>>>>>>> logical description of a dataset, not a mutable container. Transforms >>>>>>>> are >>>>>>>> like mathematical functions - they don't change their inputs, they only >>>>>>>> compute their outputs. >>>>>>>> >>>>>>>> Perhaps that you're assuming that Beam pipelines can only be a >>>>>>>> strict linear sequence of transforms? That is not the case - pipelines >>>>>>>> are >>>>>>>> an arbitrary graph, you can use a collection multiple times, i.e. apply >>>>>>>> multiple transforms to it. E.g. you can both write the collection to >>>>>>>> bigquery (step 3) and apply some other transform to the same collection >>>>>>>> (step 5). >>>>>>>> >>>>>>>> Assuming you use Java: >>>>>>>> PCollection<Foo> foos = p.apply(TextIO.read().from(...)).apply(...some >>>>>>>> transform...); >>>>>>>> foos.apply(BigQueryIO.write().to(...)); >>>>>>>> PCollection<Bar> bars = foos.apply(...some other transform...); >>>>>>>> bars.apply(BigQueryIO.write().to(...)); >>>>>>>> >>>>>>>> Let me know if this helps. >>>>>>>> >>>>>>>> On Sat, Jun 10, 2017 at 3:42 PM Morand, Sebastien < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Is there any way to add some step after a Write, because Write >>>>>>>>> return un PDone, so I can't do anything, but I would like actually do >>>>>>>>> something. >>>>>>>>> >>>>>>>>> Example : >>>>>>>>> >>>>>>>>> 1. Load data from gcs >>>>>>>>> 2. Some transform >>>>>>>>> 3. Write data into bigquery >>>>>>>>> => Nothing left in the pcollection, but when 3 is over => >>>>>>>>> 4. Load data from bigquery >>>>>>>>> 5. Some other transform >>>>>>>>> 6. Write data into bigquery >>>>>>>>> >>>>>>>>> Any way to do that? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> *Sébastien MORAND* >>>>>>>>> Team Lead Solution Architect >>>>>>>>> Technology & Operations / Digital Factory >>>>>>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>>>>>> <+33%201%2085%2057%2071%2008> >>>>>>>>> Bureau 0144C (Ouest) >>>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>>>>>> *www.veolia.com <http://www.veolia.com>* >>>>>>>>> <http://www.veolia.com> >>>>>>>>> <https://www.facebook.com/veoliaenvironment/> >>>>>>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>>>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>>>>>> <https://twitter.com/veolia> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> -------------------------------- >>>>>>>>> This e-mail transmission (message and any attached files) may >>>>>>>>> contain information that is proprietary, privileged and/or >>>>>>>>> confidential to >>>>>>>>> Veolia Environnement and/or its affiliates and is intended >>>>>>>>> exclusively for >>>>>>>>> the person(s) to whom it is addressed. If you are not the intended >>>>>>>>> recipient, please notify the sender by return e-mail and delete all >>>>>>>>> copies >>>>>>>>> of this e-mail, including all attachments. Unless expressly >>>>>>>>> authorized, any >>>>>>>>> use, disclosure, publication, retransmission or dissemination of this >>>>>>>>> e-mail and/or of its attachments is strictly prohibited. >>>>>>>>> >>>>>>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>>>>>> confidentiels et peuvent contenir des elements dont Veolia >>>>>>>>> Environnement >>>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>>>>>> message par erreur, merci de le retourner a son emetteur et de le >>>>>>>>> detruire >>>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, >>>>>>>>> la >>>>>>>>> publication, la distribution, ou la reproduction non expressement >>>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> -------------------------------- >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> -------------------------------- >>>>>>> This e-mail transmission (message and any attached files) may >>>>>>> contain information that is proprietary, privileged and/or confidential >>>>>>> to >>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively >>>>>>> for >>>>>>> the person(s) to whom it is addressed. If you are not the intended >>>>>>> recipient, please notify the sender by return e-mail and delete all >>>>>>> copies >>>>>>> of this e-mail, including all attachments. Unless expressly authorized, >>>>>>> any >>>>>>> use, disclosure, publication, retransmission or dissemination of this >>>>>>> e-mail and/or of its attachments is strictly prohibited. >>>>>>> >>>>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement >>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>>>> message par erreur, merci de le retourner a son emetteur et de le >>>>>>> detruire >>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la >>>>>>> publication, la distribution, ou la reproduction non expressement >>>>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>>>> ------------------------------------------------------------ >>>>>>> -------------------------------- >>>>>>> >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------ >>>>> -------------------------------- >>>>> This e-mail transmission (message and any attached files) may contain >>>>> information that is proprietary, privileged and/or confidential to Veolia >>>>> Environnement and/or its affiliates and is intended exclusively for the >>>>> person(s) to whom it is addressed. If you are not the intended recipient, >>>>> please notify the sender by return e-mail and delete all copies of this >>>>> e-mail, including all attachments. Unless expressly authorized, any use, >>>>> disclosure, publication, retransmission or dissemination of this e-mail >>>>> and/or of its attachments is strictly prohibited. >>>>> >>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement >>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>> message par erreur, merci de le retourner a son emetteur et de le detruire >>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la >>>>> publication, la distribution, ou la reproduction non expressement >>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>> ------------------------------------------------------------ >>>>> -------------------------------- >>>>> >>>> >>>> >>> >>
