Re: Action in the pipeline after Write

2017-06-11 Thread Gwilym Evans
Ah, so in my approach you would potentially end up with a "report" (final step) which happens before the writes complete and is based on the input contents rather than what was actually written. Thanks for that. Getting some sort of contextual output from Write results sounds like a good idea, so

Re: Action in the pipeline after Write

2017-06-11 Thread Eugene Kirpichov
This would write the data, and in parallel, also apply your combiner to the same data and apply the other thing. Combine transform does not have any "sequencing" effects - it is a basic aggregation transform; it's under the hood of Count, Sum, Mean, and other aggregation transforms; it combines a c

Re: Action in the pipeline after Write

2017-06-11 Thread Gwilym Evans
Would the following work though? I could be misunderstanding the situation: transform = p.apply(some transform) transform.apply(write) transform.apply(combine).apply(something on combined result) p.run() Cheers, Gwilym On 12 June 2017 at 02:36, Lukasz Cwik wrote: > Unfortunately you can't Com

Re: Action in the pipeline after Write

2017-06-11 Thread Lukasz Cwik
Unfortunately you can't Combine Writes since they return PDone (a terminal node) during pipeline construction. On Sun, Jun 11, 2017 at 3:23 PM, Gwilym Evans wrote: > I'm not 100% sure as I haven't tried it, but, Combining comes to mind as a > possible way of doing this, assuming your data is fin

Re: Action in the pipeline after Write

2017-06-11 Thread Gwilym Evans
I'm not 100% sure as I haven't tried it, but, Combining comes to mind as a possible way of doing this, assuming your data is finite https://beam.apache.org/documentation/programming-guide/#transforms-combine You could take the PCollection result of 2 and simultaneously apply the Write and the Com

Re: Action in the pipeline after Write

2017-06-11 Thread Morand, Sebastien
No, I can't, the pipeline is created within a cron, which is limited to 10 minutes. *Sébastien MORAND* Team Lead Solution Architect Technology & Operations / Digital Factory Veolia - Group Information Systems & Technology (IS&T) Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 Bureau 0144C (Ou

Re: Action in the pipeline after Write

2017-06-11 Thread Eugene Kirpichov
Hmm can you simply do this in your main program after the pipeline finishes? p.run().waitUntilFinish(); ... Send report ... On Sun, Jun 11, 2017, 1:50 AM Morand, Sebastien wrote: > Yes this use case can be treated by using parallel operation. > > I have a 2nd one, I would like to send a report

Re: Action in the pipeline after Write

2017-06-11 Thread Morand, Sebastien
Yes this use case can be treated by using parallel operation. I have a 2nd one, I would like to send a report at the end of the pipeline when the last line has been written in bigquery: number of lines treated, number of lines ignored (from another part of the pipeline using graph as you described

Re: Action in the pipeline after Write

2017-06-10 Thread Eugene Kirpichov
Hi! It sounds like you want to write data to BigQuery and then load the same data back from BigQuery? Why? I'm particularly confused by your comment "nothing left in the PCollection" - writing a collection to BigQuery doesn't remove data from the collection, a PCollection is just a logical descript

Action in the pipeline after Write

2017-06-10 Thread Morand, Sebastien
Hi, Is there any way to add some step after a Write, because Write return un PDone, so I can't do anything, but I would like actually do something. Example : 1. Load data from gcs 2. Some transform 3. Write data into bigquery => Nothing left in the pcollection, but when 3 is over =>