Ah, so in my approach you would potentially end up with a "report" (final
step) which happens before the writes complete and is based on the input
contents rather than what was actually written.
Thanks for that.
Getting some sort of contextual output from Write results sounds like a
good idea, so
This would write the data, and in parallel, also apply your combiner to the
same data and apply the other thing. Combine transform does not have any
"sequencing" effects - it is a basic aggregation transform; it's under the
hood of Count, Sum, Mean, and other aggregation transforms; it combines a
c
Would the following work though? I could be misunderstanding the situation:
transform = p.apply(some transform)
transform.apply(write)
transform.apply(combine).apply(something on combined result)
p.run()
Cheers,
Gwilym
On 12 June 2017 at 02:36, Lukasz Cwik wrote:
> Unfortunately you can't Com
Unfortunately you can't Combine Writes since they return PDone (a terminal
node) during pipeline construction.
On Sun, Jun 11, 2017 at 3:23 PM, Gwilym Evans
wrote:
> I'm not 100% sure as I haven't tried it, but, Combining comes to mind as a
> possible way of doing this, assuming your data is fin
I'm not 100% sure as I haven't tried it, but, Combining comes to mind as a
possible way of doing this, assuming your data is finite
https://beam.apache.org/documentation/programming-guide/#transforms-combine
You could take the PCollection result of 2 and simultaneously apply the
Write and the Com
No, I can't, the pipeline is created within a cron, which is limited to 10
minutes.
*Sébastien MORAND*
Team Lead Solution Architect
Technology & Operations / Digital Factory
Veolia - Group Information Systems & Technology (IS&T)
Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
Bureau 0144C (Ou
Hmm can you simply do this in your main program after the pipeline finishes?
p.run().waitUntilFinish();
... Send report ...
On Sun, Jun 11, 2017, 1:50 AM Morand, Sebastien
wrote:
> Yes this use case can be treated by using parallel operation.
>
> I have a 2nd one, I would like to send a report
Yes this use case can be treated by using parallel operation.
I have a 2nd one, I would like to send a report at the end of the pipeline
when the last line has been written in bigquery: number of lines treated,
number of lines ignored (from another part of the pipeline using graph as
you described
Hi!
It sounds like you want to write data to BigQuery and then load the same
data back from BigQuery? Why? I'm particularly confused by your comment
"nothing left in the PCollection" - writing a collection to BigQuery
doesn't remove data from the collection, a PCollection is just a logical
descript
Hi,
Is there any way to add some step after a Write, because Write return un
PDone, so I can't do anything, but I would like actually do something.
Example :
1. Load data from gcs
2. Some transform
3. Write data into bigquery
=> Nothing left in the pcollection, but when 3 is over =>
10 matches
Mail list logo