Re: multiple PCollections

2017-09-14 Thread Chaim Turkel
I am using the sink of BigQueryIO so the example is not the same. The example is bad data from reading, I have problems when writting. There can be multiple errors when writing to BigQuery, and if it fails there is no way to catch this error, and the whole pipeline fails chaim On Thu, Sep 14,

Re: [DISCUSSION] using NexMark for Beam

2017-09-14 Thread Reuven Lax
It's being worked on. Turns out there are some modifications still needed to the NexMark queries. Reuven On Thu, Sep 14, 2017 at 9:33 PM, Pei HE wrote: > Could any Googlers help to run NexMark on Dataflow streaming and share the > numbers with the community? > -- > Pei > > On

Re: [DISCUSSION] using NexMark for Beam

2017-09-14 Thread Pei HE
Could any Googlers help to run NexMark on Dataflow streaming and share the numbers with the community? -- Pei On Fri, Aug 25, 2017 at 11:28 PM, Lukasz Cwik wrote: > Etienne, cut some JIRAs for improvements like ValidatesRunner for the > Nexmark suite that you think are

Re: Understanding BigQueryIO.Read performance and options

2017-09-14 Thread Steve Niemitz
Cool, thanks for the shove in the right direction! I'll probably have some more time to spend on this early next week, hopefully I'll have a PR to submit after that. :) On Thu, Sep 14, 2017 at 4:37 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > On Thu, Sep 14, 2017 at 1:10 PM

Re: Understanding BigQueryIO.Read performance and options

2017-09-14 Thread Eugene Kirpichov
Oh, I see what you mean. Yeah, I agree that having BigQueryIO use TableRow as the native format was a suboptimal decision in retrospect, and I agree that it would be reasonable to provide ability to go through Avro GenericRecord instead. I'm just not sure how to provide it in an API-compatible way

Re: TestStream and stateful processing

2017-09-14 Thread Kenneth Knowles
TL;DR: GroupingState should be just coding work to automate, while merging ValueState probably needs a design doc You can automatically merge GroupingState, though it is merely implied - if all the user can do is add(datum) then the definition of merged state is necessarily equivalent to calling

Re: PBegin, PDone

2017-09-14 Thread Eugene Kirpichov
For doing something before starting the pipeline, can you do it in the main program? The only disadvantage I can see is that it wouldn't be amenable to using templates (ValueProvider's) - is that the blocker? For doing something after a transform finishes processing a window of a PCollection - we

Re: Unable to find registrar for s3n when restoring flink job from savepoint

2017-09-14 Thread Aljoscha Krettek
I responded on the issue. > On 12. Sep 2017, at 22:25, Lukasz Cwik wrote: > > Filed https://issues.apache.org/jira/browse/BEAM-2948 > > On Tue, Sep 12, 2017 at 2:10 AM, Pawel Bartoszek > wrote: > >> Hi, >> >> I am running a flink v1.2.1

multiple PCollections

2017-09-14 Thread Chaim Turkel
Hi, In one pipeline I have multiple PCollections. If I have an error on one then the whole pipline is canceled, is there a way to catch the error and log it, and for all other PCollections to continue? chaim

Re: PBegin, PDone

2017-09-14 Thread Chaim Turkel
My use case is that I have generic code to transfer for example tables from mongo to bigquery. I iterate over all tables in mongo and create a PCollection for each. But there are things that need to be checked before running, and to run only if validated. I tried the visitor but there is no way to

Re: PBegin, PDone

2017-09-14 Thread Jean-Baptiste Onofré
Hi, I don't think it makes sense on a transform (as it expects a PCollection). However, why not introducing a specific hook for that. I think you can workaround using a Pipeline Visitor, but it would be runner level. Regards JB On 09/14/2017 08:21 AM, Chaim Turkel wrote: Hi, I have a

PBegin, PDone

2017-09-14 Thread Chaim Turkel
Hi, I have a few scenarios where I would like to have code that is before the PBegin and after the PDone. This is usually for monitoring purposes. It would be nice to be able to transform from PBegin to PBegin, and PDone to PDone, so that code can be run before and after and not in the driver