Re: PBegin, PDone

2017-09-13 Thread Jean-Baptiste Onofré
Hi, I don't think it makes sense on a transform (as it expects a PCollection). However, why not introducing a specific hook for that. I think you can workaround using a Pipeline Visitor, but it would be runner level. Regards JB On 09/14/2017 08:21 AM, Chaim Turkel wrote: Hi, I have a fe

PBegin, PDone

2017-09-13 Thread Chaim Turkel
Hi, I have a few scenarios where I would like to have code that is before the PBegin and after the PDone. This is usually for monitoring purposes. It would be nice to be able to transform from PBegin to PBegin, and PDone to PDone, so that code can be run before and after and not in the driver pro

Re: TestStream and stateful processing

2017-09-13 Thread Thomas Weise
Since how the state is mutated depends on the DoFn implementation, wouldn't this require the DoFn author to supply the merge logic? Something like @Merge merge(@StateId("myCount") ValueState merged, @StateId("myCount") ValueState[] src) Without the support in Beam, what workaround could be suita

Re: New contributor

2017-09-13 Thread Griselda Cuevas
Welcome! On Sep 13, 2017 4:43 PM, "Lukasz Cwik" wrote: > Welcome. > > On Wed, Sep 13, 2017 at 4:30 PM, Reuven Lax > wrote: > > > Welcome Daniel! > > > > Reuven > > > > On Wed, Sep 13, 2017 at 4:27 PM, Jesse Anderson < > je...@bigdatainstitute.io > > > > > wrote: > > > > > Welcome! > > > > > > O

Re: New contributor

2017-09-13 Thread Lukasz Cwik
Welcome. On Wed, Sep 13, 2017 at 4:30 PM, Reuven Lax wrote: > Welcome Daniel! > > Reuven > > On Wed, Sep 13, 2017 at 4:27 PM, Jesse Anderson > > wrote: > > > Welcome! > > > > On Wed, Sep 13, 2017 at 2:24 PM Daniel Oliveira > > wrote: > > > > > Hi everyone, > > > > > > My name's Daniel Oliveira

Re: New contributor

2017-09-13 Thread Reuven Lax
Welcome Daniel! Reuven On Wed, Sep 13, 2017 at 4:27 PM, Jesse Anderson wrote: > Welcome! > > On Wed, Sep 13, 2017 at 2:24 PM Daniel Oliveira > wrote: > > > Hi everyone, > > > > My name's Daniel Oliveira. I work at Google and I'd like to start > > contributing to this project so I wanted to int

Re: New contributor

2017-09-13 Thread Jesse Anderson
Welcome! On Wed, Sep 13, 2017 at 2:24 PM Daniel Oliveira wrote: > Hi everyone, > > My name's Daniel Oliveira. I work at Google and I'd like to start > contributing to this project so I wanted to introduce myself. > > I've already read through the contribution guide and I'm excited to start > mak

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Kenneth Knowles
ValueProvider is global, PCollectionView is per-window, state is per-step/key/window, etc. So my unhappiness increases as we move through that list, adding more and more constraints on correct use, none of which are reflected in the API. Your description of "its context is an execution of the pipe

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Thanks! I think most of the issues you point out [validation, scheduling, prefetching] are in the area of wiring. I reiterate that they can be solved - both of the methods below will give the runner an answer to the low-level question "which DoFn will need which side inputs": 1) Providing withSid

Re: Report to the Board, September 2017 edition

2017-09-13 Thread Davor Bonaci
Thanks everyone for a super quick turnaround! (The report is now submitted.) On Wed, Sep 13, 2017 at 10:45 AM, Davor Bonaci wrote: > We are expected to submit a project report to the ASF Board of Directors > ahead of its next meeting. The report is due on Wednesday, 9/13. > > If interested, plea

New contributor

2017-09-13 Thread Daniel Oliveira
Hi everyone, My name's Daniel Oliveira. I work at Google and I'd like to start contributing to this project so I wanted to introduce myself. I've already read through the contribution guide and I'm excited to start making contributions soon! Thank you, Daniel Oliveira

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
On Wed, Sep 13, 2017 at 1:56 PM, Eugene Kirpichov wrote: > On Wed, Sep 13, 2017 at 1:44 PM Robert Bradshaw > wrote: > >> On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov >> wrote: >> > Hi Robert, >> > >> > Given the anticipated usage of this proposal in Java, I'm not sure the >> > Python approa

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Kenneth Knowles
I made some comments on https://issues.apache.org/jira/browse/BEAM-2950 which was filed to do a similar thing for State. Luke correctly pointed out that many of the points apply here as well. I said most of the same above, but I thought I'd pull them out again from that ticket and rephrase to apply

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
On Wed, Sep 13, 2017 at 1:44 PM Robert Bradshaw wrote: > On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov > wrote: > > Hi Robert, > > > > Given the anticipated usage of this proposal in Java, I'm not sure the > > Python approach you quoted is the right one. > > Perhaps not, but does that mean i

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov wrote: > Hi Robert, > > Given the anticipated usage of this proposal in Java, I'm not sure the > Python approach you quoted is the right one. Perhaps not, but does that mean it would be a Java-ism only or would we implement it in Python despite it

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Hi Robert, Given the anticipated usage of this proposal in Java, I'm not sure the Python approach you quoted is the right one. The main reason: I see how it works with Map/FlatMap, but what about cases like FileIO.write(), parameterized by several lambdas (element -> destination, destination -> f

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Chaim Turkel
just found, that with bigquery you cannot stream to partitions older than 30 days (so i can't use it anyway to load old data) :( On Wed, Sep 13, 2017 at 7:08 PM, Lukasz Cwik wrote: > Support was added to expose how users want to load their data with > https://github.com/apache/beam/commit/075d4d4

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Chaim Turkel
.apply("insert data table - " + table.getTableName(), BigQueryIO.writeTableRows() .to(TableRefPartition.perDay(options.getBQProject(), options.getDatasetId(), table.getBqTableName())) .withSchemaFromView(tableSchemas) .withCreateDisposition(BigQueryIO.Write.

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
+1 to reducing the amount of boilerplate for dealing with side inputs. I prefer the "NewDoFn" style of side inputs for consistency. The primary drawback seems to be lambda's incompatibility with annotations. This is solved in Python by letting all the first annotated argument of the process method

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Reuven Lax
Can you show us some of the code you are using? How are you loading into separate partitions? Reuven On Wed, Sep 13, 2017 at 10:13 AM, Chaim Turkel wrote: > I am loading into separate partitions of the same table. > I want to see it streaming will be faster. > > Is there a repository where i c

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Reuven Lax
On Wed, Sep 13, 2017 at 10:05 AM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hi, > > I agree with these concerns to an extent, however I think the advantage of > transparently letting any user code access side inputs, especially > including lambdas, is so great that we should find a

Report to the Board, September 2017 edition

2017-09-13 Thread Davor Bonaci
We are expected to submit a project report to the ASF Board of Directors ahead of its next meeting. The report is due on Wednesday, 9/13. If interested, please take a look at the draft [1], and comment or contribute content, as appropriate. I'll submit the report sometime in the next 24 hours. Th

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Chaim Turkel
just went over the changes for the streaming method. That looks great. How about adding the option to continue the apply after success with statistics or something like in the failure On Wed, Sep 13, 2017 at 7:19 PM, Reuven Lax wrote: > Ah, so you are loading each window into a separate BigQuery

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Lukasz Cwik
Initially I was skeptical of the change but after seeing how many "context" objects were being created to solve issues of passing around references really showed that our API approach was problematic. Using this pattern allows us to get rid of things like CombineWithContext and the *new* SideInputA

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Chaim Turkel
I am loading into separate partitions of the same table. I want to see it streaming will be faster. Is there a repository where i can use the snapshot version? On Wed, Sep 13, 2017 at 7:19 PM, Reuven Lax wrote: > Ah, so you are loading each window into a separate BigQuery table? That > might b

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Eugene Kirpichov
Hi, I agree with these concerns to an extent, however I think the advantage of transparently letting any user code access side inputs, especially including lambdas, is so great that we should find a way to address these concerns within the constraints of the pattern I'm proposing. See more below.

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Ben Chambers
One possible issue with this is that updating a thread local is likely to be much more expensive than passing an additional argument. Also, not all code called from within the DoFn will necessarily be in the same thread (eg., sometimes we create a pool of threads for doing work). It may be *more* c

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Reuven Lax
Ah, so you are loading each window into a separate BigQuery table? That might be the reason things are slow. Remembert a batch job doesn't return until everything finishes, and if you are loading that many tables it's entirely possible that BigQuery will throttle you, causing the slowdown. A coupl

Re: BigQueryIO withSchemaFromView

2017-09-13 Thread Lukasz Cwik
Support was added to expose how users want to load their data with https://github.com/apache/beam/commit/075d4d45a9cd398f3b4023b6efd495cc58eb9bdd It is planned to be released in 2.2.0 On Tue, Sep 12, 2017 at 11:48 PM, Chaim Turkel wrote: > from what i found it I have the windowing with bigquery

Re: Migration From 1.9.x to 2.1.0

2017-09-13 Thread Eugene Kirpichov
The full set of changes is described in https://cloud.google.com/dataflow/release-notes/release-notes-java-2 On Wed, Sep 13, 2017 at 8:53 AM Thomas Groh wrote: > for (1) and (4), the DoFn methods have been moved to be reflection based. > Instead of using `@Override` in your DoFns, you should ann

Re: Migration From 1.9.x to 2.1.0

2017-09-13 Thread Thomas Groh
for (1) and (4), the DoFn methods have been moved to be reflection based. Instead of using `@Override` in your DoFns, you should annotate those methods with `@StartBundle`, `@ProcessElement`, and `@FinishBundle` instead. For (2), Aggregators have been removed. Our suggested replacement is the use

Migration From 1.9.x to 2.1.0

2017-09-13 Thread Arunkumar Santhanagopalan
Hi, We are trying to migrate from Dataflow 1.9.x to Dataflow 2.1.0 I need help with the following changes 1. class Join extends DoFn { @Override public void startBundle(Context c) throws Exception { super.startBundle(c); createParser(); } Method "startBundle" does not override method s

Build failed in Jenkins: beam_Release_NightlySnapshot #533

2017-09-13 Thread Apache Jenkins Server
See Changes: [chamikara] Add first few BigtableWriteException to suppressed list when rethrowing [mingmxu] [BEAM-2804] support TIMESTAMP in sort [chamikara] Fix GqlQueryTranslateFn to pass localhost