Re: I want to allow a user-specified QuerySplitter for DatastoreIO

2018-05-04 Thread Chamikara Jayalath
Hi Frank, On Thu, May 3, 2018 at 1:07 PM Lukasz Cwik wrote: > I also like the idea of doing the splitting when the pipeline is running > and not during pipeline construction. This works a lot better with things > like templates. > > Do you know what Maven package contains

Re: Pubsub to Beam SQL

2018-05-04 Thread Raghu Angadi
On Thu, May 3, 2018 at 12:47 PM Anton Kedin wrote: > I think it makes sense for the case when timestamp is provided in the > payload (including pubsub message attributes). We can mark the field as an > event timestamp. But if the timestamp is internally defined by the source >

Re: Graal instead of docker?

2018-05-04 Thread Lukasz Cwik
I did take a look at Graal a while back when thinking about how execution environments could be defined, my concerns were related to it not supporting all of the features of a language. For example, its typical for Python to load and call native libraries and Graal can only execute C/C++ code that

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Jean-Baptiste Onofré
Hi I have couple of PRs I would like to include. I would like also to take the weekend for new builds and tests. If it works for everyone I propose to start the release process Tuesday. Thoughts ? Regards JB Le 4 mai 2018 à 17:49, à 17:49, Scott Wegner a écrit: >Hi JB,

Re: Graal instead of docker?

2018-05-04 Thread Romain Manni-Bucau
Le 4 mai 2018 17:55, "Lukasz Cwik" a écrit : I did take a look at Graal a while back when thinking about how execution environments could be defined, my concerns were related to it not supporting all of the features of a language. For example, its typical for Python to load and

Complex Types Support for Beam SQL DDL

2018-05-04 Thread Anton Kedin
Hi, I am working on adding support for non-primitive types in Beam SQL DDL. *Goal* Allow users to define tables with Rows, Arrays, Maps as field types in DDL. This enables defining schemas for complex sources, e.g. describing JSON sources or other sources which support complex field types (BQ,

Reading all elements of a PCollection after running beam go pipeline

2018-05-04 Thread 8 Gianfortoni
Hi dev team, I would like to be able to read the entire results of a PCollection serially after running beam. In other frameworks this is fairly straightforward, but I don't understand how one might do this with the Beam Go SDK. I guess I can read in a file that I write, but I want to be able to

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Scott Wegner
Hi JB, any idea when you will begin the release? Boyuan has a couple Python PRs [1] [2] that are ready to merge, but we'd like to wait until after the release branch is cut in case there is some performance regression. [1] https://github.com/apache/beam/pull/4741 [2]

Re: Graal instead of docker?

2018-05-04 Thread Henning Rohde
Romain, Docker, unlike selinux, solves a great number of tangible problems for us with IMO a relatively small tax. It does not have to be the only way. Some of the concerns you bring up along with possibilities were also discussed here: https://s.apache.org/beam-fn-api-container-contract. I

Re: Reading all elements of a PCollection after running beam go pipeline

2018-05-04 Thread Henning Rohde
Great! On Fri, May 4, 2018 at 4:37 PM 8 Gianfortoni <8...@tokentransit.com> wrote: > Thanks for the workaround! That should work for me. > > On Fri, May 4, 2018, 1:51 PM Henning Rohde wrote: > >> Hey there, >> >> Until side input is fully supported, you can use GBK with a

[Proposal] Finalizing Fn API : Defining and adding SDK Metrics

2018-05-04 Thread Alex Amato
Thank you everyone for your support with the Fn API : Defining and adding SDK Metrics proposal :). I only got minor feedback in the last iteration I requested last. I am finalizing this now and will begin working on PRs in the coming week. Of course

Re: Reading all elements of a PCollection after running beam go pipeline

2018-05-04 Thread 8 Gianfortoni
Thanks for the workaround! That should work for me. On Fri, May 4, 2018, 1:51 PM Henning Rohde wrote: > Hey there, > > Until side input is fully supported, you can use GBK with a fixed key to > get all elements in a single bundle (assuming global windowing). That is > how

Re: Pubsub to Beam SQL

2018-05-04 Thread Andrew Pilloud
I don't think we should jump to adding a extension, but TBLPROPERTIES is already a DDL extension and it isn't user friendly. We should strive for a world where no one needs to use it. SQL needs the timestamp to be exposed as a column, we can't hide it without changing the definition of GROUP BY. I

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Andrew Pilloud
Spanner is also broken, and post commits are failing. I've added the issue as a blocker. https://issues.apache.org/jira/browse/BEAM-4229 Andrew On Fri, May 4, 2018 at 1:24 PM Charles Chen wrote: > I have added https://issues.apache.org/jira/browse/BEAM-4236 as a blocker. > >

Re: Pubsub to Beam SQL

2018-05-04 Thread Anton Kedin
There are few aspects of the event timestamp definition in SQL, which we are talking about here: - configuring the source. E.g. for PubsubIO you can choose whether to extract event timestamp from the message attributes or the message publish time: - this is source-specific and cannot

Re: Graal instead of docker?

2018-05-04 Thread Henning Rohde
I disagree with the characterization of docker and the implications made towards portability. Graal looks like a neat project (and I never thought I would live to see the phrase "Practical Partial Evaluation" ..), but it doesn't address the needs of portability. In addition to Luke's examples, Go

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Ahmet Altay
Hi JB, We found an issue related to using side inputs in streaming mode using python SDK. Charles is currently trying to find the root cause. Would you be able to give him some additional time to investigate the issue? Charles, do you have a JIRA issue on the blocker list? Thank you everyone

Re: [SQL] Reconciling Beam SQL Environments with Calcite Schema

2018-05-04 Thread Andrew Pilloud
Reviews are wrapping up, this will probably merge Monday if I don't hear from anyone else. One more TableProvider API change after review feedback: getTables now returns Map instead of Set. Andrew On Thu, May 3, 2018 at 10:41 AM Andrew Pilloud wrote: > Ok,

Re: Graal instead of docker?

2018-05-04 Thread Romain Manni-Bucau
Le 4 mai 2018 21:31, "Henning Rohde" a écrit : I disagree with the characterization of docker and the implications made towards portability. Graal looks like a neat project (and I never thought I would live to see the phrase "Practical Partial Evaluation" ..), but it doesn't

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Charles Chen
I have added https://issues.apache.org/jira/browse/BEAM-4236 as a blocker. On Fri, May 4, 2018 at 1:19 PM Ahmet Altay wrote: > Hi JB, > > We found an issue related to using side inputs in streaming mode using > python SDK. Charles is currently trying to find the root cause.

Re: Reading all elements of a PCollection after running beam go pipeline

2018-05-04 Thread Henning Rohde
Hey there, Until side input is fully supported, you can use GBK with a fixed key to get all elements in a single bundle (assuming global windowing). That is how textio.Write works internally to produce a single file currently:

Re: ValidatesRunner test cleanup

2018-05-04 Thread Etienne Chauchot
Scott, thanks for that ! I only quickly looked at the ValidatesRunner tests that I wrote (you modified none) and the ones that impact my ongoing work (metrics).  I think some tests in MetricsTest still need to be ValidatesRunner tests. See my comment in the review. Etienne > Note: if you don't

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #28

2018-05-04 Thread Apache Jenkins Server
See Changes: [echauchot] [BEAM-4138] Support runners that do not support committed metrics in [apilloud] [SQL] Add BeamEnumerableConverter [mairbek] Templatize host name in SpannerIO