Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ankur Goenka
Congratulations and thank you for making Beam awesome! *From: *Chamikara Jayalath *Date: *Thu, May 2, 2019, 4:03 PM *To: *dev Congratulations! > > On Thu, May 2, 2019 at 10:28 AM Udi Meiri wrote: > >> Congrats everyone! >> >> On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: >> >>>

Better naming for runner specific options

2019-05-02 Thread Reza Rokni
Hi, Was reading this SO question: https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has And noticed that in https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions The option is

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-02 Thread Kenneth Knowles
Agree with all of your points about the drawbacks of ValueState. It is definitely a pro/con weighing sort of situation. Considering the number of users who are new to the orthogonality of event time and processing time, ValueState could certainly lead to confusion about why things are not in any

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Kenneth Knowles
Meta: All of Beam SQL is still "experimental" isn't it? There's very little chance that the structure of Beam SQL pipelines will be stable enough for e.g. pipeline update. So that is not worth worrying about at this stage. And this doesn't seem to affect APIs / compile time compatibility. As to

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Ahmet Altay
On Thu, May 2, 2019 at 2:18 PM Rui Wang wrote: > Brian's first proposal is challenging also partially because in BeamSQL > there is no good practice to deal with complex SQL plans. Ideally we need > enough rules and SQL plan node in Beam to construct easy-to-transform plans > for different

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Chamikara Jayalath
Congratulations! On Thu, May 2, 2019 at 10:28 AM Udi Meiri wrote: > Congrats everyone! > > On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: > >> Congratulations! >> >> On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: >> >>> Congratulations! Well deserved! >>> >>> On Thu, May 2, 2019 at 9:37

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Rui Wang
Brian's first proposal is challenging also partially because in BeamSQL there is no good practice to deal with complex SQL plans. Ideally we need enough rules and SQL plan node in Beam to construct easy-to-transform plans for different cases. I had a similar situation before when I needed to

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Brian Hulette
Ahmet - I think it would only require observing each key's partition of the input independently, and the size of the state would only be proportional to the number of distinct elements, not the entire input. Note the pipeline would be a GBK with a key based on the GROUP BY, followed by a

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Lukasz Cwik
Can you also go into more detail why you think 1) is more challenging to implement? On Thu, May 2, 2019 at 11:58 AM Ahmet Altay wrote: > From my limited understanding, would not the stateful combinefn option > require observing the whole input before being able combine and the risk of > blowing

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Ahmet Altay
>From my limited understanding, would not the stateful combinefn option require observing the whole input before being able combine and the risk of blowing memory is actually very high except for trivial inputs? On Thu, May 2, 2019 at 11:50 AM Brian Hulette wrote: > Hi everyone, > Currently

[DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Brian Hulette
Hi everyone, Currently BeamSQL does not support DISTINCT aggregations. These are queries like: > SELECT k, SUM(DISTINCT v) FROM t GROUP BY k > SELECT k, k2, COUNT(DISTINCT k2) FROM t GROUP BY k, k2 These are represented in Calcite's logical plan with a distinct flag on aggregation calls, but we

Re: kafka client interoperability

2019-05-02 Thread Lukasz Cwik
+dev On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < richard.moorhe...@cerner.com> wrote: > In Beam 2.9.0, this check was made: > > >

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Maximilian Michels
I am not sure what are you referring to here. What do you mean Kryo is simply slower ... Beam Kryo or Flink Kryo or? Flink uses Kryo as a fallback serializer when its own type serialization system can't analyze the type. I'm just guessing here that this could be slower. On 02.05.19 16:51,

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Maximilian Michels
BTW what are the next steps here ? Heejong or Max, will one of you be able to come up with a detailed proposal around this ? Thank you for all the additional comments and ideas. I will try to capture them in a document and share it here. Of course we can continue the discussion in the

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Ahmet Altay
On Thu, May 2, 2019 at 8:56 AM Kenneth Knowles wrote: > Pulling out the relevant pypi bit w.r.t. RCs: > > >> - Release candidates, nightly or snapshots need to be clearly tagged as >> pre-release on https://pypi.org/project/apache/#history >> - The latest version should not point to an artefact

Re: cancel job

2019-05-02 Thread Chaim Turkel
thanks for the reply, i am using airflow python code to run a java runner, so i do not have the actual pipleine handler, is there a way to get it? chaim On Thu, May 2, 2019 at 7:58 PM Lukasz Cwik wrote: > > +u...@beam.apache.org > > On Thu, May 2, 2019 at 9:51 AM Lukasz Cwik wrote: >> >> ...

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Ahmet Altay
On Thu, May 2, 2019 at 9:29 AM Robert Bradshaw wrote: > On Thu, May 2, 2019 at 6:03 PM Michael Luckey wrote: > > > > Yes, I understood this. But I m personally more paranoid about releasing. > > > > So formally vote (and corresponding testing) was done on rc. If we > rebuild and resign,

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Udi Meiri
Congrats everyone! On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: > Congratulations! > > On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: > >> Congratulations! Well deserved! >> >> On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: >> >>> Congratulations! >>> >>> >>> -Rui >>> >>> On Thu, May 2,

Re: cancel job

2019-05-02 Thread Lukasz Cwik
+u...@beam.apache.org On Thu, May 2, 2019 at 9:51 AM Lukasz Cwik wrote: > ... build pipeline ... > pipeline_result = p.run() > if job_taking_too_long: > pipeline_result.cancel() > > Python: >

Re: cancel job

2019-05-02 Thread Lukasz Cwik
... build pipeline ... pipeline_result = p.run() if job_taking_too_long: pipeline_result.cancel() Python: https://github.com/apache/beam/blob/95d0ac5e5cb59fd0c6a8a4861a38a7087a6c46b5/sdks/python/apache_beam/runners/runner.py#L372 Java:

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ahmet Altay
Congratulations! On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: > >> Congratulations! >> >> >> -Rui >> >> On Thu, May 2, 2019 at 8:23 AM Michael Luckey >> wrote: >> >>> Congrats! Well deserved! >>> >>> On

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Yifan Zou
Congratulations! Well deserved! On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: > Congratulations! > > > -Rui > > On Thu, May 2, 2019 at 8:23 AM Michael Luckey wrote: > >> Congrats! Well deserved! >> >> On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko >> wrote: >> >>> Congrats! >>> >>> On 2 May

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Lukasz Cwik
On Thu, May 2, 2019 at 7:20 AM Robert Bradshaw wrote: > On Sat, Apr 27, 2019 at 1:14 AM Lukasz Cwik wrote: > > > > We should stick with URN + payload + artifact metadata[1] where the only > mandatory one that all SDKs and expansion services understand is the > "bytes" artifact type. This allows

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Rui Wang
Congratulations! -Rui On Thu, May 2, 2019 at 8:23 AM Michael Luckey wrote: > Congrats! Well deserved! > > On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko > wrote: > >> Congrats! >> >> On 2 May 2019, at 10:06, Gleb Kanterov wrote: >> >> Congratulations! Well deserved! >> >> On Thu, May 2,

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Robert Bradshaw
On Thu, May 2, 2019 at 6:03 PM Michael Luckey wrote: > > Yes, I understood this. But I m personally more paranoid about releasing. > > So formally vote (and corresponding testing) was done on rc. If we rebuild > and resign, wouldn't that mean we also need to revote? Yeah, that's the sticking

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Michael Luckey
Yes, I understood this. But I m personally more paranoid about releasing. So formally vote (and corresponding testing) was done on rc. If we rebuild and resign, wouldn't that mean we also need to revote? If I understand correctly, there will be some changed version string in distributed sources

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-05-02 Thread Kenneth Knowles
The issue has been discussed for a full month, with no objections. I'd call that lazy consensus. And since you have found a way to be backwards compatible, it doesn't even have to impact docs or scripts. This is great. Kenn On Thu, May 2, 2019 at 8:43 AM Michael Luckey wrote: > Hi, > > after

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
Pulling out the relevant pypi bit w.r.t. RCs: > - Release candidates, nightly or snapshots need to be clearly tagged as > pre-release on https://pypi.org/project/apache/#history > - The latest version should not point to an artefact containing unapproved > code e.g. to a release candidate or

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
Ah, and here's one on general@incubator specifically about RCs: https://lists.apache.org/thread.html/c4afcf0807d71f844d912a7e5fe6b481f0779bdcf88ccf9abe50a160@%3Cgeneral.incubator.apache.org%3E Kenn On Thu, May 2, 2019 at 8:49 AM Kenneth Knowles wrote: > I'd suggest looking for experience

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
I'd suggest looking for experience beyond Beam and Airflow. I don't see links to some relevant threads. Here's one from legal-discuss@ about binary channels and how they relate to source releases:

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-05-02 Thread Michael Luckey
Hi, after implementing the required changes to switch from the current flat Gradle project structure to the hierarchical represented by the folder hierarchy I propose to merge the changes [1] after cut of next release branch (which is scheduled around May, 8th.) Does anyone have any concerns or

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Robert Bradshaw
On Thu, May 2, 2019 at 5:24 PM Michael Luckey wrote: > > Thanks Ahmet for calling out to the airflow folks. I believe, I am able to > follow their argument. So from my point of view I do not have an issue with > apache policy. But honestly still trying to wrap my head around Roberts > concern

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Michael Luckey
Thanks Ahmet for calling out to the airflow folks. I believe, I am able to follow their argument. So from my point of view I do not have an issue with apache policy. But honestly still trying to wrap my head around Roberts concern with rebuilding/resigning. Currently, our actual release is only a

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Michael Luckey
Congrats! Well deserved! On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko wrote: > Congrats! > > On 2 May 2019, at 10:06, Gleb Kanterov wrote: > > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía wrote: > >> Congrats everyone ! >> >> On Thu, May 2, 2019 at 9:14

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Jozef Vilcek
On Thu, May 2, 2019 at 3:41 PM Maximilian Michels wrote: > Thanks for the JIRA issues Jozef! > > > So the feature in Flink is operator chaining and Flink per default > initiate copy of input elements. In case of Beam coders copy seems to be > more noticable than native Flink. > > Copying between

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-02 Thread Maximilian Michels
Thanks for all the work Kenn. The new JIRA workflow is much better than the old label-based. -Max On 01.05.19 21:27, Kenneth Knowles wrote: Yes, new issues should have that status. And a correction: it is "Triage Needed" On Wed, May 1, 2019, 11:39 Pablo Estrada >

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Robert Bradshaw
On Sat, Apr 27, 2019 at 1:14 AM Lukasz Cwik wrote: > > We should stick with URN + payload + artifact metadata[1] where the only > mandatory one that all SDKs and expansion services understand is the "bytes" > artifact type. This allows us to add optional URNs for file://, http://, > Maven,

Re: Fwd: Your application for Season of Docs 2019 was unsuccessful

2019-05-02 Thread Maximilian Michels
Aw, too bad. Next time. I hope we can extend the docs for portability before next year :) On 02.05.19 00:30, Pablo Estrada wrote: Hello all, as you may already know, unfortunately our application for Season of Docs was not successful. That's too bad : ) - but it's good that we were able to

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Maximilian Michels
Thanks for the JIRA issues Jozef! So the feature in Flink is operator chaining and Flink per default initiate copy of input elements. In case of Beam coders copy seems to be more noticable than native Flink. Copying between chained operators can be turned off in the FlinkPipelineOptions

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Alexey Romanenko
Congrats! > On 2 May 2019, at 10:06, Gleb Kanterov wrote: > > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía > wrote: > Congrats everyone ! > > On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw > wrote: >

Re: Custom shardingFn for FileIO

2019-05-02 Thread Reuven Lax
Great, let me know when to take another look at the PR! Reuven On Wed, May 1, 2019 at 6:47 AM Jozef Vilcek wrote: > That coder is added extra as a re-map stage from "original" key to new > ShardAwareKey ... But pipeline might get broken I guess. > Very fair point. I am having a second thought

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Robert Bradshaw
Thanks for filing those. As for how not doing a copy is "safe," it's not really. Beam simply asserts that you MUST NOT mutate your inputs (and direct runners, which are used during testing, do perform extra copies and checks to catch violations of this requirement). On Thu, May 2, 2019 at 1:02

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Jozef Vilcek
I have created https://issues.apache.org/jira/browse/BEAM-7204 https://issues.apache.org/jira/browse/BEAM-7206 to track these topics further On Wed, May 1, 2019 at 1:24 PM Jozef Vilcek wrote: > > > On Tue, Apr 30, 2019 at 5:42 PM Kenneth Knowles wrote: > >> >> >> On Tue, Apr 30, 2019, 07:05

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-02 Thread Robert Bradshaw
On Wed, May 1, 2019 at 8:09 PM Kenneth Knowles wrote: > > On Wed, May 1, 2019 at 8:51 AM Reuven Lax wrote: >> >> ValueState is not necessarily racy if you're doing a read-modify-write. >> It's only racy if you're doing something like writing last element seen. > > Race conditions are not

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Gleb Kanterov
Congratulations! Well deserved! On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía wrote: > Congrats everyone ! > > On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw > wrote: > >> Congratulation, and thanks for all the great contributions each one of >> you has made to Beam! >> >> On Thu, May 2, 2019

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ismaël Mejía
Congrats everyone ! On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw wrote: > Congratulation, and thanks for all the great contributions each one of you > has made to Beam! > > On Thu, May 2, 2019 at 5:51 AM Ruoyun Huang wrote: > >> Congratulations everyone! Well deserved! >> >> On Wed, May 1,

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Robert Bradshaw
Congratulation, and thanks for all the great contributions each one of you has made to Beam! On Thu, May 2, 2019 at 5:51 AM Ruoyun Huang wrote: > Congratulations everyone! Well deserved! > > On Wed, May 1, 2019 at 8:38 PM Kenneth Knowles wrote: > >> Congrats! All well deserved! >> >> Kenn >>

cancel job

2019-05-02 Thread Chaim Turkel
Hi, I have a batch job that should run for about 40 minutes. There are times that it can run for hours, and i don't know why. I need the option to cancel the job if it runs for more than x minutes. I can do this from the gui or the gcloud cli. Is there an api code that i can do this