date:20190502

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ankur Goenka

Congratulations and thank you for making Beam awesome! *From: *Chamikara Jayalath *Date: *Thu, May 2, 2019, 4:03 PM *To: *dev Congratulations! > > On Thu, May 2, 2019 at 10:28 AM Udi Meiri wrote: > >> Congrats everyone! >> >> On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: >> >>>

Better naming for runner specific options

2019-05-02 Thread Reza Rokni

Hi, Was reading this SO question: https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has And noticed that in https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions The option is

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-02 Thread Kenneth Knowles

Agree with all of your points about the drawbacks of ValueState. It is definitely a pro/con weighing sort of situation. Considering the number of users who are new to the orthogonality of event time and processing time, ValueState could certainly lead to confusion about why things are not in any

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Kenneth Knowles

Meta: All of Beam SQL is still "experimental" isn't it? There's very little chance that the structure of Beam SQL pipelines will be stable enough for e.g. pipeline update. So that is not worth worrying about at this stage. And this doesn't seem to affect APIs / compile time compatibility. As to

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Ahmet Altay

On Thu, May 2, 2019 at 2:18 PM Rui Wang wrote: > Brian's first proposal is challenging also partially because in BeamSQL > there is no good practice to deal with complex SQL plans. Ideally we need > enough rules and SQL plan node in Beam to construct easy-to-transform plans > for different

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Chamikara Jayalath

Congratulations! On Thu, May 2, 2019 at 10:28 AM Udi Meiri wrote: > Congrats everyone! > > On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: > >> Congratulations! >> >> On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: >> >>> Congratulations! Well deserved! >>> >>> On Thu, May 2, 2019 at 9:37

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Rui Wang

Brian's first proposal is challenging also partially because in BeamSQL there is no good practice to deal with complex SQL plans. Ideally we need enough rules and SQL plan node in Beam to construct easy-to-transform plans for different cases. I had a similar situation before when I needed to

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Brian Hulette

Ahmet - I think it would only require observing each key's partition of the input independently, and the size of the state would only be proportional to the number of distinct elements, not the entire input. Note the pipeline would be a GBK with a key based on the GROUP BY, followed by a

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Lukasz Cwik

Can you also go into more detail why you think 1) is more challenging to implement? On Thu, May 2, 2019 at 11:58 AM Ahmet Altay wrote: > From my limited understanding, would not the stateful combinefn option > require observing the whole input before being able combine and the risk of > blowing

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Ahmet Altay

>From my limited understanding, would not the stateful combinefn option require observing the whole input before being able combine and the risk of blowing memory is actually very high except for trivial inputs? On Thu, May 2, 2019 at 11:50 AM Brian Hulette wrote: > Hi everyone, > Currently

[DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Brian Hulette

Hi everyone, Currently BeamSQL does not support DISTINCT aggregations. These are queries like: > SELECT k, SUM(DISTINCT v) FROM t GROUP BY k > SELECT k, k2, COUNT(DISTINCT k2) FROM t GROUP BY k, k2 These are represented in Calcite's logical plan with a distinct flag on aggregation calls, but we

Re: kafka client interoperability

2019-05-02 Thread Lukasz Cwik

+dev On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < richard.moorhe...@cerner.com> wrote: > In Beam 2.9.0, this check was made: > > >

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Maximilian Michels

I am not sure what are you referring to here. What do you mean Kryo is simply slower ... Beam Kryo or Flink Kryo or? Flink uses Kryo as a fallback serializer when its own type serialization system can't analyze the type. I'm just guessing here that this could be slower. On 02.05.19 16:51,

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Maximilian Michels

BTW what are the next steps here ? Heejong or Max, will one of you be able to come up with a detailed proposal around this ? Thank you for all the additional comments and ideas. I will try to capture them in a document and share it here. Of course we can continue the discussion in the

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Ahmet Altay

On Thu, May 2, 2019 at 8:56 AM Kenneth Knowles wrote: > Pulling out the relevant pypi bit w.r.t. RCs: > > >> - Release candidates, nightly or snapshots need to be clearly tagged as >> pre-release on https://pypi.org/project/apache/#history >> - The latest version should not point to an artefact

Re: cancel job

2019-05-02 Thread Chaim Turkel

thanks for the reply, i am using airflow python code to run a java runner, so i do not have the actual pipleine handler, is there a way to get it? chaim On Thu, May 2, 2019 at 7:58 PM Lukasz Cwik wrote: > > +u...@beam.apache.org > > On Thu, May 2, 2019 at 9:51 AM Lukasz Cwik wrote: >> >> ...

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Ahmet Altay

On Thu, May 2, 2019 at 9:29 AM Robert Bradshaw wrote: > On Thu, May 2, 2019 at 6:03 PM Michael Luckey wrote: > > > > Yes, I understood this. But I m personally more paranoid about releasing. > > > > So formally vote (and corresponding testing) was done on rc. If we > rebuild and resign,

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Udi Meiri

Congrats everyone! On Thu, May 2, 2019 at 9:55 AM Ahmet Altay wrote: > Congratulations! > > On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: > >> Congratulations! Well deserved! >> >> On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: >> >>> Congratulations! >>> >>> >>> -Rui >>> >>> On Thu, May 2,

Re: cancel job

2019-05-02 Thread Lukasz Cwik

+u...@beam.apache.org On Thu, May 2, 2019 at 9:51 AM Lukasz Cwik wrote: > ... build pipeline ... > pipeline_result = p.run() > if job_taking_too_long: > pipeline_result.cancel() > > Python: >

Re: cancel job

2019-05-02 Thread Lukasz Cwik

... build pipeline ... pipeline_result = p.run() if job_taking_too_long: pipeline_result.cancel() Python: https://github.com/apache/beam/blob/95d0ac5e5cb59fd0c6a8a4861a38a7087a6c46b5/sdks/python/apache_beam/runners/runner.py#L372 Java:

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ahmet Altay

Congratulations! On Thu, May 2, 2019 at 9:54 AM Yifan Zou wrote: > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: > >> Congratulations! >> >> >> -Rui >> >> On Thu, May 2, 2019 at 8:23 AM Michael Luckey >> wrote: >> >>> Congrats! Well deserved! >>> >>> On

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Yifan Zou

Congratulations! Well deserved! On Thu, May 2, 2019 at 9:37 AM Rui Wang wrote: > Congratulations! > > > -Rui > > On Thu, May 2, 2019 at 8:23 AM Michael Luckey wrote: > >> Congrats! Well deserved! >> >> On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko >> wrote: >> >>> Congrats! >>> >>> On 2 May

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Lukasz Cwik

On Thu, May 2, 2019 at 7:20 AM Robert Bradshaw wrote: > On Sat, Apr 27, 2019 at 1:14 AM Lukasz Cwik wrote: > > > > We should stick with URN + payload + artifact metadata[1] where the only > mandatory one that all SDKs and expansion services understand is the > "bytes" artifact type. This allows

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Rui Wang

Congratulations! -Rui On Thu, May 2, 2019 at 8:23 AM Michael Luckey wrote: > Congrats! Well deserved! > > On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko > wrote: > >> Congrats! >> >> On 2 May 2019, at 10:06, Gleb Kanterov wrote: >> >> Congratulations! Well deserved! >> >> On Thu, May 2,

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Robert Bradshaw

On Thu, May 2, 2019 at 6:03 PM Michael Luckey wrote: > > Yes, I understood this. But I m personally more paranoid about releasing. > > So formally vote (and corresponding testing) was done on rc. If we rebuild > and resign, wouldn't that mean we also need to revote? Yeah, that's the sticking

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Michael Luckey

Yes, I understood this. But I m personally more paranoid about releasing. So formally vote (and corresponding testing) was done on rc. If we rebuild and resign, wouldn't that mean we also need to revote? If I understand correctly, there will be some changed version string in distributed sources

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-05-02 Thread Kenneth Knowles

The issue has been discussed for a full month, with no objections. I'd call that lazy consensus. And since you have found a way to be backwards compatible, it doesn't even have to impact docs or scripts. This is great. Kenn On Thu, May 2, 2019 at 8:43 AM Michael Luckey wrote: > Hi, > > after

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles

Pulling out the relevant pypi bit w.r.t. RCs: > - Release candidates, nightly or snapshots need to be clearly tagged as > pre-release on https://pypi.org/project/apache/#history > - The latest version should not point to an artefact containing unapproved > code e.g. to a release candidate or

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles

Ah, and here's one on general@incubator specifically about RCs: https://lists.apache.org/thread.html/c4afcf0807d71f844d912a7e5fe6b481f0779bdcf88ccf9abe50a160@%3Cgeneral.incubator.apache.org%3E Kenn On Thu, May 2, 2019 at 8:49 AM Kenneth Knowles wrote: > I'd suggest looking for experience

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles

I'd suggest looking for experience beyond Beam and Airflow. I don't see links to some relevant threads. Here's one from legal-discuss@ about binary channels and how they relate to source releases:

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-05-02 Thread Michael Luckey

Hi, after implementing the required changes to switch from the current flat Gradle project structure to the hierarchical represented by the folder hierarchy I propose to merge the changes [1] after cut of next release branch (which is scheduled around May, 8th.) Does anyone have any concerns or

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Robert Bradshaw

On Thu, May 2, 2019 at 5:24 PM Michael Luckey wrote: > > Thanks Ahmet for calling out to the airflow folks. I believe, I am able to > follow their argument. So from my point of view I do not have an issue with > apache policy. But honestly still trying to wrap my head around Roberts > concern

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Michael Luckey

Thanks Ahmet for calling out to the airflow folks. I believe, I am able to follow their argument. So from my point of view I do not have an issue with apache policy. But honestly still trying to wrap my head around Roberts concern with rebuilding/resigning. Currently, our actual release is only a

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Michael Luckey

Congrats! Well deserved! On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko wrote: > Congrats! > > On 2 May 2019, at 10:06, Gleb Kanterov wrote: > > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía wrote: > >> Congrats everyone ! >> >> On Thu, May 2, 2019 at 9:14

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Jozef Vilcek

On Thu, May 2, 2019 at 3:41 PM Maximilian Michels wrote: > Thanks for the JIRA issues Jozef! > > > So the feature in Flink is operator chaining and Flink per default > initiate copy of input elements. In case of Beam coders copy seems to be > more noticable than native Flink. > > Copying between

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-02 Thread Maximilian Michels

Thanks for all the work Kenn. The new JIRA workflow is much better than the old label-based. -Max On 01.05.19 21:27, Kenneth Knowles wrote: Yes, new issues should have that status. And a correction: it is "Triage Needed" On Wed, May 1, 2019, 11:39 Pablo Estrada >

Re: Artifact staging in cross-language pipelines

2019-05-02 Thread Robert Bradshaw

On Sat, Apr 27, 2019 at 1:14 AM Lukasz Cwik wrote: > > We should stick with URN + payload + artifact metadata[1] where the only > mandatory one that all SDKs and expansion services understand is the "bytes" > artifact type. This allows us to add optional URNs for file://, http://, > Maven,

Re: Fwd: Your application for Season of Docs 2019 was unsuccessful

2019-05-02 Thread Maximilian Michels

Aw, too bad. Next time. I hope we can extend the docs for portability before next year :) On 02.05.19 00:30, Pablo Estrada wrote: Hello all, as you may already know, unfortunately our application for Season of Docs was not successful. That's too bad : ) - but it's good that we were able to

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Maximilian Michels

Thanks for the JIRA issues Jozef! So the feature in Flink is operator chaining and Flink per default initiate copy of input elements. In case of Beam coders copy seems to be more noticable than native Flink. Copying between chained operators can be turned off in the FlinkPipelineOptions

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Alexey Romanenko

Congrats! > On 2 May 2019, at 10:06, Gleb Kanterov wrote: > > Congratulations! Well deserved! > > On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía > wrote: > Congrats everyone ! > > On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw > wrote: >

Re: Custom shardingFn for FileIO

2019-05-02 Thread Reuven Lax

Great, let me know when to take another look at the PR! Reuven On Wed, May 1, 2019 at 6:47 AM Jozef Vilcek wrote: > That coder is added extra as a re-map stage from "original" key to new > ShardAwareKey ... But pipeline might get broken I guess. > Very fair point. I am having a second thought

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Robert Bradshaw

Thanks for filing those. As for how not doing a copy is "safe," it's not really. Beam simply asserts that you MUST NOT mutate your inputs (and direct runners, which are used during testing, do perform extra copies and checks to catch violations of this requirement). On Thu, May 2, 2019 at 1:02

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-02 Thread Jozef Vilcek

I have created https://issues.apache.org/jira/browse/BEAM-7204 https://issues.apache.org/jira/browse/BEAM-7206 to track these topics further On Wed, May 1, 2019 at 1:24 PM Jozef Vilcek wrote: > > > On Tue, Apr 30, 2019 at 5:42 PM Kenneth Knowles wrote: > >> >> >> On Tue, Apr 30, 2019, 07:05

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-02 Thread Robert Bradshaw

On Wed, May 1, 2019 at 8:09 PM Kenneth Knowles wrote: > > On Wed, May 1, 2019 at 8:51 AM Reuven Lax wrote: >> >> ValueState is not necessarily racy if you're doing a read-modify-write. >> It's only racy if you're doing something like writing last element seen. > > Race conditions are not

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Gleb Kanterov

Congratulations! Well deserved! On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía wrote: > Congrats everyone ! > > On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw > wrote: > >> Congratulation, and thanks for all the great contributions each one of >> you has made to Beam! >> >> On Thu, May 2, 2019

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Ismaël Mejía

Congrats everyone ! On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw wrote: > Congratulation, and thanks for all the great contributions each one of you > has made to Beam! > > On Thu, May 2, 2019 at 5:51 AM Ruoyun Huang wrote: > >> Congratulations everyone! Well deserved! >> >> On Wed, May 1,

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Robert Bradshaw

Congratulation, and thanks for all the great contributions each one of you has made to Beam! On Thu, May 2, 2019 at 5:51 AM Ruoyun Huang wrote: > Congratulations everyone! Well deserved! > > On Wed, May 1, 2019 at 8:38 PM Kenneth Knowles wrote: > >> Congrats! All well deserved! >> >> Kenn >>

cancel job

2019-05-02 Thread Chaim Turkel

Hi, I have a batch job that should run for about 40 minutes. There are times that it can run for hours, and i don't know why. I need the option to cancel the job if it runs for more than x minutes. I can do this from the gui or the gcloud cli. Is there an api code that i can do this

48 matches

Mail list logo