Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Raghu Angadi
This is terrific news! Thanks Charles. On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen wrote: > Local execution of Beam pipelines on the Python DirectRunner currently > suffers from performance issues, which makes it hard for pipeline authors > to iterate, especially on medium to

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Eugene Kirpichov
Do we have a release validation spreadsheet for this one? On Thu, Feb 8, 2018 at 9:30 AM Ahmet Altay wrote: > +1 > > I verified python quick start, mobile gaming examples, streaming on Direct > and Dataflow runners. Thank you JB! > > On Thu, Feb 8, 2018 at 2:27 AM, Romain

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Ismaël Mejía
Sounds impressive, and with the extra portability stuff, great ! Worth the switch just for he user experience improvement. On Thu, Feb 8, 2018 at 5:52 PM, Robert Bradshaw wrote: > This is going to be a great improvement for our users! I'll take a > look at the pull request.

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Eugene Kirpichov
Sounds awesome, congratulations and thanks for making this happen! On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi wrote: > This is terrific news! Thanks Charles. > > On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen wrote: > >> Local execution of Beam pipelines on

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread María García Herrero
Amazing improvement, Charles. Thanks for the effort! On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov wrote: > Sounds awesome, congratulations and thanks for making this happen! > > On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi wrote: > >> This is

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Valentyn Tymofieiev
Yes (thanks Kenn!): https://s.apache.org/beam-2.3.0-release-validation On Thu, Feb 8, 2018 at 10:14 AM, Eugene Kirpichov wrote: > Do we have a release validation spreadsheet for this one? > > On Thu, Feb 8, 2018 at 9:30 AM Ahmet Altay wrote: > >> +1 >>

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Robert Bradshaw
This is going to be a great improvement for our users! I'll take a look at the pull request. On Wed, Feb 7, 2018 at 7:03 PM, Kenneth Knowles wrote: > Nice! > > On Wed, Feb 7, 2018 at 6:45 PM, Charles Chen wrote: >> >> The existing DirectRunner will be needed

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Henning Rohde
Awesome! Well done, Charles. On Thu, Feb 8, 2018 at 9:10 AM, Ismaël Mejía wrote: > Sounds impressive, and with the extra portability stuff, great ! > Worth the switch just for he user experience improvement. > > On Thu, Feb 8, 2018 at 5:52 PM, Robert Bradshaw

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Romain Manni-Bucau
Very interesting! Sounds like a sane way for beam future and I'm very happy it is consistent with the current Java experience: no need to interlace runners at the end, it makes design, code and user experience way better than trying to put everything in the direct runner :). Le 8 févr. 2018

Re: dependencies.txt in META-INF?

2018-02-08 Thread Lukasz Cwik
It is unfortunate that setting Class-Path is so broken. On Wed, Feb 7, 2018 at 10:55 PM, Romain Manni-Bucau wrote: > Not really: > 1. I need to have the gav > 2. Please never set Class-Path of the manifest. It leads to broken runtime > in most environments :(. > > > Le 8

Re: dependencies.txt in META-INF?

2018-02-08 Thread Romain Manni-Bucau
Was too much abused by libs and not supported everywhere :( Le 8 févr. 2018 22:39, "Lukasz Cwik" a écrit : > It is unfortunate that setting Class-Path is so broken. > > On Wed, Feb 7, 2018 at 10:55 PM, Romain Manni-Bucau > wrote: > >> Not really: >> 1.

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Lukasz Cwik
I validated several of the quickstarts and updated the spreadsheet and currently am voting -1 for this release due to Spark runner failing. Filed https://issues.apache.org/jira/browse/BEAM-3668 with the full details. On Thu, Feb 8, 2018 at 10:32 AM, Valentyn Tymofieiev

Re: [jira] [Commented] (BEAM-2591) Python shim for submitting to FlinkRunner

2018-02-08 Thread Robert Bradshaw
FYI, either way, this is probably just adding another case to https://github.com/apache/beam/blob/c2a088339831c00ce021bd0d3197c3e84b739ede/sdks/python/apache_beam/runners/portability/universal_local_runner.py#L79 On Thu, Feb 8, 2018 at 3:54 PM, Ben Sidhom (JIRA) wrote: > > [

Re: Accessible and Stateful Sink

2018-02-08 Thread Lukasz Cwik
Based on this description, it seems like druid sinks have to be fault tolerant. I was hoping that they didn't need to be and as soon as they wrote some information to druid then you would be able to crash and druid only used the sinks as an optimization for unindexed data. In your case it seems

Portable Flink Runner plan

2018-02-08 Thread Ben Sidhom
Hey all, We're working on getting the portability framework plumbed through the Flink runner. The first iteration will likely only support batch and will be limited in its deployment flexibility, but hopefully it shouldn't be too painful to expand

Re: Accessible and Stateful Sink

2018-02-08 Thread Charles Allen
This is very insightful, thank you. I'm going to share it around with a few other key folks to see how viable a query-able indexing process would be using idiomatic Beam stuff. Cheers, Charles Allen On Thu, Feb 8, 2018 at 6:13 PM Lukasz Cwik wrote: > Actually missed one step.

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Romain Manni-Bucau
+1 (non-binding), thanks JB for the effort! Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn |

Jenkins build is back to normal : beam_Release_NightlySnapshot #679

2018-02-08 Thread Apache Jenkins Server
See

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Ismaël Mejía
+1 (binding) Validated SHAs + tag vs source.zip file. Run mvn clean install -Prelease OK Validated that the 3 regressions reported for RC1 were fixed. Run Nexmark on Direct/Flink runner on local mode, no regressions now. Installed python version on virtualenv and run wordcount with success. On

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Chamikara Jayalath
On Thu, Feb 8, 2018 at 10:18 PM Jean-Baptiste Onofré wrote: > It means a RC3 then. > > Basically, we have two options: > > 1. I cancel RC2, to include PR 4645 and cut a RC3. It can be done super > fast > (today). > +1 for option 1 since IMO we should not release with

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Jean-Baptiste Onofré
Is it specific to this release ? I think it was like this before no ? Regards JB On 02/09/2018 12:48 AM, Kenneth Knowles wrote: > Since root cause is https://issues.apache.org/jira/browse/BEAM-3519 I marked > it > a blocker so we can discuss fixes or workarounds there. > > On Thu, Feb 8, 2018

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Jean-Baptiste Onofré
It means a RC3 then. Basically, we have two options: 1. I cancel RC2, to include PR 4645 and cut a RC3. It can be done super fast (today). 2. We continue RC2 vote and we add a note about shading (as I did for the TextIO issue with Flink runner). I'm more in favor of 1 as the fix is already

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Romain Manni-Bucau
since it breaks only examples not sure it does worth yet another reroll (which means already a 2 weeks delay on the plan). Users will be affected the same anyway - and in an expected way until beam handles classloaders per transform. A note in the side is fine probably. Romain Manni-Bucau

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Kenneth Knowles
Since root cause is https://issues.apache.org/jira/browse/BEAM-3519 I marked it a blocker so we can discuss fixes or workarounds there. On Thu, Feb 8, 2018 at 1:24 PM, Lukasz Cwik wrote: > I validated several of the quickstarts and updated the spreadsheet and > currently am

Re: [jira] [Commented] (BEAM-2591) Python shim for submitting to FlinkRunner

2018-02-08 Thread Robert Bradshaw
>From the Python point of view, I think it makes sense to configure the "shim" runner with either (1) a command to run to launch and tear down the service or (2) an already existing endpoint. We could make (1) more convenient by providing some ready-made commands in some namespace that use the

Re: Accessible and Stateful Sink

2018-02-08 Thread Lukasz Cwik
Actually missed one step. For each bundle: 1) If not owner of the key: a) Read in all prior state stuff into memory and cache it. b) De-register any prior druid sink that owned that key. c) Register it self as the owner of that key 3) Garbage collect anything in state/memory that is

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Ahmet Altay
+1 I verified python quick start, mobile gaming examples, streaming on Direct and Dataflow runners. Thank you JB! On Thu, Feb 8, 2018 at 2:27 AM, Romain Manni-Bucau wrote: > +1 (non-binding), thanks JB for the effort! > > > Romain Manni-Bucau > @rmannibucau

Re: [VOTE] Release 2.3.0, release candidate #2

2018-02-08 Thread Romain Manni-Bucau
IMHO it is not a blocker but an incompatibility between spark and some IO stack. Trivial workaround is to shade the io before importing it in its project. Amternative is to wrap IO in custom classloaders. Didnt check for this one but it is a common beam issue to have conflicts between runners/io

Build failed in Jenkins: beam_PostRelease_NightlySnapshot #21

2018-02-08 Thread Apache Jenkins Server
See Changes: [mairbek] Update cloud spanner library to 0.29.0 [mairbek] Fix test [mairbek] More google-cloud-platform whitelisting [mairbek] pom updates to make maven happy [mairbek] Update