Re: [review?] WordCount in Kotlin

2019-04-04 Thread Jean-Baptiste Onofré
Thanks for the update Pablo. I will try to take a look during the week end. Regards JB On 04/04/2019 23:16, Pablo Estrada wrote: > Hello all, > as community member has been very kind to contribute a Kotlin > translation of the WordCount pipeline[1]. The documentation, tests, and > gradle

Re: Projects Can Apply Individually for Google Season of Docs

2019-04-04 Thread Aizhamal Nurmamat kyzy
Hello everyone, As the ASF announced that each project can apply for Season of Docs individually, I would like to volunteer to be one of the administrators for the program. Is this fine for everyone in the community? If so, I will start working on application on behalf of Beam this week, and I

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Heejong Lee
Robert, does nested/unnested context work properly for Java? I can see that the Context is fixed to NESTED[1] and the encode method with the Context parameter is marked as deprecated[2]. [1]:

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Bradshaw
I don't know why there are two separate copies of standard_coders.yaml--originally there was just one (though it did live in the Python directory). I'm guessing a copy was made rather than just pointing both to the new location, but that completely defeats the point. I can't seem to access JIRA

Re: Unexpected TestStream behavior when testing stateful DoFn

2019-04-04 Thread Pablo Estrada
I saw similar issues. I'll try to debug this tomorrow. It'll take some time to study the code, so we'll see. Assigning the issue to me. On Fri, Mar 29, 2019 at 6:43 AM Steve Niemitz wrote: > This reminds me of a bug I had filed for the direct runner a few weeks > ago, except I was running into

[review?] WordCount in Kotlin

2019-04-04 Thread Pablo Estrada
Hello all, as community member has been very kind to contribute a Kotlin translation of the WordCount pipeline[1]. The documentation, tests, and gradle structure for it is very good, so I am happy to merge, but since this code will become our first Kotlin "documentation"/entrypoint, I wanted to be

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Burke
My 2cents is that the "Textual description" should be part of the documentation of the URNs on the Proto messages, since that's the common place. I've added a short description for the varints for example, and we already have lenghthier format & protocol descriptions there for iterables and

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
On Thu, Apr 4, 2019 at 1:49 PM Robert Burke wrote: > We should probably move the "java" version of the yaml file [1] to a > common location rather than deep in the java hierarchy, or copying it for > Go and Python, but that can be a separate task. It's probably non-trivial > since it looks like

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
On Thu, Apr 4, 2019 at 1:48 PM Kenneth Knowles wrote: > I have to actually say that a collection of test cases is not a definition > of a format. It is one of the pieces, and the other one is a textual > description in a prominent, discoverable place. > A reference implementation can also serve

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Burke
We should probably move the "java" version of the yaml file [1] to a common location rather than deep in the java hierarchy, or copying it for Go and Python, but that can be a separate task. It's probably non-trivial since it looks like it's part of a java resources structure. Luke, the Go SDK

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
I have to actually say that a collection of test cases is not a definition of a format. It is one of the pieces, and the other one is a textual description in a prominent, discoverable place. Kenn On Thu, Apr 4, 2019 at 1:28 PM Lukasz Cwik wrote: > > > On Thu, Apr 4, 2019 at 1:15 PM Chamikara

Re: Beam contribution

2019-04-04 Thread Lukasz Cwik
I looked at the failures you were experiencing and the error message doesn't provide enough information to figure out why it is failing. On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai wrote: > Oh, I just missed it then :) > Thank you Lukasz for connecting us. > > Yeah, the two TimerReceiverTest

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Lukasz Cwik
On Thu, Apr 4, 2019 at 1:15 PM Chamikara Jayalath wrote: > > > On Thu, Apr 4, 2019 at 12:15 PM Lukasz Cwik wrote: > >> standard_coders.yaml[1] is where we are currently defining these formats. >> Unfortunately the Python SDK has its own copy[2]. >> > > Ah great. Thanks for the pointer. Any idea

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Chamikara Jayalath
On Thu, Apr 4, 2019 at 12:15 PM Lukasz Cwik wrote: > standard_coders.yaml[1] is where we are currently defining these formats. > Unfortunately the Python SDK has its own copy[2]. > Ah great. Thanks for the pointer. Any idea why there's a separate copy for Python ? I didn't see a significant

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Lukasz Cwik
standard_coders.yaml[1] is where we are currently defining these formats. Unfortunately the Python SDK has its own copy[2]. Here is an example PR[3] that adds the "beam:coder:double:v1" as tests to the Java and Python SDKs to ensure interoperability. Robert Burke, does the Go SDK have a test

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Heejong Lee
On Thu, Apr 4, 2019 at 11:50 AM Chamikara Jayalath wrote: > > > On Thu, Apr 4, 2019 at 11:29 AM Robert Bradshaw > wrote: > >> A URN defines the encoding. >> >> There are (unfortunately) *two* encodings defined for a Coder (defined >> by a URN), the nested and the unnested one. IIRC, in both

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Chamikara Jayalath
On Thu, Apr 4, 2019 at 11:29 AM Robert Bradshaw wrote: > A URN defines the encoding. > > There are (unfortunately) *two* encodings defined for a Coder (defined > by a URN), the nested and the unnested one. IIRC, in both Java and > Python, the nested one prefixes with a var-int length, and the >

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Bradshaw
A URN defines the encoding. There are (unfortunately) *two* encodings defined for a Coder (defined by a URN), the nested and the unnested one. IIRC, in both Java and Python, the nested one prefixes with a var-int length, and the unnested one does not. We should define the spec clearly and have

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Pablo Estrada
Could this be a backwards-incompatible change that would break pipelines from upgrading? If they have data in-flight in between operators, and we change the coder, they would break? I know very little about coders, but since nobody has mentioned it, I wanted to make sure we have it in mind. -P.

Re: test_split_crazy_sdf broken in python presubmit. 'DataInputOperation' object has no attribute 'index'

2019-04-04 Thread Pablo Estrada
FWIW I have seen this being flaky in a couple different PRs, and passing on the second PreCommit run. I don't know if it's due to the unhealthy worker machines, or if it's actually flaky. Best -P. On Thu, Apr 4, 2019 at 10:03 AM Lukasz Cwik wrote: > I think its a Jenkins executor issue because

Re: ParDo Execution Time stat is always 0

2019-04-04 Thread Mikhail Gryzykhin
Hi everyone, Quick summary on python and Dataflow Runner: Python SDK already reports: - MSec - User metrics (int64 and distribution) - PCollection Element Count - Work on MeanByteCount for pcollection is ongoing here . Dataflow Runner: - all metrics

Re: ParDo Execution Time stat is always 0

2019-04-04 Thread Pablo Estrada
Hello guys! Alex, Mikhail and Ryan are working on support for metrics in the portability framework. The support on the SDK is pretty advanced AFAIK*, and the next step is to get the metrics back into the runner. Lukazs and myself are working on a project that depends on this too, so I'm adding

Re: Changes in Beam Jenkins Agents

2019-04-04 Thread Yifan Zou
Great! Thank you, Lukasz! On Thu, Apr 4, 2019 at 3:10 AM Łukasz Gajowy wrote: > I verified load tests and IO performance tests jobs. Looking good. Thanks > for doing this! > > Łukasz > > > > czw., 4 kwi 2019 o 03:35 Yifan Zou napisał(a): > >> Hi, >> >> Our Jenkins are in a bad condition. 8

Re: test_split_crazy_sdf broken in python presubmit. 'DataInputOperation' object has no attribute 'index'

2019-04-04 Thread Lukasz Cwik
I think its a Jenkins executor issue because https://github.com/apache/beam/pull/8217 passed its test just now. On Thu, Apr 4, 2019 at 10:02 AM Lukasz Cwik wrote: > I have tried running this test at head locally and have not gotten this > failure because I also had a different failure related

Re: test_split_crazy_sdf broken in python presubmit. 'DataInputOperation' object has no attribute 'index'

2019-04-04 Thread Lukasz Cwik
I have tried running this test at head locally and have not gotten this failure because I also had a different failure related to the .with_complete method not being available. I'm not yet sure whether this is a Jenkins executor issue or an actual code issue. On Thu, Apr 4, 2019 at 9:17 AM Alex

Re: Hazelcast Jet Runner - validation tests

2019-04-04 Thread Lukasz Cwik
The issue with unbounded tests that rely on triggers/late data/early firings/processing time is that these are several sources of non-determinism. The sources make non-deterministic decisions around when to produce data, checkpoint, and resume and runners make non-deterministic decisions around

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-04 Thread Andrew Pilloud
Sorry for the confusion, I checked JIRA but not pull requests. Please consider the vote for RC1 canceled. I'm going to keep running through all the tests before I cut RC2, so expect that to happen Monday unless there are new issues or pull requests. Andrew On Thu, Apr 4, 2019 at 2:47 AM Ismaël

test_split_crazy_sdf broken in python presubmit. 'DataInputOperation' object has no attribute 'index'

2019-04-04 Thread Alex Amato
https://jira.apache.org/jira/browse/BEAM-7006 https://builds.apache.org/job/beam_PreCommit_Python_Phrase/331/testReport/junit/apache_beam.runners.portability.fn_api_runner_test/FnApiRunnerSplitTest/test_split_crazy_sdf_2/ Traceback (most recent call last): File

Re: kafka 0.9 support

2019-04-04 Thread Alexey Romanenko
+1 for that too. > On 4 Apr 2019, at 01:53, Raghu Angadi wrote: > > I mean, +1 for removing support for old Kafka versions after next LTS > > What the cut off should be for 'old' version is can be discussed then. My > choice would be 0.11. > Raghu. > > On Wed, Apr 3, 2019 at 4:36 PM Raghu

Re: Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-04-04 Thread Alexey Romanenko
Mikhail, Thank you for chasing this issue. I added my comments on this Jira. > On 2 Apr 2019, at 19:53, Mikhail Gryzykhin wrote: > > Hi everyone, > > I created BEAM-6974 . This > test and beam-sdks-java-io-cassandra tests fail often in our

Re: [PROPOSAL] commit granularity in master

2019-04-04 Thread Etienne Chauchot
Brain,It is good that you automated commits quality checks, thanks. But it don't agree with reducing the commit history of a PR to only one commit, I was just referring about meaningless commits such as fixup, checktyle, spotless ... I prefer not to squash everything and only squash meaningless

Hazelcast Jet Runner - validation tests

2019-04-04 Thread Jozsef Bartok
Hi. My name is Jozsef, I've been working on Runners based on Hazelcast Jet. Plural because we have both an "old-style" and a "portable" Runner in development (https://github.com/hazelcast/hazelcast-jet-beam-runner). While our portable one isn't even functional yet, the "old-style" type of Runner

Re: Changes in Beam Jenkins Agents

2019-04-04 Thread Łukasz Gajowy
I verified load tests and IO performance tests jobs. Looking good. Thanks for doing this! Łukasz czw., 4 kwi 2019 o 03:35 Yifan Zou napisał(a): > Hi, > > Our Jenkins are in a bad condition. 8 agents are down at this time, and > they are not going to be restored because of some bad errors

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-04 Thread Ismaël Mejía
> I suggest keeping the bug open until the cherry-pick is complete. That makes > tracking the burndown easier and is more accurate treatment of Fix Version. Yes, sorry for that mistake, I just systematically resolve issues when merged so I did as well for those issues instead of waiting until