Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Reuven Lax
Could this be related to any of the portability changes? On Mon, Feb 5, 2018 at 7:51 AM, Jean-Baptiste Onofré wrote: > Created: > > https://issues.apache.org/jira/browse/BEAM-3617 > > Regards > JB > > On 02/05/2018 04:42 PM, Kenneth Knowles wrote: > > What is the Jira for

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Reuven Lax
On Mon, Feb 5, 2018 at 9:06 PM, Kenneth Knowles wrote: > Joining late, but very interested. Commented on the doc. Since there's a > forked discussion between doc and thread, I want to say this on the thread: > > 1. I have used JSON schema in production for describing the

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Romain Manni-Bucau
I would add a use case: single serialization mecanism accross a pipeline. JSON allows to handle generic records (JsonObject) as well as POJO serialization and both are compatible. Compared to avro built-in mecanism, it is not intrusive in the models which is a key feature of an API. It also

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Jean-Baptiste Onofré
Hi, quick update about RC2: BEAM-3617 is the only Jira pending for the release. I'm doing a git bisect to identify the commit who caused the performance degradation. Depending of the result, if it's a easy fix than we will try to do it for RC2, else I will start the RC2 as now. I will keep you

Re: [DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Jean-Baptiste Onofré
Yes, I don't consider this as blocker, and I think it fails for a while ;) Regards JB On 02/06/2018 06:22 AM, Kenneth Knowles wrote: > Thanks JB (and Alexey)! If you are confident the failures are not release > blockers, then that's great. > > Kenn > > On Mon, Feb 5, 2018 at 9:17 PM,

Re: [DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Kenneth Knowles
Thanks JB (and Alexey)! If you are confident the failures are not release blockers, then that's great. Kenn On Mon, Feb 5, 2018 at 9:17 PM, Jean-Baptiste Onofré wrote: > Hi Kenn, > > I have Alexey in my team who started to work on the write part of > KinesisIO. I > will ask

Re: [DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Jean-Baptiste Onofré
Hi Kenn, I have Alexey in my team who started to work on the write part of KinesisIO. I will ask him to work on those issues. I think it's OK to keep KinesisIO in the distribution and work on it in the mean time. Regards JB On 02/06/2018 04:20 AM, Kenneth Knowles wrote: > The flaking of

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Kenneth Knowles
Joining late, but very interested. Commented on the doc. Since there's a forked discussion between doc and thread, I want to say this on the thread: 1. I have used JSON schema in production for describing the structure of analytics events and it is OK but not great. If you are sure your data is

Re: [DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Kenneth Knowles
I believe these are all unit tests running locally. The failures generally look more like incorrect results than service problems. On Mon, Feb 5, 2018 at 7:36 PM, Reuven Lax wrote: > Do these tests run locally, or are they contacting an actual Kinesis > service? > > On Mon,

Re: [DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Reuven Lax
Do these tests run locally, or are they contacting an actual Kinesis service? On Mon, Feb 5, 2018 at 7:20 PM, Kenneth Knowles wrote: > The flaking of KinesisIO on both Maven and Gradle executions has become > very bad. Multiple methods are flaky, and we've collected these >

[DISCUSS] What to do about widespread KinesisIO breakage

2018-02-05 Thread Kenneth Knowles
The flaking of KinesisIO on both Maven and Gradle executions has become very bad. Multiple methods are flaky, and we've collected these Critical-severity tickets: (looks like https://issues.apache.org/jira/browse/BEAM-3228 is fixed?) https://issues.apache.org/jira/browse/BEAM-3317

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-02-05 Thread Raghu Angadi
Hi Sushil, That is expected behavior. If you don't have any saved checkpoint, the pipeline would start from scratch. It does not have any connection to previous run. On Thu, Feb 1, 2018 at 1:29 AM, Sushil Ks wrote: > Hi, >Apologies for delay in my reply, > > @Raghu

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Romain Manni-Bucau
None, Json-p - the spec so no strong impl requires - as record API and a custom light wrapping for schema - like https://github.com/Talend/component-runtime/blob/master/component-form/component-form-model/src/main/java/org/talend/sdk/component/form/model/jsonschema/JsonSchema.java (note this code

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Reuven Lax
Which json library are you thinking of? At least in Java, there's always been a problem of no good standard Json library. On Mon, Feb 5, 2018 at 12:03 PM, Romain Manni-Bucau wrote: > > > Le 5 févr. 2018 19:54, "Reuven Lax" a écrit : > > multiplying by

Re: coder evolutions?

2018-02-05 Thread Romain Manni-Bucau
Does it mean we would change the implicit resolution? Do you see it being backward compatible? If so sounds a good solution. Le 5 févr. 2018 20:36, "Kenneth Knowles" a écrit : > TL;DR: create _new_ coders is not a problem. If you have a new idea for an > encoding, you can build

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Romain Manni-Bucau
Le 5 févr. 2018 19:54, "Reuven Lax" a écrit : multiplying by 1.0 doesn't really solve the right problems. The number type used by Javascript (and by extension, they standard for json) only has 53 bits of precision. I've seen many, many bugs caused because of this - the input

Re: coder evolutions?

2018-02-05 Thread Kenneth Knowles
TL;DR: create _new_ coders is not a problem. If you have a new idea for an encoding, you can build it alongside and users can use it. We also need data migration, and this is probably the easy way to be ready for that. We made a pretty big mistake in our naming of ListCoder, SetCoder, and

Re: coder evolutions?

2018-02-05 Thread Eugene Kirpichov
>From a brief reading of this discussion: if I understand correctly, we want something to help deal with libraries that assume that they own the stream (e.g. some common xml or json parsers), when using them in a context where they don't (inside a Coder). Setting aside the questions of "why would

Re: coder evolutions?

2018-02-05 Thread Robert Bradshaw
Just to clarify, the issue is that for some types (byte array being the simplest) one needs to know the length of the data in order to decode it from the stream. In particular, the claim is that many libraries out there that do encoding/decoding assume they can gather this information from the end

Re: Schema-Aware PCollections revisited

2018-02-05 Thread Reuven Lax
multiplying by 1.0 doesn't really solve the right problems. The number type used by Javascript (and by extension, they standard for json) only has 53 bits of precision. I've seen many, many bugs caused because of this - the input data may easily contain numbers too large for 53 bits. In addition,

Re: coder evolutions?

2018-02-05 Thread Raghu Angadi
Could you describe 2nd issue bit more in detail may be with a short example? LengthAwareCoder in the PR adds another buffer copy.. (BufferedElementCountingOutputStream already has extra buffer copy). On Mon, Feb 5, 2018 at 10:34 AM, Romain Manni-Bucau wrote: > Would this

Re: coder evolutions?

2018-02-05 Thread Romain Manni-Bucau
Would this work for everyone - can update the pr if so: If coder is not built in Prefix with byte size Else Current behavior ? Le 5 févr. 2018 19:21, "Romain Manni-Bucau" a écrit : > Answered inlined but I want to highlight beam is a portable API on top of >

Re: coder evolutions?

2018-02-05 Thread Romain Manni-Bucau
Answered inlined but I want to highlight beam is a portable API on top of well known vendors API which have friendly shortcuts. So the background here is to make beam at least user friendly. Im fine if the outcome of the discussion is coder concept is wrong or something like that but Im not fine

Re: coder evolutions?

2018-02-05 Thread Robert Bradshaw
On Sun, Feb 4, 2018 at 6:44 AM, Romain Manni-Bucau wrote: > Hi guys, > > I submitted a PR on coders to enhance 1. the user experience 2. the > determinism and handling of coders. > > 1. the user experience is linked to what i sent some days ago: close > handling of the

Re: coder evolutions?

2018-02-05 Thread Lukasz Cwik
I do agree that being able to upgrade the encoding for coders between pipelines is important and thanks for creating BEAM-3616. Marking/reset for a coder can only be supported by either the root coder or every leaf coder in a coder tree unless you wrap each layer with a byte copying stream. If

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Jean-Baptiste Onofré
Created: https://issues.apache.org/jira/browse/BEAM-3617 Regards JB On 02/05/2018 04:42 PM, Kenneth Knowles wrote: > What is the Jira for direct runner perf? > > On Mon, Feb 5, 2018 at 4:35 AM, Jean-Baptiste Onofré > wrote: > > Thanks ! > >

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Jean-Baptiste Onofré
Hi Kenn, my bad, I didn't create one yet (I was busy on the TextIO with flink runner, now identify \o/ ;)). Let me create it right now. Thanks ! Regards JB On 02/05/2018 04:42 PM, Kenneth Knowles wrote: > What is the Jira for direct runner perf? > > On Mon, Feb 5, 2018 at 4:35 AM,

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Kenneth Knowles
What is the Jira for direct runner perf? On Mon, Feb 5, 2018 at 4:35 AM, Jean-Baptiste Onofré wrote: > Thanks ! > > I cherry-pick on release-2.3.0 branch. > > I'm on the direct runner perf test in the mean time. > > Thanks again ! > > Regards > JB > > On 02/05/2018 12:06 PM,

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Jean-Baptiste Onofré
Thanks ! I cherry-pick on release-2.3.0 branch. I'm on the direct runner perf test in the mean time. Thanks again ! Regards JB On 02/05/2018 12:06 PM, Aljoscha Krettek wrote: > I merged fixes for: >  - https://issues.apache.org/jira/browse/BEAM-3186 >  - 

Re: [CANCEL][VOTE] Release 2.3.0, release candidate #1

2018-02-05 Thread Aljoscha Krettek
I merged fixes for: - https://issues.apache.org/jira/browse/BEAM-3186 - https://issues.apache.org/jira/browse/BEAM-3589 @JB I didn't yet merge them on the 2.3.0 branch, though, but I can or you

Build failed in Jenkins: beam_PostRelease_NightlySnapshot #17

2018-02-05 Thread Apache Jenkins Server
See Changes: [klk] google-java-format [klk] Fix empty window assignments in Nexmark [klk] Fix empty window assignment in FlattenEvaluatorFactoryTest [klk] Switch DataflowRunner to its own

Re: coder evolutions?

2018-02-05 Thread Romain Manni-Bucau
Thanks, created https://issues.apache.org/jira/browse/BEAM-3616 Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github | LinkedIn