Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-06 Thread Jan Lukavský
Hi Kenn, that should not be the case. Care was taken to fail streaming pipeline which needs this ability and the runner doesn't support this [1]. It is true, however, that a batch pipeline will not fail, because there is no generic (runner agnostic) way of supporting this transform in batch

Re: Time precision in Python

2020-02-06 Thread Kenneth Knowles
What is an out of order window? On Thu, Feb 6, 2020 at 3:09 PM Sam Rohde wrote: > Gotcha, I was just surprised by the precision loss. Thanks! > > On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw > wrote: > >> Yes, the inconsistency of timestamp granularity is something that >> hasn't yet been

Re: Transitive dependency from external repository

2020-02-06 Thread Luke Cwik
It could do that as well. On Thu, Feb 6, 2020 at 11:25 AM Kenneth Knowles wrote: > That XML-generating code should be able to traverse project.repositories > and add them on a per-module basis, no? > > On Thu, Feb 6, 2020 at 9:47 AM Luke Cwik wrote: > >> We generate the pom using Gradle

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-06 Thread Kenneth Knowles
There is a major problem with this merge: the runners that do not support it do not reject pipelines that need this feature. They will silently produce the wrong answer, causing data loss. Kenn On Thu, Feb 6, 2020 at 3:24 AM Jan Lukavský wrote: > Hi, > > the PR was merged to master and a few

Re: Time precision in Python

2020-02-06 Thread Sam Rohde
Gotcha, I was just surprised by the precision loss. Thanks! On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw wrote: > Yes, the inconsistency of timestamp granularity is something that > hasn't yet been resolved (see previous messages on this list). As long > as we round consistently, it won't

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Robert Bradshaw
Thanks! On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía wrote: > Thanks Kamil and Michał for taking care of this. > Excellent job! > > On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski < > kamil.wasilew...@polidea.com> wrote: > >> Thanks to everyone involved in the discussion. >> >> I've taken a look

Re: Time precision in Python

2020-02-06 Thread Robert Bradshaw
Yes, the inconsistency of timestamp granularity is something that hasn't yet been resolved (see previous messages on this list). As long as we round consistently, it won't result in out-of-order windows, but it may result in timestamp truncation and (for sub-millisecond small windows) even window

Time precision in Python

2020-02-06 Thread Sam Rohde
Hi All, I saw that in the Python SDK we encode WindowedValues and Timestamps

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Ismaël Mejía
Thanks Kamil and Michał for taking care of this. Excellent job! On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > Thanks to everyone involved in the discussion. > > I've taken a look at the first 50 recently updated Pull Requests. Only few > of them were

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-06 Thread Ismaël Mejía
Hehe yes sorry I should have waited a bit. Thanks for taking care of it Brian. On Thu, Feb 6, 2020 at 10:12 PM Brian Hulette wrote: > Seems to be green now. Just had to wait for another run after the schema > update. > > On Thu, Feb 6, 2020 at 9:47 AM Tomo Suzuki wrote: > >> Ismaël, >> >> The

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-06 Thread Brian Hulette
Seems to be green now. Just had to wait for another run after the schema update. On Thu, Feb 6, 2020 at 9:47 AM Tomo Suzuki wrote: > Ismaël, > > The latest postcommit failure > https://builds.apache.org/job/beam_PostCommit_SQL/3951/ was 4:15:45 PM > Brian's successful case >

Re: Transitive dependency from external repository

2020-02-06 Thread Kenneth Knowles
That XML-generating code should be able to traverse project.repositories and add them on a per-module basis, no? On Thu, Feb 6, 2020 at 9:47 AM Luke Cwik wrote: > We generate the pom using Gradle here[1]. > > The issue is that it applies to all beam modules and what you are asking > for isn't

Re: Deterministic field ordering in derived schemas

2020-02-06 Thread Reuven Lax
On Thu, Feb 6, 2020 at 12:57 AM Gleb Kanterov wrote: > Field ordering matters, for instance, for batch pipeline writing to a > non-partitioned BigQuery table. Each partition is a new table with own > schema. Each day a new table would have non-deterministic field ordering. > It's arguable if

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-06 Thread Tomo Suzuki
Ismaël, The latest postcommit failure https://builds.apache.org/job/beam_PostCommit_SQL/3951/ was 4:15:45 PM Brian's successful case https://builds.apache.org/job/beam_PostCommit_SQL_PR/243/ started 4:29:57 PM. I hope the next SQL postcommit should succeed. On Thu, Feb 6, 2020 at 11:57 AM Ismaël

Re: Transitive dependency from external repository

2020-02-06 Thread Luke Cwik
We generate the pom using Gradle here[1]. The issue is that it applies to all beam modules and what you are asking for isn't currently plumbed through. You could try adding an option to the JavaNatureConfiguration[2] and then specify the additional repository in your module. 1:

Re: Transitive dependency from external repository

2020-02-06 Thread Jean-Baptiste Onofre
Like this: repositories { jcenter() maven { url "https://plugins.gradle.org/m2/; } maven { url "https://repo.spring.io/plugins-release/; content { includeGroup "io.spring.gradle" } } maven { url "foo" } } > Le 6 févr. 2020 à 18:37, Jean-Baptiste Onofre a écrit : > > Great,

Re: Transitive dependency from external repository

2020-02-06 Thread Jean-Baptiste Onofre
Great, thanks ! Back on your question, I guess we can add the repository in buildSrc/build.gradle (repositories property). Regards JB > Le 6 févr. 2020 à 18:33, Alexey Romanenko a écrit : > > Yes, it's Apache License 2.0 > >

Re: Transitive dependency from external repository

2020-02-06 Thread Alexey Romanenko
Yes, it's Apache License 2.0 https://packages.confluent.io/maven/io/confluent/kafka-avro-serializer/5.4.0/kafka-avro-serializer-5.4.0.pom > On 6 Feb 2020, at 18:12, Jean-Baptiste

Re: Transitive dependency from external repository

2020-02-06 Thread Jean-Baptiste Onofre
Hi, Just a side note: did you check the license of the dependency (just to be sure it’s not a Cat X dependency) ? Regards JB > Le 6 févr. 2020 à 18:06, Alexey Romanenko a écrit : > > Hi, > > To add support of Confluent Registry Schema in KafkaIO we added new > dependency on

Transitive dependency from external repository

2020-02-06 Thread Alexey Romanenko
Hi, To add support of Confluent Registry Schema in KafkaIO we added new dependency on “io.confluent:kafka-avro-serializer”. The artifacts of this dependency exist in external repository [1]. So, it should not be a problem to add this repository into the list of available repositories of Beam

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-06 Thread Ismaël Mejía
This one is still broken, maybe there are two different data sources, one for the '_PR' version and the normal one, can you please confirm? https://builds.apache.org/job/beam_PostCommit_SQL/ On Thu, Feb 6, 2020 at 5:44 PM Brian Hulette wrote: > Sorry for the delay. I had some issues updating

Re: Deterministic field ordering in derived schemas

2020-02-06 Thread Luke Cwik
Out of curiosity, in what cases would Schema.fields[index] not represent the encoding_position? On Thu, Feb 6, 2020 at 12:57 AM Gleb Kanterov wrote: > Field ordering matters, for instance, for batch pipeline writing to a > non-partitioned BigQuery table. Each partition is a new table with own >

Re: SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-06 Thread Brian Hulette
Sorry for the delay. I had some issues updating the schema, I ended up having to drop it and re-create for some reason. Looks like SQL PostCommit is green on https://github.com/apache/beam/pull/10765 now. > setting up from scratch is a good idea. +1, I filed

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Etienne Chauchot
Hi, please see my comments inline On 06/02/2020 16:24, Alexey Romanenko wrote: Please, see my comments inline. On 6 Feb 2020, at 10:50, Etienne Chauchot > wrote: 1. regarding version support: ES v2 is no more maintained by Elastic since 2018/02

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Alexey Romanenko
Please, see my comments inline. > On 6 Feb 2020, at 10:50, Etienne Chauchot wrote: 1. regarding version support: ES v2 is no more maintained by Elastic since 2018/02 so we plan to remove it from the IO. In the past we already retired versions (like spark 1.6 for instance). >>

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Kamil Wasilewski
Thanks to everyone involved in the discussion. I've taken a look at the first 50 recently updated Pull Requests. Only few of them were affected. I hope it wouldn't be too hard to fix them. In any case, here you can find instructions on how to run formatter:

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Michał Walenia
Hi, the PR is merged, all checks were green :) Enjoy prettier Python! On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía wrote: > Agree no need for vote for this because the consensus is clear and the sole > impact I can think of are pending PRs that will be broken. In the Java case > what we did was

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-06 Thread Jan Lukavský
Hi, the PR was merged to master and a few follow-up issues, were created, mainly [1] and [2]. I didn't find any reference to SortedMapState in JIRA, is there any tracking issue for that that I can link to? I also added link to design document here [3]. [1]

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Jean-Baptiste Onofre
Hi, Let’s sync together about this IO. Regarding mock and IOs, and Etienne’s comment, there are two things: 1. Of course, it’s always preferable to use concrete backend, but several times it’s not possible. It’s there mock is required. 2. The mock can be smart enough to cover core IO behavior

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Ismaël Mejía
Agree no need for vote for this because the consensus is clear and the sole impact I can think of are pending PRs that will be broken. In the Java case what we did was to just notice every PR that was affected by the change. And clearly document how to validate and autoformat the code. So the

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Etienne Chauchot
Hi, Thanks all for your comments, my comments are inline On 06/02/2020 09:47, Ludovic Boutros wrote: Hi all, First, thank you all for your answers and especially, Etienne for your time, advises and kindness :) @Jean-Baptiste, any help on this module is welcome of course. @Chamikara

Re: Deterministic field ordering in derived schemas

2020-02-06 Thread Gleb Kanterov
Field ordering matters, for instance, for batch pipeline writing to a non-partitioned BigQuery table. Each partition is a new table with own schema. Each day a new table would have non-deterministic field ordering. It's arguable if it's a good practice to define table schema using Java class, even

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Ludovic Boutros
Hi all, First, thank you all for your answers and especially, Etienne for your time, advises and kindness :) @Jean-Baptiste, any help on this module is welcome of course. @Chamikara Jayalath, my aswers are inline. Have a good day ! Ludovic Le mer. 5 févr. 2020 à 20:15, Chamikara Jayalath a