Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-07-09 Thread Jean-Baptiste Onofré
The PR has been merged, it looks good to me. Thanks ! Regards JB On 10/07/2018 01:18, Lukasz Cwik wrote: > Actually, it hasn't been reviewed yet. Here is the > PR: https://github.com/apache/beam/pull/5571 > > On Mon, Jul 9, 2018 at 4:16 PM Lukasz Cwik > wrote: > >

Re: [PROPOSAL] Prepare Beam 2.6.0 release

2018-07-09 Thread Jean-Baptiste Onofré
+1 I planned to send the proposal as well ;) Regards JB On 09/07/2018 23:16, Pablo Estrada wrote: > Hello everyone! > > As per the previously agreed-upon schedule for Beam releases, the > process for the 2.6.0 Beam release should start on July 17th. > > I volunteer to perform this release.  >

CODEOWNERS for apache/beam repo

2018-07-09 Thread Udi Meiri
Hi everyone, I'm proposing to add auto-reviewer-assignment using Github's CODEOWNERS mechanism. Initial version is here: *https://github.com/apache/beam/pull/5909/files * I need help from the community in determining owners for each component. Feel

[PROPOSAL] Prepare Beam 2.6.0 release

2018-07-09 Thread Pablo Estrada
Hello everyone! As per the previously agreed-upon schedule for Beam releases, the process for the 2.6.0 Beam release should start on July 17th. I volunteer to perform this release. Here is the schedule that I have in mind: - We start triaging JIRA issues this week. - I will cut a release

Re: Building the Java SDK container with Jib?

2018-07-09 Thread Andrew Pilloud
This sounds really cool! I spent a minute looking at our current container code. We have a golang wrapper that bootstraps our Java SDK harness, which isn't compatible with Jib's current feature set (you can't add custom files or override the ENTRYPOINT). It might be quite a bit of work to move.

Building the Java SDK container with Jib?

2018-07-09 Thread Eugene Kirpichov
Hi, Apparently a new tool has come out that lets you build Java containers cheaply, without even having Docker installed: https://cloudplatform.googleblog.com/2018/07/introducing-jib-build-java-docker-images-better.html Anyone interested in giving it a shot, to have faster turnaround when

Re: Beam Dependency Ownership

2018-07-09 Thread Yifan Zou
If you haven't already, please take a look at the Beam SDK Dependency Ownership and sign up with any dependencies that you are familiar with. In case anyone miss, there is a second tab for

Re: Invite to comment on the @RequiresStableInput design doc

2018-07-09 Thread Lukasz Cwik
I'm also thinking that it would be best to apply to the whole transform. So side inputs, main inputs, timers and any future input constructs. On Sat, Jul 7, 2018 at 2:00 PM Reuven Lax wrote: > I think the entire transform. There might be some use case for having only > some inputs stable, but

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Lukasz Cwik
Instead of reverting/working around specific checks/tests that the DirectRunner is doing, have you considered using one of the other runners like Flink or Spark with a local execution cluster. You won't hit the validation/verification bottlenecks that DirectRunner specifically imposes. On Mon,

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Jean-Baptiste Onofré
Thanks for the update Eugene. @Vojta: do you mind to create a Jira ? I will tackle a fix for that. Regards JB On 09/07/2018 17:33, Eugene Kirpichov wrote: > Hi - > > If I remember correctly, the reason for this change was to ensure that > the state is encodable at all. Prior to the change,

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Eugene Kirpichov
Hi - If I remember correctly, the reason for this change was to ensure that the state is encodable at all. Prior to the change, there had been situations where the coder specified on a state cell is buggy, absent or set incorrectly (due to some issue in coder inference), but direct runner did not

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Jean-Baptiste Onofré
Hi Vojta, I fully agree, that's why it makes sense to wait Eugene's feedback. I remember we had some performance regression on the direct runner identified thanks to Nexmark, but it has been addressed by reverting a change. Good catch anyway ! Regards JB On 09/07/2018 17:20, Vojtech Janota

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Vojtech Janota
Hi Reuven, I'm not really complaining about DirectRunner. In fact it seems to me as if what previously was considered as part of the "expensive extra checks" done by the DirectRunner is now done within the beam-runners-core-java library. Considering that all objects involved are immutable (in our

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Reuven Lax
Hi Vojita, One problem is that the DirectRunner is designed for testing, not for performance. The DirectRunner currently does many purposely-inefficient things, the point of which is to better expose potential bugs in tests. For example, the DirectRunner will randomly shuffle the order of

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Jean-Baptiste Onofré
Hi, Do you use specific/complex coders in your pipeline ? I'm sure Eugene will propose some insights about this change: AFAIR, the purpose is to have a cleaner use of coders and identify identity copy. Regards JB On 09/07/2018 16:22, Vojtech Janota wrote: > Hi, > > We are using Apache Beam in

Performance issue in Beam 2.4 onwards

2018-07-09 Thread Vojtech Janota
Hi, We are using Apache Beam in our project for some time now. Since our datasets are of modest size, we have so far used DirectRunner as the computation easily fits onto a single machine. Recently we upgraded Beam from 2.2 to 2.4 and found out that performance of our pipelines drastically

Beam Dependency Check Report (2018-07-09)

2018-07-09 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release dill 0.2.6 0.2.8.2 2017-02-01 2018-06-25

Re: Ability to read from UTF-16 or UTF-32 encoded files?

2018-07-09 Thread Etienne Chauchot
Hi, Just a little precision. TextIO actually already supports custom multi-bytes delimiter in place of new lines. See TextIO#withDelimiter(byte[] delimiter) Etienne Le samedi 07 juillet 2018 à 16:15 -0700, Robert Bradshaw a écrit : > Currently TextIO scans for newlines to find line (record)