Re: Growing Beam -- A call for ideas? What is missing? What would be good to see?

2018-10-26 Thread Alejandro
Hello, although not exactly your intentions, I am also looking to contribute to Beam, but from a code perspective. I've been discussing with some beam members like Austin and lukasz (CCed) on how to integrate https://github.com/elbaulp/DPASF into Beam. It seems the best place for this algorithms

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Jean-Baptiste Onofré
+1 (binding) Quickly tested with beam-samples. Regards JB On 26/10/2018 17:05, Tim Robertson wrote: > A colleague and I tested on 2.7.0 and 2.8.0RC1: > > 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in spreadsheet) > 2. Our Avro to Avro pipelines on Spark/YARN/HDFS (note we backport

Re: Is it possible to be added to the apache beam testing group?

2018-10-26 Thread Alan Myrvold
I've added you to the apache-beam-jenkins account. We try to limit the users with access and expect that the test results are debuggable. Once you find the issue, if there are any testability improvements to make, please let me know. On Fri, Oct 26, 2018 at 1:59 PM Alex Amato wrote: > I was tryi

[PROPOSAL] Bundle splitting (https://s.apache.org/beam-checkpoint-and-split-bundles)

2018-10-26 Thread Lukasz Cwik
I build off of the work performed by Eugene et al. within Breaking the fusion barrier[2] and propose[1] a way of how to support splitting of bundles (primarily for SplittableDoFn) within the portability layer. This also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting. Note tha

Re: Growing Beam -- A call for ideas? What is missing? What would be good to see?

2018-10-26 Thread Rose Nguyen
I've heard of many people referring to the Medium posts related to Beam for step-by-step tutorials. https://medium.com/tag/apache-beam/latest On Thu, Oct 25, 2018 at 9:25 PM Austin Bennett wrote: > Hi Beam Devs and Users, > > Trying to get a sense from the community on the sorts of things we th

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Andrea Foegler
That worked! Thank you!! On Fri, Oct 26, 2018 at 2:05 PM Micah Wylde wrote: > I saw similar problems with imports from RunnerApi not showing up. The > issue is that by default IntelliJ will not analyze files that are larger > than 2MB, and RunnerApi is 2.6MB. I was able to fix this by setting >

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Micah Wylde
I saw similar problems with imports from RunnerApi not showing up. The issue is that by default IntelliJ will not analyze files that are larger than 2MB, and RunnerApi is 2.6MB. I was able to fix this by setting idea.max.intellisense.filesize=3000 in idea.properties (accessible via help -> edit c

Is it possible to be added to the apache beam testing group?

2018-10-26 Thread Alex Amato
I was trying to rerun a test which failed precommit . I wanted to run this test locally (:beam-runners-google-cloud-dataflow-java-examples-streaming:preCommit) Luke mentioned that it would be easier to debug if I were added to the testing group. So that I

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Ahmet Altay
I pushed binaries to the repositories. I started a blog post draft, please feel to make any changes directly, or comment on it [1]. I plan to publish the blog post along with an email to user@ on Monday 10/29. Ahmet [1] https://github.com/apache/beam/pull/6852 On Fri, Oct 26, 2018 at 10:16 AM,

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Lukasz Cwik
Yes. That is an issue that many people are running into. Intellij's editor does not use the build output from Gradle when providing code completion/editing support and uses a simple mechanism that indexes Java source files that is incompatible with our current shading strategy. Also, Intellij can'

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Andrea Foegler
So everything seems to build fine. But the analyzer shows missing imports for libraries that are actually part of Beam (in particular: import org.apache.beam.model.pipeline.v1.RunnerApi). I would have assumed that this option: "Delegate IDE build/run actions to gradle" would mean that the backgro

Re: Please ignore the 'Java FnApi PreCommit' and 'Java FnApi PostCommit' failures

2018-10-26 Thread Scott Wegner
Thanks for sending this heads-up. I just saw this fail on my PR [1], and looking at the Jenkins history [2] it seems like it is still mostly unstable. Can we disable this test from triggering on PRs while it is under development? If this is using PrecommitBuilder [3] I don't see a built-in way now

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Scott Wegner
Glad it helped! I forgot to mention here that Kenn and I did some hacking on the IntelliJ docs and the wiki now has much more information: https://cwiki.apache.org/confluence/display/BEAM/Using+IntelliJ+IDE If you're still having issues, take another look. And if you have a new issues or tip that'

Re: Does anyone have a strong intelliJ setup?

2018-10-26 Thread Andrea Foegler
Thank you for the updated docs! I just set up IntelliJ from scratch and got the build smoothly the the point where it's happy to use up all the resources on my laptop :) It was great to be able to just follow the steps provided to get up and going! On Fri, Oct 19, 2018 at 2:34 AM Maximilian Mic

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Ahmet Altay
+1 (binding) Thank you all for running validations and voting. I'm pleased to announce that the 2.8.0 RC1 is approved for release with 5 +1 votes (4 binding) and no -1 votes. I will start pushing the bits around. On Fri, Oct 26, 2018 at 9:20 AM, Maximilian Michels wrote: > +1 (binding) > > On

Re: Java Precommit duration

2018-10-26 Thread Kenneth Knowles
On Thu, Oct 25, 2018 at 10:47 PM Ruoyun Huang wrote: > I was trying to reproduce the issue and understand the situation. By > saying restoring parallel build, does that refer to "org.gradle.parallel" > in gradle.properties? > It is less tricky than that. Here's the exact change: https://github.

Re: [PROPOSAL] Additional design for the Beam Python State and Timers API

2018-10-26 Thread Kenneth Knowles
It all sounds very useful but I have basic concerns about item 1. The doc doesn't really seem to go into the design concerns that I have in mind. - map / flatMap are universal functions with definitions that we don't own and shouldn't violate - corollary: map / flatMap have per element paralleli

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Maximilian Michels
+1 (binding) On 26.10.18 17:45, Kenneth Knowles wrote: Nice. Thanks. +1 On Fri, Oct 26, 2018 at 8:44 AM Robert Bradshaw > wrote: Thanks Tim! This was my only hesitation, and sounds like we're in the clear here. +1 (binding) On Fri, Oct 26, 2018

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-26 Thread Maximilian Michels
I would prefer we don't introduce a (quirky) way of passing unknown options that forces users to type JSON into the command line (or similar acrobatics) Same here, the JSON approach seems technically nice but too bulky for users. To someone wanting to run a pipeline, all options are equally im

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Kenneth Knowles
Nice. Thanks. +1 On Fri, Oct 26, 2018 at 8:44 AM Robert Bradshaw wrote: > Thanks Tim! > > This was my only hesitation, and sounds like we're in the clear here. > > +1 (binding) > On Fri, Oct 26, 2018 at 5:05 PM Tim Robertson > wrote: > > > > A colleague and I tested on 2.7.0 and 2.8.0RC1: > >

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Robert Bradshaw
Thanks Tim! This was my only hesitation, and sounds like we're in the clear here. +1 (binding) On Fri, Oct 26, 2018 at 5:05 PM Tim Robertson wrote: > > A colleague and I tested on 2.7.0 and 2.8.0RC1: > > 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in spreadsheet) > 2. Our Avro to Av

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Tim Robertson
A colleague and I tested on 2.7.0 and 2.8.0RC1: 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in spreadsheet) 2. Our Avro to Avro pipelines on Spark/YARN/HDFS (note we backport the un-merged BEAM-5036 fix in our code) 3. Our Avro to Elasticsearch pipelines on Spark/YARN/HDFS Everything

Re: Roadmap section on IO related features

2018-10-26 Thread Chamikara Jayalath
+1 for using the term connectors. JB, thanks for agreeing to add content to this section. - Cham On Thu, Oct 25, 2018 at 8:48 PM Ahmet Altay wrote: > I like this idea and also the first option. I agree with making it clear > that things about IO are language specific. > > And +1 to calling it

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Robert Bradshaw
We can't use Reshuffle for this, as there may be other reasons the user wants to actually force a reshuffle, but I was suggesting a transform like reshuffle that can avoid the actual reshuffle if the data is already well distributed, and also provides some kind of unique key (though perhaps just ch

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Maximilian Michels
Actually, I don't think setting the number of shards by the Runner will solve the problem. The shuffling logic still remains. And, as observed by Jozef, it doesn't necessarily lead to balanced shards. The sharding logic of the Beam IO is handy but it shouldn't be strictly necessary when the da

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Jozef Vilcek
Thanks for the JIRA. If I understand it correctly ... so runner determined sharding will avoid extra shuffle? Will it just write worker local available data to it's shard? Something similar to coalesce in Spark? On Fri, Oct 26, 2018 at 11:26 AM Maximilian Michels wrote: > Oh ok, thanks for the p

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Maximilian Michels
Oh ok, thanks for the pointer. Coming from Flink, the default is that the sharding is determined by the runtime distribution. Indeed, we will have to add an overwrite to the Flink Runner, similar to this one: https://github.com/apache/beam/commit/cbb922c8a72680c5b8b4299197b515abf650bfdf#diff-a7