Re: Java Precommit duration

2018-10-25 Thread Ruoyun Huang
I was trying to reproduce the issue and understand the situation. By saying restoring parallel build, does that refer to "org.gradle.parallel" in gradle.properties? For me, regardless this gradle parallel property is on or off, running javaPreCommit always fails on target "

Beam Community Metrics

2018-10-25 Thread Scott Wegner
I want to summarize some of the great work done this summer by Mikhail, Udi, and Huygaa to visualize and track some project/community health metrics for Beam. Specifically, they've helped to build dashboards for: * Test suite health (pre-commit speed, post-commit reliability) * Pull Request health

Growing Beam -- A call for ideas? What is missing? What would be good to see?

2018-10-25 Thread Austin Bennett
Hi Beam Devs and Users, Trying to get a sense from the community on the sorts of things we think would be useful to build the community (I am thinking not from an angle of specific code/implementation/functionality, but from a user/usability -- I want to dive in and make real contributions with

Re: Roadmap section on IO related features

2018-10-25 Thread Jean-Baptiste Onofré
Agree, I think connector is a more meaning name for users. IO is more the Beam "internal" wording. I will update this section as I have new connectors ( :) ) on the fly. Regards JB On 26/10/2018 04:49, Kenneth Knowles wrote: > My $0.02 > > "IO" has an established meaning in Beam dev argot but

Re: Java Precommit duration

2018-10-25 Thread Kenneth Knowles
At this point I think the gains would be much less from further splitting. I am looking into parallel build restoration. Is it true that there were primarily the two failures? https://issues.apache.org/jira/browse/BEAM-5207 :beam-runners-apex:compileTestJava

Re: Roadmap section on IO related features

2018-10-25 Thread Kenneth Knowles
My $0.02 "IO" has an established meaning in Beam dev argot but I think on the web page I would use the word "connector" or something more universal. On Thu, Oct 25, 2018 at 7:39 PM Chamikara Jayalath wrote: > > (1) Add a top level IO roadmap. > I like this, but it is important on the roadmap

Roadmap section on IO related features

2018-10-25 Thread Chamikara Jayalath
(Forwarding to dev list from the PR on roadmap.) Given that we now have a main roadmap Webpage with some content, we were wondering if there should be a section on IO related efforts. We already have following Webpage on current and upcoming IO transforms.

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-25 Thread Ahmet Altay
On Tue, Oct 23, 2018 at 3:03 PM, Kenneth Knowles wrote: > Yes, user@ cannot reach new users, really. Twitter might, if we have > enough of adjacent followers to get it in front of the right people. On the > other hand, I find testimonials from experience convincing in this case. > I agree I am

Re: Java Precommit duration

2018-10-25 Thread Scott Wegner
Splitting into separate jobs that can be parallelized seems like a win for as long as Gradle task parallelization is disabled. Thanks for driving this improvement. > I'm in favor of (simple!) build breaks going in before precommits finish, on the promise that the offending test(s) passed locally.

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Rui Wang
Fantastic! -Rui On Thu, Oct 25, 2018 at 2:15 PM Mark Liu wrote: > That's awesome! Thanks Alan. > > Mark > > On Thu, Oct 25, 2018 at 2:07 PM Kenneth Knowles wrote: > >> It doesn't do the work for me, because I do have write access, and >> seemingly does not have a toggle. I'm jealous of your

Re: Data Preprocessing in Beam

2018-10-25 Thread Alex
Great! Right now there is a lot on that code I do not understand, hope in the next days I can document myself. Should I reimplement my algorithms in Scala? Or could I create a wrapper that interface with the sketching extension? Cheers.On Oct 24, 2018 15:00, Maximilian Michels wrote: > >

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Kenneth Knowles
It doesn't do the work for me, because I do have write access, and seemingly does not have a toggle. I'm jealous of your superior workflow :-) Kenn On Thu, Oct 25, 2018 at 1:10 PM Jeff Klukas wrote: > On Thu, Oct 25, 2018 at 3:17 PM Kenneth Knowles wrote: > >> What I haven't figured out is

Re: Unbalanced FileIO writes on Flink

2018-10-25 Thread Reuven Lax
FYI the Dataflow runner automatically sets the default number of shards (I believe to be 2 * num_workers). Probably we should do something similar for the Flink runner. This needs to be done by the runner, as # of workers is a runner concept; the SDK itself has no concept of workers. On Thu, Oct

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Jeff Klukas
On Thu, Oct 25, 2018 at 3:17 PM Kenneth Knowles wrote: > What I haven't figured out is how to get GitHub to create the branch for > the PR on your fork. > GitHub does that work for you. I don't have commit access to apache/beam, so when I hit the link, it gives me a banner at the top of the

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-25 Thread Thomas Weise
Reminder that this is something we ideally address before the next release... Considering the discussion so far, my preference is that we get away from unknown options and discover valid options from the runner (by expanding the job service). Once the SDK is aware of all valid options, it is

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Kenneth Knowles
What I haven't figured out is how to get GitHub to create the branch for the PR on your fork. Kenn On Thu, Oct 25, 2018 at 5:41 AM Jeff Klukas wrote: > Max - The website source was indeed merged into the main beam repository a > few weeks ago, separate from this change. > > The edit button is

Re: KafkaIO - Deadletter output

2018-10-25 Thread Raghu Angadi
On Thu, Oct 25, 2018 at 10:47 AM Chamikara Jayalath wrote: > > > On Thu, Oct 25, 2018 at 10:41 AM Raghu Angadi wrote: > >> >> On Thu, Oct 25, 2018 at 10:28 AM Chamikara Jayalath >> wrote: >> >>> Not sure if I understand why this would require Kafka to behave as two >>> independent sources. >>>

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Lukasz Cwik
Thats a good point Thomas, hadn't considered the lib/ case. I also am recommending what Thomas is suggesting as well. On Thu, Oct 25, 2018 at 10:52 AM Maximilian Michels wrote: > On 25.10.18 19:23, Lukasz Cwik wrote: > > > > > > On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels >

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Maximilian Michels
On 25.10.18 19:23, Lukasz Cwik wrote: On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels > wrote: Question: How would a user end up with the same shaded dependency twice? The shaded dependencies are transitive dependencies of Beam and thus, this

Re: KafkaIO - Deadletter output

2018-10-25 Thread Chamikara Jayalath
On Thu, Oct 25, 2018 at 10:02 AM Raghu Angadi wrote: > On Wed, Oct 24, 2018 at 11:54 PM Reuven Lax wrote: > [...] > >> KafkaIO has a few in-built policies for watermark and timestamp that >>> cover most use cases (including server time, which has a benefit of >>> providing perfect watermark).

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Lukasz Cwik
On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels wrote: > Question: How would a user end up with the same shaded dependency twice? > The shaded dependencies are transitive dependencies of Beam and thus, > this shouldn't happen. Is this a safe-guard when running different > versions of Beam in

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-25 Thread Tim
I can do some tests on Spark / YARN tomorrow (CEST timezone). Sorry I’ve just been too busy to assist. Tim > On 25 Oct 2018, at 18:59, Kenneth Knowles wrote: > > I tried to do a more thorough job on this. > > - I could not reproduce the slowdown in Query 9. I believe the variance was >

Re: [DISCUSS] Beam public roadmap

2018-10-25 Thread Thomas Weise
Hi Kenn, This looks great, thanks! As follow-up, we can probably also move the following to the Wiki: https://beam.apache.org/contribute/design-documents/ Thomas On Wed, Oct 24, 2018 at 7:45 PM Kenneth Knowles wrote: > OK. I have taken everyone's feedback into account. Preview at >

Re: KafkaIO - Deadletter output

2018-10-25 Thread Raghu Angadi
On Wed, Oct 24, 2018 at 11:54 PM Reuven Lax wrote: [...] > KafkaIO has a few in-built policies for watermark and timestamp that cover >> most use cases (including server time, which has a benefit of providing >> perfect watermark). It also gives fairly complete control on these to users >> if

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-25 Thread Kenneth Knowles
I tried to do a more thorough job on this. - I could not reproduce the slowdown in Query 9. I believe the variance was simply high given the parameters and environment - I saw the same slowdown in Query 8 when running as part of the suite, but it vanished when I ran repeatedly on its own, so

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Maximilian Michels
Question: How would a user end up with the same shaded dependency twice? The shaded dependencies are transitive dependencies of Beam and thus, this shouldn't happen. Is this a safe-guard when running different versions of Beam in the same JVM? Other than the search/replace issue, the full

Re: [DISCUSS] Separate Jenkins notifications to a new mailing list

2018-10-25 Thread Rui Wang
Thanks Kenneth for handling this! -Rui On Thu, Oct 25, 2018 at 8:47 AM Kenneth Knowles wrote: > OK, this is done. All emails go to builds@. > > Kenn > > On Mon, Oct 22, 2018 at 7:12 AM Kenneth Knowles wrote: > >> OK, bui...@beam.apache.org now exists. >> >> I've opened

Re: [DISCUSS] Separate Jenkins notifications to a new mailing list

2018-10-25 Thread Kenneth Knowles
OK, this is done. All emails go to builds@. Kenn On Mon, Oct 22, 2018 at 7:12 AM Kenneth Knowles wrote: > OK, bui...@beam.apache.org now exists. > > I've opened https://github.com/apache/beam/pull/6775 to migrate but first > make the discussion more specific. There are three types of

Jenkins build is back to normal : beam_SeedJob #2866

2018-10-25 Thread Apache Jenkins Server
See

Build failed in Jenkins: beam_SeedJob #2865

2018-10-25 Thread Apache Jenkins Server
See -- GitHub pull request #6808 of commit d2eff4579a4282bf9168bb7259610cd3af20ec16, no merge conflicts. Setting status of d2eff4579a4282bf9168bb7259610cd3af20ec16 to PENDING with url

Build failed in Jenkins: beam_SeedJob #2864

2018-10-25 Thread Apache Jenkins Server
See -- GitHub pull request #6808 of commit cabc23595747d456f4e6d6b67bdce935d3d085e2, no merge conflicts. Setting status of cabc23595747d456f4e6d6b67bdce935d3d085e2 to PENDING with url

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Jeff Klukas
Max - The website source was indeed merged into the main beam repository a few weeks ago, separate from this change. The edit button is a great idea! On Thu, Oct 25, 2018 at 7:37 AM Maximilian Michels wrote: > Cool! > > I guess the underlying change is that the website can now be edited >

Re: Java Precommit duration

2018-10-25 Thread Kenneth Knowles
The split did seemingly trim about 30 minutes off the Java precommit. Of course the difference between 50 and 80 minutes won't qualitatively change much. I don't see any other obvious and easy wins. I still like the split for the separation of signals. Kenn On Tue, Oct 23, 2018 at 2:47 PM Robert

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Maximilian Michels
Cool! I guess the underlying change is that the website can now be edited through the main repository and we don't have to go through "beam-site"? -Max On 25.10.18 12:20, Alexey Romanenko wrote: This is really cool feature! With a tab “Preview changes” it makes documentation updating much

Re: Unbalanced FileIO writes on Flink

2018-10-25 Thread Jozef Vilcek
If I do not specify shards for unbounded collection, I get Caused by: java.lang.IllegalArgumentException: When applying WriteFiles to an unbounded PCollection, must specify number of output shards explicitly at

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Alexey Romanenko
This is really cool feature! With a tab “Preview changes” it makes documentation updating much more easier to do. Thanks a lot to Alan and Scott! > On 25 Oct 2018, at 09:48, Robert Bradshaw wrote: > > Very cool! Thanks! > On Thu, Oct 25, 2018 at 9:38 AM Connell O'Callaghan > wrote: >> >>

Re: Unbalanced FileIO writes on Flink

2018-10-25 Thread Maximilian Michels
I agree it would be nice to keep the current distribution of elements instead of doing a shuffle based on an artificial shard key. Have you tried `withWindowedWrites()`? Also, why do you say you need to specify the number of shards in streaming mode? -Max On 25.10.18 10:12, Jozef Vilcek

Re: Unbalanced FileIO writes on Flink

2018-10-25 Thread Jozef Vilcek
Hm, yes, this makes sense now, but what can be done for my case? I do not want to end up with too many files on disk. I think what I am looking for is to instruct IO that do not do again random shard and reshuffle but just assume number of shards equal to number of workers and shard ID is a

Re: KafkaIO - Deadletter output

2018-10-25 Thread Jozef Vilcek
what I ended up doing, when I could not for any reasono rely on kafka timestamps, but need to parse them form message is: * have a cusom kafka deserializer which never throws but returns message which is either a success with parsed data structure plus timestamp or failure with original kafka