Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-05-31 Thread Matthias Baetens
Hey Eugene, hi all! Happy to say your talk is now on the Beam YouTube channel and can be watched here . It'd be great to see more of these on the channel so we can start sharing this on meetups, conferences and other places and see this grow, so don't hesitate to

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Jean-Baptiste Onofré
Hi, Regarding RabbitMqIO, Eugene provided new feedback last night that I would like to implement. However, it's not a release blocker, so I will move forward with 2.5.0 release without RabbitMqIO (I will include in 2.6.0). Regarding ParquetIO, I tested HDFS successfully as well (I had an issue

Apache Beam Summit in Europe

2018-05-31 Thread Matthias Baetens
Hi Beam Community, We are planning to have an Apache Beam Summit in Europe, pretty similar to what we hosted earlier in California on March 15th this year. If you'd be interested in attending, helping with organization or speaking, please answer this Google form

Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-05-31 Thread Ismaël Mejía
Great ! Matthias I think it makes sense to have these guidelines in the beam site better than a google doc. Can you please submit a PR for this? On Thu, May 31, 2018 at 8:03 AM Matthias Baetens wrote: > > Hey Eugene, hi all! > > Happy to say your talk is now on the Beam YouTube channel and can

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Kenneth Knowles
Yea - the branch should be cut before trying to finish the burndown. Kenn On Thu, May 31, 2018 at 2:09 AM Robert Bradshaw wrote: > I think it makes sense to cut the release and get the ball rolling, and > iff the ParquetIO/S3 issue turns out to be simple, we cherry-pick, > otherwise we add a

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Jean-Baptiste Onofré
Agree, I gonna start the release process tonight (my time). Regards JB Le 31 mai 2018 à 14:29, à 14:29, Kenneth Knowles a écrit: >Yea - the branch should be cut before trying to finish the burndown. > >Kenn > >On Thu, May 31, 2018 at 2:09 AM Robert Bradshaw >wrote: > >> I think it makes sense

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Robert Bradshaw
I think it makes sense to cut the release and get the ball rolling, and iff the ParquetIO/S3 issue turns out to be simple, we cherry-pick, otherwise we add a note. On Thu, May 31, 2018 at 1:56 AM Jean-Baptiste Onofré wrote: > Hi, > > Regarding RabbitMqIO, Eugene provided new feedback last night

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Henning Rohde
+1 On Thu, May 31, 2018 at 8:55 AM Thomas Weise wrote: > +1 to the goal of increasing review bandwidth > > In addition to the proposed reviewer requirement change, perhaps there are > other ways to contribute towards that goal as well? > > The discussion so far has focused on how more work can

Re: parquet/beam

2018-05-31 Thread Lukasz Cwik
It really needs someone to take a deep dive and look into whether Arrow is a good fit now considering all the use cases that Apache Beam has. I did a look about a year ago when designing the Fn Data API and concluded at that point in time it wasn't great for several reasons but mainly due to the

Re: The full list of proposals / prototype documents

2018-05-31 Thread Eugene Kirpichov
Thank you! On Thu, May 31, 2018 at 8:30 AM Alexey Romanenko wrote: > Thank you everybody for provided links. I collected all of them (please, > correct me if I missed something), categorized and created a dedicated page > for Beam website. > > Here is a PR for that (please, review): >

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Andrew Pilloud
If someone is trusted enough to review a committers code shouldn't they also be trusted enough to review another contributors code? As a non-committer I would get much quicker reviews if I could have other non-committers do the review, then get a committer who trusts us to merge. Andrew On Thu,

Re: parquet/beam

2018-05-31 Thread Kenneth Knowles
For the latter, can we have the Fn API data plane transmit sub-bundle groupings to benefit from the memory layout? On input the runner controls, on output the SDK controls (spilling)? Just random thoughts. Kenn On Thu, May 31, 2018 at 8:21 AM Lukasz Cwik wrote: > Tyler and I had reached out to

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Pablo Estrada
In that case, does it make sense to say: - A code review by a committer is enough to merge. - Committers can have their PRs reviewed by non-committers that are familiar with the code - Non-committers may have their code reviewed by non-committers, but should have a committer do a lightweight

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Thomas Weise
+1 to the goal of increasing review bandwidth In addition to the proposed reviewer requirement change, perhaps there are other ways to contribute towards that goal as well? The discussion so far has focused on how more work can get done with the same pool of committers or how committers can get

Re: parquet/beam

2018-05-31 Thread Lukasz Cwik
Kenn, it can be done but requires explicit flow control communication between the Runner -> SDK and SDK -> Runner to be developed to support sub-bundle groupings. Transports and in memory layouts are related but improving our coders to use in memory layouts would give us most of the benefit. For

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Jean-Baptiste Onofré
In that case, the contributor should be a committer pretty fast. I would prefer to keep at least a final validation from a committer to guarantee the consistency of the project and anyway, only committer role can merge a PR. However, I fully agree that the most important is the Beam community. I

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Robert Bradshaw
+1, this is what I was going to propose. Code review serves two related, but distinct purposes. The first is just getting a second set of eyes on the code to improve quality (call this the LGTM). This can be done by anyone. The second is vetting whether this contribution, in its current form,

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Jean-Baptiste Onofré
Agree, it sounds good to me. That's basically what I proposed to the Euphoria DSL team ;) Regards JB On 31/05/2018 20:35, Pablo Estrada wrote: > In that case, does it make sense to say: > > - A code review by a committer is enough to merge. > - Committers can have their PRs reviewed by

Re: SQL shaded jars don't work. How to test?

2018-05-31 Thread Andrew Pilloud
There is now a testShadowJar option on applyJavaNature, which allows you to run your tests against the shaded jar. Everyone should turn it on for their module along with failOnWarning and enableSpotless for maximum testing. On Thu, May 24, 2018 at 1:24 PM Andrew Pilloud wrote: > I've make the

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Chamikara Jayalath
+1 for the idea of reducing load on committers by involving contributors to perform detailed reviews. I think this has been the case in practice at least in some cases. I agree with Thomas Weise that proper long term solution will be growing the committer base by helping existing regular

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Eugene Kirpichov
Agreed with all said above - as I understand it, we have consensus on the following: Whether you're a committer or not: - Find somebody who's familiar with the code and ask them to review. Use your best judgment in whose review would give you good confidence that your code is actually good. (it's

Build failed in Jenkins: beam_SeedJob #1845

2018-05-31 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit 0634019867b5e33304383f39451c5c94d3752572, no merge conflicts. Setting status of 0634019867b5e33304383f39451c5c94d3752572 to PENDING with url

Jenkins build is back to normal : beam_SeedJob #1846

2018-05-31 Thread Apache Jenkins Server
See

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Tim
Congratulations! Tim > On 1 Jun 2018, at 07:05, Andrew Psaltis wrote: > > Congrats! > >> On Fri, Jun 1, 2018 at 12:26 AM, Thomas Weise wrote: >> Congrats! >> >> >>> On Thu, May 31, 2018 at 9:25 PM, Alan Myrvold wrote: >>> Congrats Gris+Pablo+Jason. Well deserved. >>> On Thu, May 31,

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Pablo Estrada
Thanks to the PMC! Very humbled and excited to keep taking part in this great community. :) -P. On Thu, May 31, 2018, 10:10 PM Tim wrote: > Congratulations! > > > Tim > > On 1 Jun 2018, at 07:05, Andrew Psaltis wrote: > > Congrats! > > On Fri, Jun 1, 2018 at 12:26 AM, Thomas Weise wrote: > >>

Re: [VOTE] Go SDK

2018-05-31 Thread Davor Bonaci
The IP clearance document has been filed into Foundation records, and is currently under review. No further action necessary, unless we hear back. On Fri, May 25, 2018 at 10:31 AM, Henning Rohde wrote: > Thanks a lot, Davor! Much appreciated. > > Thanks, > Henning > > On Fri, May 25, 2018 at

[ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Davor Bonaci
Please join me and the rest of Beam PMC in welcoming the following contributors as our newest committers. They have significantly contributed to the project in different ways, and we look forward to many more contributions in the future. * Griselda Cuevas * Pablo Estrada * Jason Kuster

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Chamikara Jayalath
Congrats to all three!! On Thu, May 31, 2018 at 7:09 PM Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > contributors as our newest committers. They have significantly contributed > to the project in different ways, and we look forward to many more >

Re: Closing (automatically?) inactive pull requests

2018-05-31 Thread Kenneth Knowles
Update: you brought the information needed, and it is now enabled. Thanks for the follow-through! Since you dug into probot's details, I took the liberty of assigning BEAM-4423 to you, in case throwing together the needed configs is fresh in your mind and you are in the mood to continue. (if not,

Re: Design Proposal: Beam-Site Automation Reliability

2018-05-31 Thread Thomas Weise
Very nice, enthusiastic +1 On Thu, May 31, 2018 at 3:24 PM, Scott Wegner wrote: > Thanks to everyone who reviewed the doc. I put together a plan based on > the initial feedback to improve website automation reliability. At a > glance, I am proposing to: > > * Migrate website source code to the

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Alan Myrvold
Congrats Gris+Pablo+Jason. Well deserved. On Thu, May 31, 2018 at 9:15 PM Jason Kuster wrote: > Thank you to Davor and the PMC; I'm excited to be able to help Beam in > this new capacity. Bring on the PRs. :D > > On Thu, May 31, 2018 at 8:55 PM Xin Wang wrote: > >> Congrats! >> >> - Xin Wang

Managing outdated dependencies

2018-05-31 Thread Chamikara Jayalath
Hi All, We recently ran into many issues due to Beam dependencies being significantly out of date. For example see [1], [2], and [3]. Yifan Zou recently introduced a proposal [4] that would allow us to identify outdated dependencies. But to really make sure that this helps the Beam project and

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Rui Wang
Congrats! -Rui On Thu, May 31, 2018 at 8:23 PM Jean-Baptiste Onofré wrote: > Congrats ! > > Regards > JB > > On 01/06/2018 04:08, Davor Bonaci wrote: > > Please join me and the rest of Beam PMC in welcoming the following > > contributors as our newest committers. They have significantly > >

Re: Closing (automatically?) inactive pull requests

2018-05-31 Thread Alan Myrvold
Thanks. I can look into adding the stale.yaml file for old pull requests/ On Thu, May 31, 2018 at 8:07 PM Kenneth Knowles wrote: > Update: you brought the information needed, and it is now enabled. Thanks > for the follow-through! > > Since you dug into probot's details, I took the liberty of

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Andrew Psaltis
Congrats! On Fri, Jun 1, 2018 at 12:26 AM, Thomas Weise wrote: > Congrats! > > > On Thu, May 31, 2018 at 9:25 PM, Alan Myrvold wrote: > >> Congrats Gris+Pablo+Jason. Well deserved. >> >> On Thu, May 31, 2018 at 9:15 PM Jason Kuster >> wrote: >> >>> Thank you to Davor and the PMC; I'm excited

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Jean-Baptiste Onofré
Congrats ! Regards JB On 01/06/2018 04:08, Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > contributors as our newest committers. They have significantly > contributed to the project in different ways, and we look forward to > many more contributions in

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Kenneth Knowles
Huzzah! On Thu, May 31, 2018 at 7:27 PM Ahmet Altay wrote: > Congratulations to all of you! > > On Thu, May 31, 2018 at 7:26 PM, Chamikara Jayalath > wrote: > >> Congrats to all three!! >> >> On Thu, May 31, 2018 at 7:09 PM Davor Bonaci wrote: >> >>> Please join me and the rest of Beam PMC in

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Ahmet Altay
Congratulations to all of you! On Thu, May 31, 2018 at 7:26 PM, Chamikara Jayalath wrote: > Congrats to all three!! > > On Thu, May 31, 2018 at 7:09 PM Davor Bonaci wrote: > >> Please join me and the rest of Beam PMC in welcoming the following >> contributors as our newest committers. They

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Anton Kedin
Congrats! On Thu, May 31, 2018 at 7:29 PM Kenneth Knowles wrote: > Huzzah! > > On Thu, May 31, 2018 at 7:27 PM Ahmet Altay wrote: > >> Congratulations to all of you! >> >> On Thu, May 31, 2018 at 7:26 PM, Chamikara Jayalath > > wrote: >> >>> Congrats to all three!! >>> >>> On Thu, May 31, 2018

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Kenneth Knowles
Seems like enough consensus, and that this is a policy thing that should have an official vote. On Thu, May 31, 2018 at 12:01 PM Robert Bradshaw wrote: > +1, this is what I was going to propose. > > Code review serves two related, but distinct purposes. The first is just > getting a second set

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Eugene Kirpichov
On Thu, May 31, 2018 at 2:56 PM Ismaël Mejía wrote: > If I understood correctly what is proposed is: > > - Committers to be able to have their PRs reviewed by non-committers > and be able to self-merge. > - For non-committers nothing changes. > I think it is being proposed that a non-committer

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Kenneth Knowles
On Thu, May 31, 2018, 15:08 Eugene Kirpichov wrote: > > > On Thu, May 31, 2018 at 2:56 PM Ismaël Mejía wrote: > >> If I understood correctly what is proposed is: >> >> - Committers to be able to have their PRs reviewed by non-committers >> and be able to self-merge. >> - For non-committers

Re: Design Proposal: Beam-Site Automation Reliability

2018-05-31 Thread Scott Wegner
Thanks to everyone who reviewed the doc. I put together a plan based on the initial feedback to improve website automation reliability. At a glance, I am proposing to: * Migrate website source code to the main apache/beam repository * Discontinue checking-in generated HTML during the PR workflow

Re: Closing (automatically?) inactive pull requests

2018-05-31 Thread Alan Myrvold
INFRA-16589 got closed asking to clarify that the probot-stale app would not have permissions to merge automatically. >From my reading of the permissions documentation, it would not. I added a comment to INFRA-16589 On Tue, May 29, 2018 at 10:05 AM Lukasz Cwik wrote: > I opened up INFRA-16589 >

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Ismaël Mejía
If I understood correctly what is proposed is: - Committers to be able to have their PRs reviewed by non-committers and be able to self-merge. - For non-committers nothing changes. This enables a committer (wearing contributor head) to merge their own changes without committer approval, so we

Re: parquet/beam

2018-05-31 Thread Ismaël Mejía
If I understand correctly Arrow allows a common multi language in-memory data representation, so basically it is a columnar data format that you can use to transfer data betweeen libraries in python (pandas, numpy, etc), Java and other languages. This avoids the round-trip to disk to do so. So we

Re: parquet/beam

2018-05-31 Thread Reuven Lax
I've looked at arrow, and there's some trickiness. Beam has a record model and arrow works best with large batches of records. We could do per record encoding, but that might be inefficient in arrow. On Thu, May 31, 2018, 5:50 PM Ismaël Mejía wrote: > If I understand correctly Arrow allows a

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Ismaël Mejía
+1 This will help reviews go faster. And in the IO reviews makes extra sense, because a common need is to ping external people who are not committers but experts in the respective data stores. Of course this puts more trust in the committers but makes sense. On Thu, May 31, 2018 at 3:46 PM

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Etienne Chauchot
Le jeudi 31 mai 2018 à 06:17 -0700, Robert Burke a écrit : +1 I also thought this was the norm. My read of the committer/contributor guide was that a committer couldn't unilaterally merge their own code (approval/LGTM needs to come from someone familiar with the component), rather than every

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Jean-Baptiste Onofré
Yes I confirm: I already checked that last week. Thanks for the double check ! Regards JB Le 31 mai 2018 à 15:13, à 15:13, Etienne Chauchot a écrit: >Hi, >I did some tests on the maven artifacts produced by the gradle build: >I published maven artifacts to local maven repo using : > ./gradlew

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Jean-Baptiste Onofré
That's not fully correct. A committer can directly commit, all depends of the approach used in the project: commit and review or review and commit. In Beam we decided to do review and commit. So a committer or a PMC or a contributor should create a PR. Other Apache projects allow to directly

Re: [PROPOSITION] schedule some sanity tests on a daily basis

2018-05-31 Thread Etienne Chauchot
Hi all, please know that this subject has come forward:The first PR (https://github.com/apache/beam/pull/5464) writes perfs to BQThe second one (https://github.com/apache/beam/pull/4976) runs the PostCommits and configure the exports to BQFirst PR needs to be merged before second one and once

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Jean-Baptiste Onofré
By the way +1 Two reviews is overkill. The review period is already pretty long, so it would better to increase it more ;) Regards JB Le 31 mai 2018 à 15:34, à 15:34, "Jean-Baptiste Onofré" a écrit: >That's not fully correct. A committer can directly commit, all depends >of the approach used

Re: GroupByKey with sorted values within key

2018-05-31 Thread Etienne Chauchot
I also totally agree with what Luke said including the sort 100 use case. @Kenn, regarding to the above, I find the name misleading too: for people that do not have a strong big data background it could sound like the PCollection is sorted in the whole whereas it is only sorted locally within

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-31 Thread Etienne Chauchot
Hi, I did some tests on the maven artifacts produced by the gradle build: I published maven artifacts to local maven repo using : ./gradlew publishToMavenLocal -PisRelease --no-parallel -x test then used beam samples project (maven based) and did a mvn dependency:tree => transitive dependencies

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Robert Burke
+1 I also thought this was the norm. My read of the committer/contributor guide was that a committer couldn't unilaterally merge their own code (approval/LGTM needs to come from someone familiar with the component), rather than every review needs two committers. I don't recall a requirement

Re: Reducing Committer Load for Code Reviews

2018-05-31 Thread Kenneth Knowles
@JB: Yea, just talking about Beam practices, not the ASF rules which allow a project to choose. @Robert & Udi: This is explicitly _not_ the norm. It hasn't really changed since the beginning of the project. Here's the relevant section:

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-31 Thread Etienne Chauchot
@Łukasz true ElasticsearchIOIT and CasasndraIOIT were the first IO integration tests@Kenn: I think the jiras do not exist, I will create a main jira and one subtask per missing job Le mercredi 30 mai 2018 à 19:45 -0700, Kenneth Knowles a écrit : > This all seems extremely useful. Is there some

Re: parquet/beam

2018-05-31 Thread Lukasz Cwik
Tyler and I had reached out to Arrow folks[1] asking about how could we support the KV> when the iterable of values is beyond memory size limits. There is an open JIRA about adding support for large byte[] and strings and list types in ARROW-750[2]. Robert had pointed out that we could do the same

Re: The full list of proposals / prototype documents

2018-05-31 Thread Alexey Romanenko
Thank you everybody for provided links. I collected all of them (please, correct me if I missed something), categorized and created a dedicated page for Beam website. Here is a PR for that (please, review): https://github.com/apache/beam-site/pull/456