Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-30 Thread Kenneth Knowles
This all seems extremely useful. Is there some action to be taken other than advertising these related JIRAs? Kenn On Wed, May 30, 2018 at 5:45 AM Łukasz Gajowy wrote: > +1 to generalizing IT. I think the tests you mentioned were developed > earlier than the general idea of how the IOIT should

Jenkins build is back to normal : beam_SeedJob #1828

2018-05-30 Thread Apache Jenkins Server
See

Re: Java compiler OOMs on Jenkins/Gradle

2018-05-30 Thread Lukasz Cwik
Try running without a daemon (use flag --no-daemon) to see if its an issue with the gradle daemon you have been using isn't overloaded. On Wed, May 30, 2018 at 5:11 PM Ankur Goenka wrote: > I am facing OOM while locally building the project using Gradle. Here is > the scan

Build failed in Jenkins: beam_SeedJob #1827

2018-05-30 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit 05939d2358b3ca778636ffdf58396fead4b38e49, no merge conflicts. Setting status of 05939d2358b3ca778636ffdf58396fead4b38e49 to PENDING with url

Re: Java compiler OOMs on Jenkins/Gradle

2018-05-30 Thread Ankur Goenka
I am facing OOM while locally building the project using Gradle. Here is the scan https://scans.gradle.com/s/t3n42rw5666us The issue is happening from :rat task. Is this issue related? On Tue, May 1, 2018 at 4:40 PM Scott Wegner wrote: > Sorry about the instability. We need to get the Gradle

Re: parquet/beam

2018-05-30 Thread Chamikara Jayalath
On Wed, May 30, 2018 at 4:43 PM Lukasz Cwik wrote: > For Python Parquet support, hopefully we can have cross language pipelines > solve this so we only need to implement it once. If it is really popular, > having it implemented more then once may be worthwhile. > I'd say Parquet format is

Re: parquet/beam

2018-05-30 Thread Lukasz Cwik
For Python Parquet support, hopefully we can have cross language pipelines solve this so we only need to implement it once. If it is really popular, having it implemented more then once may be worthwhile. Would the point of Arrow be to treat it as an IO connector similar to ParquetIO or JdbcIO (I

Jenkins build is back to stable : beam_SeedJob #1822

2018-05-30 Thread Apache Jenkins Server
See

Jenkins build became unstable: beam_SeedJob #1821

2018-05-30 Thread Apache Jenkins Server
See

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-30 Thread Łukasz Gajowy
Regarding ParquetIO on S3: I am investigating the issue. It seems that it never worked on s3 (I didn't expect that). Currently, I'm trying to understand why it behaves differently than on other filesystems (HDFS, local). Any help appreciated. Regarding ParquetIO on HDFS: I was able to run it on

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-30 Thread Robert Bradshaw
On Wed, May 30, 2018 at 12:59 PM Ahmet Altay wrote: > Thank you JB. > > For clarification, are you referring to the following items: > - RabbitMqIO - https://github.com/apache/beam/pull/1729 > - ParquetIO on HDFS/S3 - https://issues.apache.org/jira/browse/BEAM-4421 > > If the above mapping is

Re: Reducing Committer Load for Code Reviews

2018-05-30 Thread Udi Meiri
I thought this was the norm already? I have been the sole reviewer a few PRs by committers and I'm only a contributor. +1 On Wed, May 30, 2018 at 2:13 PM Kenneth Knowles wrote: > ++1 > > This is good reasoning. If you trust someone with the committer > responsibilities [1] you should trust

Re: Reducing Committer Load for Code Reviews

2018-05-30 Thread Kenneth Knowles
++1 This is good reasoning. If you trust someone with the committer responsibilities [1] you should trust them to find an appropriate reviewer. Also: - adds a new way for non-committers and committers to bond - makes committers seem less like gatekeepers because it goes both ways - might

parquet/beam

2018-05-30 Thread Austin Bennett
I can see great use cases with s3/Parquet - so that's a great addition (which JB is addressing, for Java)! It would be even more ideal for the use cases I find myself around for there to be python parquet support, so for perhaps this next release: Would it make sense to be exploring:

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-30 Thread Ahmet Altay
Thank you JB. For clarification, are you referring to the following items: - RabbitMqIO - https://github.com/apache/beam/pull/1729 - ParquetIO on HDFS/S3 - https://issues.apache.org/jira/browse/BEAM-4421 If the above mapping is correct, could we separate addition of new feature from addressing

Reducing Committer Load for Code Reviews

2018-05-30 Thread Thomas Groh
Hey all; I've been thinking recently about the process we have for committing code, and our current process. I'd like to propose that we change our current process to require at least one committer is present for each code review, but remove the need to have a second committer review the code

Re: Hello Beam!

2018-05-30 Thread Lukasz Cwik
You'll want to take a look at JdbcIO, there is an example of how to use it in the Javadoc: https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/jdbc/JdbcIO.html On Wed, May 30, 2018 at 10:52 AM arun kumar wrote: > Thank you for response. > > Can you please share me ,

Re: Hello Beam!

2018-05-30 Thread arun kumar
Thank you for response. Can you please share me , how we can connect with postgres database . Please share me if you have any examples. Thanks Arun On Wed, May 30, 2018, 8:09 PM Lukasz Cwik wrote: > Arun, it would be best to checkout one of the quickstarts (Java, Python, > Go)

Jenkins build is back to normal : beam_SeedJob #1816

2018-05-30 Thread Apache Jenkins Server
See

Build failed in Jenkins: beam_SeedJob #1815

2018-05-30 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit c5af2747e7720891bfb421ea71db3233f8676edf, no merge conflicts. Setting status of c5af2747e7720891bfb421ea71db3233f8676edf to PENDING with url

Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-30 Thread Kenneth Knowles
Awesome! In the meantime I've tried out Gradle + Checker and unfortunately compilation hung. It could be due to any subset of Gradle, Checker, Errorprone. I would not expect a performance problem, since Checker is "pluggable type systems" and type checking is a very fast sort of analysis. Also I

Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-30 Thread Pablo Estrada
Thank you guys : D On Wed, May 30, 2018 at 9:20 AM Scott Wegner wrote: > Sorry to revive an old thread, but I wanted to give a shout-out and say > thank you to Ismaël and Tim who have been quickly chipping away at the > ErrorProne backlog. We started with 47 ErrorProne JIRA's [1], and in two >

Re: GroupByKey with sorted values within key

2018-05-30 Thread Lukasz Cwik
To add some more context, not all Runners have to be Java based and also not all shuffle implementations want to execute untrusted Java code which is why the contract is around using a secondary key with lexicographically ordered bytes. I don't remember the original concerns as to why SortValues

Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-30 Thread Scott Wegner
Sorry to revive an old thread, but I wanted to give a shout-out and say thank you to Ismaël and Tim who have been quickly chipping away at the ErrorProne backlog. We started with 47 ErrorProne JIRA's [1], and in two weeks we're down to just 17 [2]. Thanks! [1]

Re: [SQL] Unsupported features

2018-05-30 Thread Kenneth Knowles
This is extremely useful. Thanks for putting so much information together! Kenn On Wed, May 30, 2018 at 8:19 AM Kai Jiang wrote: > Hi all, > > Based on pull/5481 , I manually > did a coverage test with TPC-ds queries (65%) and TPC-h queries (100%) and

Re: GroupByKey with sorted values within key

2018-05-30 Thread Kenneth Knowles
The sorting by bytes is a deliberate limitation of this particular approach. It basically assumes you are using bytes-based shuffle under the hood, so invoking a language-specific comparator would be something new. I know +Ben had some ideas about this. Kenn On Wed, May 30, 2018 at 8:53 AM David

[SQL] Unsupported features

2018-05-30 Thread Kai Jiang
Hi all, Based on pull/5481 , I manually did a coverage test with TPC-ds queries (65%) and TPC-h queries (100%) and want to see what features Beam SQL is currently not supporting. Test was running on DirectRunner. I want to share the result.​ TPC-DS

Re: GroupByKey with sorted values within key

2018-05-30 Thread Kenneth Knowles
I can see a few usability issues here. Totally agree w/ Luke, just noting: - The naming is slightly misleading because SortValues is actually already GBK+SortValues. - It also makes things look less supported when they are in the extensions/ folder. I'd say we should have a better place to put

Re: GroupByKey with sorted values within key

2018-05-30 Thread Lukasz Cwik
Each runner can choose to override the SortValues PTransform with their own internal offering. For example Spark overrides global combine[1] during pipeline translation. If Spark detected the SortValues PTransform during translation, it could override the offering with something that used

Re: Hello Beam!

2018-05-30 Thread Lukasz Cwik
Arun, it would be best to checkout one of the quickstarts (Java, Python, Go) (https://beam.apache.org/get-started/beam-overview/) and when you have questions ask them on u...@beam.apache.org On Wed, May 30, 2018 at 5:32 AM arun kumar wrote: > Hi All, > > Thank you for adding in the group and I

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-30 Thread Łukasz Gajowy
+1 to generalizing IT. I think the tests you mentioned were developed earlier than the general idea of how the IOIT should look like emerged. AFAIK the same goes for the tests in io/google-cloud-platform module. I recently created some issues that address that [1], [2], [3]. If there's anyone

Re: Hello Beam!

2018-05-30 Thread arun kumar
Hi All, Thank you for adding in the group and I am interested in Apache beam with Google cloud runner. I need to start on Apache beam with Google cloud runner. Please help me where I need to start from and if anyone have simple code for my requirement. Thanks Arunkumar On Wed, May 30, 2018,

Performance testing documentation - suggestions request

2018-05-30 Thread Łukasz Gajowy
Hi, the Performance Testing Framework is an ongoing effort for some time now. As I noticed (and received signals from the community) it is getting more popular (this is good) but besides changing commands from mvn to gradle ones, it was not updated for a very long time (this is not good at all!).

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-30 Thread Etienne Chauchot
Hi Łukasz Thanks for the details. I was more thinking about generalizing IT test integration. For example some IOs like Cassandra and Elasticsearch have IT but no groovy scripts. Also I agree with your listAnd thanks for the details about backend services automatic provisioning, I did not know

Re: The full list of proposals / prototype documents

2018-05-30 Thread Łukasz Gajowy
Hi, I just wanted to add those two (sorry for being kinda late with this): https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit?usp=sharing https://docs.google.com/document/d/1Cb7XVmqe__nA_WCrriAifL-3WCzbZzV4Am5W_SkQLeA/edit?usp=sharing Thanks, Łukasz 2018-05-29

Re: Survey: what is everyone working on that you want to share?

2018-05-30 Thread Łukasz Gajowy
Let's add Performance Testing section too: https://github.com/apache/beam-site/pull/455 Thanks, Łukasz 2018-05-29 23:59 GMT+02:00 Carlos Alonso : > My two cents: > > https://github.com/apache/beam/pull/5341 > https://issues.apache.org/jira/browse/BEAM-4257 > > On Tue, May 29, 2018 at 7:50 PM

GroupByKey with sorted values within key

2018-05-30 Thread marek-simunek
Hi,  I have question I am trying to do translation in dsl-euphoria for “ GroupByKey with sorted values within key” to Beam. I am aware of java sdk extensions SortValues, but it doesn’t have sufficient abstraction for runners. I noticed that in DataflowRunner there is translation of batch

Re: [PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-30 Thread Łukasz Gajowy
Hi Etienne, it is already possible, provided that there is appropriate Jenkins job defined (see examples here: [1],[2]). Either the reviewer or the author can run the seed job to load job definitions (by typing "Run seed job" in comment) and then run the test he/she is interested to run (by

Re: Hello Beam!

2018-05-30 Thread Łukasz Gajowy
Welcome! :) 2018-05-30 7:25 GMT+02:00 Jean-Baptiste Onofré : > Welcome ! > > Looking forward to work and discuss with you ;) > > Regards > JB > > On 29/05/2018 23:49, Rui Wang wrote: > > Hi there, > > > > I am Rui (pronounced as same as "Ray")! > > > > I recently joined Google Cloud. Beam is a

[PROPOSAL] CI improvement: be able to run the IT of the IOs from github pull request

2018-05-30 Thread Etienne Chauchot
Hi guys Part of the CI improvement work, I would suggest to enable running the integration tests of the IOs from the github PR. Indeed, when doing a review, either the reviewer or the author needs to run the IT. The problem is that the results are private. It would be good to be able to run IT