Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Jean-Baptiste Onofré
No problem, just define the fix version in Jira to 2.1.0 and I will wait to have all Jira fixed with this version before cutting the RC1. Thanks ! Regards JB On 07/06/2017 05:48 AM, Kenneth Knowles wrote: +1 to these and IMO we should treat all of the remaining 10 items on the burndown. I thin

Re: writing to s3 in beam

2017-07-05 Thread Ted Yu
Please take a look at BEAM-2500 (and related JIRAs). Cheers On Wed, Jul 5, 2017 at 8:00 PM, Jyotirmoy Sundi wrote: > Hi Folks, > > I am trying to write to s3 from beam. > > These are configs I am passing > > --hdfsConfiguration='[{"fs.default.name": "s3://xxx-output", > "fs.s3.awsAccessKey

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Kenneth Knowles
+1 to these and IMO we should treat all of the remaining 10 items on the burndown. I think all but one or two are in late-stage PR right now. It should be easy to merge them before July 10. On Wed, Jul 5, 2017 at 4:28 PM, Raghu Angadi wrote: > I would like to request merging two Kafka related PR

writing to s3 in beam

2017-07-05 Thread Jyotirmoy Sundi
Hi Folks, I am trying to write to s3 from beam. These are configs I am passing --hdfsConfiguration='[{"fs.default.name": "s3://xxx-output", "fs.s3.awsAccessKeyId" :"xxx", "fs.s3.awsSecretAccessKey":"yyy"}]' --input="/home/hadoop/data" --output="s3://xx-output/beam-output/" *Any idea how ca

Re: Making it easier to run IO ITs

2017-07-05 Thread Kenneth Knowles
I have some extra nerdy maven-fu (pom-fu?) to suggest: use -D instead of -P for a little more flexibility. You can't have one profile activate another, but you _can_ activate two profiles with the same property. [1] This doesn't work: mvn -P profile1 profile1 true

Re: Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
I also wrote up this dev doc that goes into more depth on how this will all work, as well as what it will be like to create a new IO IT. https://docs.google.com/document/d/1fISxgeq4Cbr-YRJQDgpnHxfTiQiHv8zQgb47dSvvJ78/edit?usp=sharing S On Wed, Jul 5, 2017 at 3:11 PM Stephen Sisk wrote: > hey

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Raghu Angadi
I would like to request merging two Kafka related PRs : #3461 , #3492 . Especially the second one, as it improves user experience in case of server misconfiguration that prevents connections between workers and the

Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
I wonder if the watermark is accidentally advancing too early, causing Apex to shut down the pipeline before the final finalize DoFn executes? On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise wrote: > I don't think this is a problem with the test and if anything this problem > to me shows the test i

Re: BeamSQL status and merge to master

2017-07-05 Thread Jesse Anderson
So excited to start using this! On Wed, Jul 5, 2017, 3:34 PM Mingmin Xu wrote: > Thanks for everybody's effort, we're very close to finish existing tasks. > Here's an status update of SQL DSL, feel free to have a try and share any > comment: > > *1. what's done* > DSL feature is done, with bas

Re: BeamSQL status and merge to master

2017-07-05 Thread Mingmin Xu
Thanks for everybody's effort, we're very close to finish existing tasks. Here's an status update of SQL DSL, feel free to have a try and share any comment: *1. what's done* DSL feature is done, with basic filter/project/aggregation/union/join, built-in functions/UDF/UDAF(pending on #3491) *2.

Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
hey all, I wanted to share an early draft of what it'll be like to invoke mvn for the IO integration tests in the future when we have the integration with kubernetes going. I'm really excited about these changes - working on the IO ITs, I have to run them frequently, and the command lines to run

Re: Failure in Apex runner

2017-07-05 Thread Kenneth Knowles
There is no asynchronous behavior in this test. It is basically a "batch" test, here: https://github.com/apache/beam/blob/master/runners/apex/src/test/java/org/apache/beam/runners/apex/examples/WordCountTest.java#L117 The pipeline is: p.apply("ReadLines", TextIO.read().from(options.getInputFi

[Proposal] Submitting pipelines to Runners in another language

2017-07-05 Thread Sourabh Bajaj
Hi, I wanted to share a proposal for submitting pipelines from SDK X (Python/Go) to runners written in another language Y (Java) (Flink / Spark / Apex) using the Runner API. Please find the doc here . As alway

Re: Build Failure in * release-2.0.0

2017-07-05 Thread Jyotirmoy Sundi
Thanks Ted On Wed, Jul 5, 2017 at 1:42 PM Ted Yu wrote: > bq. Caused by: java.net.SocketException: Too many open files > > Please adjust ulimit. > > FYI > > On Wed, Jul 5, 2017 at 1:33 PM, Jyotirmoy Sundi > wrote: > > > Hi Folks , > > > > Any idea why the build is failing in release-2.0.0 , i d

Re: Failure in Apex runner

2017-07-05 Thread Thomas Weise
I don't think this is a problem with the test and if anything this problem to me shows the test is useful in catching similar issues during unit test runs. Is there any form of asynchronous/trigger based processing in this pipeline that could cause this? The Apex runner will shutdown the pipeline

Re: Build Failure in * release-2.0.0

2017-07-05 Thread Ted Yu
bq. Caused by: java.net.SocketException: Too many open files Please adjust ulimit. FYI On Wed, Jul 5, 2017 at 1:33 PM, Jyotirmoy Sundi wrote: > Hi Folks , > > Any idea why the build is failing in release-2.0.0 , i did "mvn clean > package" > > > *Trace* > > [INFO] Running org.apache.beam.sdk.i

Build Failure in * release-2.0.0

2017-07-05 Thread Jyotirmoy Sundi
Hi Folks , Any idea why the build is failing in release-2.0.0 , i did "mvn clean package" *Trace* [INFO] Running org.apache.beam.sdk.io.hbase.HBaseResultCoderTest [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.461 s - in org.apache.beam.sdk.io.hbase.HBaseResultCoderTe

Re: is there any similar page to build custom io for java

2017-07-05 Thread Jyotirmoy Sundi
Thanks Stephen. On Wed, Jul 5, 2017 at 9:42 AM, Stephen Sisk wrote: > hi! > > I'd suggest checking out the Pipeline I/O page [0] for general IO guidance. > The Authoring I/O Transforms is not language specific, but combined with > the PTransform style guide [1], it's the best current resource fo

Re: Failure in Apex runner

2017-07-05 Thread Kenneth Knowles
Upon further investigation, this tests always writes to ./target/wordcountresult-0-of-2 and ./target/wordcountresult-1-of-2. So after a successful test run, any further run without a `clean` will spuriously succeed. I was running via IntelliJ so did not do the ritual `mvn clean` wor

Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
I've done a bit more debugging with logging. It appears that the finalize ParDo is never being invoked in this Apex test (or at least the LOG.info in that ParDo never runs). This ParDo is run on a constant element (code snippet below), so it should always run. PCollection singletonCollection = p.a

Re: Failure in Apex runner

2017-07-05 Thread Kenneth Knowles
Data-dependent file destinations is a pretty great feature. We also have another change to make to this @Experimental feature, and it would be nice to get them both into 2.1.0 if we can unblock this quickly. I just tried this too, and failed to reproduce it. But Jenkins and Reuven both have a reli

Re: [PROPOSAL] External Join with KV Stores

2017-07-05 Thread Robert Bradshaw
I'm generally in favor of viewing these as seekable reads rather than an entirely new concept. Not sure how it would fit into the SDFs architecture. On Wed, Jul 5, 2017 at 10:27 AM, Lukasz Cwik wrote: > Yes, I was thinking the same thing about side inputs. Our current IOs don't > support "seeking

Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
Hi Thomas, This only happens with https://github.com/apache/beam/pull/3356. Reuven On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise wrote: > Hi Reuven, > > I'm not able to reproduce the issue locally. I was hoping to see which > thread is attempting to emit the results. In Apex, only the operator

Re: [PROPOSAL] External Join with KV Stores

2017-07-05 Thread Lukasz Cwik
Yes, I was thinking the same thing about side inputs. Our current IOs don't support "seeking" and we could make HBaseIO/JdbcIO/... become seekable by key+window which would allow a Runner to optimize the Read + SideInput into any kind of deferred lookup when its accessed as a side input instead of

Re: is there any similar page to build custom io for java

2017-07-05 Thread Stephen Sisk
hi! I'd suggest checking out the Pipeline I/O page [0] for general IO guidance. The Authoring I/O Transforms is not language specific, but combined with the PTransform style guide [1], it's the best current resource for java and should give you what you need. We're definitely going to be adding mo

Re: Jenkins Executor Issue

2017-07-05 Thread Jason Kuster
This has been resolved, although Infra has not yet determined a root cause for the issue. If anyone sees this recurring please reply to this thread or to the JIRA. On Fri, Jun 30, 2017 at 12:34 PM, Jean-Baptiste Onofré wrote: > Thanks Jason. > > Regards > JB > > > On 06/30/2017 08:19 PM, Jason K

is there any similar page to build custom io for java

2017-07-05 Thread Jyotirmoy Sundi
Hi , I found this for python https://beam.apache.org/documentation/sdks/python-custom-io/ but was wondering if alike exists for java. -- Best Regards, Jyotirmoy Sundi

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Jean-Baptiste Onofré
FYI, the release branch has been created. I plan to do the RC1 tomorrow, so you have time to cherry-pick if wanted ;) Regards JB On 07/05/2017 07:52 AM, Jean-Baptiste Onofré wrote: Hi, I'm building with the last changes and I will cut the release branch just after. I keep you posted. Regard