Re: writing to s3 in beam

2017-07-05 Thread Ted Yu
Please take a look at BEAM-2500 (and related JIRAs).

Cheers

On Wed, Jul 5, 2017 at 8:00 PM, Jyotirmoy Sundi  wrote:

> Hi Folks,
>
>  I am trying to write to s3 from beam.
>
> These are configs I am passing
>
> --hdfsConfiguration='[{"fs.default.name": "s3://xxx-output",
> "fs.s3.awsAccessKeyId" :"xxx", "fs.s3.awsSecretAccessKey":"yyy"}]'
> --input="/home/hadoop/data" --output="s3://xx-output/beam-output/"
>
> *Any idea how can I write to s3, I am using beam release-2.0.0*
>
> *Trace*
>
> 17/07/06 02:55:46 WARN TaskSetManager: Lost task 7.0 in stage 2.0 (TID 31,
> ip-10-130-237-28.vpc.internal): org.apache.beam.sdk.util.
> UserCodeException:
> java.lang.NullPointerException
>
> at
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
>
> at
> org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles$auxiliary$
> TXDiaduA.invokeProcessElement(Unknown
> Source)
>
> at
> org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(
> SimpleDoFnRunner.java:197)
>
> at
> org.apache.beam.runners.core.SimpleDoFnRunner.processElement(
> SimpleDoFnRunner.java:155)
>
> at
> org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.
> processElement(DoFnRunnerWithMetrics.java:64)
>
> at
> org.apache.beam.runners.spark.translation.SparkProcessContext$
> ProcCtxtIterator.computeNext(SparkProcessContext.java:165)
>
> at
> org.apache.beam.runners.spark.repackaged.com.google.common.
> collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
>
> at
> org.apache.beam.runners.spark.repackaged.com.google.common.
> collect.AbstractIterator.hasNext(AbstractIterator.java:140)
>
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.
> hasNext(Wrappers.scala:41)
>
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(
> Growable.scala:48)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at scala.collection.TraversableOnce$class.toArray(
> TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.
> apply(RDD.scala:927)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.
> apply(RDD.scala:927)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1858)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1858)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.NullPointerException
>
> at java.io.File.(File.java:277)
>
> at
> org.apache.hadoop.fs.s3.S3OutputStream.newBackupFile(
> S3OutputStream.java:92)
>
> at org.apache.hadoop.fs.s3.S3OutputStream.(S3OutputStream.java:84)
>
> at org.apache.hadoop.fs.s3.S3FileSystem.create(S3FileSystem.java:252)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:782)
>
> at
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(
> HadoopFileSystem.java:103)
>
> at
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(
> HadoopFileSystem.java:67)
>
> at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:207)
>
> at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:194)
>
> at org.apache.beam.sdk.io.FileBasedSink$Writer.open(
> FileBasedSink.java:876)
>
> at
> org.apache.beam.sdk.io.FileBasedSink$Writer.openUnwindowed(FileBasedSink.
> java:842)
>
> at
> org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.
> processElement(WriteFiles.java:362)
>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
>


writing to s3 in beam

2017-07-05 Thread Jyotirmoy Sundi
Hi Folks,

 I am trying to write to s3 from beam.

These are configs I am passing

--hdfsConfiguration='[{"fs.default.name": "s3://xxx-output",
"fs.s3.awsAccessKeyId" :"xxx", "fs.s3.awsSecretAccessKey":"yyy"}]'
--input="/home/hadoop/data" --output="s3://xx-output/beam-output/"

*Any idea how can I write to s3, I am using beam release-2.0.0*

*Trace*

17/07/06 02:55:46 WARN TaskSetManager: Lost task 7.0 in stage 2.0 (TID 31,
ip-10-130-237-28.vpc.internal): org.apache.beam.sdk.util.UserCodeException:
java.lang.NullPointerException

at
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)

at
org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles$auxiliary$TXDiaduA.invokeProcessElement(Unknown
Source)

at
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:197)

at
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:155)

at
org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:64)

at
org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:165)

at
org.apache.beam.runners.spark.repackaged.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)

at
org.apache.beam.runners.spark.repackaged.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)

at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)

at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException

at java.io.File.(File.java:277)

at
org.apache.hadoop.fs.s3.S3OutputStream.newBackupFile(S3OutputStream.java:92)

at org.apache.hadoop.fs.s3.S3OutputStream.(S3OutputStream.java:84)

at org.apache.hadoop.fs.s3.S3FileSystem.create(S3FileSystem.java:252)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:782)

at
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:103)

at
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:67)

at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:207)

at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:194)

at org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:876)

at
org.apache.beam.sdk.io.FileBasedSink$Writer.openUnwindowed(FileBasedSink.java:842)

at
org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement(WriteFiles.java:362)



-- 
Best Regards,
Jyotirmoy Sundi


Re: Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
I also wrote up this dev doc that goes into more depth on how this will all
work, as well as what it will be like to create a new IO IT.

https://docs.google.com/document/d/1fISxgeq4Cbr-YRJQDgpnHxfTiQiHv8zQgb47dSvvJ78/edit?usp=sharing


S

On Wed, Jul 5, 2017 at 3:11 PM Stephen Sisk  wrote:

> hey all,
>
> I wanted to share an early draft of what it'll be like to invoke mvn for
> the IO integration tests in the future when we have the integration with
> kubernetes going.
>
> I'm really excited about these changes - working on the IO ITs, I have to
> run them frequently, and the command lines to run them can be quite a bear.
> For example:
>
> mvn -e verify -Dit.test=org.apache.beam.sdk.io.jdbc.JdbcIOIT
> -DskipITs=false -pl sdks/java/io/jdbc -Pio-it -Pdataflow-runner
> -DintegrationTestPipelineOptions=["--project=[project]","--gcpTempLocation=gs://[bucket]/staging","--postgresUsername=postgres","--postgresPassword=uuinkks","--postgresDatabaseName=postgres","--postgresSsl=False","--postgresServerName=[1.2.3.4]","--runner=TestDataflowRunner","--defaultWorkerLogLevel=INFO"]
>
> Also, in order to run this, I first need to have created an instance of
> this datastore in kubernetes and then copied the parameter and inevitably I
> mis-copy something in there or something changes, so it doesn't work
> correctly and I have to go back in and edit it.
>
> So that's a pain.
>
> To invoke the IO ITs in the future, it'll be a command like this:
>   mvn verify -Pio-it-suite -pl sdks/java/io/jdbc
>   -DpkbLocation="path-to-pkb.py" \
>   -DintegrationTestPipelineOptions='["--tempRoot=my-temp-root"]'
> (or at least, that's what I'm proposing :)
>
> This will run the jdbc integration tests, spinning up the data store for
> that integration test in your kubernetes cluster.
>
> This is all enabled by a combination of adding new profiles in maven for
> each IO and changes to the beam benchmarks in pkb (perfkitbenchmarker) to
> control kubernetes. Jason has already done a lot of work to get pkb working
> to run our regular benchmarks, and I'm excited to continue that work for IO
> ITs. We use pkb to control kubernetes and capture our benchmark times. This
> means you'll need to install pkb if you'd like to use this nicer
> experience, however, devs will never have to use pkb if they don't want to,
> nor is making changes in pkb required when you want to add a new IO IT. You
> can always spin up the data store yourself, and invoke the integration test
> directly.
>
> Drafts of these changes can be seen at [0] and [1] - however, I don't
> expect most folks will care about these changes other than "how do I invoke
> this?", so let me know if you have comments about how this is invoked.
>
> S
>
> [0] pom changes hooking up the call to pkb -
> https://github.com/ssisk/beam/commit/eec7cb5b71330761e71850e8e6f65f34249641b0
> [1] pkb changes enabling kubernetes spin up-
> https://github.com/ssisk/PerfKitBenchmarker/commits/kubernetes_create
> (last 2 changes)
>


Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Raghu Angadi
I would like to request merging two Kafka related PRs : #3461
, #3492
. Especially the second one, as
it improves user experience in case of server misconfiguration that
prevents connections between workers and the Kafka cluster.

On Wed, Jul 5, 2017 at 8:10 AM, Jean-Baptiste Onofré 
wrote:

> FYI, the release branch has been created.
>
> I plan to do the RC1 tomorrow, so you have time to cherry-pick if wanted ;)
>
> Regards
> JB
>
>
> On 07/05/2017 07:52 AM, Jean-Baptiste Onofré wrote:
>
>> Hi,
>>
>> I'm building with the last changes and I will cut the release branch just
>> after.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 07/03/2017 05:37 PM, Jean-Baptiste Onofré wrote:
>>
>>> Hi guys,
>>>
>>> The 2.1.0 release branch will be great in a hour or so.
>>>
>>> I updated Jira, please, take a look and review the one assigned to you
>>> where I left a comment.
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>> On 07/01/2017 07:06 AM, Jean-Baptiste Onofré wrote:
>>>
 It sounds good Kenn. Thanks.

 I will ask in the Jira.

 Thanks !
 Regards
 JB

 On 07/01/2017 06:58 AM, Kenneth Knowles wrote:

> SGTM
>
> There are still 23 open issues tagged with 2.1.0. Since this is not
> reduced
> from last time, I think it is fair to ask them to be cherry-picked to
> the
> release branch or deferred.
>
> To the assignees of these issues: can you please evaluate whether
> completion is imminent?
>
> I want to also note that many PMC members have Monday and Tuesday off,
> providing a strong incentive to take the whole week off. So I suggest
> July
> 10 as the earliest day for RC1.
>
> On Fri, Jun 30, 2017 at 8:53 PM, Jean-Baptiste Onofré  >
> wrote:
>
> Hi,
>>
>> The build is now back to normal, I will create the release branch
>> today.
>>
>> Regards
>> JB
>>
>>
>> On 06/29/2017 03:22 PM, Jean-Baptiste Onofré wrote:
>>
>> FYI,
>>>
>>> I opened https://github.com/apache/beam/pull/3471 to fix the
>>> SpannerIO
>>> test on my machine. I don't understand how the test can pass without
>>> defining the project ID (it should always fail on the precondition).
>>>
>>> I will create the release branch once this PR is merged.
>>>
>>> Regards
>>> JB
>>>
>>> On 06/29/2017 06:29 AM, Jean-Baptiste Onofré wrote:
>>>
>>> Hi Stephen,

 Thanks for the update.

 I have an issue on my machine with SpannerIOTest. I will create the
 release branch as soon as this is fix. Then, we will be able to
 cherry-pick
 the fix we want.

 I keep you posted.

 Regards
 JB

 On 06/28/2017 09:37 PM, Stephen Sisk wrote:

 hi!
>
> I'm hopeful we can get the fix for BEAM-2533 into this release as
> well,
> there's a bigtable fix in the next version that'd be good to have.
> The
> bigtable client release should be in the next day or two.
>
> S
>
> On Mon, Jun 26, 2017 at 12:03 PM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> wrote:
>
> Hi guys,
>
>>
>> just a quick update about the 2.1.0 release.
>>
>> I will complete the Jira triage tomorrow.
>>
>> I plan to create the release branch Wednesday.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 06/22/2017 04:23 AM, Jean-Baptiste Onofré wrote:
>>
>> Hi guys,
>>>
>>> As we released 2.0.0 (first stable release) last month during
>>> ApacheCon,
>>>
>>> and to
>>
>> maintain our release pace, I would like to release 2.1.0 next
>>> week.
>>>
>>> This release would include lot of bug fixes and some new
>>> features:
>>>
>>> https://issues.apache.org/jira/projects/BEAM/versions/12340528
>>>
>>> I'm volunteer to be release manager for this one.
>>>
>>> Thoughts ?
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
>

>>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>

>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
I wonder if the watermark is accidentally advancing too early, causing Apex
to shut down the pipeline before the final finalize DoFn executes?

On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise  wrote:

> I don't think this is a problem with the test and if anything this problem
> to me shows the test is useful in catching similar issues during unit test
> runs.
>
> Is there any form of asynchronous/trigger based processing in this pipeline
> that could cause this?
>
> The Apex runner will shutdown the pipeline after the final watermark, the
> shutdown signal traverses the pipeline just like a watermark, but it is not
> seen by user code.
>
> Thomas
>
> --
> sent from mobile
> On Jul 5, 2017 1:19 PM, "Kenneth Knowles"  wrote:
>
> > Upon further investigation, this tests always writes to
> > ./target/wordcountresult-0-of-2 and
> > ./target/wordcountresult-1-of-2. So after a successful test run,
> > any further run without a `clean` will spuriously succeed. I was running
> > via IntelliJ so did not do the ritual `mvn clean` workaround. So
> > reproduction appears to be easy and we could fix the test (if we don't
> > remove it) to use a fresh temp dir.
> >
> > This seems to point to a bug in waitUntilFinish() and/or Apex if the
> > topology is shut down before this ParDo is run. This is a ParDo with
> > trivial bounded input but with side inputs. So I would guess the bug is
> > either in watermark tracking / readiness of the side input or just how
> > PushbackSideInputDoFnRunner is used.
> >
> > On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax 
> > wrote:
> >
> > > I've done a bit more debugging with logging. It appears that the
> finalize
> > > ParDo is never being invoked in this Apex test (or at least the
> LOG.info
> > in
> > > that ParDo never runs). This ParDo is run on a constant element (code
> > > snippet below), so it should always run.
> > >
> > > PCollection singletonCollection = p.apply(Create.of((Void)
> null));
> > > singletonCollection
> > > .apply("Finalize", ParDo.of(new DoFn() {
> > >   @ProcessElement
> > >   public void processElement(ProcessContext c) throws Exception {
> > > LOG.info("Finalizing write operation {}.", writeOperation);
> > >
> > >
> > > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles
>  > >
> > > wrote:
> > >
> > > > Data-dependent file destinations is a pretty great feature. We also
> > have
> > > > another change to make to this @Experimental feature, and it would be
> > > nice
> > > > to get them both into 2.1.0 if we can unblock this quickly.
> > > >
> > > > I just tried this too, and failed to reproduce it. But Jenkins and
> > Reuven
> > > > both have a reliable repro.
> > > >
> > > > Questionss:
> > > >
> > > >  - Any ideas about how these configurations differ?
> > > >  - Does this actually affect users?
> > > >  - Once we have another test that catches this issue, can we delete
> > this
> > > > test?
> > > >
> > > > Every other test passes, including the actual example WordCountIT.
> > Since
> > > > the PR doesn't change primitives, it also seems like it is an
> existing
> > > > issue. And the test seems redundant with our other testing but won't
> > get
> > > as
> > > > much maintenance attention. I don't want to stop catching whatever
> this
> > > > issue is, though.
> > > >
> > > > Kenn
> > > >
> > > > On Wed, Jul 5, 2017 at 10:31 AM, Reuven Lax  >
> > > > wrote:
> > > >
> > > > > Hi Thomas,
> > > > >
> > > > > This only happens with https://github.com/apache/beam/pull/3356.
> > > > >
> > > > > Reuven
> > > > >
> > > > > On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise 
> wrote:
> > > > >
> > > > > > Hi Reuven,
> > > > > >
> > > > > > I'm not able to reproduce the issue locally. I was hoping to see
> > > which
> > > > > > thread is attempting to emit the results. In Apex, only the
> > operator
> > > > > thread
> > > > > > can emit the results, any other thread that is launched by the
> > > operator
> > > > > > cannot. I'm not aware of ParDo managing separate threads though
> and
> > > > > assume
> > > > > > this must be a race. If you still have the log, can you send it
> to
> > > me?
> > > > > >
> > > > > > Thanks,
> > > > > > Thomas
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Jul 1, 2017 at 5:51 AM, Reuven Lax
> >  > > >
> > > > > > wrote:
> > > > > >
> > > > > > > pr/3356 fails in the Apex WordCountTest. The failed test is
> here
> > > > > > >  > > > > > > MavenInstall/12829/org.apache.beam$beam-runners-apex/
> > > > > > > testReport/org.apache.beam.runners.apex.examples/
> WordCountTest/
> > > > > > > testWordCountExample/>
> > > > > > > :
> > > > > > >
> > > > > > > Upon debugging, it looks like this is likely a problem in the
> > Apex
> > > > > runner
> > > > > > > itself. A ParDo calls 

Re: BeamSQL status and merge to master

2017-07-05 Thread Jesse Anderson
So excited to start using this!

On Wed, Jul 5, 2017, 3:34 PM Mingmin Xu  wrote:

> Thanks for everybody's effort, we're very close to finish existing tasks.
> Here's an status update of SQL DSL, feel free to have a try and share any
> comment:
>
> *1. what's done*
>   DSL feature is done, with basic filter/project/aggregation/union/join,
> built-in functions/UDF/UDAF(pending on #3491)
>
> *2. what's on-going*
>   more unit tests, and documentation of README/Beam web.
>
> *3. open questions*
>   BEAM-2441  want to see
> any suggestion on the proper module name for SQL work. As mentioned in
> task, '*dsl/sql* is for the Java SDK and also prevents alternative language
> implementations, however there's another SQL client and not good to be
> included as Java SDK extention'.
>
> ---
> *How to run the example* beam/dsls/sql/example/BeamSqlExample.java
> <
> https://github.com/apache/beam/blob/DSL_SQL/dsls/sql/src/main/java/org/apache/beam/dsls/sql/example/BeamSqlExample.java
> >
> 1. run 'mvn install' to avoid the error in #3439
> 
> 2. run 'mvn -pl dsls/sql compile exec:java
> -Dexec.mainClass=org.apache.beam.dsls.sql.example.BeamSqlExample
> -Dexec.args="--runner=DirectRunner" -Pdirect-runner'
>
> FYI:
> 1. burn-down list in google doc
>
> https://docs.google.com/document/d/1EHZgSu4Jd75iplYpYT_K_JwSZxL2DWG8kv_EmQzNXFc/edit?usp=sharing
> 2. JIRA tasks with label 'dsl_sql_merge'
>
> https://issues.apache.org/jira/browse/BEAM-2555?jql=labels%20%3D%20dsl_sql_merge
>
>
> Mingmin
>
> On Tue, Jun 13, 2017 at 8:51 AM, Lukasz Cwik 
> wrote:
>
> > Nevermind, I merged it into #2 about usability.
> >
> > On Tue, Jun 13, 2017 at 8:50 AM, Lukasz Cwik  wrote:
> >
> > > I added a section about maven module structure/packaging (#6).
> > >
> > > On Tue, Jun 13, 2017 at 8:30 AM, Tyler Akidau
>  > >
> > > wrote:
> > >
> > >> Thanks Mingmin. I've copied your list into a doc[1] to make it easier
> to
> > >> collaborate on comments and edits.
> > >>
> > >> [1] https://s.apache.org/beam-dsl-sql-burndown
> > >>
> > >> -Tyler
> > >>
> > >>
> > >> On Mon, Jun 12, 2017 at 10:09 PM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > >> wrote:
> > >>
> > >> > Hi Mingmin
> > >> >
> > >> > Sorry, the meeting was in the middle of the night for me and I
> wasn't
> > >> able
> > >> > to
> > >> > make it.
> > >> >
> > >> > The timing and checklist look good to me.
> > >> >
> > >> > We plan to do a Beam release end of June, so, merging in July means
> we
> > >> can
> > >> > include it in the next release.
> > >> >
> > >> > Thanks !
> > >> > Regards
> > >> > JB
> > >> >
> > >> > On 06/13/2017 03:06 AM, Mingmin Xu wrote:
> > >> > > Hi all,
> > >> > >
> > >> > > Thanks to join the meeting. As discussed, we're planning to merge
> > >> DSL_SQL
> > >> > > branch back to master, targeted in the middle of July. A tag
> > >> > > 'dsl_sql_merge'[1] is created to track all todo tasks.
> > >> > >
> > >> > > *What's added in Beam SQL?*
> > >> > > BeamSQL provides the capability to execute SQL queries with Beam
> > Java
> > >> > SDK,
> > >> > > either by translating SQL to a PTransform, or run with a
> standalone
> > >> CLI
> > >> > > client.
> > >> > >
> > >> > > *Checklist for merge:*
> > >> > > 1. functionality
> > >> > >1.1. SQL grammer:
> > >> > >  1.1.1. basic query with SELECT/FILTER/PROJECT;
> > >> > >  1.1.2. AGGREGATION with global window;
> > >> > >  1.1.3. AGGREGATION with FIX_TIME/SLIDING_TIME/SESSION window;
> > >> > >  1.1.4. JOIN
> > >> > >1.2. UDF/UDAF support;
> > >> > >1.3. support predefined String/Math/Date functions, see[2];
> > >> > >
> > >> > > 2. DSL interface to convert SQL as PTransform;
> > >> > >
> > >> > > 3. junit test;
> > >> > >
> > >> > > 4. Java document;
> > >> > >
> > >> > > 5. Document of SQL feature in website;
> > >> > >
> > >> > > Any comments/suggestions are very welcomed.
> > >> > >
> > >> > > Note:
> > >> > > [1].
> > >> > >
> > >> > https://issues.apache.org/jira/browse/BEAM-2436?jql=labels%
> > >> 20%3D%20dsl_sql_merge
> > >> > >
> > >> > > [2]. https://calcite.apache.org/docs/reference.html
> > >> > >
> > >> >
> > >> > --
> > >> > Jean-Baptiste Onofré
> > >> > jbono...@apache.org
> > >> > http://blog.nanthrax.net
> > >> > Talend - http://www.talend.com
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
> 
> Mingmin
>
-- 
Thanks,

Jesse


Re: BeamSQL status and merge to master

2017-07-05 Thread Mingmin Xu
Thanks for everybody's effort, we're very close to finish existing tasks.
Here's an status update of SQL DSL, feel free to have a try and share any
comment:

*1. what's done*
  DSL feature is done, with basic filter/project/aggregation/union/join,
built-in functions/UDF/UDAF(pending on #3491)

*2. what's on-going*
  more unit tests, and documentation of README/Beam web.

*3. open questions*
  BEAM-2441  want to see
any suggestion on the proper module name for SQL work. As mentioned in
task, '*dsl/sql* is for the Java SDK and also prevents alternative language
implementations, however there's another SQL client and not good to be
included as Java SDK extention'.

---
*How to run the example* beam/dsls/sql/example/BeamSqlExample.java

1. run 'mvn install' to avoid the error in #3439

2. run 'mvn -pl dsls/sql compile exec:java
-Dexec.mainClass=org.apache.beam.dsls.sql.example.BeamSqlExample
-Dexec.args="--runner=DirectRunner" -Pdirect-runner'

FYI:
1. burn-down list in google doc
https://docs.google.com/document/d/1EHZgSu4Jd75iplYpYT_K_JwSZxL2DWG8kv_EmQzNXFc/edit?usp=sharing
2. JIRA tasks with label 'dsl_sql_merge'
https://issues.apache.org/jira/browse/BEAM-2555?jql=labels%20%3D%20dsl_sql_merge


Mingmin

On Tue, Jun 13, 2017 at 8:51 AM, Lukasz Cwik 
wrote:

> Nevermind, I merged it into #2 about usability.
>
> On Tue, Jun 13, 2017 at 8:50 AM, Lukasz Cwik  wrote:
>
> > I added a section about maven module structure/packaging (#6).
> >
> > On Tue, Jun 13, 2017 at 8:30 AM, Tyler Akidau  >
> > wrote:
> >
> >> Thanks Mingmin. I've copied your list into a doc[1] to make it easier to
> >> collaborate on comments and edits.
> >>
> >> [1] https://s.apache.org/beam-dsl-sql-burndown
> >>
> >> -Tyler
> >>
> >>
> >> On Mon, Jun 12, 2017 at 10:09 PM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Hi Mingmin
> >> >
> >> > Sorry, the meeting was in the middle of the night for me and I wasn't
> >> able
> >> > to
> >> > make it.
> >> >
> >> > The timing and checklist look good to me.
> >> >
> >> > We plan to do a Beam release end of June, so, merging in July means we
> >> can
> >> > include it in the next release.
> >> >
> >> > Thanks !
> >> > Regards
> >> > JB
> >> >
> >> > On 06/13/2017 03:06 AM, Mingmin Xu wrote:
> >> > > Hi all,
> >> > >
> >> > > Thanks to join the meeting. As discussed, we're planning to merge
> >> DSL_SQL
> >> > > branch back to master, targeted in the middle of July. A tag
> >> > > 'dsl_sql_merge'[1] is created to track all todo tasks.
> >> > >
> >> > > *What's added in Beam SQL?*
> >> > > BeamSQL provides the capability to execute SQL queries with Beam
> Java
> >> > SDK,
> >> > > either by translating SQL to a PTransform, or run with a standalone
> >> CLI
> >> > > client.
> >> > >
> >> > > *Checklist for merge:*
> >> > > 1. functionality
> >> > >1.1. SQL grammer:
> >> > >  1.1.1. basic query with SELECT/FILTER/PROJECT;
> >> > >  1.1.2. AGGREGATION with global window;
> >> > >  1.1.3. AGGREGATION with FIX_TIME/SLIDING_TIME/SESSION window;
> >> > >  1.1.4. JOIN
> >> > >1.2. UDF/UDAF support;
> >> > >1.3. support predefined String/Math/Date functions, see[2];
> >> > >
> >> > > 2. DSL interface to convert SQL as PTransform;
> >> > >
> >> > > 3. junit test;
> >> > >
> >> > > 4. Java document;
> >> > >
> >> > > 5. Document of SQL feature in website;
> >> > >
> >> > > Any comments/suggestions are very welcomed.
> >> > >
> >> > > Note:
> >> > > [1].
> >> > >
> >> > https://issues.apache.org/jira/browse/BEAM-2436?jql=labels%
> >> 20%3D%20dsl_sql_merge
> >> > >
> >> > > [2]. https://calcite.apache.org/docs/reference.html
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbono...@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >
> >
>



-- 

Mingmin


Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
hey all,

I wanted to share an early draft of what it'll be like to invoke mvn for
the IO integration tests in the future when we have the integration with
kubernetes going.

I'm really excited about these changes - working on the IO ITs, I have to
run them frequently, and the command lines to run them can be quite a bear.
For example:

mvn -e verify -Dit.test=org.apache.beam.sdk.io.jdbc.JdbcIOIT
-DskipITs=false -pl sdks/java/io/jdbc -Pio-it -Pdataflow-runner
-DintegrationTestPipelineOptions=["--project=[project]","--gcpTempLocation=gs://[bucket]/staging","--postgresUsername=postgres","--postgresPassword=uuinkks","--postgresDatabaseName=postgres","--postgresSsl=False","--postgresServerName=[1.2.3.4]","--runner=TestDataflowRunner","--defaultWorkerLogLevel=INFO"]

Also, in order to run this, I first need to have created an instance of
this datastore in kubernetes and then copied the parameter and inevitably I
mis-copy something in there or something changes, so it doesn't work
correctly and I have to go back in and edit it.

So that's a pain.

To invoke the IO ITs in the future, it'll be a command like this:
  mvn verify -Pio-it-suite -pl sdks/java/io/jdbc
  -DpkbLocation="path-to-pkb.py" \
  -DintegrationTestPipelineOptions='["--tempRoot=my-temp-root"]'
(or at least, that's what I'm proposing :)

This will run the jdbc integration tests, spinning up the data store for
that integration test in your kubernetes cluster.

This is all enabled by a combination of adding new profiles in maven for
each IO and changes to the beam benchmarks in pkb (perfkitbenchmarker) to
control kubernetes. Jason has already done a lot of work to get pkb working
to run our regular benchmarks, and I'm excited to continue that work for IO
ITs. We use pkb to control kubernetes and capture our benchmark times. This
means you'll need to install pkb if you'd like to use this nicer
experience, however, devs will never have to use pkb if they don't want to,
nor is making changes in pkb required when you want to add a new IO IT. You
can always spin up the data store yourself, and invoke the integration test
directly.

Drafts of these changes can be seen at [0] and [1] - however, I don't
expect most folks will care about these changes other than "how do I invoke
this?", so let me know if you have comments about how this is invoked.

S

[0] pom changes hooking up the call to pkb -
https://github.com/ssisk/beam/commit/eec7cb5b71330761e71850e8e6f65f34249641b0
[1] pkb changes enabling kubernetes spin up-
https://github.com/ssisk/PerfKitBenchmarker/commits/kubernetes_create (last
2 changes)


Re: Failure in Apex runner

2017-07-05 Thread Kenneth Knowles
There is no asynchronous behavior in this test. It is basically a "batch"
test, here:
https://github.com/apache/beam/blob/master/runners/apex/src/test/java/org/apache/beam/runners/apex/examples/WordCountTest.java#L117

The pipeline is:

p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
  .apply(ParDo.of(new ExtractWordsFn()))
  .apply(Count.perElement())
  .apply(ParDo.of(new FormatAsStringFn()))
  .apply("WriteCounts", TextIO.write().to(options.getOutput()))
  ;

It runs this on a hardcoded input file and verifies the two expected output
files have hardcoded hashes. The files are never renamed from their
temporary destinations to their final destinations, since that transform
(the finalizing sub-transform of TextIO) is never run.

On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise  wrote:

> I don't think this is a problem with the test and if anything this problem
> to me shows the test is useful in catching similar issues during unit test
> runs.
>
> Is there any form of asynchronous/trigger based processing in this pipeline
> that could cause this?
>
> The Apex runner will shutdown the pipeline after the final watermark, the
> shutdown signal traverses the pipeline just like a watermark, but it is not
> seen by user code.
>
> Thomas
>
> --
> sent from mobile
> On Jul 5, 2017 1:19 PM, "Kenneth Knowles"  wrote:
>
> > Upon further investigation, this tests always writes to
> > ./target/wordcountresult-0-of-2 and
> > ./target/wordcountresult-1-of-2. So after a successful test run,
> > any further run without a `clean` will spuriously succeed. I was running
> > via IntelliJ so did not do the ritual `mvn clean` workaround. So
> > reproduction appears to be easy and we could fix the test (if we don't
> > remove it) to use a fresh temp dir.
> >
> > This seems to point to a bug in waitUntilFinish() and/or Apex if the
> > topology is shut down before this ParDo is run. This is a ParDo with
> > trivial bounded input but with side inputs. So I would guess the bug is
> > either in watermark tracking / readiness of the side input or just how
> > PushbackSideInputDoFnRunner is used.
> >
> > On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax 
> > wrote:
> >
> > > I've done a bit more debugging with logging. It appears that the
> finalize
> > > ParDo is never being invoked in this Apex test (or at least the
> LOG.info
> > in
> > > that ParDo never runs). This ParDo is run on a constant element (code
> > > snippet below), so it should always run.
> > >
> > > PCollection singletonCollection = p.apply(Create.of((Void)
> null));
> > > singletonCollection
> > > .apply("Finalize", ParDo.of(new DoFn() {
> > >   @ProcessElement
> > >   public void processElement(ProcessContext c) throws Exception {
> > > LOG.info("Finalizing write operation {}.", writeOperation);
> > >
> > >
> > > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles
>  > >
> > > wrote:
> > >
> > > > Data-dependent file destinations is a pretty great feature. We also
> > have
> > > > another change to make to this @Experimental feature, and it would be
> > > nice
> > > > to get them both into 2.1.0 if we can unblock this quickly.
> > > >
> > > > I just tried this too, and failed to reproduce it. But Jenkins and
> > Reuven
> > > > both have a reliable repro.
> > > >
> > > > Questionss:
> > > >
> > > >  - Any ideas about how these configurations differ?
> > > >  - Does this actually affect users?
> > > >  - Once we have another test that catches this issue, can we delete
> > this
> > > > test?
> > > >
> > > > Every other test passes, including the actual example WordCountIT.
> > Since
> > > > the PR doesn't change primitives, it also seems like it is an
> existing
> > > > issue. And the test seems redundant with our other testing but won't
> > get
> > > as
> > > > much maintenance attention. I don't want to stop catching whatever
> this
> > > > issue is, though.
> > > >
> > > > Kenn
> > > >
> > > > On Wed, Jul 5, 2017 at 10:31 AM, Reuven Lax  >
> > > > wrote:
> > > >
> > > > > Hi Thomas,
> > > > >
> > > > > This only happens with https://github.com/apache/beam/pull/3356.
> > > > >
> > > > > Reuven
> > > > >
> > > > > On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise 
> wrote:
> > > > >
> > > > > > Hi Reuven,
> > > > > >
> > > > > > I'm not able to reproduce the issue locally. I was hoping to see
> > > which
> > > > > > thread is attempting to emit the results. In Apex, only the
> > operator
> > > > > thread
> > > > > > can emit the results, any other thread that is launched by the
> > > operator
> > > > > > cannot. I'm not aware of ParDo managing separate threads though
> and
> > > > > assume
> > > > > > this must be a race. If you still have the log, can you send it
> to
> > > me?
> > > > > >
> > > > > > Thanks,
> > > > > > Thomas
> > > > > >
> > > > > >
> > 

[Proposal] Submitting pipelines to Runners in another language

2017-07-05 Thread Sourabh Bajaj
Hi,

I wanted to share a proposal for submitting pipelines from SDK X
(Python/Go) to runners written in another language Y (Java) (Flink / Spark
/ Apex) using the Runner API. Please find the doc here

.

As always comments and feedback are welcome.

Thanks
Sourabh


Re: Build Failure in * release-2.0.0

2017-07-05 Thread Jyotirmoy Sundi
Thanks Ted

On Wed, Jul 5, 2017 at 1:42 PM Ted Yu  wrote:

> bq. Caused by: java.net.SocketException: Too many open files
>
> Please adjust ulimit.
>
> FYI
>
> On Wed, Jul 5, 2017 at 1:33 PM, Jyotirmoy Sundi 
> wrote:
>
> > Hi Folks ,
> >
> > Any idea why the build is failing in release-2.0.0 , i did "mvn clean
> > package"
> >
> >
> > *Trace*
> >
> > [INFO] Running org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
> >
> > [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> > 0.461 s - in org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
> >
> > [INFO] Running org.apache.beam.sdk.io.hbase.HBaseIOTest
> >
> > [ERROR] Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > 4.504 s <<< FAILURE! - in org.apache.beam.sdk.io.hbase.HBaseIOTest
> >
> > [ERROR] testReadingWithKeyRange(org.apache.beam.sdk.io.hbase.HBaseIOTest)
> > Time
> > elapsed: 4.504 s  <<< ERROR!
> >
> > java.lang.RuntimeException:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=1, exceptions:
> >
> > Wed Jul 05 13:31:23 PDT 2017,
> > RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> > java.net.SocketException: Too many open files
> >
> >
> > at
> > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> > waitUntilFinish(DirectRunner.java:330)
> >
> > at
> > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> > waitUntilFinish(DirectRunner.java:292)
> >
> > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
> >
> > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
> >
> > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
> >
> > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> >
> > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)
> >
> > at
> > org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(
> > HBaseIOTest.java:418)
> >
> > at
> > org.apache.beam.sdk.io.hbase.HBaseIOTest.testReadingWithKeyRange(
> > HBaseIOTest.java:253)
> >
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> > 62)
> >
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> >
> > at java.lang.reflect.Method.invoke(Method.java:498)
> >
> > at
> > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> > FrameworkMethod.java:50)
> >
> > at
> > org.junit.internal.runners.model.ReflectiveCallable.run(
> > ReflectiveCallable.java:12)
> >
> > at
> > org.junit.runners.model.FrameworkMethod.invokeExplosively(
> > FrameworkMethod.java:47)
> >
> > at
> > org.junit.internal.runners.statements.InvokeMethod.
> > evaluate(InvokeMethod.java:17)
> >
> > at
> >
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
> >
> > at
> > org.junit.rules.ExpectedException$ExpectedExceptionStatement.
> > evaluate(ExpectedException.java:239)
> >
> > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> >
> > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> >
> > at
> > org.junit.runners.BlockJUnit4ClassRunner.runChild(
> > BlockJUnit4ClassRunner.java:78)
> >
> > at
> > org.junit.runners.BlockJUnit4ClassRunner.runChild(
> > BlockJUnit4ClassRunner.java:57)
> >
> > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> >
> > at
> >
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
> >
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> >
> > at java.lang.Thread.run(Thread.java:745)
> >
> > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Failed
> > after attempts=1, exceptions:
> >
> > Wed Jul 05 13:31:23 PDT 2017,
> > RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> > java.net.SocketException: Too many open files
> >
> >
> > at
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> > RpcRetryingCaller.java:157)
> >
> > at
> > org.apache.hadoop.hbase.client.ResultBoundedCompletionService
> > $QueueingFuture.run(ResultBoundedCompletionService.java:65)
> >
> > ... 3 more
> >
> > Caused by: java.net.SocketException: Too many open files
> >
> > at sun.nio.ch.Net.socket0(Native Method)
> >
> > at sun.nio.ch.Net.socket(Net.java:411)
> >
> > at sun.nio.ch.Net.socket(Net.java:404)
> >
> > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
> >
> > at
> > sun.nio.ch.SelectorProviderImpl.openSocketChannel(
> > SelectorProviderImpl.java:60)
> >
> > at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
> >
> > at
> > 

Re: Failure in Apex runner

2017-07-05 Thread Thomas Weise
I don't think this is a problem with the test and if anything this problem
to me shows the test is useful in catching similar issues during unit test
runs.

Is there any form of asynchronous/trigger based processing in this pipeline
that could cause this?

The Apex runner will shutdown the pipeline after the final watermark, the
shutdown signal traverses the pipeline just like a watermark, but it is not
seen by user code.

Thomas

--
sent from mobile
On Jul 5, 2017 1:19 PM, "Kenneth Knowles"  wrote:

> Upon further investigation, this tests always writes to
> ./target/wordcountresult-0-of-2 and
> ./target/wordcountresult-1-of-2. So after a successful test run,
> any further run without a `clean` will spuriously succeed. I was running
> via IntelliJ so did not do the ritual `mvn clean` workaround. So
> reproduction appears to be easy and we could fix the test (if we don't
> remove it) to use a fresh temp dir.
>
> This seems to point to a bug in waitUntilFinish() and/or Apex if the
> topology is shut down before this ParDo is run. This is a ParDo with
> trivial bounded input but with side inputs. So I would guess the bug is
> either in watermark tracking / readiness of the side input or just how
> PushbackSideInputDoFnRunner is used.
>
> On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax 
> wrote:
>
> > I've done a bit more debugging with logging. It appears that the finalize
> > ParDo is never being invoked in this Apex test (or at least the LOG.info
> in
> > that ParDo never runs). This ParDo is run on a constant element (code
> > snippet below), so it should always run.
> >
> > PCollection singletonCollection = p.apply(Create.of((Void) null));
> > singletonCollection
> > .apply("Finalize", ParDo.of(new DoFn() {
> >   @ProcessElement
> >   public void processElement(ProcessContext c) throws Exception {
> > LOG.info("Finalizing write operation {}.", writeOperation);
> >
> >
> > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles  >
> > wrote:
> >
> > > Data-dependent file destinations is a pretty great feature. We also
> have
> > > another change to make to this @Experimental feature, and it would be
> > nice
> > > to get them both into 2.1.0 if we can unblock this quickly.
> > >
> > > I just tried this too, and failed to reproduce it. But Jenkins and
> Reuven
> > > both have a reliable repro.
> > >
> > > Questionss:
> > >
> > >  - Any ideas about how these configurations differ?
> > >  - Does this actually affect users?
> > >  - Once we have another test that catches this issue, can we delete
> this
> > > test?
> > >
> > > Every other test passes, including the actual example WordCountIT.
> Since
> > > the PR doesn't change primitives, it also seems like it is an existing
> > > issue. And the test seems redundant with our other testing but won't
> get
> > as
> > > much maintenance attention. I don't want to stop catching whatever this
> > > issue is, though.
> > >
> > > Kenn
> > >
> > > On Wed, Jul 5, 2017 at 10:31 AM, Reuven Lax 
> > > wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > This only happens with https://github.com/apache/beam/pull/3356.
> > > >
> > > > Reuven
> > > >
> > > > On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise  wrote:
> > > >
> > > > > Hi Reuven,
> > > > >
> > > > > I'm not able to reproduce the issue locally. I was hoping to see
> > which
> > > > > thread is attempting to emit the results. In Apex, only the
> operator
> > > > thread
> > > > > can emit the results, any other thread that is launched by the
> > operator
> > > > > cannot. I'm not aware of ParDo managing separate threads though and
> > > > assume
> > > > > this must be a race. If you still have the log, can you send it to
> > me?
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 1, 2017 at 5:51 AM, Reuven Lax
>  > >
> > > > > wrote:
> > > > >
> > > > > > pr/3356 fails in the Apex WordCountTest. The failed test is here
> > > > > >  > > > > > MavenInstall/12829/org.apache.beam$beam-runners-apex/
> > > > > > testReport/org.apache.beam.runners.apex.examples/WordCountTest/
> > > > > > testWordCountExample/>
> > > > > > :
> > > > > >
> > > > > > Upon debugging, it looks like this is likely a problem in the
> Apex
> > > > runner
> > > > > > itself. A ParDo calls output(), and that triggers an exception
> > thrown
> > > > > from
> > > > > > inside the Apex runner. The Apex runner calls emit on a
> > > > DefaultOutputPort
> > > > > > (ApexParDoOperator.java:275), and that throws an exception inside
> > of
> > > > > > verifyOperatorThread().
> > > > > >
> > > > > > I'm going to ignore this failure for now as it seems unrelated to
> > my
> > > > PR,
> > > > > > but does someone want to take a look?
> > > > > >
> > > > > > Reuven
> > > > > >
> > > > >

Re: Build Failure in * release-2.0.0

2017-07-05 Thread Ted Yu
bq. Caused by: java.net.SocketException: Too many open files

Please adjust ulimit.

FYI

On Wed, Jul 5, 2017 at 1:33 PM, Jyotirmoy Sundi  wrote:

> Hi Folks ,
>
> Any idea why the build is failing in release-2.0.0 , i did "mvn clean
> package"
>
>
> *Trace*
>
> [INFO] Running org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
>
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> 0.461 s - in org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
>
> [INFO] Running org.apache.beam.sdk.io.hbase.HBaseIOTest
>
> [ERROR] Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 4.504 s <<< FAILURE! - in org.apache.beam.sdk.io.hbase.HBaseIOTest
>
> [ERROR] testReadingWithKeyRange(org.apache.beam.sdk.io.hbase.HBaseIOTest)
> Time
> elapsed: 4.504 s  <<< ERROR!
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=1, exceptions:
>
> Wed Jul 05 13:31:23 PDT 2017,
> RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> java.net.SocketException: Too many open files
>
>
> at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> waitUntilFinish(DirectRunner.java:330)
>
> at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> waitUntilFinish(DirectRunner.java:292)
>
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
>
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
>
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
>
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
>
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)
>
> at
> org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(
> HBaseIOTest.java:418)
>
> at
> org.apache.beam.sdk.io.hbase.HBaseIOTest.testReadingWithKeyRange(
> HBaseIOTest.java:253)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> FrameworkMethod.java:50)
>
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(
> ReflectiveCallable.java:12)
>
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(
> FrameworkMethod.java:47)
>
> at
> org.junit.internal.runners.statements.InvokeMethod.
> evaluate(InvokeMethod.java:17)
>
> at
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>
> at
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.
> evaluate(ExpectedException.java:239)
>
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:78)
>
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:57)
>
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>
> at
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed
> after attempts=1, exceptions:
>
> Wed Jul 05 13:31:23 PDT 2017,
> RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> java.net.SocketException: Too many open files
>
>
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:157)
>
> at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService
> $QueueingFuture.run(ResultBoundedCompletionService.java:65)
>
> ... 3 more
>
> Caused by: java.net.SocketException: Too many open files
>
> at sun.nio.ch.Net.socket0(Native Method)
>
> at sun.nio.ch.Net.socket(Net.java:411)
>
> at sun.nio.ch.Net.socket(Net.java:404)
>
> at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>
> at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(
> SelectorProviderImpl.java:60)
>
> at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>
> at
> org.apache.hadoop.net.StandardSocketFactory.createSocket(
> StandardSocketFactory.java:62)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> setupConnection(RpcClientImpl.java:410)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> setupIOstreams(RpcClientImpl.java:722)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> writeRequest(RpcClientImpl.java:906)
>
> at
> 

Build Failure in * release-2.0.0

2017-07-05 Thread Jyotirmoy Sundi
Hi Folks ,

Any idea why the build is failing in release-2.0.0 , i did "mvn clean
package"


*Trace*

[INFO] Running org.apache.beam.sdk.io.hbase.HBaseResultCoderTest

[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.461 s - in org.apache.beam.sdk.io.hbase.HBaseResultCoderTest

[INFO] Running org.apache.beam.sdk.io.hbase.HBaseIOTest

[ERROR] Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
4.504 s <<< FAILURE! - in org.apache.beam.sdk.io.hbase.HBaseIOTest

[ERROR] testReadingWithKeyRange(org.apache.beam.sdk.io.hbase.HBaseIOTest)  Time
elapsed: 4.504 s  <<< ERROR!

java.lang.RuntimeException:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=1, exceptions:

Wed Jul 05 13:31:23 PDT 2017,
RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
java.net.SocketException: Too many open files


at
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:330)

at
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)

at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)

at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)

at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)

at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)

at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)

at
org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(HBaseIOTest.java:418)

at
org.apache.beam.sdk.io.hbase.HBaseIOTest.testReadingWithKeyRange(HBaseIOTest.java:253)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)

at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)

at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

at
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)

at
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)

at org.junit.rules.RunRules.evaluate(RunRules.java:20)

at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)

at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)

at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)

at
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=1, exceptions:

Wed Jul 05 13:31:23 PDT 2017,
RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
java.net.SocketException: Too many open files


at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)

at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)

... 3 more

Caused by: java.net.SocketException: Too many open files

at sun.nio.ch.Net.socket0(Native Method)

at sun.nio.ch.Net.socket(Net.java:411)

at sun.nio.ch.Net.socket(Net.java:404)

at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)

at
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)

at java.nio.channels.SocketChannel.open(SocketChannel.java:145)

at
org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:410)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:722)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)

at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)

at

Re: is there any similar page to build custom io for java

2017-07-05 Thread Jyotirmoy Sundi
Thanks Stephen.

On Wed, Jul 5, 2017 at 9:42 AM, Stephen Sisk 
wrote:

> hi!
>
> I'd suggest checking out the Pipeline I/O page [0] for general IO guidance.
> The Authoring I/O Transforms is not language specific, but combined with
> the PTransform style guide [1], it's the best current resource for java and
> should give you what you need. We're definitely going to be adding more
> content to improve this.
>
> This email list is also a good source of info, we're always happy to help.
>
> You may also consider whether something like JdbcIO or HadoopInputFormatIO
> may meet your needs.
>
> S
> [0] https://beam.apache.org/documentation/io/io-toc/
> [1] https://beam.apache.org/contribute/ptransform-style-guide/
>
> On Wed, Jul 5, 2017 at 8:48 AM Jyotirmoy Sundi  wrote:
>
> > Hi ,
> > I found this for python
> > https://beam.apache.org/documentation/sdks/python-custom-io/ but was
> > wondering if alike exists for java.
> >
> > --
> > Best Regards,
> > Jyotirmoy Sundi
> >
>



-- 
Best Regards,
Jyotirmoy Sundi


Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
I've done a bit more debugging with logging. It appears that the finalize
ParDo is never being invoked in this Apex test (or at least the LOG.info in
that ParDo never runs). This ParDo is run on a constant element (code
snippet below), so it should always run.

PCollection singletonCollection = p.apply(Create.of((Void) null));
singletonCollection
.apply("Finalize", ParDo.of(new DoFn() {
  @ProcessElement
  public void processElement(ProcessContext c) throws Exception {
LOG.info("Finalizing write operation {}.", writeOperation);


On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles 
wrote:

> Data-dependent file destinations is a pretty great feature. We also have
> another change to make to this @Experimental feature, and it would be nice
> to get them both into 2.1.0 if we can unblock this quickly.
>
> I just tried this too, and failed to reproduce it. But Jenkins and Reuven
> both have a reliable repro.
>
> Questionss:
>
>  - Any ideas about how these configurations differ?
>  - Does this actually affect users?
>  - Once we have another test that catches this issue, can we delete this
> test?
>
> Every other test passes, including the actual example WordCountIT. Since
> the PR doesn't change primitives, it also seems like it is an existing
> issue. And the test seems redundant with our other testing but won't get as
> much maintenance attention. I don't want to stop catching whatever this
> issue is, though.
>
> Kenn
>
> On Wed, Jul 5, 2017 at 10:31 AM, Reuven Lax 
> wrote:
>
> > Hi Thomas,
> >
> > This only happens with https://github.com/apache/beam/pull/3356.
> >
> > Reuven
> >
> > On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise  wrote:
> >
> > > Hi Reuven,
> > >
> > > I'm not able to reproduce the issue locally. I was hoping to see which
> > > thread is attempting to emit the results. In Apex, only the operator
> > thread
> > > can emit the results, any other thread that is launched by the operator
> > > cannot. I'm not aware of ParDo managing separate threads though and
> > assume
> > > this must be a race. If you still have the log, can you send it to me?
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > >
> > > On Sat, Jul 1, 2017 at 5:51 AM, Reuven Lax 
> > > wrote:
> > >
> > > > pr/3356 fails in the Apex WordCountTest. The failed test is here
> > > >  > > > MavenInstall/12829/org.apache.beam$beam-runners-apex/
> > > > testReport/org.apache.beam.runners.apex.examples/WordCountTest/
> > > > testWordCountExample/>
> > > > :
> > > >
> > > > Upon debugging, it looks like this is likely a problem in the Apex
> > runner
> > > > itself. A ParDo calls output(), and that triggers an exception thrown
> > > from
> > > > inside the Apex runner. The Apex runner calls emit on a
> > DefaultOutputPort
> > > > (ApexParDoOperator.java:275), and that throws an exception inside of
> > > > verifyOperatorThread().
> > > >
> > > > I'm going to ignore this failure for now as it seems unrelated to my
> > PR,
> > > > but does someone want to take a look?
> > > >
> > > > Reuven
> > > >
> > >
> >
>


Re: Failure in Apex runner

2017-07-05 Thread Reuven Lax
Hi Thomas,

This only happens with https://github.com/apache/beam/pull/3356.

Reuven

On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise  wrote:

> Hi Reuven,
>
> I'm not able to reproduce the issue locally. I was hoping to see which
> thread is attempting to emit the results. In Apex, only the operator thread
> can emit the results, any other thread that is launched by the operator
> cannot. I'm not aware of ParDo managing separate threads though and assume
> this must be a race. If you still have the log, can you send it to me?
>
> Thanks,
> Thomas
>
>
>
> On Sat, Jul 1, 2017 at 5:51 AM, Reuven Lax 
> wrote:
>
> > pr/3356 fails in the Apex WordCountTest. The failed test is here
> >  > MavenInstall/12829/org.apache.beam$beam-runners-apex/
> > testReport/org.apache.beam.runners.apex.examples/WordCountTest/
> > testWordCountExample/>
> > :
> >
> > Upon debugging, it looks like this is likely a problem in the Apex runner
> > itself. A ParDo calls output(), and that triggers an exception thrown
> from
> > inside the Apex runner. The Apex runner calls emit on a DefaultOutputPort
> > (ApexParDoOperator.java:275), and that throws an exception inside of
> > verifyOperatorThread().
> >
> > I'm going to ignore this failure for now as it seems unrelated to my PR,
> > but does someone want to take a look?
> >
> > Reuven
> >
>


Re: [PROPOSAL] External Join with KV Stores

2017-07-05 Thread Lukasz Cwik
Yes, I was thinking the same thing about side inputs. Our current IOs don't
support "seeking" and we could make HBaseIO/JdbcIO/... become seekable by
key+window which would allow a Runner to optimize the Read + SideInput into
any kind of deferred lookup when its accessed as a side input instead of
loading it all into state. The Runner could interrogate the properties of
the "seekable" IO to see if its compatible with what the user is doing
before performing the optimization. Granted I believe it will be difficult
to express when something becomes available, how to handle updates to the
external store, etc...

What I like about modelling it as seekable IOs + Runner optimization is
that users don't need to change their pipeline to get benefits when they
upgrade to newer versions of Apache Beam.

On Tue, Jul 4, 2017 at 9:48 AM, Aljoscha Krettek 
wrote:

> Hi,
>
> This is a very interesting proposal! I read you comment about side inputs
> and I tend to agree, though I think that side inputs don’t have to be
> strictly streams. It’s easily possible to imagine a Beam where a side input
> can be based on an external system and accessing side input simply goes
> through to the external system. In this world, it would be somewhat hard to
> reason about side input availability and making sure to only process main
> input when side-input is available. Though it’s not unsolvable, I think.
>
> What I like about your solution is that it is implementable as a DoFn,
> without any special support by the Runners. However, I think that in the
> Flink Runner it should be possible to execute this with the Async I/O
> operator and therefore get asynchronous accesses to the external system. I
> also think that this is not always better than batching, though.
>
> Best,
> Aljoscha
> > On 3. Jul 2017, at 04:36, JingsongLee  wrote:
> >
> > Hi all:
> > In some scenarios, the user needs to query some information from
> external kv store in the pipeline.I think we can have a good abstraction
> that allows users to get as little as possible with the underlying
> details.Here is a docs of this proposal, would like to receive your
> feedback.
> > https://docs.google.com/document/d/1B-XnUwXh64lbswRieckU0BxtygSV58hy
> sqZbpZmk03A/edit?usp=sharing
> > Best, Jingsong Lee
> >
>
>


Re: Jenkins Executor Issue

2017-07-05 Thread Jason Kuster
This has been resolved, although Infra has not yet determined a root cause
for the issue. If anyone sees this recurring please reply to this thread or
to the JIRA.

On Fri, Jun 30, 2017 at 12:34 PM, Jean-Baptiste Onofré 
wrote:

> Thanks Jason.
>
> Regards
> JB
>
>
> On 06/30/2017 08:19 PM, Jason Kuster wrote:
>
>> Hi all,
>>
>> There appears to be an issue ongoing where our Jenkins jobs are not being
>> scheduled onto our executors. I've filed
>> https://issues.apache.org/jira/browse/INFRA-14476 to track this issue,
>> and
>> will try to communicate back here with relevant status updates.
>>
>> Best,
>>
>> Jason
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


is there any similar page to build custom io for java

2017-07-05 Thread Jyotirmoy Sundi
Hi ,
I found this for python
https://beam.apache.org/documentation/sdks/python-custom-io/ but was
wondering if alike exists for java.

-- 
Best Regards,
Jyotirmoy Sundi