Re: apache beam 2.16.0 ?

2019-09-18 Thread Rui Wang
Hi, You can search(or ask) in dev@ for the progress of 2.16.0. The answer is the release of 2.16.0 is ongoing and it will be released once blockers are solved. -Rui On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote: > Hello. > > I would like to use 2.16.0 to diagnose container problem,

apache beam 2.16.0 ?

2019-09-18 Thread Yu Watanabe
Hello. I would like to use 2.16.0 to diagnose container problem, however, looks like the job-server is not available on maven yet. RuntimeError: Unable to fetch remote job server jar at

Re: Word-count example

2019-09-18 Thread Ankur Goenka
Hi Matthew, Beam 2.16.0 is not yet released hence you are getting the error. Can you try using 2.15.0 version? Thanks, Ankur On Wed, Sep 18, 2019 at 6:59 AM Matthew Patterson wrote: > Tried > > " > mvn archetype:generate \ > -DarchetypeGroupId=org.apache.beam \ >

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-18 Thread Ahmet Altay
I believe the flag was never relevant for PortableRunner. I might be wrong as well. The flag affects a few bits in the core code and that is why the solution cannot be by just setting the flag in Dataflow runner. It requires some amount of clean up. I agree that it would be good to clean this up,

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
I will attempt to do without sharding (though I believe we did do a run without shards and it incurred the extra shuffle costs). Pipeline is simple. The only shuffle that is explicitly defined is the shuffle after merging files together into a single PCollection (Flatten Transform). So it's a

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Chamikara Jayalath
Are you specifying the number of shards to write to: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L859 If so, this will incur an additional shuffle to re-distribute data written by all workers into the given number of shards before

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
batch on dataflowRunner. On Wed, Sep 18, 2019 at 4:05 PM Reuven Lax wrote: > Are you using streaming or batch? Also which runner are you using? > > On Wed, Sep 18, 2019 at 1:57 PM Shannon Duncan > wrote: > >> So I followed up on why TextIO shuffles and dug into the code some. It is >> using

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
So I followed up on why TextIO shuffles and dug into the code some. It is using the shards and getting all the values into a keyed group to write to a single file. However... I wonder if there is way to just take the records that are on a worker and write them out. Thus not needing a shard number

KinesisIO.write() returning NoSuchMethodError: com.google.common.util.concurrent.Futures.addCallback

2019-09-18 Thread Ankit Jhalaria
Hey beam devs, I am trying to use KinesisIO.write() with beam 2.13, running on flink and its failing while trying to do Futures.addCallback(f, new UserRecordResultFutureCallback()); Its currently pulling in beam-vendor-guava-20_0-0.1.jar I have tried updating bringing in a current version but

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-18 Thread Maximilian Michels
I disagree that this flag is obsolete. It is still serving a purpose for batch users using dataflow runner and that is decent chunk of beam python users. It is obsolete for the PortableRunner. If the Dataflow Runner needs this flag, couldn't we simply add it there? As far as I know Dataflow

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Ankur Goenka
Adding to the previous suggestions. You can also add "--retain_docker_container" to your pipeline option and later login to the machine to check the docker container log. Also, in my experience running on yarn, the yarn user some time do not have access to use docker. I would suggest checking if

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Kyle Weaver
> Per your suggest, I read the design sheet and it states that harness container is a mandatory settings for all TaskManger. That doc is out of date. As Benjamin said, it's not strictly required any more to use Docker. However, it is still recommended, as Docker makes managing dependencies a

Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
We have been using Beam for a bit now. However we just turned on the dataflow shuffle service and were very surprised that the shuffled data amounts were quadruple the amounts we expected. Turns out that the file writing TextIO is doing shuffles within itself. Is there a way to prevent shuffling

Re: Where is /opt/apache/beam/boot?

2019-09-18 Thread Benjamin Tan
You are awesome. Thanks! On 2019/09/18 15:08:08, Lukasz Cwik wrote: > It is embedded inside the docker container that corresponds to which SDK > your using. > > Python container boot src: > https://github.com/apache/beam/blob/master/sdks/python/container/boot.go > Java container boot src: >

Re: Python Portable Runner Issues

2019-09-18 Thread Chad Dombrova
Just note that while Dataflow does have robust python support it does not fully support the portability framework. It’s a bit of a blurry distinction, and honestly I’m not crystal clear on this as I get the impression that Dataflow may be a bit of a Portability hybrid. It does not use the job

Re: Where is /opt/apache/beam/boot?

2019-09-18 Thread Lukasz Cwik
It is embedded inside the docker container that corresponds to which SDK your using. Python container boot src: https://github.com/apache/beam/blob/master/sdks/python/container/boot.go Java container boot src: https://github.com/apache/beam/blob/master/sdks/java/container/boot.go Go container

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Thank you for the reply. I see files "boot" under below directories. But these seems to be used for containers. (python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name "boot" -exec ls -l {} \; lrwxrwxrwx 1 admin admin 23 Sep 16 23:43

Re: Python Portable Runner Issues

2019-09-18 Thread Holden Karau
Probably the most stable is running on Dataflow still. But I’m excited to see the progress towards a Spark runner, can’t wait to try TFT on it :) On Tue, Sep 17, 2019 at 4:37 PM Kyle Weaver wrote: > The Flink runner is definitely more stable, as it's been around for longer > and has more

Where is /opt/apache/beam/boot?

2019-09-18 Thread Benjamin Tan
I'm trying to use the process environment_config and I have no idea where is /opt/apache/beam/boot. Is there a way to generate this?

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
Try this as part of PipelineOptions: --environment_config={\"command\":\"/opt/apache/beam/boot\"} On 2019/09/18 10:40:42, Yu Watanabe wrote: > Hello. > > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job > to AWS EMR (5.26.0). > > However, I get below error when I

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
It’s command: /opt/Apache/beam/boot in json. U might be able to find some examples online. I’ll reply you when I get home to paste the actual command. Sent from my iPhone > On 18 Sep 2019, at 9:35 PM, Yu Watanabe wrote: > > Benjamin. > > Thank you for the reply. > Per your suggest, I read

Re: Word-count example

2019-09-18 Thread Matthew Patterson
Tried " mvn archetype:generate \ -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=2.16.0 \ -DgroupId=org.example \ -DartifactId=word-count-beam \ -Dversion="0.1" \

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Benjamin. Thank you for the reply. Per your suggest, I read the design sheet and it states that harness container is a mandatory settings for all TaskManger. https://s.apache.org/portable-flink-runner-overview > The Flink cluster itself is deployed as normal. For example, it might be deployed

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
Seems like docker is not installed. Maybe run with PROCESS as the environment type ? Or install docker. Sent from my iPhone > On 18 Sep 2019, at 6:40 PM, Yu Watanabe wrote: > > Hello. > > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job to > AWS EMR (5.26.0). > >

Re: Beam AmqpIO

2019-09-18 Thread Jean-Baptiste Onofré
Hi, As the original author of AmqpIO, I can update, but it requires some internal changes to the IO (especially the way we are dealing with checkpoint). I will create a Jira and work on an update. Regards JB On 18/09/2019 12:50, Alexey Romanenko wrote: > It seems that we use pretty outdated

Re: Beam AmqpIO

2019-09-18 Thread Alexey Romanenko
It seems that we use pretty outdated version of proton-j, current version is 0.33.2. In the same time, Messenger API was deprecated and removed a while ago. So, updating to new version won’t be so easy. What can be as alternative for this? > On 18 Sep 2019, at 10:47, Jean-Baptiste Onofré

Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Hello. I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job to AWS EMR (5.26.0). However, I get below error when I run the pipeline and fail. - Caused by: java.lang.Exception: The user defined 'open()' method caused

Re: Beam AmqpIO

2019-09-18 Thread Jean-Baptiste Onofré
Hi, It's provided by Apache QPid proton-j: org.apache.qpid:proton-j:0.13.1 Regards JB On 18/09/2019 10:34, kasper wrote: > Hello, > > AmqpIO seems to have a dependency on the > org.apache.qpid.proton.messenger package in usage but I do not seem to > find that package anywhere in a public nexus

Beam AmqpIO

2019-09-18 Thread kasper
Hello, AmqpIO seems to have a dependency on the org.apache.qpid.proton.messenger package in usage but I do not seem to find that package anywhere in a public nexus repository. The net effect seems to be a classdefnotfound concerning org.apache.qpid.proton.messenger.Messenger$Factory. Any