Hi,
You can search(or ask) in dev@ for the progress of 2.16.0.
The answer is the release of 2.16.0 is ongoing and it will be released once
blockers are solved.
-Rui
On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote:
> Hello.
>
> I would like to use 2.16.0 to diagnose container problem,
Hello.
I would like to use 2.16.0 to diagnose container problem, however, looks
like the job-server is not available on maven yet.
RuntimeError: Unable to fetch remote job server jar at
Hi Matthew,
Beam 2.16.0 is not yet released hence you are getting the error. Can you
try using 2.15.0 version?
Thanks,
Ankur
On Wed, Sep 18, 2019 at 6:59 AM Matthew Patterson
wrote:
> Tried
>
> "
> mvn archetype:generate \
> -DarchetypeGroupId=org.apache.beam \
>
I believe the flag was never relevant for PortableRunner. I might be wrong
as well. The flag affects a few bits in the core code and that is why the
solution cannot be by just setting the flag in Dataflow runner. It requires
some amount of clean up. I agree that it would be good to clean this up,
I will attempt to do without sharding (though I believe we did do a run
without shards and it incurred the extra shuffle costs).
Pipeline is simple.
The only shuffle that is explicitly defined is the shuffle after merging
files together into a single PCollection (Flatten Transform).
So it's a
Are you specifying the number of shards to write to:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L859
If so, this will incur an additional shuffle to re-distribute data written
by all workers into the given number of shards before
batch on dataflowRunner.
On Wed, Sep 18, 2019 at 4:05 PM Reuven Lax wrote:
> Are you using streaming or batch? Also which runner are you using?
>
> On Wed, Sep 18, 2019 at 1:57 PM Shannon Duncan
> wrote:
>
>> So I followed up on why TextIO shuffles and dug into the code some. It is
>> using
So I followed up on why TextIO shuffles and dug into the code some. It is
using the shards and getting all the values into a keyed group to write to
a single file.
However... I wonder if there is way to just take the records that are on a
worker and write them out. Thus not needing a shard number
Hey beam devs,
I am trying to use KinesisIO.write() with beam 2.13, running on flink and its
failing while trying to do
Futures.addCallback(f, new UserRecordResultFutureCallback());
Its currently pulling in beam-vendor-guava-20_0-0.1.jar
I have tried updating bringing in a current version but
I disagree that this flag is obsolete. It is still serving a purpose for batch
users using dataflow runner and that is decent chunk of beam python users.
It is obsolete for the PortableRunner. If the Dataflow Runner needs this
flag, couldn't we simply add it there? As far as I know Dataflow
Adding to the previous suggestions.
You can also add "--retain_docker_container" to your pipeline option and
later login to the machine to check the docker container log.
Also, in my experience running on yarn, the yarn user some time do not have
access to use docker. I would suggest checking if
> Per your suggest, I read the design sheet and it states that harness
container is a mandatory settings for all TaskManger.
That doc is out of date. As Benjamin said, it's not strictly required any
more to use Docker. However, it is still recommended, as Docker makes
managing dependencies a
We have been using Beam for a bit now. However we just turned on the
dataflow shuffle service and were very surprised that the shuffled data
amounts were quadruple the amounts we expected.
Turns out that the file writing TextIO is doing shuffles within itself.
Is there a way to prevent shuffling
You are awesome. Thanks!
On 2019/09/18 15:08:08, Lukasz Cwik wrote:
> It is embedded inside the docker container that corresponds to which SDK
> your using.
>
> Python container boot src:
> https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
> Java container boot src:
>
Just note that while Dataflow does have robust python support it does not
fully support the portability framework. It’s a bit of a blurry
distinction, and honestly I’m not crystal clear on this as I get the
impression that Dataflow may be a bit of a Portability hybrid. It does not
use the job
It is embedded inside the docker container that corresponds to which SDK
your using.
Python container boot src:
https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
Java container boot src:
https://github.com/apache/beam/blob/master/sdks/java/container/boot.go
Go container
Thank you for the reply.
I see files "boot" under below directories.
But these seems to be used for containers.
(python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name "boot"
-exec ls -l {} \;
lrwxrwxrwx 1 admin admin 23 Sep 16 23:43
Probably the most stable is running on Dataflow still. But I’m excited to
see the progress towards a Spark runner, can’t wait to try TFT on it :)
On Tue, Sep 17, 2019 at 4:37 PM Kyle Weaver wrote:
> The Flink runner is definitely more stable, as it's been around for longer
> and has more
I'm trying to use the process environment_config and I have no idea where is
/opt/apache/beam/boot.
Is there a way to generate this?
Try this as part of PipelineOptions:
--environment_config={\"command\":\"/opt/apache/beam/boot\"}
On 2019/09/18 10:40:42, Yu Watanabe wrote:
> Hello.
>
> I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job
> to AWS EMR (5.26.0).
>
> However, I get below error when I
It’s command: /opt/Apache/beam/boot in json. U might be able to find some
examples online. I’ll reply you when I get home to paste the actual command.
Sent from my iPhone
> On 18 Sep 2019, at 9:35 PM, Yu Watanabe wrote:
>
> Benjamin.
>
> Thank you for the reply.
> Per your suggest, I read
Tried
"
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.16.0 \
-DgroupId=org.example \
-DartifactId=word-count-beam \
-Dversion="0.1" \
Benjamin.
Thank you for the reply.
Per your suggest, I read the design sheet and it states that harness
container is a mandatory settings for all TaskManger.
https://s.apache.org/portable-flink-runner-overview
> The Flink cluster itself is deployed as normal. For example, it might be
deployed
Seems like docker is not installed. Maybe run with PROCESS as the environment
type ? Or install docker.
Sent from my iPhone
> On 18 Sep 2019, at 6:40 PM, Yu Watanabe wrote:
>
> Hello.
>
> I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job to
> AWS EMR (5.26.0).
>
>
Hi,
As the original author of AmqpIO, I can update, but it requires some
internal changes to the IO (especially the way we are dealing with
checkpoint).
I will create a Jira and work on an update.
Regards
JB
On 18/09/2019 12:50, Alexey Romanenko wrote:
> It seems that we use pretty outdated
It seems that we use pretty outdated version of proton-j, current version is
0.33.2.
In the same time, Messenger API was deprecated and removed a while ago. So,
updating to new version won’t be so easy. What can be as alternative for this?
> On 18 Sep 2019, at 10:47, Jean-Baptiste Onofré
Hello.
I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job
to AWS EMR (5.26.0).
However, I get below error when I run the pipeline and fail.
-
Caused by: java.lang.Exception: The user defined 'open()' method caused
Hi,
It's provided by Apache QPid proton-j: org.apache.qpid:proton-j:0.13.1
Regards
JB
On 18/09/2019 10:34, kasper wrote:
> Hello,
>
> AmqpIO seems to have a dependency on the
> org.apache.qpid.proton.messenger package in usage but I do not seem to
> find that package anywhere in a public nexus
Hello,
AmqpIO seems to have a dependency on the
org.apache.qpid.proton.messenger package in usage but I do not seem to
find that package anywhere in a public nexus repository. The net effect
seems to be a classdefnotfound concerning
org.apache.qpid.proton.messenger.Messenger$Factory. Any
29 matches
Mail list logo