Re: Beam's job crashes on cluster

2019-12-13 Thread Matthew K.
" To: dev Subject: Re: Beam's job crashes on cluster > I applied some modifications to the code to run Beam tasks on k8s cluster using spark-submit.   Interesting, how does that work?   On Fri, Dec 13, 2019 at 12:49 PM Matthew K. <softm...@gmx.com> wrote:   I'

Re: Beam's job crashes on cluster

2019-12-13 Thread Matthew K.
he job-server:runShadow command.   Without specifying the master URL, the default is to start an embedded Spark master within the same JVM as the job server, rather than using your standalone master.   On Fri, Dec 13, 2019 at 12:15 PM Matthew K. <softm...@gmx.com> wrote: Job ser

Re: Beam's job crashes on cluster

2019-12-13 Thread Matthew K.
as lost.   Some additional info that might help us establish a chain of causation: - the arguments you used to start the job server? - the spark cluster deployment setup?   On Fri, Dec 13, 2019 at 8:00 AM Matthew K. <softm...@gmx.com> wrote: Actually the reason for that error i

Re: Beam's job crashes on cluster

2019-12-13 Thread Matthew K.
lExecutor.java:624)         at java.lang.Thread.run(Thread.java:748)         Sent: Friday, December 13, 2019 at 6:58 AM From: "Matthew K." To: dev@beam.apache.org Cc: dev Sub

Re: Beam's job crashes on cluster

2019-12-13 Thread Matthew K.
Sent: Thursday, December 12, 2019 at 5:30 PM From: "Kyle Weaver" To: dev Subject: Re: Beam's job crashes on cluster Can you share the pipeline options you are using? Particularly environment_type and environment_config.   On Thu, Dec 12, 2019 at 2:58 PM Matthew K. <softm...

Beam's job crashes on cluster

2019-12-12 Thread Matthew K.
Running Beam on Spark cluster, it crashhes and I get the following error (workers are on separate nodes, it works fine when workers are on the same node as runner):   > Task :runners:spark:job-server:runShadow FAILED Exception in thread wait_until_finish_read: Traceback (most recent call last):

Pipeline parameters for running jobs in a cluster

2019-12-10 Thread Matthew K.
Hi,   To run a beam job on a spark cluster with some number of nodes running:   1. Is it recommended to set pipeline parameters --num_workers, --max_num_workers, --autoscaling_algorithms, --worker_machine_type, etc, or beam (spark) will figure that out?   2. If that is recommended to set thos

Re: Command for Beam worker on Spark cluster

2019-11-07 Thread Matthew K.
you mean here.   On Wed, Nov 6, 2019 at 3:32 PM Matthew K. <softm...@gmx.com> wrote: Thanks, still I need to pass parameters to the boot executable, such as, worker id, control endpoint, logging endpoint, etc.   Where can I extract these parameters from? (In apache_beam Python cod

Re: Command for Beam worker on Spark cluster

2019-11-06 Thread Matthew K.
pendencies on all of your worker machines.   The best example I know of is here: https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165   On Wed, Nov 6, 2019 at 2:24 PM Matthew K. <softm...@gmx.com> wrote:

Command for Beam worker on Spark cluster

2019-11-06 Thread Matthew K.
Hi all,   I am trying to run *Python* beam pipeline on a Spark cluster. Since workers are running on separate nodes, I am using "PROCESS" for "evironment_type" in pipeline options, but I couldn't find any documentation on what "command" I should pass to "environment_config" to run on the worker,