Re: Beam Spark/Flink runner with DC/OS

2017-01-30 Thread Chaoran Yu
Thank you Davor for the reply!

Your understanding of my problem is exactly right. 

I thought about the issue you mentioned. Then I looked at Beam source code. It 
looks to me that IO is done via IOChannelFactory class. And it has two 
subclasses, FileIOChannelFactory and GcsIOChannelFactory. I figured probably 
the wrong class got registered. This link I found 
http://markmail.org/message/mrv4cg4y6bjtdssy 
 points out the same registration 
problem. So I tried registering GcsIOChannelFactory, but got the following 
error:

Scheme: [file] is already registered with class 
org.apache.beam.sdk.util.FileIOChannelFactory

Now I’m not sure what to do..

Thanks for the help!

Chaoran



> On Jan 30, 2017, at 12:05 PM, Davor Bonaci  wrote:
> 
> Sorry for the delay here.
> 
> Am I correct in summarizing that "gs://bucket/file" doesn't work on a Spark 
> cluster, but does with Spark runner locally? Beam file systems utilize 
> AutoService functionality and, generally speaking, all filesystems should be 
> available and automatically registered on all runners. This is probably just 
> a simple matter of staging the right classes on the cluster.
> 
> Pei, any additional thoughts here?
> 
> On Mon, Jan 23, 2017 at 1:58 PM, Chaoran Yu  > wrote:
> Sorry for the spam. But to clarify, I didn’t write the code. I’m using the 
> code described here https://beam.apache.org/get-started/wordcount-example/ 
> 
> So the file already exists in GS.
> 
>> On Jan 23, 2017, at 4:55 PM, Chaoran Yu > > wrote:
>> 
>> I didn’t upload the file. But since the identical Beam code, when running in 
>> Spark local mode, was able to fetch the file and process it, the file does 
>> exist.
>> It’s just that somehow Spark standalone mode can’t find the file.
>> 
>> 
>>> On Jan 23, 2017, at 4:50 PM, Amit Sela >> > wrote:
>>> 
>>> I think "external" is the key here, you're cluster is running all it's 
>>> components on your local machine so you're good.
>>> 
>>> As for GS, it's like Amazon's S3 or sort-of a cloud service HDFS offered by 
>>> Google. You need to upload your file to GS. Have you ?  
>>> 
>>> On Mon, Jan 23, 2017 at 11:47 PM Chaoran Yu >> > wrote:
>>> Well, my file is not in my local filesystem. It’s in GS. 
>>> This is the line of code that reads the input file: 
>>> p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/* <>"))
>>> 
>>> And this page https://beam.apache.org/get-started/quickstart/ 
>>>  says the following:
>>> "you can’t access a local file if you are running the pipeline on an 
>>> external cluster”.
>>> I’m indeed trying to run a pipeline on a standalone Spark cluster running 
>>> on my local machine. So local files are not an option.
>>> 
>>> 
 On Jan 23, 2017, at 4:41 PM, Amit Sela > wrote:
 
 Why not try file:// instead ? it doesn't seem like you're using Google 
 Storage, right ? I mean the input file is on your local FS.
 
 On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu > wrote:
 No I’m not using Dataproc.
 I’m simply running on my local machine. I started a local Spark cluster 
 with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my 
 Beam job to that cluster.
 The gs file is the kinglear.txt from Beam’s example code and it should be 
 public. 
 
 My full stack trace is attached.
 
 Thanks,
 Chaoran
 
 
 
> On Jan 23, 2017, at 4:23 PM, Amit Sela  > wrote:
> 
> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the 
> machines hosting the executor processes have access to GS ? could you 
> paste the entire stack trace ?
> 
> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu  > wrote:
> Thank you Amit for the reply,
> 
> I just tried two more runners and below is a summary:
> 
> DirectRunner: works
> FlinkRunner: works in local mode. I got an error “Communication with 
> JobManager failed: lost connection to the JobManager” when running in 
> cluster mode, 
> SparkRunner: works in local mode (mvn exec command) but fails in cluster 
> mode (spark-submit) with the error I pasted in the previous email.
> 
> In SparkRunner’s case, can it be that Spark executor can’t access gs file 
> in Google Storage?
> 
> Thank you,
> 
> 
> 
>> On Jan 23, 2017, at 3:28 PM, Amit 

Re: Beam Spark/Flink runner with DC/OS

2017-01-30 Thread Davor Bonaci
Sorry for the delay here.

Am I correct in summarizing that "gs://bucket/file" doesn't work on a Spark
cluster, but does with Spark runner locally? Beam file systems utilize
AutoService functionality and, generally speaking, all filesystems should
be available and automatically registered on all runners. This is probably
just a simple matter of staging the right classes on the cluster.

Pei, any additional thoughts here?

On Mon, Jan 23, 2017 at 1:58 PM, Chaoran Yu 
wrote:

> Sorry for the spam. But to clarify, I didn’t write the code. I’m using the
> code described here https://beam.apache.org/get-started/wordcount-example/
> So the file already exists in GS.
>
> On Jan 23, 2017, at 4:55 PM, Chaoran Yu  wrote:
>
> I didn’t upload the file. But since the identical Beam code, when running
> in Spark local mode, was able to fetch the file and process it, the file
> does exist.
> It’s just that somehow Spark standalone mode can’t find the file.
>
>
> On Jan 23, 2017, at 4:50 PM, Amit Sela  wrote:
>
> I think "external" is the key here, you're cluster is running all it's
> components on your local machine so you're good.
>
> As for GS, it's like Amazon's S3 or sort-of a cloud service HDFS offered
> by Google. You need to upload your file to GS. Have you ?
>
> On Mon, Jan 23, 2017 at 11:47 PM Chaoran Yu 
> wrote:
>
>> Well, my file is not in my local filesystem. It’s in GS.
>> This is the line of code that reads the input file: p.apply(TextIO.Read.
>> from("gs://apache-beam-samples/shakespeare/*"))
>>
>> And this page https://beam.apache.org/get-started/quickstart/ says the
>> following:
>> "you can’t access a local file if you are running the pipeline on an
>> external cluster”.
>> I’m indeed trying to run a pipeline on a standalone Spark cluster running
>> on my local machine. So local files are not an option.
>>
>>
>> On Jan 23, 2017, at 4:41 PM, Amit Sela  wrote:
>>
>> Why not try file:// instead ? it doesn't seem like you're using Google
>> Storage, right ? I mean the input file is on your local FS.
>>
>> On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu 
>> wrote:
>>
>> No I’m not using Dataproc.
>> I’m simply running on my local machine. I started a local Spark cluster
>> with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam
>> job to that cluster.
>> The gs file is the kinglear.txt from Beam’s example code and it should be
>> public.
>>
>> My full stack trace is attached.
>>
>> Thanks,
>> Chaoran
>>
>>
>>
>> On Jan 23, 2017, at 4:23 PM, Amit Sela  wrote:
>>
>> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the
>> machines hosting the executor processes have access to GS ? could you paste
>> the entire stack trace ?
>>
>> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu 
>> wrote:
>>
>> Thank you Amit for the reply,
>>
>> I just tried two more runners and below is a summary:
>>
>> DirectRunner: works
>> FlinkRunner: works in local mode. I got an error “Communication with
>> JobManager failed: lost connection to the JobManager” when running in
>> cluster mode,
>> SparkRunner: works in local mode (mvn exec command) but fails in cluster
>> mode (spark-submit) with the error I pasted in the previous email.
>>
>> In SparkRunner’s case, can it be that Spark executor can’t access gs file
>> in Google Storage?
>>
>> Thank you,
>>
>>
>>
>> On Jan 23, 2017, at 3:28 PM, Amit Sela  wrote:
>>
>> Is this working for you with other runners ? judging by the stack trace,
>> it seems like IOChannelUtils fails to find a handler so it doesn't seem
>> like it is a Spark specific problem.
>>
>> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu 
>> wrote:
>>
>> Thank you Amit and JB!
>>
>> This is not related to DC/OS itself, but I ran into a problem when
>> launching a Spark job on a cluster with spark-submit. My Spark job written
>> in Beam can’t read the specified gs file. I got the following error:
>>
>> Caused by: java.io.IOException: Unable to find handler for
>> gs://beam-samples/sample.txt
>> at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
>> IOChannelUtils.java:307)
>> at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(
>> FileBasedSource.java:528)
>> at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(
>> OffsetBasedSource.java:271)
>> at org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.
>> hasNext(SourceRDD.java:125)
>>
>> Then I thought about switching to reading from another source, but I saw
>> in Beam’s documentation that TextIO can only read from files in Google
>> Cloud Storage (prefixed with gs://) when running in cluster mode. How do
>> you guys doing file IO in Beam when using the SparkRunner?
>>
>>
>> Thank you,
>> Chaoran
>>
>>
>> On Jan 22, 2017, at 4:32 AM, Amit Sela  wrote:

Re: Beam Spark/Flink runner with DC/OS

2017-01-23 Thread Chaoran Yu
Sorry for the spam. But to clarify, I didn’t write the code. I’m using the code 
described here https://beam.apache.org/get-started/wordcount-example/ 

So the file already exists in GS.

> On Jan 23, 2017, at 4:55 PM, Chaoran Yu  wrote:
> 
> I didn’t upload the file. But since the identical Beam code, when running in 
> Spark local mode, was able to fetch the file and process it, the file does 
> exist.
> It’s just that somehow Spark standalone mode can’t find the file.
> 
> 
>> On Jan 23, 2017, at 4:50 PM, Amit Sela > > wrote:
>> 
>> I think "external" is the key here, you're cluster is running all it's 
>> components on your local machine so you're good.
>> 
>> As for GS, it's like Amazon's S3 or sort-of a cloud service HDFS offered by 
>> Google. You need to upload your file to GS. Have you ?  
>> 
>> On Mon, Jan 23, 2017 at 11:47 PM Chaoran Yu > > wrote:
>> Well, my file is not in my local filesystem. It’s in GS. 
>> This is the line of code that reads the input file: 
>> p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/* <>"))
>> 
>> And this page https://beam.apache.org/get-started/quickstart/ 
>>  says the following:
>> "you can’t access a local file if you are running the pipeline on an 
>> external cluster”.
>> I’m indeed trying to run a pipeline on a standalone Spark cluster running on 
>> my local machine. So local files are not an option.
>> 
>> 
>>> On Jan 23, 2017, at 4:41 PM, Amit Sela >> > wrote:
>>> 
>>> Why not try file:// instead ? it doesn't seem like you're using Google 
>>> Storage, right ? I mean the input file is on your local FS.
>>> 
>>> On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu >> > wrote:
>>> No I’m not using Dataproc.
>>> I’m simply running on my local machine. I started a local Spark cluster 
>>> with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam 
>>> job to that cluster.
>>> The gs file is the kinglear.txt from Beam’s example code and it should be 
>>> public. 
>>> 
>>> My full stack trace is attached.
>>> 
>>> Thanks,
>>> Chaoran
>>> 
>>> 
>>> 
 On Jan 23, 2017, at 4:23 PM, Amit Sela > wrote:
 
 Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the 
 machines hosting the executor processes have access to GS ? could you 
 paste the entire stack trace ?
 
 On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu > wrote:
 Thank you Amit for the reply,
 
 I just tried two more runners and below is a summary:
 
 DirectRunner: works
 FlinkRunner: works in local mode. I got an error “Communication with 
 JobManager failed: lost connection to the JobManager” when running in 
 cluster mode, 
 SparkRunner: works in local mode (mvn exec command) but fails in cluster 
 mode (spark-submit) with the error I pasted in the previous email.
 
 In SparkRunner’s case, can it be that Spark executor can’t access gs file 
 in Google Storage?
 
 Thank you,
 
 
 
> On Jan 23, 2017, at 3:28 PM, Amit Sela  > wrote:
> 
> Is this working for you with other runners ? judging by the stack trace, 
> it seems like IOChannelUtils fails to find a handler so it doesn't seem 
> like it is a Spark specific problem. 
> 
> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu  > wrote:
> Thank you Amit and JB! 
> 
> This is not related to DC/OS itself, but I ran into a problem when 
> launching a Spark job on a cluster with spark-submit. My Spark job 
> written in Beam can’t read the specified gs file. I got the following 
> error:
> 
> Caused by: java.io.IOException: Unable to find handler for 
> gs://beam-samples/sample.txt <>
>   at 
> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307)
>   at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528)
>   at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271)
>   at 
> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125)
> 
> Then I thought about switching to reading from another source, but I saw 
> in Beam’s documentation that TextIO can only read from files in Google 
> Cloud Storage (prefixed with gs://) when running in cluster mode. How do 
> you guys doing file IO in Beam when using the 

Re: Beam Spark/Flink runner with DC/OS

2017-01-23 Thread Chaoran Yu
Well, my file is not in my local filesystem. It’s in GS. 
This is the line of code that reads the input file: 
p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/*"))

And this page https://beam.apache.org/get-started/quickstart/ 
 says the following:
"you can’t access a local file if you are running the pipeline on an external 
cluster”.
I’m indeed trying to run a pipeline on a standalone Spark cluster running on my 
local machine. So local files are not an option.


> On Jan 23, 2017, at 4:41 PM, Amit Sela  wrote:
> 
> Why not try file:// instead ? it doesn't seem like you're using Google 
> Storage, right ? I mean the input file is on your local FS.
> 
> On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu  > wrote:
> No I’m not using Dataproc.
> I’m simply running on my local machine. I started a local Spark cluster with 
> sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam job to 
> that cluster.
> The gs file is the kinglear.txt from Beam’s example code and it should be 
> public. 
> 
> My full stack trace is attached.
> 
> Thanks,
> Chaoran
> 
> 
> 
>> On Jan 23, 2017, at 4:23 PM, Amit Sela > > wrote:
>> 
>> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the 
>> machines hosting the executor processes have access to GS ? could you paste 
>> the entire stack trace ?
>> 
>> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu > > wrote:
>> Thank you Amit for the reply,
>> 
>> I just tried two more runners and below is a summary:
>> 
>> DirectRunner: works
>> FlinkRunner: works in local mode. I got an error “Communication with 
>> JobManager failed: lost connection to the JobManager” when running in 
>> cluster mode, 
>> SparkRunner: works in local mode (mvn exec command) but fails in cluster 
>> mode (spark-submit) with the error I pasted in the previous email.
>> 
>> In SparkRunner’s case, can it be that Spark executor can’t access gs file in 
>> Google Storage?
>> 
>> Thank you,
>> 
>> 
>> 
>>> On Jan 23, 2017, at 3:28 PM, Amit Sela >> > wrote:
>>> 
>>> Is this working for you with other runners ? judging by the stack trace, it 
>>> seems like IOChannelUtils fails to find a handler so it doesn't seem like 
>>> it is a Spark specific problem. 
>>> 
>>> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu >> > wrote:
>>> Thank you Amit and JB! 
>>> 
>>> This is not related to DC/OS itself, but I ran into a problem when 
>>> launching a Spark job on a cluster with spark-submit. My Spark job written 
>>> in Beam can’t read the specified gs file. I got the following error:
>>> 
>>> Caused by: java.io.IOException: Unable to find handler for 
>>> gs://beam-samples/sample.txt <>
>>> at 
>>> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307)
>>> at 
>>> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528)
>>> at 
>>> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271)
>>> at 
>>> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125)
>>> 
>>> Then I thought about switching to reading from another source, but I saw in 
>>> Beam’s documentation that TextIO can only read from files in Google Cloud 
>>> Storage (prefixed with gs://) when running in cluster mode. How do you guys 
>>> doing file IO in Beam when using the SparkRunner?
>>> 
>>> 
>>> Thank you,
>>> Chaoran
>>> 
>>> 
 On Jan 22, 2017, at 4:32 AM, Amit Sela > wrote:
 
 I'lll join JB's comment on the Spark runner saying that submitting Beam 
 pipelines using the Spark runner can be done using Spark's spark-submit 
 script, find out more in the Spark runner documentation 
 .
 
 Amit.
 
 On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré > wrote:
 Hi,
 
 Not directly DCOS (I think Stephen did some test on it), but I have a
 platform running Spark and Flink with Beam on Mesos + Marathon.
 
 It basically doesn't have anything special as running piplines uses
 spark-submit (as on in Spark "natively").
 
 Regards
 JB
 
 On 01/22/2017 12:56 AM, Chaoran Yu wrote:
 > Hello all,
 >
 >   Has anyone had experience using Beam on DC/OS? I want to run Beam code
 >
 > executed with Spark runner on DC/OS. As a next step, I would like to run 
 > the
 >
 > Flink runner as well. There doesn't seem to exist any information
 > about running
 >
 > Beam on DC/OS I 

Re: Beam Spark/Flink runner with DC/OS

2017-01-23 Thread Chaoran Yu
No I’m not using Dataproc.I’m simply running on my local machine. I started a local Spark cluster with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam job to that cluster.The gs file is the kinglear.txt from Beam’s example code and it should be public. My full stack trace is attached.Thanks,Chaoranlog4j:WARN No appenders could be found for logger 
(org.apache.beam.sdk.options.PipelineOptionsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/23 16:31:39 INFO SparkContext: Running Spark version 1.6.3
17/01/23 16:31:39 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/01/23 16:31:44 INFO SecurityManager: Changing view acls to: stephen
17/01/23 16:31:44 INFO SecurityManager: Changing modify acls to: stephen
17/01/23 16:31:44 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(stephen); users 
with modify permissions: Set(stephen)
17/01/23 16:31:44 INFO Utils: Successfully started service 'sparkDriver' on 
port 53235.
17/01/23 16:31:45 INFO Slf4jLogger: Slf4jLogger started
17/01/23 16:31:45 INFO Remoting: Starting remoting
17/01/23 16:31:45 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkDriverActorSystem@10.71.1.72:53236]
17/01/23 16:31:45 INFO Utils: Successfully started service 
'sparkDriverActorSystem' on port 53236.
17/01/23 16:31:45 INFO SparkEnv: Registering MapOutputTracker
17/01/23 16:31:45 INFO SparkEnv: Registering BlockManagerMaster
17/01/23 16:31:45 INFO DiskBlockManager: Created local directory at 
/private/var/folders/7k/mv0jws6j7ws8pnp350h1ndtwgn/T/blockmgr-358c683e-ae91-49b1-875e-0b037dea5d08
17/01/23 16:31:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/01/23 16:31:45 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/23 16:31:45 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
17/01/23 16:31:45 INFO SparkUI: Started SparkUI at http://10.71.1.72:4040
17/01/23 16:31:45 INFO HttpFileServer: HTTP File server directory is 
/private/var/folders/7k/mv0jws6j7ws8pnp350h1ndtwgn/T/spark-70afbd21-f270-4670-88d6-2895e5f23ea5/httpd-04f079ac-4416-4e74-9dd3-5bf70f70ee8e
17/01/23 16:31:45 INFO HttpServer: Starting HTTP Server
17/01/23 16:31:45 INFO Utils: Successfully started service 'HTTP file server' 
on port 53237.
17/01/23 16:31:46 INFO SparkContext: Added JAR 
file:/Users/stephen/word-count-beam/target/word-count-beam-bundled-0.1.jar at 
http://10.71.1.72:53237/jars/word-count-beam-bundled-0.1.jar with timestamp 
1485207106193
17/01/23 16:31:46 INFO AppClient$ClientEndpoint: Connecting to master 
spark://Chaoran-MacBook-Pro.local:7077...
17/01/23 16:31:46 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20170123163146-
17/01/23 16:31:46 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 53239.
17/01/23 16:31:46 INFO NettyBlockTransferService: Server created on 53239
17/01/23 16:31:46 INFO BlockManagerMaster: Trying to register BlockManager
17/01/23 16:31:46 INFO BlockManagerMasterEndpoint: Registering block manager 
10.71.1.72:53239 with 511.1 MB RAM, BlockManagerId(driver, 10.71.1.72, 53239)
17/01/23 16:31:46 INFO AppClient$ClientEndpoint: Executor added: 
app-20170123163146-/0 on worker-20170123162954-10.71.1.72-53220 
(10.71.1.72:53220) with 8 cores
17/01/23 16:31:46 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20170123163146-/0 on hostPort 10.71.1.72:53220 with 8 cores, 1024.0 MB 
RAM
17/01/23 16:31:46 INFO BlockManagerMaster: Registered BlockManager
17/01/23 16:31:46 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/01/23 16:31:46 INFO AppClient$ClientEndpoint: Executor updated: 
app-20170123163146-/0 is now RUNNING
17/01/23 16:31:46 INFO SparkRunner$Evaluator: Evaluating Read(CompressedSource)
17/01/23 16:31:46 INFO SparkRunner$Evaluator: Evaluating ParDo(ExtractWords)
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Evaluating AnonymousParDo
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Entering directly-translatable 
composite transform: 'WordCount.CountWords/Count.PerElement/Count.PerKey'
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Evaluating Count.PerKey 
[Combine.PerKey]
17/01/23 16:31:47 INFO FileBasedSource: Matched 1 files for pattern 
gs://apache-beam-samples/shakespeare/kinglear.txt
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Evaluating AnonymousParDo
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Entering directly-translatable 
composite transform: 'WriteCounts/Write/Create.Values'
17/01/23 16:31:47 INFO SparkRunner$Evaluator: Evaluating Create.Values
17/01/23 

Re: Beam Spark/Flink runner with DC/OS

2017-01-22 Thread Amit Sela
I'lll join JB's comment on the Spark runner saying that submitting Beam
pipelines using the Spark runner can be done using Spark's spark-submit
script, find out more in the Spark runner documentation
.

Amit.

On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Not directly DCOS (I think Stephen did some test on it), but I have a
> platform running Spark and Flink with Beam on Mesos + Marathon.
>
> It basically doesn't have anything special as running piplines uses
> spark-submit (as on in Spark "natively").
>
> Regards
> JB
>
> On 01/22/2017 12:56 AM, Chaoran Yu wrote:
> > Hello all,
> >
> >   Has anyone had experience using Beam on DC/OS? I want to run Beam code
> >
> > executed with Spark runner on DC/OS. As a next step, I would like to run
> the
> >
> > Flink runner as well. There doesn't seem to exist any information
> > about running
> >
> > Beam on DC/OS I can find on the web. So some pointers are greatly
> > appreciated.
> >
> > Thank you,
> >
> > Chaoran Yu
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Beam Spark/Flink runner with DC/OS

2017-01-21 Thread Jean-Baptiste Onofré

Hi,

Not directly DCOS (I think Stephen did some test on it), but I have a 
platform running Spark and Flink with Beam on Mesos + Marathon.


It basically doesn't have anything special as running piplines uses 
spark-submit (as on in Spark "natively").


Regards
JB

On 01/22/2017 12:56 AM, Chaoran Yu wrote:

Hello all,

  Has anyone had experience using Beam on DC/OS? I want to run Beam code

executed with Spark runner on DC/OS. As a next step, I would like to run the

Flink runner as well. There doesn't seem to exist any information
about running

Beam on DC/OS I can find on the web. So some pointers are greatly
appreciated.

Thank you,

Chaoran Yu



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Beam Spark/Flink runner with DC/OS

2017-01-21 Thread Chaoran Yu
Hello all,

  Has anyone had experience using Beam on DC/OS? I want to run Beam code

executed with Spark runner on DC/OS. As a next step, I would like to run the

Flink runner as well. There doesn't seem to exist any information about
running

Beam on DC/OS I can find on the web. So some pointers are greatly
appreciated.

Thank you,

Chaoran Yu