Sorry for the spam. But to clarify, I didn’t write the code. I’m using the code described here https://beam.apache.org/get-started/wordcount-example/ <https://beam.apache.org/get-started/wordcount-example/> So the file already exists in GS.
> On Jan 23, 2017, at 4:55 PM, Chaoran Yu <[email protected]> wrote: > > I didn’t upload the file. But since the identical Beam code, when running in > Spark local mode, was able to fetch the file and process it, the file does > exist. > It’s just that somehow Spark standalone mode can’t find the file. > > >> On Jan 23, 2017, at 4:50 PM, Amit Sela <[email protected] >> <mailto:[email protected]>> wrote: >> >> I think "external" is the key here, you're cluster is running all it's >> components on your local machine so you're good. >> >> As for GS, it's like Amazon's S3 or sort-of a cloud service HDFS offered by >> Google. You need to upload your file to GS. Have you ? >> >> On Mon, Jan 23, 2017 at 11:47 PM Chaoran Yu <[email protected] >> <mailto:[email protected]>> wrote: >> Well, my file is not in my local filesystem. It’s in GS. >> This is the line of code that reads the input file: >> p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/* <>")) >> >> And this page https://beam.apache.org/get-started/quickstart/ >> <https://beam.apache.org/get-started/quickstart/> says the following: >> "you can’t access a local file if you are running the pipeline on an >> external cluster”. >> I’m indeed trying to run a pipeline on a standalone Spark cluster running on >> my local machine. So local files are not an option. >> >> >>> On Jan 23, 2017, at 4:41 PM, Amit Sela <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Why not try file:// instead ? it doesn't seem like you're using Google >>> Storage, right ? I mean the input file is on your local FS. >>> >>> On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu <[email protected] >>> <mailto:[email protected]>> wrote: >>> No I’m not using Dataproc. >>> I’m simply running on my local machine. I started a local Spark cluster >>> with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam >>> job to that cluster. >>> The gs file is the kinglear.txt from Beam’s example code and it should be >>> public. >>> >>> My full stack trace is attached. >>> >>> Thanks, >>> Chaoran >>> >>> >>> >>>> On Jan 23, 2017, at 4:23 PM, Amit Sela <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the >>>> machines hosting the executor processes have access to GS ? could you >>>> paste the entire stack trace ? >>>> >>>> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Thank you Amit for the reply, >>>> >>>> I just tried two more runners and below is a summary: >>>> >>>> DirectRunner: works >>>> FlinkRunner: works in local mode. I got an error “Communication with >>>> JobManager failed: lost connection to the JobManager” when running in >>>> cluster mode, >>>> SparkRunner: works in local mode (mvn exec command) but fails in cluster >>>> mode (spark-submit) with the error I pasted in the previous email. >>>> >>>> In SparkRunner’s case, can it be that Spark executor can’t access gs file >>>> in Google Storage? >>>> >>>> Thank you, >>>> >>>> >>>> >>>>> On Jan 23, 2017, at 3:28 PM, Amit Sela <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Is this working for you with other runners ? judging by the stack trace, >>>>> it seems like IOChannelUtils fails to find a handler so it doesn't seem >>>>> like it is a Spark specific problem. >>>>> >>>>> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Thank you Amit and JB! >>>>> >>>>> This is not related to DC/OS itself, but I ran into a problem when >>>>> launching a Spark job on a cluster with spark-submit. My Spark job >>>>> written in Beam can’t read the specified gs file. I got the following >>>>> error: >>>>> >>>>> Caused by: java.io.IOException: Unable to find handler for >>>>> gs://beam-samples/sample.txt <> >>>>> at >>>>> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307) >>>>> at >>>>> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528) >>>>> at >>>>> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271) >>>>> at >>>>> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125) >>>>> >>>>> Then I thought about switching to reading from another source, but I saw >>>>> in Beam’s documentation that TextIO can only read from files in Google >>>>> Cloud Storage (prefixed with gs://) when running in cluster mode. How do >>>>> you guys doing file IO in Beam when using the SparkRunner? >>>>> >>>>> >>>>> Thank you, >>>>> Chaoran >>>>> >>>>> >>>>>> On Jan 22, 2017, at 4:32 AM, Amit Sela <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> I'lll join JB's comment on the Spark runner saying that submitting Beam >>>>>> pipelines using the Spark runner can be done using Spark's spark-submit >>>>>> script, find out more in the Spark runner documentation >>>>>> <https://beam.apache.org/documentation/runners/spark/>. >>>>>> >>>>>> Amit. >>>>>> >>>>>> On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Hi, >>>>>> >>>>>> Not directly DCOS (I think Stephen did some test on it), but I have a >>>>>> platform running Spark and Flink with Beam on Mesos + Marathon. >>>>>> >>>>>> It basically doesn't have anything special as running piplines uses >>>>>> spark-submit (as on in Spark "natively"). >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On 01/22/2017 12:56 AM, Chaoran Yu wrote: >>>>>> > Hello all, >>>>>> > >>>>>> > Has anyone had experience using Beam on DC/OS? I want to run Beam >>>>>> > code >>>>>> > >>>>>> > executed with Spark runner on DC/OS. As a next step, I would like to >>>>>> > run the >>>>>> > >>>>>> > Flink runner as well. There doesn't seem to exist any information >>>>>> > about running >>>>>> > >>>>>> > Beam on DC/OS I can find on the web. So some pointers are greatly >>>>>> > appreciated. >>>>>> > >>>>>> > Thank you, >>>>>> > >>>>>> > Chaoran Yu >>>>>> > >>>>>> >>>>>> -- >>>>>> Jean-Baptiste Onofré >>>>>> [email protected] <mailto:[email protected]> >>>>>> http://blog.nanthrax.net <http://blog.nanthrax.net/> >>>>>> Talend - http://www.talend.com <http://www.talend.com/> >>>>> >>>> >>> >> >
