Why not try file:// instead ? it doesn't seem like you're using Google
Storage, right ? I mean the input file is on your local FS.

On Mon, Jan 23, 2017 at 11:34 PM Chaoran Yu <[email protected]>
wrote:

> No I’m not using Dataproc.
> I’m simply running on my local machine. I started a local Spark cluster
> with sbin/start-master.sh and sbin/start-slave.sh. Then I submitted my Beam
> job to that cluster.
> The gs file is the kinglear.txt from Beam’s example code and it should be
> public.
>
> My full stack trace is attached.
>
> Thanks,
> Chaoran
>
>
>
> On Jan 23, 2017, at 4:23 PM, Amit Sela <[email protected]> wrote:
>
> Maybe, are you running on Dataproc ? are you using YARN/Mesos ? do the
> machines hosting the executor processes have access to GS ? could you paste
> the entire stack trace ?
>
> On Mon, Jan 23, 2017 at 11:21 PM Chaoran Yu <[email protected]>
> wrote:
>
> Thank you Amit for the reply,
>
> I just tried two more runners and below is a summary:
>
> DirectRunner: works
> FlinkRunner: works in local mode. I got an error “Communication with
> JobManager failed: lost connection to the JobManager” when running in
> cluster mode,
> SparkRunner: works in local mode (mvn exec command) but fails in cluster
> mode (spark-submit) with the error I pasted in the previous email.
>
> In SparkRunner’s case, can it be that Spark executor can’t access gs file
> in Google Storage?
>
> Thank you,
>
>
>
> On Jan 23, 2017, at 3:28 PM, Amit Sela <[email protected]> wrote:
>
> Is this working for you with other runners ? judging by the stack trace,
> it seems like IOChannelUtils fails to find a handler so it doesn't seem
> like it is a Spark specific problem.
>
> On Mon, Jan 23, 2017 at 8:50 PM Chaoran Yu <[email protected]>
> wrote:
>
> Thank you Amit and JB!
>
> This is not related to DC/OS itself, but I ran into a problem when
> launching a Spark job on a cluster with spark-submit. My Spark job written
> in Beam can’t read the specified gs file. I got the following error:
>
> Caused by: java.io.IOException: Unable to find handler for
> gs://beam-samples/sample.txt
> at
> org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307)
> at
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528)
> at
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271)
> at
> org.apache.beam.runners.spark.io.SourceRDD$Bounded$1.hasNext(SourceRDD.java:125)
>
> Then I thought about switching to reading from another source, but I saw
> in Beam’s documentation that TextIO can only read from files in Google
> Cloud Storage (prefixed with gs://) when running in cluster mode. How do
> you guys doing file IO in Beam when using the SparkRunner?
>
>
> Thank you,
> Chaoran
>
>
> On Jan 22, 2017, at 4:32 AM, Amit Sela <[email protected]> wrote:
>
> I'lll join JB's comment on the Spark runner saying that submitting Beam
> pipelines using the Spark runner can be done using Spark's spark-submit
> script, find out more in the Spark runner documentation
> <https://beam.apache.org/documentation/runners/spark/>.
>
> Amit.
>
> On Sun, Jan 22, 2017 at 8:03 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> Hi,
>
> Not directly DCOS (I think Stephen did some test on it), but I have a
> platform running Spark and Flink with Beam on Mesos + Marathon.
>
> It basically doesn't have anything special as running piplines uses
> spark-submit (as on in Spark "natively").
>
> Regards
> JB
>
> On 01/22/2017 12:56 AM, Chaoran Yu wrote:
> > Hello all,
> >
> >   Has anyone had experience using Beam on DC/OS? I want to run Beam code
> >
> > executed with Spark runner on DC/OS. As a next step, I would like to run
> the
> >
> > Flink runner as well. There doesn't seem to exist any information
> > about running
> >
> > Beam on DC/OS I can find on the web. So some pointers are greatly
> > appreciated.
> >
> > Thank you,
> >
> > Chaoran Yu
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>
>

Reply via email to