Re: Spark, Mesos, Docker and S3

2016-01-29 Thread Mao Geng
Sathish,

The constraint you described is Marathon's, not Mesos's :)

Spark.mesos.constraints are applied to slave attributes like tachyon=true
;us-east-1=false, as described in
https://issues.apache.org/jira/browse/SPARK-6707.

Cheers,
-Mao

On Fri, Jan 29, 2016 at 2:51 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hi
>
> Quick question. How to pass constraint [["hostname", "CLUSTER", "
> specific.node.com"]] to mesos?
>
> I was trying --conf spark.mesos.constraints=hostname:specific.node.com.
> But it didn't seems working
>
>
> Please help
>
>
> Thanks
>
> Sathish
>
> On Thu, Jan 28, 2016 at 6:52 PM Mao Geng <m...@sumologic.com> wrote:
>
>> From my limited knowledge, only limited options such as network mode,
>> volumes, portmaps can be passed through. See
>> https://github.com/apache/spark/pull/3074/files.
>>
>> https://issues.apache.org/jira/browse/SPARK-8734 is open for exposing
>> all docker options to spark.
>>
>> -Mao
>>
>> On Thu, Jan 28, 2016 at 1:55 PM, Sathish Kumaran Vairavelu <
>> vsathishkuma...@gmail.com> wrote:
>>
>>> Thank you., I figured it out. I have set executor memory to minimal and
>>> it works.,
>>>
>>> Another issue has come.. I have to pass --add-host option while running
>>> containers in slave nodes.. Is there any option to pass docker run
>>> parameters from spark?
>>> On Thu, Jan 28, 2016 at 12:26 PM Mao Geng <m...@sumologic.com> wrote:
>>>
>>>> Sathish,
>>>>
>>>> I guess the mesos resources are not enough to run your job. You might
>>>> want to check the mesos log to figure out why.
>>>>
>>>> I tried to run the docker image with "--conf spark.mesos.coarse=false"
>>>> and "true". Both are fine.
>>>>
>>>> Best,
>>>> Mao
>>>>
>>>> On Wed, Jan 27, 2016 at 5:00 PM, Sathish Kumaran Vairavelu <
>>>> vsathishkuma...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On the same Spark/Mesos/Docker setup, I am getting warning "Initial
>>>>> Job has not accepted any resources; check your cluster UI to ensure that
>>>>> workers are registered and have sufficient resources". I am running in
>>>>> coarse grained mode. Any pointers on how to fix this issue? Please help. I
>>>>> have updated both docker.properties and spark-default.conf with  
>>>>> spark.mesos.executor.docker.image
>>>>> and other properties.
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Sathish
>>>>>
>>>>> On Wed, Jan 27, 2016 at 9:58 AM Sathish Kumaran Vairavelu <
>>>>> vsathishkuma...@gmail.com> wrote:
>>>>>
>>>>>> Thanks a lot for your info! I will try this today.
>>>>>> On Wed, Jan 27, 2016 at 9:29 AM Mao Geng <m...@sumologic.com> wrote:
>>>>>>
>>>>>>> Hi Sathish,
>>>>>>>
>>>>>>> The docker image is normal, no AWS profile included.
>>>>>>>
>>>>>>> When the driver container runs with --net=host, the driver host's
>>>>>>> AWS profile will take effect so that the driver can access the 
>>>>>>> protected s3
>>>>>>> files.
>>>>>>>
>>>>>>> Similarly,  Mesos slaves also run Spark executor docker container in
>>>>>>> --net=host mode, so that the AWS profile of Mesos slaves will take 
>>>>>>> effect.
>>>>>>>
>>>>>>> Hope it helps,
>>>>>>> Mao
>>>>>>>
>>>>>>> On Jan 26, 2016, at 9:15 PM, Sathish Kumaran Vairavelu <
>>>>>>> vsathishkuma...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Mao,
>>>>>>>
>>>>>>> I want to check on accessing the S3 from Spark docker in Mesos.  The
>>>>>>> EC2 instance that I am using has the AWS profile/IAM included.  Should 
>>>>>>> we
>>>>>>> build the docker image with any AWS profile settings or --net=host 
>>>>>>> docker
>>>>>>> option takes care of it?
>>>>>>>
>>>>>>> Please help
>>>>>>>
>&g

Re: Spark, Mesos, Docker and S3

2016-01-28 Thread Mao Geng
>From my limited knowledge, only limited options such as network mode,
volumes, portmaps can be passed through. See
https://github.com/apache/spark/pull/3074/files.

https://issues.apache.org/jira/browse/SPARK-8734 is open for exposing all
docker options to spark.

-Mao

On Thu, Jan 28, 2016 at 1:55 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Thank you., I figured it out. I have set executor memory to minimal and it
> works.,
>
> Another issue has come.. I have to pass --add-host option while running
> containers in slave nodes.. Is there any option to pass docker run
> parameters from spark?
> On Thu, Jan 28, 2016 at 12:26 PM Mao Geng <m...@sumologic.com> wrote:
>
>> Sathish,
>>
>> I guess the mesos resources are not enough to run your job. You might
>> want to check the mesos log to figure out why.
>>
>> I tried to run the docker image with "--conf spark.mesos.coarse=false"
>> and "true". Both are fine.
>>
>> Best,
>> Mao
>>
>> On Wed, Jan 27, 2016 at 5:00 PM, Sathish Kumaran Vairavelu <
>> vsathishkuma...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> On the same Spark/Mesos/Docker setup, I am getting warning "Initial Job
>>> has not accepted any resources; check your cluster UI to ensure that
>>> workers are registered and have sufficient resources". I am running in
>>> coarse grained mode. Any pointers on how to fix this issue? Please help. I
>>> have updated both docker.properties and spark-default.conf with  
>>> spark.mesos.executor.docker.image
>>> and other properties.
>>>
>>>
>>> Thanks
>>>
>>> Sathish
>>>
>>> On Wed, Jan 27, 2016 at 9:58 AM Sathish Kumaran Vairavelu <
>>> vsathishkuma...@gmail.com> wrote:
>>>
>>>> Thanks a lot for your info! I will try this today.
>>>> On Wed, Jan 27, 2016 at 9:29 AM Mao Geng <m...@sumologic.com> wrote:
>>>>
>>>>> Hi Sathish,
>>>>>
>>>>> The docker image is normal, no AWS profile included.
>>>>>
>>>>> When the driver container runs with --net=host, the driver host's AWS
>>>>> profile will take effect so that the driver can access the protected s3
>>>>> files.
>>>>>
>>>>> Similarly,  Mesos slaves also run Spark executor docker container in
>>>>> --net=host mode, so that the AWS profile of Mesos slaves will take effect.
>>>>>
>>>>> Hope it helps,
>>>>> Mao
>>>>>
>>>>> On Jan 26, 2016, at 9:15 PM, Sathish Kumaran Vairavelu <
>>>>> vsathishkuma...@gmail.com> wrote:
>>>>>
>>>>> Hi Mao,
>>>>>
>>>>> I want to check on accessing the S3 from Spark docker in Mesos.  The
>>>>> EC2 instance that I am using has the AWS profile/IAM included.  Should we
>>>>> build the docker image with any AWS profile settings or --net=host docker
>>>>> option takes care of it?
>>>>>
>>>>> Please help
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Sathish
>>>>>
>>>>> On Tue, Jan 26, 2016 at 9:04 PM Mao Geng <m...@sumologic.com> wrote:
>>>>>
>>>>>> Thank you very much, Jerry!
>>>>>>
>>>>>> I changed to "--jars
>>>>>> /opt/spark/lib/hadoop-aws-2.7.1.jar,/opt/spark/lib/aws-java-sdk-1.7.4.jar"
>>>>>> then it worked like a charm!
>>>>>>
>>>>>> From Mesos task logs below, I saw Mesos executor downloaded the jars
>>>>>> from the driver, which is a bit unnecessary (as the docker image already
>>>>>> has them), but that's ok - I am happy seeing Spark + Mesos + Docker + S3
>>>>>> worked together!
>>>>>>
>>>>>> Thanks,
>>>>>> Mao
>>>>>>
>>>>>> 16/01/27 02:54:45 INFO Executor: Using REPL class URI: 
>>>>>> http://172.16.3.98:33771
>>>>>> 16/01/27 02:55:12 INFO CoarseGrainedExecutorBackend: Got assigned task 0
>>>>>> 16/01/27 02:55:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>>>>>> 16/01/27 02:55:12 INFO Executor: Fetching 
>>>>>> http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar with timestamp 
>>>>>> 1453863280432
>>>>>>

Re: Spark, Mesos, Docker and S3

2016-01-28 Thread Mao Geng
Sathish,

I guess the mesos resources are not enough to run your job. You might want
to check the mesos log to figure out why.

I tried to run the docker image with "--conf spark.mesos.coarse=false" and
"true". Both are fine.

Best,
Mao

On Wed, Jan 27, 2016 at 5:00 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hi,
>
> On the same Spark/Mesos/Docker setup, I am getting warning "Initial Job
> has not accepted any resources; check your cluster UI to ensure that
> workers are registered and have sufficient resources". I am running in
> coarse grained mode. Any pointers on how to fix this issue? Please help. I
> have updated both docker.properties and spark-default.conf with  
> spark.mesos.executor.docker.image
> and other properties.
>
>
> Thanks
>
> Sathish
>
> On Wed, Jan 27, 2016 at 9:58 AM Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
>> Thanks a lot for your info! I will try this today.
>> On Wed, Jan 27, 2016 at 9:29 AM Mao Geng <m...@sumologic.com> wrote:
>>
>>> Hi Sathish,
>>>
>>> The docker image is normal, no AWS profile included.
>>>
>>> When the driver container runs with --net=host, the driver host's AWS
>>> profile will take effect so that the driver can access the protected s3
>>> files.
>>>
>>> Similarly,  Mesos slaves also run Spark executor docker container in
>>> --net=host mode, so that the AWS profile of Mesos slaves will take effect.
>>>
>>> Hope it helps,
>>> Mao
>>>
>>> On Jan 26, 2016, at 9:15 PM, Sathish Kumaran Vairavelu <
>>> vsathishkuma...@gmail.com> wrote:
>>>
>>> Hi Mao,
>>>
>>> I want to check on accessing the S3 from Spark docker in Mesos.  The EC2
>>> instance that I am using has the AWS profile/IAM included.  Should we build
>>> the docker image with any AWS profile settings or --net=host docker option
>>> takes care of it?
>>>
>>> Please help
>>>
>>>
>>> Thanks
>>>
>>> Sathish
>>>
>>> On Tue, Jan 26, 2016 at 9:04 PM Mao Geng <m...@sumologic.com> wrote:
>>>
>>>> Thank you very much, Jerry!
>>>>
>>>> I changed to "--jars
>>>> /opt/spark/lib/hadoop-aws-2.7.1.jar,/opt/spark/lib/aws-java-sdk-1.7.4.jar"
>>>> then it worked like a charm!
>>>>
>>>> From Mesos task logs below, I saw Mesos executor downloaded the jars
>>>> from the driver, which is a bit unnecessary (as the docker image already
>>>> has them), but that's ok - I am happy seeing Spark + Mesos + Docker + S3
>>>> worked together!
>>>>
>>>> Thanks,
>>>> Mao
>>>>
>>>> 16/01/27 02:54:45 INFO Executor: Using REPL class URI: 
>>>> http://172.16.3.98:33771
>>>> 16/01/27 02:55:12 INFO CoarseGrainedExecutorBackend: Got assigned task 0
>>>> 16/01/27 02:55:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>>>> 16/01/27 02:55:12 INFO Executor: Fetching 
>>>> http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar with timestamp 
>>>> 1453863280432
>>>> 16/01/27 02:55:12 INFO Utils: Fetching 
>>>> http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar to 
>>>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp1518118694295619525.tmp
>>>> 16/01/27 02:55:12 INFO Utils: Copying 
>>>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/-19880839621453863280432_cache
>>>>  to /./hadoop-aws-2.7.1.jar
>>>> 16/01/27 02:55:12 INFO Executor: Adding file:/./hadoop-aws-2.7.1.jar to 
>>>> class loader
>>>> 16/01/27 02:55:12 INFO Executor: Fetching 
>>>> http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar with timestamp 
>>>> 1453863280472
>>>> 16/01/27 02:55:12 INFO Utils: Fetching 
>>>> http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar to 
>>>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp8868621397726761921.tmp
>>>> 16/01/27 02:55:12 INFO Utils: Copying 
>>>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/8167072821453863280472_cache
>>>>  to /./aws-java-sdk-1.7.4.jar
>>>> 16/01/27 02:55:12 INFO Executor: Adding file:/./aws-java-sdk-1.7.4.jar to 
>>>> class loader
>>>>
>>>> On Tue, Jan 26, 2016 at 5:40 PM, Jerry Lam <chiling...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Mao,
>>>>>
>>>>&

Re: Spark, Mesos, Docker and S3

2016-01-27 Thread Mao Geng
Hi Sathish,

The docker image is normal, no AWS profile included. 

When the driver container runs with --net=host, the driver host's AWS profile 
will take effect so that the driver can access the protected s3 files. 

Similarly,  Mesos slaves also run Spark executor docker container in --net=host 
mode, so that the AWS profile of Mesos slaves will take effect.

Hope it helps,
Mao

> On Jan 26, 2016, at 9:15 PM, Sathish Kumaran Vairavelu 
> <vsathishkuma...@gmail.com> wrote:
> 
> Hi Mao, 
> 
> I want to check on accessing the S3 from Spark docker in Mesos.  The EC2 
> instance that I am using has the AWS profile/IAM included.  Should we build 
> the docker image with any AWS profile settings or --net=host docker option 
> takes care of it? 
> 
> Please help
> 
> 
> Thanks
> 
> Sathish
> 
>> On Tue, Jan 26, 2016 at 9:04 PM Mao Geng <m...@sumologic.com> wrote:
>> Thank you very much, Jerry! 
>> 
>> I changed to "--jars 
>> /opt/spark/lib/hadoop-aws-2.7.1.jar,/opt/spark/lib/aws-java-sdk-1.7.4.jar" 
>> then it worked like a charm!
>> 
>> From Mesos task logs below, I saw Mesos executor downloaded the jars from 
>> the driver, which is a bit unnecessary (as the docker image already has 
>> them), but that's ok - I am happy seeing Spark + Mesos + Docker + S3 worked 
>> together!  
>> 
>> Thanks,
>> Mao
>> 16/01/27 02:54:45 INFO Executor: Using REPL class URI: 
>> http://172.16.3.98:33771
>> 16/01/27 02:55:12 INFO CoarseGrainedExecutorBackend: Got assigned task 0
>> 16/01/27 02:55:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>> 16/01/27 02:55:12 INFO Executor: Fetching 
>> http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar with timestamp 
>> 1453863280432
>> 16/01/27 02:55:12 INFO Utils: Fetching 
>> http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar to 
>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp1518118694295619525.tmp
>> 16/01/27 02:55:12 INFO Utils: Copying 
>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/-19880839621453863280432_cache
>>  to /./hadoop-aws-2.7.1.jar
>> 16/01/27 02:55:12 INFO Executor: Adding file:/./hadoop-aws-2.7.1.jar to 
>> class loader
>> 16/01/27 02:55:12 INFO Executor: Fetching 
>> http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar with timestamp 
>> 1453863280472
>> 16/01/27 02:55:12 INFO Utils: Fetching 
>> http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar to 
>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp8868621397726761921.tmp
>> 16/01/27 02:55:12 INFO Utils: Copying 
>> /tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/8167072821453863280472_cache 
>> to /./aws-java-sdk-1.7.4.jar
>> 16/01/27 02:55:12 INFO Executor: Adding file:/./aws-java-sdk-1.7.4.jar to 
>> class loader
>>> On Tue, Jan 26, 2016 at 5:40 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>> Hi Mao,
>>> 
>>> Can you try --jars to include those jars?
>>> 
>>> Best Regards,
>>> 
>>> Jerry
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 26 Jan, 2016, at 7:02 pm, Mao Geng <m...@sumologic.com> wrote:
>>>> 
>>>> Hi there, 
>>>> 
>>>> I am trying to run Spark on Mesos using a Docker image as executor, as 
>>>> mentioned 
>>>> http://spark.apache.org/docs/latest/running-on-mesos.html#mesos-docker-support.
>>>>  
>>>> 
>>>> I built a docker image using the following Dockerfile (which is based on 
>>>> https://github.com/apache/spark/blob/master/docker/spark-mesos/Dockerfile):
>>>> 
>>>> FROM mesosphere/mesos:0.25.0-0.2.70.ubuntu1404
>>>> 
>>>> # Update the base ubuntu image with dependencies needed for Spark
>>>> RUN apt-get update && \
>>>> apt-get install -y python libnss3 openjdk-7-jre-headless curl
>>>> 
>>>> RUN curl 
>>>> http://www.carfab.com/apachesoftware/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
>>>>  | tar -xzC /opt && \
>>>> ln -s /opt/spark-1.6.0-bin-hadoop2.6 /opt/spark
>>>> ENV SPARK_HOME /opt/spark
>>>> ENV MESOS_NATIVE_JAVA_LIBRARY /usr/local/lib/libmesos.so
>>>> 
>>>> Then I successfully ran spark-shell via this docker command:
>>>> docker run --rm -it --net=host /: 
>>>> /opt/spark/bin/spark-shell --master mesos://:5050 --conf 
>>>> /: 
>>>> 
>>>> So far so good. Then I wanted to call sc.textFile to load a file from S3, 
>>>> but I was blocked 

Spark, Mesos, Docker and S3

2016-01-26 Thread Mao Geng
sk failed with "java.lang.ClassNotFoundException"
Exception? Is there something wrong in the command line options I used to
start spark-shell, or in the docker image, or in the "s3a://" url? Or is
something related to the Docker executor of Mesos? I studied a bit
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
but didn't understand it well...

Appreciate if anyone will shed some lights on me.

Thanks,
Mao Geng


Re: Spark, Mesos, Docker and S3

2016-01-26 Thread Mao Geng
Thank you very much, Jerry!

I changed to "--jars
/opt/spark/lib/hadoop-aws-2.7.1.jar,/opt/spark/lib/aws-java-sdk-1.7.4.jar"
then it worked like a charm!

>From Mesos task logs below, I saw Mesos executor downloaded the jars from
the driver, which is a bit unnecessary (as the docker image already has
them), but that's ok - I am happy seeing Spark + Mesos + Docker + S3 worked
together!

Thanks,
Mao

16/01/27 02:54:45 INFO Executor: Using REPL class URI: http://172.16.3.98:33771
16/01/27 02:55:12 INFO CoarseGrainedExecutorBackend: Got assigned task 0
16/01/27 02:55:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/01/27 02:55:12 INFO Executor: Fetching
http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar with timestamp
1453863280432
16/01/27 02:55:12 INFO Utils: Fetching
http://172.16.3.98:3850/jars/hadoop-aws-2.7.1.jar to
/tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp1518118694295619525.tmp
16/01/27 02:55:12 INFO Utils: Copying
/tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/-19880839621453863280432_cache
to /./hadoop-aws-2.7.1.jar
16/01/27 02:55:12 INFO Executor: Adding file:/./hadoop-aws-2.7.1.jar
to class loader
16/01/27 02:55:12 INFO Executor: Fetching
http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar with timestamp
1453863280472
16/01/27 02:55:12 INFO Utils: Fetching
http://172.16.3.98:3850/jars/aws-java-sdk-1.7.4.jar to
/tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/fetchFileTemp8868621397726761921.tmp
16/01/27 02:55:12 INFO Utils: Copying
/tmp/spark-7b8e1681-8a62-4f1d-9e11-fdf8062b1b08/8167072821453863280472_cache
to /./aws-java-sdk-1.7.4.jar
16/01/27 02:55:12 INFO Executor: Adding file:/./aws-java-sdk-1.7.4.jar
to class loader

On Tue, Jan 26, 2016 at 5:40 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Mao,
>
> Can you try --jars to include those jars?
>
> Best Regards,
>
> Jerry
>
> Sent from my iPhone
>
> On 26 Jan, 2016, at 7:02 pm, Mao Geng <m...@sumologic.com> wrote:
>
> Hi there,
>
> I am trying to run Spark on Mesos using a Docker image as executor, as
> mentioned
> http://spark.apache.org/docs/latest/running-on-mesos.html#mesos-docker-support
> .
>
> I built a docker image using the following Dockerfile (which is based on
> https://github.com/apache/spark/blob/master/docker/spark-mesos/Dockerfile
> ):
>
> FROM mesosphere/mesos:0.25.0-0.2.70.ubuntu1404
>
> # Update the base ubuntu image with dependencies needed for Spark
> RUN apt-get update && \
> apt-get install -y python libnss3 openjdk-7-jre-headless curl
>
> RUN curl
> http://www.carfab.com/apachesoftware/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
> | tar -xzC /opt && \
> ln -s /opt/spark-1.6.0-bin-hadoop2.6 /opt/spark
> ENV SPARK_HOME /opt/spark
> ENV MESOS_NATIVE_JAVA_LIBRARY /usr/local/lib/libmesos.so
>
> Then I successfully ran spark-shell via this docker command:
> docker run --rm -it --net=host /:
> /opt/spark/bin/spark-shell --master mesos://:5050 --conf
> /:
>
> So far so good. Then I wanted to call sc.textFile to load a file from S3,
> but I was blocked by some issues which I couldn't figure out. I've read
> https://dzone.com/articles/uniting-spark-parquet-and-s3-as-an-alternative-to
> and
> http://blog.encomiabile.it/2015/10/29/apache-spark-amazon-s3-and-apache-mesos,
> learned that I need to add hadood-aws-2.7.1 and aws-java-sdk-2.7.4 into the
> executor and driver's classpaths, in order to access s3 files.
>
> So, I added following lines into Dockerfile and build a new image.
> RUN curl
> https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
> -o /opt/spark/lib/aws-java-sdk-1.7.4.jar
> RUN curl
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar
> -o /opt/spark/lib/hadoop-aws-2.7.1.jar
>
> Then I started spark-shell again with below command:
> docker run --rm -it --net=host /:
> /opt/spark/bin/spark-shell --master mesos://:5050 --conf
> /: --conf 
> spark.executor.extraClassPath=/opt/spark/lib/hadoop-aws-2.7.1.jar:/opt/spark/lib/aws-java-sdk-1.7.4.jar
>  --conf
> spark.driver.extraClassPath=/opt/spark/lib/hadoop-aws-2.7.1.jar:/opt/spark/lib/aws-java-sdk-1.7.4.jar
>
> But below command failed when I ran it in spark-shell:
> scala> sc.textFile("s3a:///").count()
> [Stage 0:>  (0 +
> 2) / 2]16/01/26 23:05:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, ip-172-16-14-203.us-west-2.compute.internal):
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
> at o