Thanks Tim,
There's a little more to it in fact - if I use the
pre-built-with-hadoop-2.6 binaries, all is good (with correctly named
tarballs in hdfs). Using the pre-built with user-provided hadoop
(including setting SPARK_DIST_CLASSPATH in setup-env.sh) then I get the
JNI exception.
Aha - I've found the minimal set of changes that fixes it. I can use
the user-provided hadoop tarballs, but I _have_ to add spark-env.sh to
them (which I wasn't expecting - I don't recall seeing this anywhere in
the docs so I was expecting everything was setup by spark/mesos from the
client config).
FWIW, spark-env.sh:
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)
#export MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so
export SPARK_EXECUTOR_URI=hdfs:///apps/spark/spark15.tgz
Leaving out SPARK_DIST_CLASSPATH leads to
org.apache.hadoop.fs.FSDataInputStream class errors (as you'd expect).
Leaving out MESOS_NATIVE_JAVA_LIBRARY seems to have no consequences ATM
(it is set in the client).
Leaving out SPARK_EXECUTOR_URI stops the job starting at all.
spark-defaults.conf isn't required to be in the tarball, on the client
it's set to:
spark.master
mesos://zk://mesos-1.example.net:2181,mesos-2.example.net:2181,mesos-3.example.net:2181/mesos
spark.executor.uri hdfs:///apps/spark/spark15.tgz
I guess this is the way forward for us right now, bit uncomfortable as I
like to understand why :-)
On 09/09/2015 18:43, Tim Chen wrote:
Hi Adrian,
Spark is expecting a specific naming of the tgz and also the folder
name inside, as this is generated by running make-distribution.sh
--tgz in the Spark source folder.
If you use a Spark 1.4 tgz generated with that script with the same
name and upload to HDFS again, fix the URI then it should work.
Tim
On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <adr...@opensignal.com
<mailto:adr...@opensignal.com>> wrote:
5mins later...
Trying 1.5 with a fairly plain build:
./make-distribution.sh --tgz --name os1 -Phadoop-2.6
and on my first attempt stderr showed:
I0909 15:16:49.392144 1619 fetcher.cpp:441] Fetched
'hdfs:///apps/spark/spark15.tgz' to
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz'
sh: 1: cd: can't cd to spark15.tgz
sh: 1: ./bin/spark-class: not found
Aha, let's rename the file in hdfs (and the two configs) from
spark15.tgz to spark-1.5.0-bin-os1.tgz...
Success!!!
The same trick with 1.4 doesn't work, but now that I have
something that does I can make progress.
Hopefully this helps someone else :-)
Adrian
On 09/09/2015 16:59, Adrian Bridgett wrote:
I'm trying to run spark (1.4.1) on top of mesos (0.23). I've
followed the instructions (uploaded spark tarball to HDFS, set
executor uri in both places etc) and yet on the slaves it's
failing to lauch even the SparkPi example with a JNI error. It
does run with a local master. A day of debugging later and it's
time to ask for help!
bin/spark-submit --master mesos://10.1.201.191:5050
<http://10.1.201.191:5050> --class
org.apache.spark.examples.SparkPi /tmp/examples.jar
(I'm putting the jar outside hdfs - on both client box + slave
(turned off other slaves for debugging) - due to
http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html.
I should note that I had the same JNI errors when using the mesos
cluster dispatcher).
I'm using Oracle Java 8 (no other java - even openjdk - is installed)
As you can see, the slave is downloading the framework fine (you
can even see it extracted on the slave). Can anyone shed some
light on what's going on - e.g. how is it attempting to run the
executor?
I'm going to try a different JVM (and try a custom spark
distribution) but I suspect that the problem is much more basic.
Maybe it can't find the hadoop native libs?
Any light would be much appreciated :) I've included the
slaves's stderr below:
I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"}
I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI
'hdfs:///apps/spark/spark.tgz'
I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly
into the sandbox directory
I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI
'hdfs:///apps/spark/spark.tgz'
I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource
with Hadoop client from 'hdfs:///apps/spark/spark.tgz' to
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with
command: tar -C
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
-xf
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
into
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of
extracting resource from URI with 'extract' flag, because it does
not seem to be an archive: hdfs:///apps/spark/spark.tgz
I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched
'hdfs:///apps/spark/spark.tgz' to
'/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
Error: A JNI error has occurred, please check your installation
and try again
Exception in thread "main" java.lang.NoClassDefFoundError:
org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at
sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett
<http://twitter.com/adrianbridgett>| LinkedIn link
<https://uk.linkedin.com/in/abridgett>
_____________________________________________________
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>|
LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________