Re: WARN LoadSnappy: Snappy native library not loaded
I ran into this recently. Turned out we had an old org-xerial-snappy.properties file in one of our conf directories that had the setting: # Disables loading Snappy-Java native library bundled in the # snappy-java-*.jar file forcing to load the Snappy-Java native # library from the java.library.path. # org.xerial.snappy.disable.bundled.libs=true When I switched that to false, it made the problem go away. May or may not be your problem of course, but worth a look. HTH, DR On 11/17/2015 05:22 PM, Andy Davidson wrote: I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I have 3 slaves. In general I am running into trouble even with small work loads. I am using IPython notebooks running on my spark cluster. Everything is painfully slow. I am using the standAlone cluster manager. I noticed that I am getting the following warning on my driver console. Any idea what the problem might be? 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded Here is an overview of my POS app. I have a file on hdfs containing about 5000 twitter status strings. tweetStrings = sc.textFile(dataURL) jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) Generated the following error ³error occurred while calling o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² Any idea what we need to do to improve new spark user¹s out of the box experience? Kind regards Andy export PYSPARK_PYTHON=python3.4 export PYSPARK_DRIVER_PYTHON=python3.4 export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 numCores=2 $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores $numCores $* - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: WARN LoadSnappy: Snappy native library not loaded
On my master grep native /root/spark/conf/spark-env.sh SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/l ib/native/" $ ls /root/ephemeral-hdfs/lib/native/ libhadoop.a libhadoop.solibhadooputils.a libsnappy.so libsnappy.so.1.1.3 Linux-i386-32 libhadooppipes.a libhadoop.so.1.0.0 libhdfs.a libsnappy.so.1 Linux-amd64-64 From: Andrew Davidson <a...@santacruzintegration.com> Date: Tuesday, November 17, 2015 at 2:29 PM To: "user @spark" <user@spark.apache.org> Subject: Re: WARN LoadSnappy: Snappy native library not loaded > I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6 > > From: Andrew Davidson <a...@santacruzintegration.com> > Date: Tuesday, November 17, 2015 at 2:26 PM > To: "user @spark" <user@spark.apache.org> > Subject: Re: WARN LoadSnappy: Snappy native library not loaded > >> FYI >> >> After 17 min. only 26112/228155 have succeeded >> >> This seems very slow >> >> Kind regards >> >> Andy >> >> >> >> From: Andrew Davidson <a...@santacruzintegration.com> >> Date: Tuesday, November 17, 2015 at 2:22 PM >> To: "user @spark" <user@spark.apache.org> >> Subject: WARN LoadSnappy: Snappy native library not loaded >> >> >>> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I >>> have 3 slaves. In general I am running into trouble even with small work >>> loads. I am using IPython notebooks running on my spark cluster. >>> Everything is painfully slow. I am using the standAlone cluster manager. >>> I noticed that I am getting the following warning on my driver console. >>> Any idea what the problem might be? >>> >>> >>> >>> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for >>> source because spark.app.id is not set. >>> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded >>> >>> >>> >>> Here is an overview of my POS app. I have a file on hdfs containing about >>> 5000 twitter status strings. >>> >>> tweetStrings = sc.textFile(dataURL) >>> >>> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) >>> >>> >>> Generated the following error ³error occurred while calling >>> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² >>> >>> Any idea what we need to do to improve new spark user¹s out of the box >>> experience? >>> >>> Kind regards >>> >>> Andy >>> >>> export PYSPARK_PYTHON=python3.4 >>> export PYSPARK_DRIVER_PYTHON=python3.4 >>> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" >>> >>> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 >>> >>> >>> numCores=2 >>> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores >>> $numCores $*
Re: WARN LoadSnappy: Snappy native library not loaded
I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6 From: Andrew Davidson <a...@santacruzintegration.com> Date: Tuesday, November 17, 2015 at 2:26 PM To: "user @spark" <user@spark.apache.org> Subject: Re: WARN LoadSnappy: Snappy native library not loaded > FYI > > After 17 min. only 26112/228155 have succeeded > > This seems very slow > > Kind regards > > Andy > > > > From: Andrew Davidson <a...@santacruzintegration.com> > Date: Tuesday, November 17, 2015 at 2:22 PM > To: "user @spark" <user@spark.apache.org> > Subject: WARN LoadSnappy: Snappy native library not loaded > > >> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I >> have 3 slaves. In general I am running into trouble even with small work >> loads. I am using IPython notebooks running on my spark cluster. >> Everything is painfully slow. I am using the standAlone cluster manager. >> I noticed that I am getting the following warning on my driver console. >> Any idea what the problem might be? >> >> >> >> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for >> source because spark.app.id is not set. >> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded >> >> >> >> Here is an overview of my POS app. I have a file on hdfs containing about >> 5000 twitter status strings. >> >> tweetStrings = sc.textFile(dataURL) >> >> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) >> >> >> Generated the following error ³error occurred while calling >> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² >> >> Any idea what we need to do to improve new spark user¹s out of the box >> experience? >> >> Kind regards >> >> Andy >> >> export PYSPARK_PYTHON=python3.4 >> export PYSPARK_DRIVER_PYTHON=python3.4 >> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" >> >> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 >> >> >> numCores=2 >> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores >> $numCores $*
Re: WARN LoadSnappy: Snappy native library not loaded
FYI After 17 min. only 26112/228155 have succeeded This seems very slow Kind regards Andy From: Andrew Davidson <a...@santacruzintegration.com> Date: Tuesday, November 17, 2015 at 2:22 PM To: "user @spark" <user@spark.apache.org> Subject: WARN LoadSnappy: Snappy native library not loaded >I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I >have 3 slaves. In general I am running into trouble even with small work >loads. I am using IPython notebooks running on my spark cluster. >Everything is painfully slow. I am using the standAlone cluster manager. >I noticed that I am getting the following warning on my driver console. >Any idea what the problem might be? > > > >15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for >source because spark.app.id is not set. >15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop >library for your platform... using builtin-java classes where applicable >15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded > > > >Here is an overview of my POS app. I have a file on hdfs containing about >5000 twitter status strings. > >tweetStrings = sc.textFile(dataURL) > >jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) > > >Generated the following error ³error occurred while calling >o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² > >Any idea what we need to do to improve new spark user¹s out of the box >experience? > >Kind regards > >Andy > >export PYSPARK_PYTHON=python3.4 >export PYSPARK_DRIVER_PYTHON=python3.4 >export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" > >MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 > > >numCores=2 >$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores >$numCores $* - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
WARN LoadSnappy: Snappy native library not loaded
I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I have 3 slaves. In general I am running into trouble even with small work loads. I am using IPython notebooks running on my spark cluster. Everything is painfully slow. I am using the standAlone cluster manager. I noticed that I am getting the following warning on my driver console. Any idea what the problem might be? 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded Here is an overview of my POS app. I have a file on hdfs containing about 5000 twitter status strings. tweetStrings = sc.textFile(dataURL) jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) Generated the following error ³error occurred while calling o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² Any idea what we need to do to improve new spark user¹s out of the box experience? Kind regards Andy export PYSPARK_PYTHON=python3.4 export PYSPARK_DRIVER_PYTHON=python3.4 export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 numCores=2 $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores $numCores $*