Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-19 Thread David Rosenstrauch
I ran into this recently.  Turned out we had an old 
org-xerial-snappy.properties file in one of our conf directories that 
had the setting:


# Disables loading Snappy-Java native library bundled in the
# snappy-java-*.jar file forcing to load the Snappy-Java native
# library from the java.library.path.
#
org.xerial.snappy.disable.bundled.libs=true

When I switched that to false, it made the problem go away.

May or may not be your problem of course, but worth a look.

HTH,

DR

On 11/17/2015 05:22 PM, Andy Davidson wrote:

I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I have
3 slaves. In general I am running into trouble even with small work loads. I
am using IPython notebooks running on my spark cluster. Everything is
painfully slow. I am using the standAlone cluster manager. I noticed that I
am getting the following warning on my driver console. Any idea what the
problem might be?



15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.

15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded



Here is an overview of my POS app. I have a file on hdfs containing about
5000 twitter status strings.

tweetStrings = sc.textFile(dataURL)
jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))

Generated the following error ³error occurred while calling o78.partitions.:
java.lang.OutOfMemoryError: GC overhead limit exceeded²

Any idea what we need to do to improve new spark user¹s out of the box
experience?

Kind regards

Andy

export PYSPARK_PYTHON=python3.4

export PYSPARK_DRIVER_PYTHON=python3.4

export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"

MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077


numCores=2

$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
$numCores $*








-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson

On my master

grep native /root/spark/conf/spark-env.sh

SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/l
ib/native/"



$ ls /root/ephemeral-hdfs/lib/native/

libhadoop.a   libhadoop.solibhadooputils.a  libsnappy.so
libsnappy.so.1.1.3  Linux-i386-32

libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a libsnappy.so.1
Linux-amd64-64


From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, November 17, 2015 at 2:29 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: WARN LoadSnappy: Snappy native library not loaded

> I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6
> 
> From:  Andrew Davidson <a...@santacruzintegration.com>
> Date:  Tuesday, November 17, 2015 at 2:26 PM
> To:  "user @spark" <user@spark.apache.org>
> Subject:  Re: WARN LoadSnappy: Snappy native library not loaded
> 
>> FYI
>> 
>> After 17 min. only 26112/228155 have succeeded
>> 
>> This seems very slow
>> 
>> Kind regards
>> 
>> Andy
>> 
>> 
>> 
>> From:  Andrew Davidson <a...@santacruzintegration.com>
>> Date:  Tuesday, November 17, 2015 at 2:22 PM
>> To:  "user @spark" <user@spark.apache.org>
>> Subject:  WARN LoadSnappy: Snappy native library not loaded
>> 
>> 
>>> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>>> have 3 slaves. In general I am running into trouble even with small work
>>> loads. I am using IPython notebooks running on my spark cluster.
>>> Everything is painfully slow. I am using the standAlone cluster manager.
>>> I noticed that I am getting the following warning on my driver console.
>>> Any idea what the problem might be?
>>> 
>>> 
>>> 
>>> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>>> source because spark.app.id is not set.
>>> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>>> 
>>> 
>>> 
>>> Here is an overview of my POS app. I have a file on hdfs containing about
>>> 5000 twitter status strings.
>>> 
>>> tweetStrings = sc.textFile(dataURL)
>>> 
>>> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>>> 
>>> 
>>> Generated the following error ³error occurred while calling
>>> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>>> 
>>> Any idea what we need to do to improve new spark user¹s out of the box
>>> experience?
>>> 
>>> Kind regards
>>> 
>>> Andy
>>> 
>>> export PYSPARK_PYTHON=python3.4
>>> export PYSPARK_DRIVER_PYTHON=python3.4
>>> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>>> 
>>> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>>> 
>>> 
>>> numCores=2
>>> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>>> $numCores $*




Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6

From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, November 17, 2015 at 2:26 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: WARN LoadSnappy: Snappy native library not loaded

> FYI
> 
> After 17 min. only 26112/228155 have succeeded
> 
> This seems very slow
> 
> Kind regards
> 
> Andy
> 
> 
> 
> From:  Andrew Davidson <a...@santacruzintegration.com>
> Date:  Tuesday, November 17, 2015 at 2:22 PM
> To:  "user @spark" <user@spark.apache.org>
> Subject:  WARN LoadSnappy: Snappy native library not loaded
> 
> 
>> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>> have 3 slaves. In general I am running into trouble even with small work
>> loads. I am using IPython notebooks running on my spark cluster.
>> Everything is painfully slow. I am using the standAlone cluster manager.
>> I noticed that I am getting the following warning on my driver console.
>> Any idea what the problem might be?
>> 
>> 
>> 
>> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>> source because spark.app.id is not set.
>> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>> 
>> 
>> 
>> Here is an overview of my POS app. I have a file on hdfs containing about
>> 5000 twitter status strings.
>> 
>> tweetStrings = sc.textFile(dataURL)
>> 
>> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>> 
>> 
>> Generated the following error ³error occurred while calling
>> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>> 
>> Any idea what we need to do to improve new spark user¹s out of the box
>> experience?
>> 
>> Kind regards
>> 
>> Andy
>> 
>> export PYSPARK_PYTHON=python3.4
>> export PYSPARK_DRIVER_PYTHON=python3.4
>> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>> 
>> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>> 
>> 
>> numCores=2
>> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>> $numCores $*




Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
FYI

After 17 min. only 26112/228155 have succeeded

This seems very slow

Kind regards

Andy



From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, November 17, 2015 at 2:22 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  WARN LoadSnappy: Snappy native library not loaded


>I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>have 3 slaves. In general I am running into trouble even with small work
>loads. I am using IPython notebooks running on my spark cluster.
>Everything is painfully slow. I am using the standAlone cluster manager.
>I noticed that I am getting the following warning on my driver console.
>Any idea what the problem might be?
>
>
>
>15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>source because spark.app.id is not set.
>15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>library for your platform... using builtin-java classes where applicable
>15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>
>
>
>Here is an overview of my POS app. I have a file on hdfs containing about
>5000 twitter status strings.
>
>tweetStrings = sc.textFile(dataURL)
>
>jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>
>
>Generated the following error ³error occurred while calling
>o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>
>Any idea what we need to do to improve new spark user¹s out of the box
>experience?
>
>Kind regards
>
>Andy
>
>export PYSPARK_PYTHON=python3.4
>export PYSPARK_DRIVER_PYTHON=python3.4
>export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>
>MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>
>
>numCores=2
>$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>$numCores $*



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I have
3 slaves. In general I am running into trouble even with small work loads. I
am using IPython notebooks running on my spark cluster. Everything is
painfully slow. I am using the standAlone cluster manager. I noticed that I
am getting the following warning on my driver console. Any idea what the
problem might be?



15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.

15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded



Here is an overview of my POS app. I have a file on hdfs containing about
5000 twitter status strings.

tweetStrings = sc.textFile(dataURL)
jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))

Generated the following error ³error occurred while calling o78.partitions.:
java.lang.OutOfMemoryError: GC overhead limit exceeded²

Any idea what we need to do to improve new spark user¹s out of the box
experience?

Kind regards

Andy

export PYSPARK_PYTHON=python3.4

export PYSPARK_DRIVER_PYTHON=python3.4

export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"

MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077


numCores=2

$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
$numCores $*