Re: --packages Failed to load class for data source v1.4

2015-06-17 Thread Don Drake
I don't think this is the same issue as it works just fine in pyspark
v1.3.1.

Are you aware of any workaround? I was hoping to start testing one of my
apps in Spark 1.4 and I use the CSV exports as a safety valve to easily
debug my data flow.

-Don


On Sun, Jun 14, 2015 at 7:18 PM, Burak Yavuz brk...@gmail.com wrote:

 Hi Don,
 This seems related to a known issue, where the classpath on the driver is
 missing the related classes. This is a bug in py4j as py4j uses the System
 Classloader rather than Spark's Context Classloader. However, this problem
 existed in 1.3.0 as well, therefore I'm curious whether it's the same
 issue. Thanks for opening the Jira, I'll take a look.

 Best,
 Burak
 On Jun 14, 2015 2:40 PM, Don Drake dondr...@gmail.com wrote:


 I looked at this again, and when I use the Scala spark-shell and load a
 CSV using the same package it works just fine, so this seems specific to
 pyspark.

 I've created the following JIRA:
 https://issues.apache.org/jira/browse/SPARK-8365

 -Don

 On Sat, Jun 13, 2015 at 11:46 AM, Don Drake dondr...@gmail.com wrote:

 I downloaded the pre-compiled Spark 1.4.0 and attempted to run an
 existing Python Spark application against it and got the following error:

 py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
 : java.lang.RuntimeException: Failed to load class for data source:
 com.databricks.spark.csv

 I pass the following on the command-line to my spark-submit:
 --packages com.databricks:spark-csv_2.10:1.0.3

 This worked fine on 1.3.1, but not in 1.4.

 I was able to replicate it with the following pyspark:

 a = {'a':1.0, 'b':'asdf'}
 rdd = sc.parallelize([a])
 df = sqlContext.createDataFrame(rdd)
 df.save(/tmp/d.csv, com.databricks.spark.csv)


 Even using the new
 df.write.format('com.databricks.spark.csv').save('/tmp/d.csv') gives the
 same error.

 I see it was added in the web UI:
 file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By User
 file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By User
 http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 Thoughts?

 -Don



 Gory details:

 $ pyspark --packages com.databricks:spark-csv_2.10:1.0.3
 Python 2.7.6 (default, Sep  9 2014, 15:04:36)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
 Type help, copyright, credits or license for more information.
 Ivy Default Cache set to: /Users/drake/.ivy2/cache
 The jars for the packages stored in: /Users/drake/.ivy2/jars
 :: loading settings :: url =
 jar:file:/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
 com.databricks#spark-csv_2.10 added as a dependency
 :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
 confs: [default]
 found com.databricks#spark-csv_2.10;1.0.3 in central
 found org.apache.commons#commons-csv;1.1 in central
 :: resolution report :: resolve 590ms :: artifacts dl 17ms
 :: modules in use:
 com.databricks#spark-csv_2.10;1.0.3 from central in [default]
 org.apache.commons#commons-csv;1.1 from central in [default]
 -
 |  |modules||   artifacts   |
 |   conf   | number| search|dwnlded|evicted|| number|dwnlded|
 -
 |  default |   2   |   0   |   0   |   0   ||   2   |   0   |
 -
 :: retrieving :: org.apache.spark#spark-submit-parent
 confs: [default]
 0 artifacts copied, 2 already retrieved (0kB/15ms)
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/06/13 11:06:08 INFO SparkContext: Running Spark version 1.4.0
 2015-06-13 11:06:08.921 java[19233:2145789] Unable to load realm info
 from SCDynamicStore
 15/06/13 11:06:09 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/06/13 11:06:09 WARN Utils: Your hostname, Dons-MacBook-Pro-2.local
 resolves to a loopback address: 127.0.0.1; using 10.0.0.222 instead (on
 interface en0)
 15/06/13 11:06:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
 another address
 15/06/13 11:06:09 INFO SecurityManager: Changing view acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: Changing modify acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(drake); users
 with modify permissions: Set(drake)
 15/06/13 11:06:10 INFO Slf4jLogger: Slf4jLogger started
 15/06/13 11:06:10 INFO Remoting: Starting remoting
 15/06/13 11:06:10 INFO Remoting: Remoting started; listening on
 addresses :[akka.tcp://sparkDriver@10.0.0.222:56870]
 15/06/13 

Re: --packages Failed to load class for data source v1.4

2015-06-14 Thread Don Drake
I looked at this again, and when I use the Scala spark-shell and load a CSV
using the same package it works just fine, so this seems specific to
pyspark.

I've created the following JIRA:
https://issues.apache.org/jira/browse/SPARK-8365

-Don

On Sat, Jun 13, 2015 at 11:46 AM, Don Drake dondr...@gmail.com wrote:

 I downloaded the pre-compiled Spark 1.4.0 and attempted to run an existing
 Python Spark application against it and got the following error:

 py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
 : java.lang.RuntimeException: Failed to load class for data source:
 com.databricks.spark.csv

 I pass the following on the command-line to my spark-submit:
 --packages com.databricks:spark-csv_2.10:1.0.3

 This worked fine on 1.3.1, but not in 1.4.

 I was able to replicate it with the following pyspark:

 a = {'a':1.0, 'b':'asdf'}
 rdd = sc.parallelize([a])
 df = sqlContext.createDataFrame(rdd)
 df.save(/tmp/d.csv, com.databricks.spark.csv)


 Even using the new
 df.write.format('com.databricks.spark.csv').save('/tmp/d.csv') gives the
 same error.

 I see it was added in the web UI:
 file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By 
 Userfile:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By 
 Userhttp://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 Thoughts?

 -Don



 Gory details:

 $ pyspark --packages com.databricks:spark-csv_2.10:1.0.3
 Python 2.7.6 (default, Sep  9 2014, 15:04:36)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
 Type help, copyright, credits or license for more information.
 Ivy Default Cache set to: /Users/drake/.ivy2/cache
 The jars for the packages stored in: /Users/drake/.ivy2/jars
 :: loading settings :: url =
 jar:file:/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
 com.databricks#spark-csv_2.10 added as a dependency
 :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
 confs: [default]
 found com.databricks#spark-csv_2.10;1.0.3 in central
 found org.apache.commons#commons-csv;1.1 in central
 :: resolution report :: resolve 590ms :: artifacts dl 17ms
 :: modules in use:
 com.databricks#spark-csv_2.10;1.0.3 from central in [default]
 org.apache.commons#commons-csv;1.1 from central in [default]
 -
 |  |modules||   artifacts   |
 |   conf   | number| search|dwnlded|evicted|| number|dwnlded|
 -
 |  default |   2   |   0   |   0   |   0   ||   2   |   0   |
 -
 :: retrieving :: org.apache.spark#spark-submit-parent
 confs: [default]
 0 artifacts copied, 2 already retrieved (0kB/15ms)
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/06/13 11:06:08 INFO SparkContext: Running Spark version 1.4.0
 2015-06-13 11:06:08.921 java[19233:2145789] Unable to load realm info from
 SCDynamicStore
 15/06/13 11:06:09 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/06/13 11:06:09 WARN Utils: Your hostname, Dons-MacBook-Pro-2.local
 resolves to a loopback address: 127.0.0.1; using 10.0.0.222 instead (on
 interface en0)
 15/06/13 11:06:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
 another address
 15/06/13 11:06:09 INFO SecurityManager: Changing view acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: Changing modify acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(drake); users
 with modify permissions: Set(drake)
 15/06/13 11:06:10 INFO Slf4jLogger: Slf4jLogger started
 15/06/13 11:06:10 INFO Remoting: Starting remoting
 15/06/13 11:06:10 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkDriver@10.0.0.222:56870]
 15/06/13 11:06:10 INFO Utils: Successfully started service 'sparkDriver'
 on port 56870.
 15/06/13 11:06:10 INFO SparkEnv: Registering MapOutputTracker
 15/06/13 11:06:10 INFO SparkEnv: Registering BlockManagerMaster
 15/06/13 11:06:10 INFO DiskBlockManager: Created local directory at
 /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0hgn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/blockmgr-a1412b71-fe56-429c-a193-ce3fb95d2ffd
 15/06/13 11:06:10 INFO MemoryStore: MemoryStore started with capacity
 265.4 MB
 15/06/13 11:06:10 INFO HttpFileServer: HTTP File server directory is
 /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0hgn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/httpd-84d178da-7e60-4eed-8031-e6a0c465bd4c
 15/06/13 11:06:10 INFO HttpServer: Starting HTTP Server
 

Re: --packages Failed to load class for data source v1.4

2015-06-14 Thread Burak Yavuz
Hi Don,
This seems related to a known issue, where the classpath on the driver is
missing the related classes. This is a bug in py4j as py4j uses the System
Classloader rather than Spark's Context Classloader. However, this problem
existed in 1.3.0 as well, therefore I'm curious whether it's the same
issue. Thanks for opening the Jira, I'll take a look.

Best,
Burak
On Jun 14, 2015 2:40 PM, Don Drake dondr...@gmail.com wrote:


 I looked at this again, and when I use the Scala spark-shell and load a
 CSV using the same package it works just fine, so this seems specific to
 pyspark.

 I've created the following JIRA:
 https://issues.apache.org/jira/browse/SPARK-8365

 -Don

 On Sat, Jun 13, 2015 at 11:46 AM, Don Drake dondr...@gmail.com wrote:

 I downloaded the pre-compiled Spark 1.4.0 and attempted to run an
 existing Python Spark application against it and got the following error:

 py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
 : java.lang.RuntimeException: Failed to load class for data source:
 com.databricks.spark.csv

 I pass the following on the command-line to my spark-submit:
 --packages com.databricks:spark-csv_2.10:1.0.3

 This worked fine on 1.3.1, but not in 1.4.

 I was able to replicate it with the following pyspark:

 a = {'a':1.0, 'b':'asdf'}
 rdd = sc.parallelize([a])
 df = sqlContext.createDataFrame(rdd)
 df.save(/tmp/d.csv, com.databricks.spark.csv)


 Even using the new
 df.write.format('com.databricks.spark.csv').save('/tmp/d.csv') gives the
 same error.

 I see it was added in the web UI:
 file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By User
 file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
 By User
 http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jarAdded
 By User
 Thoughts?

 -Don



 Gory details:

 $ pyspark --packages com.databricks:spark-csv_2.10:1.0.3
 Python 2.7.6 (default, Sep  9 2014, 15:04:36)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
 Type help, copyright, credits or license for more information.
 Ivy Default Cache set to: /Users/drake/.ivy2/cache
 The jars for the packages stored in: /Users/drake/.ivy2/jars
 :: loading settings :: url =
 jar:file:/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
 com.databricks#spark-csv_2.10 added as a dependency
 :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
 confs: [default]
 found com.databricks#spark-csv_2.10;1.0.3 in central
 found org.apache.commons#commons-csv;1.1 in central
 :: resolution report :: resolve 590ms :: artifacts dl 17ms
 :: modules in use:
 com.databricks#spark-csv_2.10;1.0.3 from central in [default]
 org.apache.commons#commons-csv;1.1 from central in [default]
 -
 |  |modules||   artifacts   |
 |   conf   | number| search|dwnlded|evicted|| number|dwnlded|
 -
 |  default |   2   |   0   |   0   |   0   ||   2   |   0   |
 -
 :: retrieving :: org.apache.spark#spark-submit-parent
 confs: [default]
 0 artifacts copied, 2 already retrieved (0kB/15ms)
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/06/13 11:06:08 INFO SparkContext: Running Spark version 1.4.0
 2015-06-13 11:06:08.921 java[19233:2145789] Unable to load realm info
 from SCDynamicStore
 15/06/13 11:06:09 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 15/06/13 11:06:09 WARN Utils: Your hostname, Dons-MacBook-Pro-2.local
 resolves to a loopback address: 127.0.0.1; using 10.0.0.222 instead (on
 interface en0)
 15/06/13 11:06:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
 another address
 15/06/13 11:06:09 INFO SecurityManager: Changing view acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: Changing modify acls to: drake
 15/06/13 11:06:09 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(drake); users
 with modify permissions: Set(drake)
 15/06/13 11:06:10 INFO Slf4jLogger: Slf4jLogger started
 15/06/13 11:06:10 INFO Remoting: Starting remoting
 15/06/13 11:06:10 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkDriver@10.0.0.222:56870]
 15/06/13 11:06:10 INFO Utils: Successfully started service 'sparkDriver'
 on port 56870.
 15/06/13 11:06:10 INFO SparkEnv: Registering MapOutputTracker
 15/06/13 11:06:10 INFO SparkEnv: Registering BlockManagerMaster
 15/06/13 11:06:10 INFO DiskBlockManager: Created local directory at