So I tired building the connector from: https://github.com/datastax/spark-cassandra-connector
which seems to include the java class referenced in the error message: [root@devzero spark]# unzip -l spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar |grep CassandraJavaUtil 14612 02-16-2015 23:25 com/datastax/spark/connector/japi/CassandraJavaUtil.class [root@devzero spark]# When I try running my spark test job, I still get the exact same error, even though both my jars seems to have been processed by spark. ... 15/02/17 00:00:45 INFO SparkUI: Started SparkUI at http://devzero:4040 15/02/17 00:00:45 INFO SparkContext: Added JAR file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at http://10.212.55.42:36929/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with timestamp 1424131245595 15/02/17 00:00:45 INFO SparkContext: Added JAR file:/spark/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at http://10.212.55.42:36929/jars/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar with timestamp 1424131245623 15/02/17 00:00:45 INFO Utils: Copying /spark/test2.py to /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/test2.py 15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/test2.py at http://10.212.55.42:36929/files/test2.py with timestamp 1424131245624 15/02/17 00:00:45 INFO Utils: Copying /spark/pyspark_cassandra.py to /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/pyspark_cassandra.py 15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/pyspark_cassandra.py at http://10.212.55.42:36929/files/pyspark_cassandra.py with timestamp 1424131245633 15/02/17 00:00:45 INFO Executor: Starting executor ID <driver> on host localhost 15/ .... 15/02/17 00:00:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. Traceback (most recent call last): File "/spark/test2.py", line 5, in <module> sc = CassandraSparkContext(conf=conf) File "/spark/python/pyspark/context.py", line 105, in __init__ conf, jsc) File "/spark/pyspark_cassandra.py", line 17, in _do_init self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 726, in __getattr__ py4j.protocol.Py4JError: Trying to call a package. am I building the wrong connector jar? or using the wrong jar? Thanks a lot, Mohamed. On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi < mohamed.lrh...@georgetown.edu> wrote: > Oh, I don't know. thanks a lot Davies, gonna figure that out now.... > > On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <dav...@databricks.com> wrote: > >> It also need the Cassandra jar: >> com.datastax.spark.connector.CassandraJavaUtil >> >> Is it included in /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ? >> >> >> >> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi >> <mohamed.lrh...@georgetown.edu> wrote: >> > Yes, am sure the system cant find the jar.. but how do I fix that... my >> > submit command includes the jar: >> > >> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py >> > >> > and the spark output seems to indicate it is handling it: >> > >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with >> > timestamp 1424066326632 >> > >> > >> > I don't really know what else I could try.... any suggestions highly >> > appreciated. >> > >> > Thanks, >> > Mohamed. >> > >> > >> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <dav...@databricks.com> >> wrote: >> >> >> >> It seems that the jar for cassandra is not loaded, you should have >> >> them in the classpath. >> >> >> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi >> >> <mohamed.lrh...@georgetown.edu> wrote: >> >> > Hello all, >> >> > >> >> > Trying the example code from this package >> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this >> >> > error... >> >> > >> >> > Can you see what I am doing wrong? from googling arounf it seems to >> be >> >> > that >> >> > the jar is not found somehow... The spark log shows the JAR was >> >> > processed >> >> > at least. >> >> > >> >> > Thank you so much. >> >> > >> >> > am using spark-1.2.1-bin-hadoop2.4.tgz >> >> > >> >> > test2.py is simply: >> >> > >> >> > from pyspark.context import SparkConf >> >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra >> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver") >> >> > conf.set("spark.cassandra.connection.host", "devzero") >> >> > sc = CassandraSparkContext(conf=conf) >> >> > >> >> > [root@devzero spark]# /usr/local/bin/docker-enter spark-master >> bash -c >> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py >> --jars >> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path >> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py" >> >> > ... >> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started >> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting >> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on >> >> > addresses >> >> > :[akka.tcp://sparkDriver@devzero:38917] >> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service >> 'sparkDriver' >> >> > on >> >> > port 38917. >> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker >> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster >> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory at >> >> > >> >> > >> /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193 >> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with capacity >> >> > 265.4 >> >> > MB >> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load native-hadoop >> >> > library for your platform... using builtin-java classes where >> applicable >> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory is >> >> > >> >> > >> /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647 >> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server >> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP file >> >> > server' on port 56642. >> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI' >> on >> >> > port >> >> > 4040. >> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at >> http://devzero:4040 >> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR >> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at >> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar >> with >> >> > timestamp 1424066326632 >> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to >> >> > >> >> > >> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py >> >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py >> at >> >> > http://10.212.55.42:56642/files/test2.py with timestamp >> 1424066326633 >> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py to >> >> > >> >> > >> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py >> >> > 15/02/16 05:58:46 INFO SparkContext: Added file >> >> > file:/spark/pyspark_cassandra.py at >> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp >> >> > 1424066326642 >> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on >> host >> >> > localhost >> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver: >> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver >> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created on >> >> > 32895 >> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register >> >> > BlockManager >> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block >> >> > manager >> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>, >> localhost, >> >> > 32895) >> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager >> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at >> >> > http://devzero:4040 >> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler >> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor: >> >> > MapOutputTrackerActor >> >> > stopped! >> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared >> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped >> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster stopped >> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped >> SparkContext >> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >> >> > Shutting >> >> > down remote daemon. >> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >> Remote >> >> > daemon shut down; proceeding with flushing remote transports. >> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >> >> > Remoting >> >> > shut down. >> >> > Traceback (most recent call last): >> >> > File "/spark/test2.py", line 5, in <module> >> >> > sc = CassandraSparkContext(conf=conf) >> >> > File "/spark/python/pyspark/context.py", line 105, in __init__ >> >> > conf, jsc) >> >> > File "/spark/pyspark_cassandra.py", line 17, in _do_init >> >> > self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) >> >> > File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >> >> > line >> >> > 726, in __getattr__ >> >> > py4j.protocol.Py4JError: Trying to call a package. >> >> > >> >> > >> > >> > >> > >