So I tired building the connector from:
https://github.com/datastax/spark-cassandra-connector

which seems to include the java class referenced in the error message:

[root@devzero spark]# unzip -l
spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar
|grep CassandraJavaUtil

    14612  02-16-2015 23:25
com/datastax/spark/connector/japi/CassandraJavaUtil.class

[root@devzero spark]#


When I try running my spark test job, I still get the exact same error,
even though both my jars seems to have been processed by spark.


...
15/02/17 00:00:45 INFO SparkUI: Started SparkUI at http://devzero:4040
15/02/17 00:00:45 INFO SparkContext: Added JAR
file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
http://10.212.55.42:36929/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with
timestamp 1424131245595
15/02/17 00:00:45 INFO SparkContext: Added JAR
file:/spark/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at
http://10.212.55.42:36929/jars/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar
with timestamp 1424131245623
15/02/17 00:00:45 INFO Utils: Copying /spark/test2.py to
/tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/test2.py
15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/test2.py at
http://10.212.55.42:36929/files/test2.py with timestamp 1424131245624
15/02/17 00:00:45 INFO Utils: Copying /spark/pyspark_cassandra.py to
/tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/pyspark_cassandra.py
15/02/17 00:00:45 INFO SparkContext: Added file
file:/spark/pyspark_cassandra.py at
http://10.212.55.42:36929/files/pyspark_cassandra.py with timestamp
1424131245633
15/02/17 00:00:45 INFO Executor: Starting executor ID <driver> on host
localhost
15/
....
15/02/17 00:00:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting
shut down.
Traceback (most recent call last):
  File "/spark/test2.py", line 5, in <module>
    sc = CassandraSparkContext(conf=conf)
  File "/spark/python/pyspark/context.py", line 105, in __init__
    conf, jsc)
  File "/spark/pyspark_cassandra.py", line 17, in _do_init
    self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc)
  File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.


am I building the wrong connector jar? or using the wrong jar?

Thanks a lot,
Mohamed.



On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi <
mohamed.lrh...@georgetown.edu> wrote:

> Oh, I don't know. thanks a lot Davies, gonna figure that out now....
>
> On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <dav...@databricks.com> wrote:
>
>> It also need the Cassandra jar:
>> com.datastax.spark.connector.CassandraJavaUtil
>>
>> Is it included in  /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ?
>>
>>
>>
>> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi
>> <mohamed.lrh...@georgetown.edu> wrote:
>> > Yes, am sure the system cant find the jar.. but how do I fix that... my
>> > submit command includes the jar:
>> >
>> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars
>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py
>> >
>> > and the spark output seems to indicate it is handling it:
>> >
>> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
>> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
>> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with
>> > timestamp 1424066326632
>> >
>> >
>> > I don't really know what else I could try.... any suggestions highly
>> > appreciated.
>> >
>> > Thanks,
>> > Mohamed.
>> >
>> >
>> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <dav...@databricks.com>
>> wrote:
>> >>
>> >> It seems that the jar for cassandra is not loaded, you should have
>> >> them in the classpath.
>> >>
>> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi
>> >> <mohamed.lrh...@georgetown.edu> wrote:
>> >> > Hello all,
>> >> >
>> >> > Trying the example code from this package
>> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this
>> >> > error...
>> >> >
>> >> > Can you see what I am doing wrong? from googling arounf it seems to
>> be
>> >> > that
>> >> > the jar is not found somehow...  The spark log shows the JAR was
>> >> > processed
>> >> > at least.
>> >> >
>> >> > Thank you so much.
>> >> >
>> >> > am using spark-1.2.1-bin-hadoop2.4.tgz
>> >> >
>> >> > test2.py is simply:
>> >> >
>> >> > from pyspark.context import SparkConf
>> >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra
>> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver")
>> >> > conf.set("spark.cassandra.connection.host", "devzero")
>> >> > sc = CassandraSparkContext(conf=conf)
>> >> >
>> >> > [root@devzero spark]# /usr/local/bin/docker-enter  spark-master
>> bash -c
>> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py
>> --jars
>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py"
>> >> > ...
>> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started
>> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting
>> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on
>> >> > addresses
>> >> > :[akka.tcp://sparkDriver@devzero:38917]
>> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service
>> 'sparkDriver'
>> >> > on
>> >> > port 38917.
>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker
>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster
>> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory at
>> >> >
>> >> >
>> /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193
>> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with capacity
>> >> > 265.4
>> >> > MB
>> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load native-hadoop
>> >> > library for your platform... using builtin-java classes where
>> applicable
>> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory is
>> >> >
>> >> >
>> /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647
>> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server
>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP file
>> >> > server' on port 56642.
>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI'
>> on
>> >> > port
>> >> > 4040.
>> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at
>> http://devzero:4040
>> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
>> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
>> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar
>> with
>> >> > timestamp 1424066326632
>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to
>> >> >
>> >> >
>> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py
>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py
>> at
>> >> > http://10.212.55.42:56642/files/test2.py with timestamp
>> 1424066326633
>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py to
>> >> >
>> >> >
>> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py
>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file
>> >> > file:/spark/pyspark_cassandra.py at
>> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp
>> >> > 1424066326642
>> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on
>> host
>> >> > localhost
>> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver:
>> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver
>> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created on
>> >> > 32895
>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register
>> >> > BlockManager
>> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block
>> >> > manager
>> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>,
>> localhost,
>> >> > 32895)
>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager
>> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at
>> >> > http://devzero:4040
>> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler
>> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor:
>> >> > MapOutputTrackerActor
>> >> > stopped!
>> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared
>> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped
>> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster stopped
>> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped
>> SparkContext
>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>> >> > Shutting
>> >> > down remote daemon.
>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>> Remote
>> >> > daemon shut down; proceeding with flushing remote transports.
>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>> >> > Remoting
>> >> > shut down.
>> >> > Traceback (most recent call last):
>> >> >   File "/spark/test2.py", line 5, in <module>
>> >> >     sc = CassandraSparkContext(conf=conf)
>> >> >   File "/spark/python/pyspark/context.py", line 105, in __init__
>> >> >     conf, jsc)
>> >> >   File "/spark/pyspark_cassandra.py", line 17, in _do_init
>> >> >     self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc)
>> >> >   File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> >> > line
>> >> > 726, in __getattr__
>> >> > py4j.protocol.Py4JError: Trying to call a package.
>> >> >
>> >> >
>> >
>> >
>>
>
>

Reply via email to