Hi Guys

I have the following script which will be used in Spark.

#!/usr/bin/env python3
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
import os
os.environ['CLASSPATH']="/mnt/spark/lib"
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = 
sqlContext.read.format("org.apache.spark.sql.cassandra").options(keyspace="lebara_diameter_codes",
 table="nl_lebara_diameter_codes").load()
list = df.select("errorcode2001").where("errorcode2001 > 1200").collect()
list2 = df.select("date").collect()
print([i for i in list[0]])
print(type(list[0]))

The error that it throws is the following one (which is logical because I do 
not load the jar files):
py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.
: java.lang.ClassNotFoundException: Failed to find data source: 
org.apache.spark.sql.cassandra. Please find packages at 
http://spark-packages.org

Is there a way to load those jar files into python or the classpath when 
calling sqlContext.read.format("org.apache.spark.sql.cassandra")?

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to