Hi Andy,
I believe you need to set the host and port settings separately
spark.cassandra.connection.host
spark.cassandra.connection.port
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters
Looking at the logs, it seems your port config is not being set and it's
falling back to default.
Let me know if that helps.
Saurabh Bajaj
On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson wrote:
> Hi Ted
>
> I believe by default cassandra listens on 9042
>
> From: Ted Yu
> Date: Tuesday, March 8, 2016 at 6:11 PM
> To: Andrew Davidson
> Cc: "user @spark"
> Subject: Re: pyspark spark-cassandra-connector java.io.IOException:
> Failed to open native connection to Cassandra at {192.168.1.126}:9042
>
> Have you contacted spark-cassandra-connector related mailing list ?
>
> I wonder where the port 9042 came from.
>
> Cheers
>
> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>>
>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>> notebook that reads a data frame from Cassandra.
>>
>> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
>> works how ever I can not figure out how to configure my notebook. I have
>> tried various hacks any idea what I am doing wrong
>>
>> : java.io.IOException: Failed to open native connection to Cassandra at
>> {192.168.1.126}:9042
>>
>>
>>
>>
>> Thanks in advance
>>
>> Andy
>>
>>
>>
>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>
>> $ export PYSPARK_PYTHON=python3
>> $ export PYSPARK_DRIVER_PYTHON=python3
>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>
>>
>>
>> In [15]:
>> 1
>>
>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>>
>> 2
>>
>> df = sqlContext.read\
>>
>> 3
>>
>> .format("org.apache.spark.sql.cassandra")\
>>
>> 4
>>
>> .options(table=“time_series", keyspace="notification")\
>>
>> 5
>>
>> .load()
>>
>> 6
>>
>>
>>
>> 7
>>
>> df.printSchema()
>>
>> 8
>>
>> df.show()
>>
>> ---Py4JJavaError
>> Traceback (most recent call
>> last) in () 1
>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")>
>> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>> .options(table="kv", keyspace="notification").load() 3 4
>> df.printSchema() 5
>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>> in load(self, path, format, schema, **options)137
>> return self._df(self._jreader.load(path))138 else:--> 139
>> return self._df(self._jreader.load())140 141
>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>> in __call__(self, *args)811 answer =
>> self.gateway_client.send_command(command)812 return_value =
>> get_return_value(--> 813 answer, self.gateway_client,
>> self.target_id, self.name)814 815 for temp_arg in
>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>> in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 45
>> return f(*a, **kw) 46 except
>> py4j.protocol.Py4JJavaError as e: 47 s =
>> e.java_exception.toString()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py
>> in get_return_value(answer, gateway_client, target_id, name)306
>> raise Py4JJavaError(307 "An error occurred
>> while calling {0}{1}{2}.\n".--> 308 format(target_id,
>> ".", name), value)
>> 309 else:310 raise Py4JError(
>> Py4JJavaError: An error occurred while calling o280.load.
>> : java.io.IOException: Failed to open native connection to Cassandr