Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-09 Thread Andy Davidson
Hi Ted and Saurahb

If I use —conf arguments with pyspark I am able to connect. Any idea how I
can set these values programmatically? (I work on a notebook server and can
not easily reconfigure the server

This works

extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
--packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10"

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs --conf
spark.cassandra.connection.host=localhost --conf
spark.cassandra.connection.port=9043 $*


df = sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="json_timeseries", keyspace="notification")\
.load()
df.printSchema()
df.show(truncate=False)


I have tried using setContext.setConf() but it does not work. It does  not
seem to have any effect

#sqlContext.setConf("spark.cassandra.connection.host","localhost")
#sqlContext.setConf("spark.cassandra.connection.port","9043")

#sqlContext.setConf("connection.host","localhost")
#sqlContext.setConf("connection.port","9043")

sqlContext.setConf("host","localhost")
sqlContext.setConf("port","9043”)

Thanks

Andy

From:  Saurabh Bajaj 
Date:  Tuesday, March 8, 2016 at 9:13 PM
To:  Andrew Davidson 
Cc:  Ted Yu , "user @spark" 
Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
to open native connection to Cassandra at {192.168.1.126}:9042

> Hi Andy, 
> 
> I believe you need to set the host and port settings separately
> spark.cassandra.connection.host
> spark.cassandra.connection.port
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/referenc
> e.md#cassandra-connection-parameters
> 
> Looking at the logs, it seems your port config is not being set and it's
> falling back to default.
> Let me know if that helps.
> 
> Saurabh Bajaj
> 
> On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson 
> wrote:
>> Hi Ted
>> 
>> I believe by default cassandra listens on 9042
>> 
>> From:  Ted Yu 
>> Date:  Tuesday, March 8, 2016 at 6:11 PM
>> To:  Andrew Davidson 
>> Cc:  "user @spark" 
>> Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
>> to open native connection to Cassandra at {192.168.1.126}:9042
>> 
>>> Have you contacted spark-cassandra-connector related mailing list ?
>>> 
>>> I wonder where the port 9042 came from.
>>> 
>>> Cheers
>>> 
>>> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson
>>>  wrote:
>>>> 
>>>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>>>> notebook that reads a data frame from Cassandra.
>>>> 
>>>> I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
>>>> how ever I can not figure out how to configure my notebook. I have tried
>>>> various hacks any idea what I am doing wrong
>>>> 
>>>> : java.io.IOException: Failed to open native connection to Cassandra at
>>>> {192.168.1.126}:9042
>>>> 
>>>> 
>>>> 
>>>> Thanks in advance
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> 
>>>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>>>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>>> 
>>>> $ export PYSPARK_PYTHON=python3
>>>> $ export PYSPARK_DRIVER_PYTHON=python3
>>>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>>> 
>>>> 
>>>> 
>>>> In [15]:
>>>> 1
>>>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043
>>>> <http://127.0.0.1:9043> ")
>>>> 2
>>>> df = sqlContext.read\
>>>> 3
>>>> .format("org.apache.spark.sql.cassandra")\
>>>> 4
>>>> .options(table=“time_series", keyspace="notification")\
>>>> 5
>>>> .load()
>>>> 6
>>>> ​
>>>> 7
>>>> df.printSchema()
>>>> 8
>>>> df.show()
>>>> ---
>>>> Py4JJavaError Traceback (most recent call last)
>>>>  in ()  1
>>>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")>
>>>

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Ted Yu
>From cassandra.yaml :

native_transport_port: 9042

FYI

On Tue, Mar 8, 2016 at 9:13 PM, Saurabh Bajaj 
wrote:

> Hi Andy,
>
> I believe you need to set the host and port settings separately
> spark.cassandra.connection.host
> spark.cassandra.connection.port
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters
>
> Looking at the logs, it seems your port config is not being set and it's
> falling back to default.
> Let me know if that helps.
>
> Saurabh Bajaj
>
> On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> Hi Ted
>>
>> I believe by default cassandra listens on 9042
>>
>> From: Ted Yu 
>> Date: Tuesday, March 8, 2016 at 6:11 PM
>> To: Andrew Davidson 
>> Cc: "user @spark" 
>> Subject: Re: pyspark spark-cassandra-connector java.io.IOException:
>> Failed to open native connection to Cassandra at {192.168.1.126}:9042
>>
>> Have you contacted spark-cassandra-connector related mailing list ?
>>
>> I wonder where the port 9042 came from.
>>
>> Cheers
>>
>> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <
>> a...@santacruzintegration.com> wrote:
>>
>>>
>>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>>> notebook that reads a data frame from Cassandra.
>>>
>>> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
>>> works how ever I can not figure out how to configure my notebook. I have
>>> tried various hacks any idea what I am doing wrong
>>>
>>> : java.io.IOException: Failed to open native connection to Cassandra at 
>>> {192.168.1.126}:9042
>>>
>>>
>>>
>>>
>>> Thanks in advance
>>>
>>> Andy
>>>
>>>
>>>
>>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>>> --packages
>>> datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>>
>>> $ export PYSPARK_PYTHON=python3
>>> $ export PYSPARK_DRIVER_PYTHON=python3
>>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>>
>>>
>>>
>>> In [15]:
>>> 1
>>>
>>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>>>
>>> 2
>>>
>>> df = sqlContext.read\
>>>
>>> 3
>>>
>>> .format("org.apache.spark.sql.cassandra")\
>>>
>>> 4
>>>
>>> .options(table=“time_series", keyspace="notification")\
>>>
>>> 5
>>>
>>> .load()
>>>
>>> 6
>>>
>>> ​
>>>
>>> 7
>>>
>>> df.printSchema()
>>>
>>> 8
>>>
>>> df.show()
>>>
>>> ---Py4JJavaError
>>>  Traceback (most recent call 
>>> last) in ()  1 
>>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 
>>> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>>> .options(table="kv", keyspace="notification").load()  3   4 
>>> df.printSchema()  5 
>>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>>>  in load(self, path, format, schema, **options)137 
>>> return self._df(self._jreader.load(path))138 else:--> 139   
>>>   return self._df(self._jreader.load())140 141 
>>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>>>  in __call__(self, *args)811 answer = 
>>> self.gateway_client.send_command(command)812 return_value = 
>>> get_return_value(--> 813 answer, self.gateway_client, 
>>> self.target_id, self.name)814 815 for temp_arg in 
>>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>>>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 
>>> 45 return f(*a, **kw) 46 except 
>>> py4j.protocol.Py4JJavaError as e: 47 s = 
>>> e.java_exception.toString()/Users/andrewdavidson/wor

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Saurabh Bajaj
Hi Andy,

I believe you need to set the host and port settings separately
spark.cassandra.connection.host
spark.cassandra.connection.port
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-connection-parameters

Looking at the logs, it seems your port config is not being set and it's
falling back to default.
Let me know if that helps.

Saurabh Bajaj

On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson  wrote:

> Hi Ted
>
> I believe by default cassandra listens on 9042
>
> From: Ted Yu 
> Date: Tuesday, March 8, 2016 at 6:11 PM
> To: Andrew Davidson 
> Cc: "user @spark" 
> Subject: Re: pyspark spark-cassandra-connector java.io.IOException:
> Failed to open native connection to Cassandra at {192.168.1.126}:9042
>
> Have you contacted spark-cassandra-connector related mailing list ?
>
> I wonder where the port 9042 came from.
>
> Cheers
>
> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>>
>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
>> notebook that reads a data frame from Cassandra.
>>
>> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
>> works how ever I can not figure out how to configure my notebook. I have
>> tried various hacks any idea what I am doing wrong
>>
>> : java.io.IOException: Failed to open native connection to Cassandra at 
>> {192.168.1.126}:9042
>>
>>
>>
>>
>> Thanks in advance
>>
>> Andy
>>
>>
>>
>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>>
>> $ export PYSPARK_PYTHON=python3
>> $ export PYSPARK_DRIVER_PYTHON=python3
>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>>
>>
>>
>> In [15]:
>> 1
>>
>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>>
>> 2
>>
>> df = sqlContext.read\
>>
>> 3
>>
>> .format("org.apache.spark.sql.cassandra")\
>>
>> 4
>>
>> .options(table=“time_series", keyspace="notification")\
>>
>> 5
>>
>> .load()
>>
>> 6
>>
>> ​
>>
>> 7
>>
>> df.printSchema()
>>
>> 8
>>
>> df.show()
>>
>> ---Py4JJavaError
>>  Traceback (most recent call 
>> last) in ()  1 
>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 
>> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>> .options(table="kv", keyspace="notification").load()  3   4 
>> df.printSchema()  5 
>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>>  in load(self, path, format, schema, **options)137 
>> return self._df(self._jreader.load(path))138 else:--> 139
>>  return self._df(self._jreader.load())140 141 
>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>>  in __call__(self, *args)811 answer = 
>> self.gateway_client.send_command(command)812 return_value = 
>> get_return_value(--> 813 answer, self.gateway_client, 
>> self.target_id, self.name)814 815 for temp_arg in 
>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 45 
>> return f(*a, **kw) 46 except 
>> py4j.protocol.Py4JJavaError as e: 47 s = 
>> e.java_exception.toString()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py
>>  in get_return_value(answer, gateway_client, target_id, name)306 
>> raise Py4JJavaError(307 "An error occurred 
>> while calling {0}{1}{2}.\n".--> 308 format(target_id, 
>> ".", name), value)
>> 309 else:310 raise Py4JError(
>> Py4JJavaError: An error occurred while calling o280.load.
>> : java.io.IOException: Failed to open native connection to Cassandr

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Andy Davidson
Hi Ted

I believe by default cassandra listens on 9042

From:  Ted Yu 
Date:  Tuesday, March 8, 2016 at 6:11 PM
To:  Andrew Davidson 
Cc:  "user @spark" 
Subject:  Re: pyspark spark-cassandra-connector java.io.IOException: Failed
to open native connection to Cassandra at {192.168.1.126}:9042

> Have you contacted spark-cassandra-connector related mailing list ?
> 
> I wonder where the port 9042 came from.
> 
> Cheers
> 
> On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson 
> wrote:
>> 
>> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python notebook
>> that reads a data frame from Cassandra.
>> 
>> I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
>> how ever I can not figure out how to configure my notebook. I have tried
>> various hacks any idea what I am doing wrong
>> 
>> : java.io.IOException: Failed to open native connection to Cassandra at
>> {192.168.1.126}:9042
>> 
>> 
>> 
>> Thanks in advance
>> 
>> Andy
>> 
>> 
>> 
>> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
>> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>> 
>> $ export PYSPARK_PYTHON=python3
>> $ export PYSPARK_DRIVER_PYTHON=python3
>> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>> 
>> 
>> 
>> In [15]:
>> 1
>> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043
>> <http://127.0.0.1:9043> ")
>> 2
>> df = sqlContext.read\
>> 3
>> .format("org.apache.spark.sql.cassandra")\
>> 4
>> .options(table=“time_series", keyspace="notification")\
>> 5
>> .load()
>> 6
>> ​
>> 7
>> df.printSchema()
>> 8
>> df.show()
>> ---Py
>> 4JJavaError Traceback (most recent call last)
>>  in ()  1
>> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 2
>> df = sqlContext.read.format("org.apache.spark.sql.cassandra")
>> .options(table="kv", keyspace="notification").load()  3   4
>> df.printSchema()  5
>> df.show()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/pyth
>> on/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
>> 137 return self._df(self._jreader.load(path))138
>> else:--> 139 return self._df(self._jreader.load())140 141
>> @since(1.4)/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/py
>> thon/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
>> 811 answer = self.gateway_client.send_command(command)812
>> return_value = get_return_value(
>> --> 813 answer, self.gateway_client, self.target_id, self.name
>> <http://self.name> )
>> 814 815 for temp_arg in
>> temp_args:/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/pyt
>> hon/pyspark/sql/utils.py in deco(*a, **kw) 43 def deco(*a, **kw):
>> 44 try:---> 45 return f(*a, **kw) 46 except
>> py4j.protocol.Py4JJavaError as e: 47 s =
>> e.java_exception.toString()/Users/andrewdavidson/workSpace/spark/spark-1.6.0-
>> bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py in
>> get_return_value(answer, gateway_client, target_id, name)306
>> raise Py4JJavaError(
>> 307 "An error occurred while calling
>> {0}{1}{2}.\n".--> 308 format(target_id, ".", name),
>> value)
>> 309 else:310 raise Py4JError(
>> 
>> Py4JJavaError: An error occurred while calling o280.load.
>> : java.io.IOException: Failed to open native connection to Cassandra at
>> {192.168.1.126}:9042
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$conne
>> ctor$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassandr
>> aConnector.scala:148)
>>  at 
>> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassandr
>> aConnector.scala:148)
>>  at 
>> com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCou
>> ntedCache.scala:31)
>>  at 
>> com.datastax.spark.connec

Re: pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Ted Yu
Have you contacted spark-cassandra-connector related mailing list ?

I wonder where the port 9042 came from.

Cheers

On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson  wrote:

>
> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
> notebook that reads a data frame from Cassandra.
>
> *I connect to cassadra using an ssh tunnel running on port 9043.* CQLSH
> works how ever I can not figure out how to configure my notebook. I have
> tried various hacks any idea what I am doing wrong
>
> : java.io.IOException: Failed to open native connection to Cassandra at 
> {192.168.1.126}:9042
>
>
>
>
> Thanks in advance
>
> Andy
>
>
>
> $ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
> --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"
>
> $ export PYSPARK_PYTHON=python3
> $ export PYSPARK_DRIVER_PYTHON=python3
> $ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*
>
>
>
> In [15]:
> 1
>
> sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
>
> 2
>
> df = sqlContext.read\
>
> 3
>
> .format("org.apache.spark.sql.cassandra")\
>
> 4
>
> .options(table=“time_series", keyspace="notification")\
>
> 5
>
> .load()
>
> 6
>
> ​
>
> 7
>
> df.printSchema()
>
> 8
>
> df.show()
>
> ---Py4JJavaError
>  Traceback (most recent call 
> last) in ()  1 
> sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")> 2 
> df = sqlContext.read.format("org.apache.spark.sql.cassandra")
> .options(table="kv", keyspace="notification").load()  3   4 
> df.printSchema()  5 df.show()
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py
>  in load(self, path, format, schema, **options)137 return 
> self._df(self._jreader.load(path))138 else:--> 139 
> return self._df(self._jreader.load())140 141 @since(1.4)
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)811 answer = 
> self.gateway_client.send_command(command)812 return_value = 
> get_return_value(--> 813 answer, self.gateway_client, 
> self.target_id, self.name)814 815 for temp_arg in temp_args:
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py
>  in deco(*a, **kw) 43 def deco(*a, **kw): 44 try:---> 45  
>return f(*a, **kw) 46 except 
> py4j.protocol.Py4JJavaError as e: 47 s = 
> e.java_exception.toString()
> /Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)306  
>raise Py4JJavaError(307 "An error occurred 
> while calling {0}{1}{2}.\n".--> 308 format(target_id, 
> ".", name), value)309 else:310 raise 
> Py4JError(
> Py4JJavaError: An error occurred while calling o280.load.
> : java.io.IOException: Failed to open native connection to Cassandra at 
> {192.168.1.126}:9042
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
>   at 
> com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
>   at 
> com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
>   at 
> com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
>   at 
> com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:184)
>   at 
> org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:267)
>   at 
> org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.j

pyspark spark-cassandra-connector java.io.IOException: Failed to open native connection to Cassandra at {192.168.1.126}:9042

2016-03-08 Thread Andy Davidson

I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python notebook
that reads a data frame from Cassandra.

I connect to cassadra using an ssh tunnel running on port 9043. CQLSH works
how ever I can not figure out how to configure my notebook. I have tried
various hacks any idea what I am doing wrong

: java.io.IOException: Failed to open native connection to Cassandra at
{192.168.1.126}:9042



Thanks in advance

Andy



$ extraPkgs="--packages com.databricks:spark-csv_2.11:1.3.0 \
--packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.11"

$ export PYSPARK_PYTHON=python3
$ export PYSPARK_DRIVER_PYTHON=python3
$ IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $extraPkgs $*



In [15]:
1
sqlContext.setConf("spark.cassandra.connection.host”,”127.0.0.1:9043")
2
df = sqlContext.read\
3
.format("org.apache.spark.sql.cassandra")\
4
.options(table=“time_series", keyspace="notification")\
5
.load()
6
​
7
df.printSchema()
8
df.show()
---
Py4JJavaError Traceback (most recent call last)
 in ()
  1 
sqlContext.setConf("spark.cassandra.connection.host","localhost:9043")
> 2 df = sqlContext.read.format("org.apache.spark.sql.cassandra")
.options(table="kv", keyspace="notification").load()
  3 
  4 df.printSchema()
  5 df.show()

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspa
rk/sql/readwriter.py in load(self, path, format, schema, **options)
137 return self._df(self._jreader.load(path))
138 else:
--> 139 return self._df(self._jreader.load())
140 
141 @since(1.4)

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/p
y4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
811 answer = self.gateway_client.send_command(command)
812 return_value = get_return_value(
--> 813 answer, self.gateway_client, self.target_id, self.name)
814 
815 for temp_arg in temp_args:

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/pyspa
rk/sql/utils.py in deco(*a, **kw)
 43 def deco(*a, **kw):
 44 try:
---> 45 return f(*a, **kw)
 46 except py4j.protocol.Py4JJavaError as e:
 47 s = e.java_exception.toString()

/Users/andrewdavidson/workSpace/spark/spark-1.6.0-bin-hadoop2.6/python/lib/p
y4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client,
target_id, name)
306 raise Py4JJavaError(
307 "An error occurred while calling {0}{1}{2}.\n".
--> 308 format(target_id, ".", name), value)
309 else:
310 raise Py4JError(

Py4JJavaError: An error occurred while calling o280.load.
: java.io.IOException: Failed to open native connection to Cassandra at
{192.168.1.126}:9042
at 
com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$conn
ector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassand
raConnector.scala:148)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(Cassand
raConnector.scala:148)
at 
com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCo
untedCache.scala:31)
at 
com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.sca
la:56)
at 
com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraCon
nector.scala:81)
at 
com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraC
onnector.scala:109)
at 
com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTok
enFactory(CassandraRDDPartitioner.scala:184)
at 
org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourc
eRelation.scala:267)
at 
org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.sc
ala:57)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(Resolve
dDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.jav