Re: Issue writing to Cassandra from Spark

Akhil Das Tue, 13 Jan 2015 13:22:09 -0800

Awesome.

Thanks
Best Regards


On Tue, Jan 13, 2015 at 10:35 PM, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:

> I realized that I was running the cluster with 
> spark.cassandra.output.concurrent.writes=2,
> changing it to 1 did the trick. We realized that the issue was because
> spark was producing data at much higher frequency than our small Cassandra
> cluster could write and so changing the property value to 1 fixed the issue
> for us.
>
> Thanks
> Ankur
>
> On Mon, Jan 12, 2015 at 9:04 AM, Ankur Srivastava <
> ankur.srivast...@gmail.com> wrote:
>
>> Hi Akhil,
>>
>> Thank you for the pointers. Below is how we are saving data to Cassandra.
>>
>> javaFunctions(rddToSave).writerBuilder(datapipelineKeyspace,
>>
>>   datapipelineOutputTable, mapToRow(Sample.class))
>>
>> The data we are saving at this stage is ~200 million rows.
>>
>> How do we control application threads in spark so that it does not exceed
>> "rpc_max_threads"? We are running with default value of this property in
>> cassandra.yaml. I have already set these
>> two properties for Spark-Cassandra connector:
>>
>> spark.cassandra.output.batch.size.rows=1
>> spark.cassandra.output.concurrent.writes=1
>>
>> Thanks
>> - Ankur
>>
>>
>> On Sun, Jan 11, 2015 at 10:16 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> I see, can you paste the piece of code? Its probably because you are
>>> exceeding the number of connection that are specified in the
>>> property rpc_max_threads. Make sure you close all the connections properly.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Jan 12, 2015 at 7:45 AM, Ankur Srivastava <
>>> ankur.srivast...@gmail.com> wrote:
>>>
>>>> Hi Akhil, thank you for your response.
>>>>
>>>> Actually we are first reading from cassandra and then writing back
>>>> after doing some processing. All the reader stages succeed with no error
>>>> and many writer stages also succeed but many fail as well.
>>>>
>>>> Thanks
>>>> Ankur
>>>>
>>>> On Sat, Jan 10, 2015 at 10:15 PM, Akhil Das <ak...@sigmoidanalytics.com
>>>> > wrote:
>>>>
>>>>> Just make sure you are not connecting to the Old RPC Port (9160), new
>>>>> binary port is running on 9042.
>>>>>
>>>>> What is your rpc_address listed in cassandra.yaml? Also make sure you
>>>>> have start_native_transport: *true *in the yaml file.
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Sat, Jan 10, 2015 at 8:44 AM, Ankur Srivastava <
>>>>> ankur.srivast...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are currently using spark to join data in Cassandra and then write
>>>>>> the results back into Cassandra. While reads happen with out any error
>>>>>> during the writes we see many exceptions like below. Our environment
>>>>>> details are:
>>>>>>
>>>>>> - Spark v 1.1.0
>>>>>> - spark-cassandra-connector-java_2.10 v 1.1.0
>>>>>>
>>>>>> We are using below settings for the writer
>>>>>>
>>>>>> spark.cassandra.output.batch.size.rows=1
>>>>>>
>>>>>> spark.cassandra.output.concurrent.writes=1
>>>>>>
>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>>>>>> host(s) tried for query failed (tried: [] - use getErrors() for details)
>>>>>>
>>>>>> at
>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>>
>>>>>> at
>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Ankur
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Issue writing to Cassandra from Spark

Reply via email to