Re: pycassa failures in large batch cycling

aaron morton Mon, 13 May 2013 21:48:44 -0700

> After several cycles, pycassa starts getting connection failures.
Do you have the error stack ?
Are the TimedOutExceptions or socket time outs or something else.


> Would things be any different if we used multiple nodes and scaled the data 
> and worker count to match?  I mean, is there something inherent to 
> cassandra's operating model that makes it want to always have multiple nodes?
It's not expected. If I had to guess I would say the 100MB rows are causing 
some GC problems (check the cass log) and you are getting timeouts from that. 

As a work around what happens when you reduce the number of workers ? 

Consider smoothing out the row size by chunking them into several rows, see the 
Astynax client recipes for a design pattern. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/05/2013, at 1:09 PM, John R. Frank <j...@mit.edu> wrote:

> C* users,
> 
> We have a process that loads a large batch of rows from Cassandra into many 
> separate compute workers.  The rows are one-column wide and range in size for 
> a couple KB to ~100 MB.  After manipulating the data for a while, each 
> compute worker writes the data back with *new* row keys computed by the 
> workers (UUIDs).
> 
> After the full batch is written back to new rows, a cleanup worker deletes 
> the old rows.
> 
> After several cycles, pycassa starts getting connection failures.
> 
> Should we use a pycassa listener to catch these failures and just recreate 
> the ConnectionPool and keep going as if the connection had not dropped? Or is 
> there a better approach?
> 
> These failures happen on just a simple single-node setup with a total data 
> set less than half the size of Java heap space, e.g. 2GB data (times two for 
> the two copies during cycling) versus 8GB heap.  We tried reducing 
> memtable_flush_queue_size to 2 so that it would flush the deletes faster, and 
> also tried multithreaded_compaction=true, but still pycassa gets connection 
> failures.
> 
> Is this expected before for shedding load?  Or is this unexpected?
> 
> Would things be any different if we used multiple nodes and scaled the data 
> and worker count to match?  I mean, is there something inherent to 
> cassandra's operating model that makes it want to always have multiple nodes?
> 
> Thanks for pointers,
> John

Re: pycassa failures in large batch cycling

Reply via email to