> After several cycles, pycassa starts getting connection failures. Do you have the error stack ? Are the TimedOutExceptions or socket time outs or something else.
> Would things be any different if we used multiple nodes and scaled the data > and worker count to match? I mean, is there something inherent to > cassandra's operating model that makes it want to always have multiple nodes? It's not expected. If I had to guess I would say the 100MB rows are causing some GC problems (check the cass log) and you are getting timeouts from that. As a work around what happens when you reduce the number of workers ? Consider smoothing out the row size by chunking them into several rows, see the Astynax client recipes for a design pattern. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/05/2013, at 1:09 PM, John R. Frank <j...@mit.edu> wrote: > C* users, > > We have a process that loads a large batch of rows from Cassandra into many > separate compute workers. The rows are one-column wide and range in size for > a couple KB to ~100 MB. After manipulating the data for a while, each > compute worker writes the data back with *new* row keys computed by the > workers (UUIDs). > > After the full batch is written back to new rows, a cleanup worker deletes > the old rows. > > After several cycles, pycassa starts getting connection failures. > > Should we use a pycassa listener to catch these failures and just recreate > the ConnectionPool and keep going as if the connection had not dropped? Or is > there a better approach? > > These failures happen on just a simple single-node setup with a total data > set less than half the size of Java heap space, e.g. 2GB data (times two for > the two copies during cycling) versus 8GB heap. We tried reducing > memtable_flush_queue_size to 2 so that it would flush the deletes faster, and > also tried multithreaded_compaction=true, but still pycassa gets connection > failures. > > Is this expected before for shedding load? Or is this unexpected? > > Would things be any different if we used multiple nodes and scaled the data > and worker count to match? I mean, is there something inherent to > cassandra's operating model that makes it want to always have multiple nodes? > > Thanks for pointers, > John