Hello Pavel, What is the size of the Cluster (# of nodes)? And you need to iterate over the full 1TB every time you do the update? Or just parts of it?
IMO information is short to make any kind of assessment of the problem you are having. I can suggest to try a 2.0.x (or 2.1.1) release to see if you get the same problem. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>* Tel: 1649 www.pythian.com On Wed, Feb 11, 2015 at 11:22 AM, Pavel Velikhov <pavel.velik...@gmail.com> wrote: > Hi, > > I’m using Cassandra to store NLP data, the dataset is not that huge > (about 1TB), but I need to iterate over it quite frequently, updating the > full dataset (each record, but not necessarily each column). > > I’ve run into two problems (I’m using the latest Cassandra): > > 1. I was trying to copy from one Cassandra cluster to another via a > python driver, however the driver confused the two instances > 2. While trying to update the full dataset with a simple transformation > (again via python driver), single node and clustered Cassandra run out of > memory no matter what settings I try, even I put a lot of sleeps into the > mix. However simpler transformations (updating just one column, specially > when there is a lot of processing overhead) work just fine. > > I’m really concerned about #2, since we’re moving all heavy processing to > a Spark cluster and will expand it, and I would expect much heavier traffic > to/from Cassandra. Any hints, war stories, etc. very appreciated! > > Thank you, > Pavel Velikhov -- --