Hello Pavel,

What is the size of the Cluster (# of nodes)? And you need to iterate over
the full 1TB every time you do the update? Or just parts of it?

IMO information is short to make any kind of assessment of the problem you
are having.

I can suggest to try a 2.0.x (or 2.1.1) release to see if you get the same
problem.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Tel: 1649
www.pythian.com

On Wed, Feb 11, 2015 at 11:22 AM, Pavel Velikhov <pavel.velik...@gmail.com>
wrote:

> Hi,
>
>   I’m using Cassandra to store NLP data, the dataset is not that huge
> (about 1TB), but I need to iterate over it quite frequently, updating the
> full dataset (each record, but not necessarily each column).
>
>   I’ve run into two problems (I’m using the latest Cassandra):
>
>   1. I was trying to copy from one Cassandra cluster to another via a
> python driver, however the driver confused the two instances
>   2. While trying to update the full dataset with a simple transformation
> (again via python driver), single node and clustered Cassandra run out of
> memory no matter what settings I try, even I put a lot of sleeps into the
> mix. However simpler transformations (updating just one column, specially
> when there is a lot of processing overhead) work just fine.
>
> I’m really concerned about #2, since we’re moving all heavy processing to
> a Spark cluster and will expand it, and I would expect much heavier traffic
> to/from Cassandra. Any hints, war stories, etc. very appreciated!
>
> Thank you,
> Pavel Velikhov

-- 


--



Reply via email to