> On Feb 12, 2015, at 12:37 AM, Robert Coli <[email protected]> wrote: > > On Wed, Feb 11, 2015 at 2:22 AM, Pavel Velikhov <[email protected] > <mailto:[email protected]>> wrote: > 2. While trying to update the full dataset with a simple transformation > (again via python driver), single node and clustered Cassandra run out of > memory no matter what settings I try, even I put a lot of sleeps into the > mix. However simpler transformations (updating just one column, specially > when there is a lot of processing overhead) work just fine. > > What does a "simple transformation" mean here? Assuming a reasonable sized > heap, OOM sounds like you're trying to update a large number of large > partitions in a single operation. > > In general, in Cassandra, you're best off interacting with a single or small > number of partitions in any given interaction. > > =Rob >
Hi Robert! Simple transformation is changing just a single column value (for I usually do it for the whole dataset). But when I was running out of memory, I was reading in 5 columns and updating 3. Some of them could be big, but I need to check and rerun this case. (I worked around this by dumping to files and then scanning the files and updating the database, but this stinks!) I don’t quite understand the fundamentals of Cassandra - if I’m just doing one scan with a reasonable number of columns that I fetch, and I’m updating at the same time, what’s happening there? Why eat up so much memory and die?
