Re: Cassandra consume large memory

2011-07-17 Thread JKnight JKnight
Thank for your response.
Do you talk about virtual memory (column VIRT show in top command)?
But I mention about column RES. In my case, VIRT is 61.8G, RES is 3.2G and
SHR is 1.2G.

JMX show Memory Usage:
Used : 600MB, Commit 2.1G, Max: 2.1G

On Mon, Jul 18, 2011 at 11:59 AM, Jonathan Ellis  wrote:

> http://wiki.apache.org/cassandra/FAQ#mmap
>
> On Sun, Jul 17, 2011 at 11:54 PM, JKnight JKnight 
> wrote:
> > Dear all,
> > I use JMX to monitor Cassandra server.
> > Heap Memory Usage show:
> > Used : 600MB, Commit 2.1G, Max: 2.1G
> > But htop show Cassandra process consume 3.1G.
> > Could you tell me why Cassandra occupy memory very large than in used?
> > Thank a lot for support.
> > --
> > Best regards,
> > JKnight
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Best regards,
JKnight


Re: Range query ordering with CQL JDBC

2011-07-17 Thread Matthieu Nahoum
Aaron, thanks for the reply.

I think what I encounter is exactly this problem!

I'll try the suggestions, or switch away from the random partitioner.

Cordially,

Matthieu Nahoum

On Sun, Jul 17, 2011 at 5:50 PM, aaron morton wrote:

> You are probably seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
>
> Row keys are not ordered by their key, they are ordered by the token
> created by the partitioner.
>
> If you still think there is a problem provide an example of the data your
> are seeing and what you expected to see.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16 Jul 2011, at 06:09, Matthieu Nahoum wrote:
>
> Hi Eric,
>
> I am using the default partitioner, which is the RandomPartitioner I guess.
> The key type is String. Are Strings ordered by lexicographic rules?
>
> Thanks
>
> On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans  wrote:
>
>> On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
>> > I am trying to range-query a column family on which the keys are
>> > epochs (similar to the output of System.currentTimeMillis() in Java).
>> > In CQL (Cassandra 0.8.1 with JDBC driver):
>> >
>> > SELECT * FROM columnFamily WHERE KEY > '130920500';
>> >
>> > I can't get to have a result that make sense, it always returns wrong
>> > timestamps. So I must make an error somewhere in the way I input the
>> > querying value. I tried in clear (like above), in hexadecimal, etc.
>> >
>> > What is the correct way of doing this? Is it possible that my key is
>> > too long?
>>
>> What partitioner are you using?  What is the key type?
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>
>
>


Re: Default behavior of generate index_name for columns...

2011-07-17 Thread Boris Yen
Will this have any side effect when doing a get_indexed_slices or when a
user wants to drop an index by any means?

Boris

On Mon, Jul 18, 2011 at 1:13 PM, Jonathan Ellis  wrote:

> 0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
> can't fix the ones 0.8.0 allowed, retroactively.
>
> On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen  wrote:
> > I have tested another case, not sure if this is a bug.
> > I created a few column families on 0.8.0 each has user_name column, in
> > addition, I also enabled secondary index on this column.  Then, I
> upgraded
> > to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
> > "user_name_idx" appears for different columns families. It seems the
> > validation rule for index_name on 0.8.1 has been skipped completely.
> >
> > Is this a bug? or is it intentional?
> > Regards
> > Boris
> > On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen  wrote:
> >>
> >> Done. It is CASSANDRA-2903.
> >> On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis 
> wrote:
> >>>
> >>> Please.
> >>>
> >>> On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen  wrote:
> >>> > Hi Jonathan,
> >>> > Do I need to open a ticket for this?
> >>> > Regards
> >>> > Boris
> >>> >
> >>> > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis 
> >>> > wrote:
> >>> >>
> >>> >> Sounds reasonable to me.
> >>> >>
> >>> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen 
> wrote:
> >>> >> > Hi,
> >>> >> > I have a few column families, each has a column called user_name.
> I
> >>> >> > tried to
> >>> >> > use secondary index on user_name column for each of the column
> >>> >> > family.
> >>> >> > However, when creating these column families, cassandra keeps
> >>> >> > reporting
> >>> >> > "Duplicate index name..." exception. I finally figured out that it
> >>> >> > seems
> >>> >> > the
> >>> >> > default index name is "column name"+"_idx", this make my column
> >>> >> > family
> >>> >> > violate the "uniqueness of index name" rule.
> >>> >> > I was wondering if the default index_name generating rule could be
> >>> >> > like
> >>> >> > "column name"+"cf name", so the index name would not collide with
> >>> >> > each
> >>> >> > other
> >>> >> > that easily, if the user do not assign "index_name" when creating
> a
> >>> >> > column
> >>> >> > family.
> >>> >> > Regards
> >>> >> > Boris
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Jonathan Ellis
> >>> >> Project Chair, Apache Cassandra
> >>> >> co-founder of DataStax, the source for professional Cassandra
> support
> >>> >> http://www.datastax.com
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder of DataStax, the source for professional Cassandra support
> >>> http://www.datastax.com
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Default behavior of generate index_name for columns...

2011-07-17 Thread Jonathan Ellis
0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
can't fix the ones 0.8.0 allowed, retroactively.

On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen  wrote:
> I have tested another case, not sure if this is a bug.
> I created a few column families on 0.8.0 each has user_name column, in
> addition, I also enabled secondary index on this column.  Then, I upgraded
> to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
> "user_name_idx" appears for different columns families. It seems the
> validation rule for index_name on 0.8.1 has been skipped completely.
>
> Is this a bug? or is it intentional?
> Regards
> Boris
> On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen  wrote:
>>
>> Done. It is CASSANDRA-2903.
>> On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis  wrote:
>>>
>>> Please.
>>>
>>> On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen  wrote:
>>> > Hi Jonathan,
>>> > Do I need to open a ticket for this?
>>> > Regards
>>> > Boris
>>> >
>>> > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis 
>>> > wrote:
>>> >>
>>> >> Sounds reasonable to me.
>>> >>
>>> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen  wrote:
>>> >> > Hi,
>>> >> > I have a few column families, each has a column called user_name. I
>>> >> > tried to
>>> >> > use secondary index on user_name column for each of the column
>>> >> > family.
>>> >> > However, when creating these column families, cassandra keeps
>>> >> > reporting
>>> >> > "Duplicate index name..." exception. I finally figured out that it
>>> >> > seems
>>> >> > the
>>> >> > default index name is "column name"+"_idx", this make my column
>>> >> > family
>>> >> > violate the "uniqueness of index name" rule.
>>> >> > I was wondering if the default index_name generating rule could be
>>> >> > like
>>> >> > "column name"+"cf name", so the index name would not collide with
>>> >> > each
>>> >> > other
>>> >> > that easily, if the user do not assign "index_name" when creating a
>>> >> > column
>>> >> > family.
>>> >> > Regards
>>> >> > Boris
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra consume large memory

2011-07-17 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#mmap

On Sun, Jul 17, 2011 at 11:54 PM, JKnight JKnight  wrote:
> Dear all,
> I use JMX to monitor Cassandra server.
> Heap Memory Usage show:
> Used : 600MB, Commit 2.1G, Max: 2.1G
> But htop show Cassandra process consume 3.1G.
> Could you tell me why Cassandra occupy memory very large than in used?
> Thank a lot for support.
> --
> Best regards,
> JKnight
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Cassandra consume large memory

2011-07-17 Thread JKnight JKnight
Dear all,

I use JMX to monitor Cassandra server.

Heap Memory Usage show:
Used : 600MB, Commit 2.1G, Max: 2.1G

But htop show Cassandra process consume 3.1G.

Could you tell me why Cassandra occupy memory very large than in used?

Thank a lot for support.

-- 
Best regards,
JKnight


Re: Default behavior of generate index_name for columns...

2011-07-17 Thread Boris Yen
I have tested another case, not sure if this is a bug.

I created a few column families on 0.8.0 each has user_name column, in
addition, I also enabled secondary index on this column.  Then, I upgraded
to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
"user_name_idx" appears for different columns families. It seems the
validation rule for index_name on 0.8.1 has been skipped completely.

Is this a bug? or is it intentional?

Regards
Boris

On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen  wrote:

> Done. It is 
> CASSANDRA-2903
> .
>
> On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis  wrote:
>
>> Please.
>>
>> On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen  wrote:
>> > Hi Jonathan,
>> > Do I need to open a ticket for this?
>> > Regards
>> > Boris
>> >
>> > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis 
>> wrote:
>> >>
>> >> Sounds reasonable to me.
>> >>
>> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen  wrote:
>> >> > Hi,
>> >> > I have a few column families, each has a column called user_name. I
>> >> > tried to
>> >> > use secondary index on user_name column for each of the column
>> family.
>> >> > However, when creating these column families, cassandra keeps
>> reporting
>> >> > "Duplicate index name..." exception. I finally figured out that it
>> seems
>> >> > the
>> >> > default index name is "column name"+"_idx", this make my column
>> family
>> >> > violate the "uniqueness of index name" rule.
>> >> > I was wondering if the default index_name generating rule could be
>> like
>> >> > "column name"+"cf name", so the index name would not collide with
>> each
>> >> > other
>> >> > that easily, if the user do not assign "index_name" when creating a
>> >> > column
>> >> > family.
>> >> > Regards
>> >> > Boris
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>


Re: Thrift Java Client - Get a column family from a Keyspace

2011-07-17 Thread aaron morton
> Currently the only way for that would be iterating through the list of column 
> families returned by the getCf_defs() method.

Yes. 

BTW most people access cassandra via a higher level client, for the Java peeps 
tend to use either  Hector or Pelops. Aside from not having to code against 
thrift they also provide connection management and retry features that are dead 
handy.  

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 14 Jul 2011, at 23:59, Chandrasekhar M wrote:

> Hi
>  
> I have been playing around with Cassandra and its Java Thrift Client.
>  
> From my understanding, one could get/retrieve a Keyspace, KsDef object, using 
> the describe_keyspace(String name) method on the Cassandra.Client object.
>  
> Subsequently, one could get a list of all the ColumnFamily definitions in a 
> keyspace, using the getCf_defs() method on the KsDef Object.
>  
> Is there a way to get a single ColumnFamily if I know the name of the 
> columnfamily (just a convenience function) ?
>  
> Currently the only way for that would be iterating through the list of column 
> families returned by the getCf_defs() method.
>  
> Thanks in Advance
> Chandra
> 
> 
> Register for Impetus Webinar on ‘Device Side Performance Optimization of 
> Mobile Apps’, July 08 (10:00 am Pacific Time). Impetus is presenting a 
> Cassandra case study on July 11 as a sponsor for Cassandra SF 2011 in San 
> Francisco. 
> 
> Click http://www.impetus.com to know more. Follow us on 
> www.twitter.com/impetuscalling 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.



Re: Anyone using Facebook's flashcache?

2011-07-17 Thread AJ

On 7/17/2011 12:29 PM, Héctor Izquierdo Seliva wrote:

I've been using flashcache for a while in production. It improves read
performance and latency was halved by a good chunk, though I don't
remember the exact numbers.

Problems: compactions will trash your cache, and so will memtable
flushes. Right now there's no way to avoid that.

If you want, I could dig the numbers for a before/after comparison.



Hector, some before/after numbers would be great if you can find them.  
Thanks!


What happens when your cache gets trashed?  Do compactions and flushes 
go slower?


aj







Re: Data overhead discussion in Cassandra

2011-07-17 Thread aaron morton
What RF are you using ? 

On disk each column has 15 bytes of overhead, plus the column name and the 
column value. So for an 8 byte long and a 8 byte double there will be 16 bytes 
of data and 15 bytes of data. 

The index file also contains the the row key, the MD5 token (for RP) and the 
row offset for the data file. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 Jul 2011, at 07:09, Sameer Farooqui wrote:

> We just set up a demo cluster with Cassandra 0.8.1 with 12 nodes and loaded 
> 1.5 TB of data into it. However, the actual space on disk being used by data 
> files in Cassandra is 3 TB. We're using a standard column family with a 
> million rows (key=string) and 35,040 columns per key. The column name is a 
> long and the column value is a double.
> 
> I was just hoping to understand more about why the data overhead is so large. 
> We're not using expiring columns. Even considering indexing and bloom 
> filters, it shouldn't have bloated up the data size to 2x the original 
> amount. Or should it have?
> 
> How can we better anticipate the actual data usage on disk in the future?
> 
> - Sameer



Re: Range query ordering with CQL JDBC

2011-07-17 Thread aaron morton
You are probably seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

Row keys are not ordered by their key, they are ordered by the token created by 
the partitioner.

If you still think there is a problem provide an example of the data your are 
seeing and what you expected to see. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jul 2011, at 06:09, Matthieu Nahoum wrote:

> Hi Eric,
> 
> I am using the default partitioner, which is the RandomPartitioner I guess.
> The key type is String. Are Strings ordered by lexicographic rules?
> 
> Thanks 
> 
> On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans  wrote:
> On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
> > I am trying to range-query a column family on which the keys are
> > epochs (similar to the output of System.currentTimeMillis() in Java).
> > In CQL (Cassandra 0.8.1 with JDBC driver):
> >
> > SELECT * FROM columnFamily WHERE KEY > '130920500';
> >
> > I can't get to have a result that make sense, it always returns wrong
> > timestamps. So I must make an error somewhere in the way I input the
> > querying value. I tried in clear (like above), in hexadecimal, etc.
> >
> > What is the correct way of doing this? Is it possible that my key is
> > too long?
> 
> What partitioner are you using?  What is the key type?
> 
> --
> Eric Evans
> eev...@rackspace.com
> 
> 
> 
> 
> -- 
> ---
> Engineer at NAVTEQ
> Berkeley Systems Engineer '10
> ENAC Engineer '09
> 
> 151 N. Michigan Ave.
> Appt. 3716
> Chicago, IL, 60601
> USA
> Cell: +1 (510) 423-1835
> 
> http://www.linkedin.com/in/matthieunahoum
> 



Re: Cassandra OOM on repair.

2011-07-17 Thread Jonathan Ellis
Can't think of any.

On Sun, Jul 17, 2011 at 1:27 PM, Andrey Stepachev  wrote:
> Looks like problem in code:
>     public IndexSummary(long expectedKeys)
>     {
>         long expectedEntries = expectedKeys /
> DatabaseDescriptor.getIndexInterval();
>         if (expectedEntries > Integer.MAX_VALUE)
>             // TODO: that's a _lot_ of keys, or a very low interval
>             throw new RuntimeException("Cannot use index_interval of " +
> DatabaseDescriptor.getIndexInterval() + " with " + expectedKeys + "
> (expected) keys.");
>         indexPositions = new ArrayList((int)expectedEntries);
>     }
> I have too many keys, and too small index interval.
> To fix this, I can:
> 1) reduce number of keys - rewrite app and sacrifice balance
> 2) increase index_interval - hurt another column families
> A question:
> Are there any drawbacks for using different indexInterval for column
> families
> in keyspace? (suppose I'll write a patch)
> 2011/7/15 Andrey Stepachev 
>>
>> Looks like key indexes eat all memory:
>> http://paste.kde.org/97213/
>>
>> 2011/7/15 Andrey Stepachev 
>>>
>>> UPDATE:
>>> I found, that
>>> a) with min10G cassandra survive.
>>> b) I have ~1000 sstables
>>> c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow
>>> So, I have a question:
>>> a) if row is bigger then 64mb before compaction, why it compacted in
>>> memory
>>> b) if it smaller, what eats so much memory?
>>> 2011/7/15 Andrey Stepachev 

 Hi all.
 Cassandra constantly OOM on repair or compaction. Increasing memory
 doesn't help (6G)
 I can give more, but I think that this is not a regular situation.
 Cluster has 4 nodes. RF=3.
 Cassandra version 0.8.1
 Ring looks like this:
  Address         DC          Rack        Status State   Load
  Owns    Token

        127605887595351923798765477786913079296
 xxx.xxx.xxx.66  datacenter1 rack1       Up     Normal  176.96 GB
 25.00%  0
 xxx.xxx.xxx.69  datacenter1 rack1       Up     Normal  178.19 GB
 25.00%  42535295865117307932921825928971026432
 xxx.xxx.xxx.67  datacenter1 rack1       Up     Normal  178.26 GB
 25.00%  85070591730234615865843651857942052864
 xxx.xxx.xxx.68  datacenter1 rack1       Up     Normal  175.2 GB
  25.00%  127605887595351923798765477786913079296
 About schema:
 I have big rows (>100k, up to several millions). But as I know, it is
 normal for cassandra.
 All things work relatively good, until I start long running
 pre-production tests. I load
 data and after a while (~4hours) cluster begin timeout and them some
 nodes die with OOM.
 My app retries to send, so after short period all nodes becomes down.
 Very nasty.
 But now, I can OOM nodes by simple call nodetool repair.
 In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
 upper limit.
 cfstats shows: http://paste.kde.org/96817/
 config is: http://paste.kde.org/96823/
 A question is: does anybody knows, what this means. Why cassandra tries
 to load
 something big into memory at once?
 A.
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Anyone using Facebook's flashcache?

2011-07-17 Thread Héctor Izquierdo Seliva
I've been using flashcache for a while in production. It improves read
performance and latency was halved by a good chunk, though I don't
remember the exact numbers. 

Problems: compactions will trash your cache, and so will memtable
flushes. Right now there's no way to avoid that.

If you want, I could dig the numbers for a before/after comparison. 




Re: Cassandra OOM on repair.

2011-07-17 Thread Andrey Stepachev
Looks like problem in code:

public IndexSummary(long expectedKeys)
{
long expectedEntries = expectedKeys /
DatabaseDescriptor.getIndexInterval();
if (expectedEntries > Integer.MAX_VALUE)
// TODO: that's a _lot_ of keys, or a very low interval
throw new RuntimeException("Cannot use index_interval of " +
DatabaseDescriptor.getIndexInterval() + " with " + expectedKeys + "
(expected) keys.");
indexPositions = new ArrayList((int)expectedEntries);
}

I have too many keys, and too small index interval.

To fix this, I can:
1) reduce number of keys - rewrite app and sacrifice balance
2) increase index_interval - hurt another column families

A question:
Are there any drawbacks for using different indexInterval for column
families
in keyspace? (suppose I'll write a patch)

2011/7/15 Andrey Stepachev 

> Looks like key indexes eat all memory:
>
> http://paste.kde.org/97213/
>
>
> 2011/7/15 Andrey Stepachev 
>
>> UPDATE:
>>
>> I found, that
>> a) with min10G cassandra survive.
>> b) I have ~1000 sstables
>> c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow
>>
>> So, I have a question:
>> a) if row is bigger then 64mb before compaction, why it compacted in
>> memory
>> b) if it smaller, what eats so much memory?
>>
>> 2011/7/15 Andrey Stepachev 
>>
>>> Hi all.
>>>
>>> Cassandra constantly OOM on repair or compaction. Increasing memory
>>> doesn't help (6G)
>>> I can give more, but I think that this is not a regular situation.
>>> Cluster has 4 nodes. RF=3.
>>> Cassandra version 0.8.1
>>>
>>> Ring looks like this:
>>>  Address DC  RackStatus State   Load
>>>  OwnsToken
>>>
>>>  127605887595351923798765477786913079296
>>> xxx.xxx.xxx.66  datacenter1 rack1   Up Normal  176.96 GB
>>> 25.00%  0
>>> xxx.xxx.xxx.69  datacenter1 rack1   Up Normal  178.19 GB
>>> 25.00%  42535295865117307932921825928971026432
>>> xxx.xxx.xxx.67  datacenter1 rack1   Up Normal  178.26 GB
>>> 25.00%  85070591730234615865843651857942052864
>>> xxx.xxx.xxx.68  datacenter1 rack1   Up Normal  175.2 GB
>>>  25.00%  127605887595351923798765477786913079296
>>>
>>> About schema:
>>> I have big rows (>100k, up to several millions). But as I know, it is
>>> normal for cassandra.
>>> All things work relatively good, until I start long running
>>> pre-production tests. I load
>>> data and after a while (~4hours) cluster begin timeout and them some
>>> nodes die with OOM.
>>> My app retries to send, so after short period all nodes becomes down.
>>> Very nasty.
>>>
>>> But now, I can OOM nodes by simple call nodetool repair.
>>> In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
>>> upper limit.
>>> cfstats shows: http://paste.kde.org/96817/
>>> config is: http://paste.kde.org/96823/
>>> A question is: does anybody knows, what this means. Why cassandra tries
>>> to load
>>> something big into memory at once?
>>>
>>> A.
>>>
>>
>>
>