Re: Cassandra 3.2.1: Memory leak?

2016-03-14 Thread Mohamed Lrhazi
I am trying to recapture again... but my first attempt, it does not look
like these numbers vary all that much, from when the cluster reboots, till
when the nodes start crashing:

[root@avesterra-prod-1 ~]# nodetool -u cassandra -pw '..'  tablestats|
grep "Bloom filter space used:"
Bloom filter space used: 2041877200
Bloom filter space used: 0
Bloom filter space used: 1936840
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 352
Bloom filter space used: 0
Bloom filter space used: 48
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 48
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 72
Bloom filter space used: 720
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 32
Bloom filter space used: 56
Bloom filter space used: 0
Bloom filter space used: 32
Bloom filter space used: 32
Bloom filter space used: 56
Bloom filter space used: 56
Bloom filter space used: 32
Bloom filter space used: 32
Bloom filter space used: 32
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
Bloom filter space used: 0
[root@avesterra-prod-1 ~]#





On Mon, Mar 14, 2016 at 4:43 PM, Paulo Motta 
wrote:

> Sorry, the command is actually nodetool tablestats and you should watch
> the bloom filter size or similar metrics.
>
> 2016-03-14 17:35 GMT-03:00 Mohamed Lrhazi :
>
>> Hi Paulo,
>>
>> Which metric should I watch for this ?
>>
>> [root@avesterra-prod-1 ~]# rpm -qa| grep datastax
>> datastax-ddc-3.2.1-1.noarch
>> datastax-ddc-tools-3.2.1-1.noarch
>> [root@avesterra-prod-1 ~]# cassandra -v
>> 3.2.1
>> [root@avesterra-prod-1 ~]#
>>
>> [root@avesterra-prod-1 ~]# nodetool -u cassandra -pw ''  tpstats
>>
>>
>> Pool NameActive   Pending  Completed   Blocked
>>  All time blocked
>> MutationStage 0 0  13609 0
>>   0
>> ViewMutationStage 0 0  0 0
>>   0
>> ReadStage 0 0  0 0
>>   0
>> RequestResponseStage  0 0  8 0
>>   0
>> ReadRepairStage   0 0  0 0
>>   0
>> CounterMutationStage  0 0  0 0
>>   0
>> MiscStage 0 0  0 0
>>   0
>> CompactionExecutor1 1  17556 0
>>   0
>> MemtableReclaimMemory 0 0 38 0
>>   0
>> PendingRangeCalculator0 0  8 0
>>   0
>> GossipStage   0 0 118094 0
>>   0
>> SecondaryIndexManagement  0 0  0 0
>>   0
>> HintsDispatcher   0 0  0 0
>>   0
>> MigrationStage0 0  0 0
>>   0
>> MemtablePostFlush 0 0 55 0
>>   0
>> PerDiskMemtableFlushWriter_0 0 0 38 0
>> 0
>> ValidationExecutor0 0  0 0
>>   0
>> Sampler   0 0  0 0
>>   0
>> MemtableFlushWriter   0 0 38 0
>>   0
>> InternalResponseStage 0 0  0 0
>>   0
>> AntiEntropyStage  0 0  0 0
>>   0
>> CacheCleanupExecutor  0 0  0 0
>>   0
>> 

Re: Regarding cassandra-stress results

2016-03-14 Thread Rajath Subramanyam
I opened CASSANDRA-11352
 to add this minor
improvement to the cassandra-stress tool where the units are part of the
output.

- Rajath


Rajath Subramanyam


On Mon, Mar 14, 2016 at 3:43 PM, John Wong  wrote:

> On Mon, Mar 14, 2016 at 6:13 PM, Robert Coli  wrote:
>
>> On Mon, Mar 14, 2016 at 11:38 AM, Rajath Subramanyam 
>> wrote:
>>
>>> When cassandra-stress tool dumps the output at the end of the
>>> benchmarking run, what is the unit of latency statistics ?
>>>
>>
>> This is becoming a FAQ. Perhaps the docs for the tool (and/or the tool
>> itself) should be modified to specify units.
>>
>> I have bcced docs AT datastax regarding the docs.
>>
>> =Rob
>>
>>
>
> Probably also worth adding to the actual output. I am not sure if there's
> a good reason to not include in the output.
>
> John
>
>


Re: Regarding cassandra-stress results

2016-03-14 Thread John Wong
On Mon, Mar 14, 2016 at 6:13 PM, Robert Coli  wrote:

> On Mon, Mar 14, 2016 at 11:38 AM, Rajath Subramanyam 
> wrote:
>
>> When cassandra-stress tool dumps the output at the end of the
>> benchmarking run, what is the unit of latency statistics ?
>>
>
> This is becoming a FAQ. Perhaps the docs for the tool (and/or the tool
> itself) should be modified to specify units.
>
> I have bcced docs AT datastax regarding the docs.
>
> =Rob
>
>

Probably also worth adding to the actual output. I am not sure if there's a
good reason to not include in the output.

John


Re: Cassandra Upgrade 3.0.x vs 3.x (Tick-Tock Release)

2016-03-14 Thread Robert Coli
On Mon, Mar 14, 2016 at 12:40 PM, Kathiresan S  wrote:

> We are planning for Cassandra upgrade in our production environment.
> Which version of Cassandra is stable and is advised to upgrade to, at the
> moment?
>

https://www.eventbrite.com/engineering/what-version-of-cassandra-should-i-run/

(IOW, you should run either 2.1.MAX or 2.2.5)

Relatively soon, the answer will be "3.0.x", probably around the time where
3.0.x is >= 6.

After this series, the change in release cadence may change the above rule
of thumb.

=Rob


Re: Regarding cassandra-stress results

2016-03-14 Thread Robert Coli
On Mon, Mar 14, 2016 at 11:38 AM, Rajath Subramanyam 
wrote:

> When cassandra-stress tool dumps the output at the end of the benchmarking
> run, what is the unit of latency statistics ?
>

This is becoming a FAQ. Perhaps the docs for the tool (and/or the tool
itself) should be modified to specify units.

I have bcced docs AT datastax regarding the docs.

=Rob


Re: Regarding cassandra-stress results

2016-03-14 Thread Jaydeep Chovatia
ms

On Mon, Mar 14, 2016 at 11:38 AM, Rajath Subramanyam 
wrote:

> Hello Cassandra Community,
>
> When cassandra-stress tool dumps the output at the end of the benchmarking
> run, what is the unit of latency statistics ?
>
> latency mean  : 0.7 [READ:0.7, WRITE:0.7]
> latency median: 0.6 [READ:0.6, WRITE:0.6]
> latency 95th percentile   : 0.8 [READ:0.8, WRITE:0.8]
> latency 99th percentile   : 1.2 [READ:1.2, WRITE:1.2]
> latency 99.9th percentile : 8.8 [READ:8.9, WRITE:9.0]
> latency max   : 448.7 [READ:162.3, WRITE:448.7]
>
> Thanks in advance.
>
> - Rajath
> 
> Rajath Subramanyam
>
>


Re: Cassandra Upgrade 3.0.x vs 3.x (Tick-Tock Release)

2016-03-14 Thread Bryan Cheng
Hi Kathir,

The specific version will depend on your needs (eg. libraries) and
risk/stability profile. Personally, I generally go with the oldest branch
with still active maintenance (which would be 2.2.x or 2.1.x if you only
need critical fixes), but there's lots of good stuff in 3.x if you're happy
being a little closer to the bleeding edge.

There was a bit of discussion elsewhere on this list, eg here:
https://www.mail-archive.com/user@cassandra.apache.org/msg45990.html,
searching may turn up some more recommendations.

--Bryan

On Mon, Mar 14, 2016 at 12:40 PM, Kathiresan S  wrote:

> Hi,
>
> We are planning for Cassandra upgrade in our production environment.
> Which version of Cassandra is stable and is advised to upgrade to, at the
> moment?
>
> Looking at this JIRA (CASSANDRA-10822
> ), it looks like,
> if at all we plan to upgrade any recent version, it should be >= 3.0.2/3.2
>
> Should it be 3.0.4 / 3.0.3 / 3.3 or 3.4 ? In general, is it a good
> practice to upgrade to a Tick-Tock release instead of 3.0.X version. Please
> advice.
>
> Thanks,
> ​​Kathir
>


Re: Cassandra 3.2.1: Memory leak?

2016-03-14 Thread Paulo Motta
Sorry, the command is actually nodetool tablestats and you should watch the
bloom filter size or similar metrics.

2016-03-14 17:35 GMT-03:00 Mohamed Lrhazi :

> Hi Paulo,
>
> Which metric should I watch for this ?
>
> [root@avesterra-prod-1 ~]# rpm -qa| grep datastax
> datastax-ddc-3.2.1-1.noarch
> datastax-ddc-tools-3.2.1-1.noarch
> [root@avesterra-prod-1 ~]# cassandra -v
> 3.2.1
> [root@avesterra-prod-1 ~]#
>
> [root@avesterra-prod-1 ~]# nodetool -u cassandra -pw ''  tpstats
>
>
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MutationStage 0 0  13609 0
> 0
> ViewMutationStage 0 0  0 0
> 0
> ReadStage 0 0  0 0
> 0
> RequestResponseStage  0 0  8 0
> 0
> ReadRepairStage   0 0  0 0
> 0
> CounterMutationStage  0 0  0 0
> 0
> MiscStage 0 0  0 0
> 0
> CompactionExecutor1 1  17556 0
> 0
> MemtableReclaimMemory 0 0 38 0
> 0
> PendingRangeCalculator0 0  8 0
> 0
> GossipStage   0 0 118094 0
> 0
> SecondaryIndexManagement  0 0  0 0
> 0
> HintsDispatcher   0 0  0 0
> 0
> MigrationStage0 0  0 0
> 0
> MemtablePostFlush 0 0 55 0
> 0
> PerDiskMemtableFlushWriter_0 0 0 38 0
> 0
> ValidationExecutor0 0  0 0
> 0
> Sampler   0 0  0 0
> 0
> MemtableFlushWriter   0 0 38 0
> 0
> InternalResponseStage 0 0  0 0
> 0
> AntiEntropyStage  0 0  0 0
> 0
> CacheCleanupExecutor  0 0  0 0
> 0
> Native-Transport-Requests 0 0  0 0
> 0
>
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> HINT 0
> MUTATION 0
> COUNTER_MUTATION 0
> BATCH_STORE  0
> BATCH_REMOVE 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> [root@avesterra-prod-1 ~]#
>
>
>
>
> Thanks a lot,
> Mohamed.
>
>
>
> On Mon, Mar 14, 2016 at 8:22 AM, Paulo Motta 
> wrote:
>
>> Can you check with nodetool tpstats if bloom filter mem space utilization
>> is very large/ramping up before the node gets killed? You could be hitting
>> CASSANDRA-11344.
>>
>> 2016-03-12 19:43 GMT-03:00 Mohamed Lrhazi 
>> :
>>
>>> In my case, all nodes seem to be constantly logging messages like these:
>>>
>>> DEBUG [GossipStage:1] 2016-03-12 17:41:19,123 FailureDetector.java:456 -
>>> Ignoring interval time of 2000928319 for /10.212.18.170
>>>
>>> What does that mean?
>>>
>>> Thanks a lot,
>>> Mohamed.
>>>
>>>
>>> On Sat, Mar 12, 2016 at 5:39 PM, Mohamed Lrhazi <
>>> mohamed.lrh...@georgetown.edu> wrote:
>>>
 Oh wow, similar behavior with different version all together!!

 On Sat, Mar 12, 2016 at 5:28 PM, ssiv...@gmail.com 
 wrote:

> Hi, I'll duplicate here my email with the same issue
>
> "
>
>
> *I have 7 nodes of C* v2.2.5 running on CentOS 7 and using jemalloc
> for dynamic storage allocation. Use only one keyspace and one table with
> Leveled compaction strategy. I've loaded ~500 GB of data into the cluster
> with replication factor equals to 3 and waiting until compaction is
> finished. But during compaction each of the C* nodes allocates all the
> available memory (~128GB) and just stops its process. This is a known bug 
> ?
> *"
>
>
> On 03/13/2016 12:56 AM, Mohamed Lrhazi wrote:
>
> Hello,
>
> We installed Datastax community edition, on 8 nodes, RHEL7. We
> inserted some 7 billion rows into a pretty simple table. the inserts seem
> to have completed without issues. but ever since, we find that the nodes
> reliably run out of RAM after few hours, without any user 

Re: Cassandra 3.2.1: Memory leak?

2016-03-14 Thread Mohamed Lrhazi
Hi Paulo,

Which metric should I watch for this ?

[root@avesterra-prod-1 ~]# rpm -qa| grep datastax
datastax-ddc-3.2.1-1.noarch
datastax-ddc-tools-3.2.1-1.noarch
[root@avesterra-prod-1 ~]# cassandra -v
3.2.1
[root@avesterra-prod-1 ~]#

[root@avesterra-prod-1 ~]# nodetool -u cassandra -pw ''  tpstats


Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0  13609 0
0
ViewMutationStage 0 0  0 0
0
ReadStage 0 0  0 0
0
RequestResponseStage  0 0  8 0
0
ReadRepairStage   0 0  0 0
0
CounterMutationStage  0 0  0 0
0
MiscStage 0 0  0 0
0
CompactionExecutor1 1  17556 0
0
MemtableReclaimMemory 0 0 38 0
0
PendingRangeCalculator0 0  8 0
0
GossipStage   0 0 118094 0
0
SecondaryIndexManagement  0 0  0 0
0
HintsDispatcher   0 0  0 0
0
MigrationStage0 0  0 0
0
MemtablePostFlush 0 0 55 0
0
PerDiskMemtableFlushWriter_0 0 0 38 0
  0
ValidationExecutor0 0  0 0
0
Sampler   0 0  0 0
0
MemtableFlushWriter   0 0 38 0
0
InternalResponseStage 0 0  0 0
0
AntiEntropyStage  0 0  0 0
0
CacheCleanupExecutor  0 0  0 0
0
Native-Transport-Requests 0 0  0 0
0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
HINT 0
MUTATION 0
COUNTER_MUTATION 0
BATCH_STORE  0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0
[root@avesterra-prod-1 ~]#




Thanks a lot,
Mohamed.



On Mon, Mar 14, 2016 at 8:22 AM, Paulo Motta 
wrote:

> Can you check with nodetool tpstats if bloom filter mem space utilization
> is very large/ramping up before the node gets killed? You could be hitting
> CASSANDRA-11344.
>
> 2016-03-12 19:43 GMT-03:00 Mohamed Lrhazi :
>
>> In my case, all nodes seem to be constantly logging messages like these:
>>
>> DEBUG [GossipStage:1] 2016-03-12 17:41:19,123 FailureDetector.java:456 -
>> Ignoring interval time of 2000928319 for /10.212.18.170
>>
>> What does that mean?
>>
>> Thanks a lot,
>> Mohamed.
>>
>>
>> On Sat, Mar 12, 2016 at 5:39 PM, Mohamed Lrhazi <
>> mohamed.lrh...@georgetown.edu> wrote:
>>
>>> Oh wow, similar behavior with different version all together!!
>>>
>>> On Sat, Mar 12, 2016 at 5:28 PM, ssiv...@gmail.com 
>>> wrote:
>>>
 Hi, I'll duplicate here my email with the same issue

 "


 *I have 7 nodes of C* v2.2.5 running on CentOS 7 and using jemalloc for
 dynamic storage allocation. Use only one keyspace and one table with
 Leveled compaction strategy. I've loaded ~500 GB of data into the cluster
 with replication factor equals to 3 and waiting until compaction is
 finished. But during compaction each of the C* nodes allocates all the
 available memory (~128GB) and just stops its process. This is a known bug ?
 *"


 On 03/13/2016 12:56 AM, Mohamed Lrhazi wrote:

 Hello,

 We installed Datastax community edition, on 8 nodes, RHEL7. We inserted
 some 7 billion rows into a pretty simple table. the inserts seem to have
 completed without issues. but ever since, we find that the nodes reliably
 run out of RAM after few hours, without any user activity at all. No reads
 nor write are sent at all.  What should we look for to try and identify
 root cause?


 [root@avesterra-prod-1 ~]# cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 7.2 (Maipo)
 [root@avesterra-prod-1 ~]# rpm -qa| grep datastax
 datastax-ddc-3.2.1-1.noarch
 datastax-ddc-tools-3.2.1-1.noarch
 [root@avesterra-prod-1 ~]#

 The 

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-14 Thread Adam Plumb
OK so good news, I'm running with the patched jar file in my cluster and
haven't seen any issues.  The bloom filter off-heap memory usage is between
1.5GB and 2GB per node, which is much more in-line with what I'm expecting!
 (thumbsup)

On Mon, Mar 14, 2016 at 9:42 AM, Adam Plumb  wrote:

> Thanks for the link!  Luckily the cluster I'm running is not yet in
> production and running with dummy data so I will throw that jar on the
> nodes and I'll let you know how things shake out.
>
> On Sun, Mar 13, 2016 at 11:02 PM, Paulo Motta 
> wrote:
>
>> You could be hitting CASSANDRA-11344 (
>> https://issues.apache.org/jira/browse/CASSANDRA-11344).  If that's the
>> case, you may try to replace your cassandra jar on an affected node with a
>> version with this fix in place and force bloom filter regeneration to see
>> if if it fixes your problem. You can build with "ant jar" from this branch:
>> https://github.com/pauloricardomg/cassandra/tree/3.4-11344
>>
>> You can force bloom filter regeneration by either removing your
>> *Filter.db files (make sure to backup them before for safety) or changing
>> the bloom_filter_fp_chance before restarting affected nodes with the fixed
>> jar.
>>
>> 2016-03-13 19:51 GMT-03:00 Adam Plumb :
>>
>>> So it's looking like the bloom filter off heap memory usage is ramping
>>> up and up until the OOM killer kills the java process.  I relaunched on
>>> instances with 60GB of memory and the same thing is happening.  A node will
>>> start using more and more RAM until the process is killed, then another
>>> node will start using more and more until it is also killed.
>>>
>>> Is this the expected behavior?  It doesn't seem ideal to me.  Is there
>>> anything obvious that I'm doing wrong?
>>>
>>> On Fri, Mar 11, 2016 at 11:31 AM, Adam Plumb  wrote:
>>>
 Here is the creation syntax for the entire schema.  The xyz table has
 about 2.1 billion keys and the def table has about 230 million keys.  Max
 row size is about 3KB, mean row size is 700B.

 CREATE KEYSPACE abc WITH replication = {'class':
> 'NetworkTopologyStrategy', 'us-east': 3};
> CREATE TABLE xyz (
>   id text,
>   secondary_id int,
>   data text,
>   PRIMARY KEY(id)
> )
>   WITH
>   compaction = { 'class': 'LeveledCompactionStrategy' }
>   and compression = {'class': 'LZ4Compressor'};
> CREATE INDEX secondary_id_index ON abc.xyz (secondary_id);
> CREATE TABLE def (
>   id text,
>   secondary_id int,
>   data text,
>   PRIMARY KEY(id)
> )
>   WITH
>   compaction = { 'class': 'LeveledCompactionStrategy' }
>   and compression = {'class': 'LZ4Compressor'};
> CREATE INDEX secondary_id_index_def ON abc.def (secondary_id);


 On Fri, Mar 11, 2016 at 11:24 AM, Jack Krupansky <
 jack.krupan...@gmail.com> wrote:

> What is your schema and data like - in particular, how wide are your
> partitions (number of rows and typical row size)?
>
> Maybe you just need (a lot) more heap for rows during the repair
> process.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 11:19 AM, Adam Plumb  wrote:
>
>> These are brand new boxes only running Cassandra.  Yeah the kernel is
>> what is killing the JVM, and this does appear to be a memory leak in
>> Cassandra.  And Cassandra is the only thing running, aside from the basic
>> services needed for Amazon Linux to run.
>>
>> On Fri, Mar 11, 2016 at 11:17 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> Sacrifice child in dmesg is your OS killing the process with the
>>> most ram. That means you're actually running out of memory at the Linux
>>> level outside of the JVM.
>>>
>>> Are you running anything other than Cassandra on this box?
>>>
>>> If so, does it have a memory leak?
>>>
>>> all the best,
>>>
>>> Sebastián
>>> On Mar 11, 2016 11:14 AM, "Adam Plumb"  wrote:
>>>
 I've got a new cluster of 18 nodes running Cassandra 3.4 that I
 just launched and loaded data into yesterday (roughly 2TB of total 
 storage)
 and am seeing runaway memory usage.  These nodes are EC2 c3.4xlarges 
 with
 30GB RAM and the heap size is set to 8G with a new heap size of 1.6G.

 Last night I finished loading up the data, then ran an incremental
 repair on one of the nodes just to ensure that everything was working
 (nodetool repair).  Over night all 18 nodes ran out of memory and were
 killed by the OOM killer.  I restarted them this morning and they all 
 came
 up fine, but just started churning through memory and got killed 
 again.  I
 restarted them again and they're doing the same thing.  I'm not 
 getting any

Cassandra Upgrade 3.0.x vs 3.x (Tick-Tock Release)

2016-03-14 Thread Kathiresan S
Hi,

We are planning for Cassandra upgrade in our production environment.
Which version of Cassandra is stable and is advised to upgrade to, at the
moment?

Looking at this JIRA (CASSANDRA-10822
), it looks like, if
at all we plan to upgrade any recent version, it should be >= 3.0.2/3.2

Should it be 3.0.4 / 3.0.3 / 3.3 or 3.4 ? In general, is it a good practice
to upgrade to a Tick-Tock release instead of 3.0.X version. Please advice.

Thanks,
​​Kathir


Regarding cassandra-stress results

2016-03-14 Thread Rajath Subramanyam
Hello Cassandra Community,

When cassandra-stress tool dumps the output at the end of the benchmarking
run, what is the unit of latency statistics ?

latency mean  : 0.7 [READ:0.7, WRITE:0.7]
latency median: 0.6 [READ:0.6, WRITE:0.6]
latency 95th percentile   : 0.8 [READ:0.8, WRITE:0.8]
latency 99th percentile   : 1.2 [READ:1.2, WRITE:1.2]
latency 99.9th percentile : 8.8 [READ:8.9, WRITE:9.0]
latency max   : 448.7 [READ:162.3, WRITE:448.7]

Thanks in advance.

- Rajath

Rajath Subramanyam


Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-14 Thread Adam Plumb
Thanks for the link!  Luckily the cluster I'm running is not yet in
production and running with dummy data so I will throw that jar on the
nodes and I'll let you know how things shake out.

On Sun, Mar 13, 2016 at 11:02 PM, Paulo Motta 
wrote:

> You could be hitting CASSANDRA-11344 (
> https://issues.apache.org/jira/browse/CASSANDRA-11344).  If that's the
> case, you may try to replace your cassandra jar on an affected node with a
> version with this fix in place and force bloom filter regeneration to see
> if if it fixes your problem. You can build with "ant jar" from this branch:
> https://github.com/pauloricardomg/cassandra/tree/3.4-11344
>
> You can force bloom filter regeneration by either removing your *Filter.db
> files (make sure to backup them before for safety) or changing the
> bloom_filter_fp_chance before restarting affected nodes with the fixed jar.
>
> 2016-03-13 19:51 GMT-03:00 Adam Plumb :
>
>> So it's looking like the bloom filter off heap memory usage is ramping up
>> and up until the OOM killer kills the java process.  I relaunched on
>> instances with 60GB of memory and the same thing is happening.  A node will
>> start using more and more RAM until the process is killed, then another
>> node will start using more and more until it is also killed.
>>
>> Is this the expected behavior?  It doesn't seem ideal to me.  Is there
>> anything obvious that I'm doing wrong?
>>
>> On Fri, Mar 11, 2016 at 11:31 AM, Adam Plumb  wrote:
>>
>>> Here is the creation syntax for the entire schema.  The xyz table has
>>> about 2.1 billion keys and the def table has about 230 million keys.  Max
>>> row size is about 3KB, mean row size is 700B.
>>>
>>> CREATE KEYSPACE abc WITH replication = {'class':
 'NetworkTopologyStrategy', 'us-east': 3};
 CREATE TABLE xyz (
   id text,
   secondary_id int,
   data text,
   PRIMARY KEY(id)
 )
   WITH
   compaction = { 'class': 'LeveledCompactionStrategy' }
   and compression = {'class': 'LZ4Compressor'};
 CREATE INDEX secondary_id_index ON abc.xyz (secondary_id);
 CREATE TABLE def (
   id text,
   secondary_id int,
   data text,
   PRIMARY KEY(id)
 )
   WITH
   compaction = { 'class': 'LeveledCompactionStrategy' }
   and compression = {'class': 'LZ4Compressor'};
 CREATE INDEX secondary_id_index_def ON abc.def (secondary_id);
>>>
>>>
>>> On Fri, Mar 11, 2016 at 11:24 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
 What is your schema and data like - in particular, how wide are your
 partitions (number of rows and typical row size)?

 Maybe you just need (a lot) more heap for rows during the repair
 process.

 -- Jack Krupansky

 On Fri, Mar 11, 2016 at 11:19 AM, Adam Plumb  wrote:

> These are brand new boxes only running Cassandra.  Yeah the kernel is
> what is killing the JVM, and this does appear to be a memory leak in
> Cassandra.  And Cassandra is the only thing running, aside from the basic
> services needed for Amazon Linux to run.
>
> On Fri, Mar 11, 2016 at 11:17 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> Sacrifice child in dmesg is your OS killing the process with the most
>> ram. That means you're actually running out of memory at the Linux level
>> outside of the JVM.
>>
>> Are you running anything other than Cassandra on this box?
>>
>> If so, does it have a memory leak?
>>
>> all the best,
>>
>> Sebastián
>> On Mar 11, 2016 11:14 AM, "Adam Plumb"  wrote:
>>
>>> I've got a new cluster of 18 nodes running Cassandra 3.4 that I just
>>> launched and loaded data into yesterday (roughly 2TB of total storage) 
>>> and
>>> am seeing runaway memory usage.  These nodes are EC2 c3.4xlarges with 
>>> 30GB
>>> RAM and the heap size is set to 8G with a new heap size of 1.6G.
>>>
>>> Last night I finished loading up the data, then ran an incremental
>>> repair on one of the nodes just to ensure that everything was working
>>> (nodetool repair).  Over night all 18 nodes ran out of memory and were
>>> killed by the OOM killer.  I restarted them this morning and they all 
>>> came
>>> up fine, but just started churning through memory and got killed again. 
>>>  I
>>> restarted them again and they're doing the same thing.  I'm not getting 
>>> any
>>> errors in the system log, since the process is getting killed abruptly
>>> (which makes me think this is a native memory issue, not heap)
>>>
>>> Obviously this behavior isn't the best.  I'm willing to provide any
>>> data people need to help debug this, these nodes are still up and 
>>> running.
>>> I'm also in IRC if anyone wants to jump on there.
>>>
>>> Here is the output of ps aux:

Re: Cassandra 3.2.1: Memory leak?

2016-03-14 Thread Paulo Motta
Can you check with nodetool tpstats if bloom filter mem space utilization
is very large/ramping up before the node gets killed? You could be hitting
CASSANDRA-11344.

2016-03-12 19:43 GMT-03:00 Mohamed Lrhazi :

> In my case, all nodes seem to be constantly logging messages like these:
>
> DEBUG [GossipStage:1] 2016-03-12 17:41:19,123 FailureDetector.java:456 -
> Ignoring interval time of 2000928319 for /10.212.18.170
>
> What does that mean?
>
> Thanks a lot,
> Mohamed.
>
>
> On Sat, Mar 12, 2016 at 5:39 PM, Mohamed Lrhazi <
> mohamed.lrh...@georgetown.edu> wrote:
>
>> Oh wow, similar behavior with different version all together!!
>>
>> On Sat, Mar 12, 2016 at 5:28 PM, ssiv...@gmail.com 
>> wrote:
>>
>>> Hi, I'll duplicate here my email with the same issue
>>>
>>> "
>>>
>>>
>>> *I have 7 nodes of C* v2.2.5 running on CentOS 7 and using jemalloc for
>>> dynamic storage allocation. Use only one keyspace and one table with
>>> Leveled compaction strategy. I've loaded ~500 GB of data into the cluster
>>> with replication factor equals to 3 and waiting until compaction is
>>> finished. But during compaction each of the C* nodes allocates all the
>>> available memory (~128GB) and just stops its process. This is a known bug ?
>>> *"
>>>
>>>
>>> On 03/13/2016 12:56 AM, Mohamed Lrhazi wrote:
>>>
>>> Hello,
>>>
>>> We installed Datastax community edition, on 8 nodes, RHEL7. We inserted
>>> some 7 billion rows into a pretty simple table. the inserts seem to have
>>> completed without issues. but ever since, we find that the nodes reliably
>>> run out of RAM after few hours, without any user activity at all. No reads
>>> nor write are sent at all.  What should we look for to try and identify
>>> root cause?
>>>
>>>
>>> [root@avesterra-prod-1 ~]# cat /etc/redhat-release
>>> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>>> [root@avesterra-prod-1 ~]# rpm -qa| grep datastax
>>> datastax-ddc-3.2.1-1.noarch
>>> datastax-ddc-tools-3.2.1-1.noarch
>>> [root@avesterra-prod-1 ~]#
>>>
>>> The nodes had 8 GB RAM, which we doubled twice and now are trying with
>>> 40GB... they still manage to consume it all and cause oom_killer to kick in.
>>>
>>> Pretty much all the settings are the default ones the installation
>>> created.
>>>
>>> Thanks,
>>> Mohamed.
>>>
>>>
>>> --
>>> Thanks,
>>> Serj
>>>
>>>
>>
>


Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-14 Thread Carlos Alonso
Hi.

+1 to this @Jack's sentence 'Generally, Cassandra is ideal for only two use
cases (access patterns really): 1) retrieval by a specific key, and 2)
retrieval of a relatively narrow slice of contiguous data, beginning with a
specific key.'

So I think you're modelling it properly (to have fairly narrow rows). I
think you can then store in another table the initial bucket for a sensor
and either don't have the end one (taking advantage that Cassandra is very
quick at finding empty partitions) and query until today. Or, given that
your bucketing is per week, only update the 'last partition' entry for a
sensor if we're really one week after the latest saved. That will generate
one single tombstone per sensor and that doesn't sound scary I think.

On the other hand. Did you considered offloading the historical data to a
better data warehouse?

Regards

Carlos Alonso | Software Engineer | @calonso 

On 12 March 2016 at 16:59, Jack Krupansky  wrote:

> Generally, secondary indexes are not recommended in Cassandra. Query
> tables and/or materialized views are the recommended alternative. But it
> all depends on the specific nature of the queries and the cardinality of
> the data.
>
> Generally, Cassandra is ideal for only two use cases (access patterns
> really): 1) retrieval by a specific key, and 2) retrieval of a relatively
> narrow slice of contiguous data, beginning with a specific key.
>
> Bulk retrieval is not a great access pattern for Cassandra. The emphasis
> is on being a database (that's why CQL is so similar to SQL) rather than a
> raw data store.
>
> Sure, technically you can do bulk retrieval, but essentially that requires
> modeling and accessing using relatively narrow slices.
>
> Closing the circle, Cassandra is always enhancing its capabilities and
> there is indeed that effort underway to support wider rows, but the
> emphasis of modeling still needs to be centered on point queries and narrow
> contiguous slices.
>
> Even with Spark and analytics that may indeed need to do a full scan of a
> large amount of data, the model needs to be that the big scan is done in
> small chunks.
>
>
> -- Jack Krupansky
>
> On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania 
> wrote:
>
>> Our analytics currently pulls in all the data for a single sensor reading
>> as we use it in its entirety during signal processing. We may add secondary
>> indices to the table in the future to pull in broadly classified data, but
>> right now, our only goal is this bulk retrieval.
>>
>> --
>> *From:* Jack Krupansky 
>> *To:* user@cassandra.apache.org
>> *Sent:* Friday, March 11, 2016 7:25 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Thanks, that level of query detail gives us a better picture to focus on.
>> I think through this some more over the weekend.
>>
>> Also, these queries focus on raw, bulk retrieval of sensor data readings,
>> but do you have reading-based queries, such as range of an actual sensor
>> reading?
>>
>> -- Jack Krupansky
>>
>> On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania 
>> wrote:
>>
>> The 5000 readings mentioned would be against a single sensor on a single
>> sensor unit.
>>
>> The scope of the queries on this table is intended to be fairly simple.
>> Here are some example queries, without 'sharding', that we would perform on
>> this table:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
>> ORDER BY time DESC LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
>> ORDER BY time LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
>> classification=?
>> ORDER BY time DESC LIMIT 5000
>>
>> where 'classification' is secondary index that we expect to add.
>>
>> In some cases, we have to revisit all values too so a complete table scan
>> is needed:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>>
>> Getting the "next" and "previous" 5000 readings is also something we do,
>> but is manageable from our standpoint as we can look at the range-end
>> timestamps that are returned and use those in the subsequent queries.
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
>> ORDER BY time LIMIT 5000
>>
>> Splitting the bulk content out of the main table is something we
>> considered too but we didn't find any detail on whether that would solve
>> our timeout problem. If there is a reference for using this approach, it
>> would be of interest to us to avoid any assumptions on how we would
>> approach it.
>>
>> A question: Is the probability of a timeout directly linked to a longer
>> seek time in reading through a 

Multi DC setup for analytics

2016-03-14 Thread Anishek Agarwal
Hello,

We are using cassandra 2.0.17 and have two logical DC having different
Keyspaces but both having same logical name DC1.

we want to setup another cassandra cluster for analytics which should get
data from both the above DC.

if we setup the new DC with name DC2 and follow the steps
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
will it work ?

I would think we would have to first change the names of existing clusters
to have to different names and then go with adding another dc getting data
from these?

Also as soon as we add the node the data starts moving... this will all be
only real time changes done to the cluster right ? we still have to do the
rebuild to get the data for tokens for node in new cluster ?

Thanks
Anishek