In place vnode conversion possible?

2014-12-16 Thread Jonas Borgström
Hi,

I know that adding a new vnode enabled DC is the recommended method to
convert and existing cluster to vnode. And that the cassandra-shuffle
utility has been removed.

That said, I've done some testing and it appears to be possible to
perform an in place conversion as long as all nodes contain all data (3
nodes and replication factor 3 for example) like this:

for each node:
- nodetool -h localhost disablegossip (Not sure if this is needed)

- cqlsh localhost
  UPDATE system.local SET tokens=$NEWTOKENS WHERE key='local';

- nodetool -h localhost disablethrift (Not sure if this is needed)
- nodetool -h localhost drain
- service cassandra restart

And the following python snippet was used to generate $NEWTOKENS for
each node (RandomPartitioner):

import random
print str([str(x) for x in sorted(random.randint(0,2**127-1) for x in
range(256))]).replace('[', '{').replace(']', '}')


I've tested this in a test cluster and it seems to work just fine.

Has anyone else done anything similar?

Or if manually changing tokens is impossible and something horrible will
hit me down the line?

Test cluster configuration
--
Cassandra version: 1.2.19
Number of nodes: 3
Keyspace: NetworkTopologyStrategy:  {DC1: 1, DC2:1, DC3: 1}

/ Jonas



signature.asc
Description: OpenPGP digital signature


Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hello all,

I have read a lot about Cassandra and I read about key-value pairs,
partition keys, clustering keys, etc..
Is key mentioned in key-value pair and partition key refers to same or are
they different?

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2));


In this schema, I know (year, category) is the compound partition key and
frequency is the clustering key. What is the key here?


Thank You!

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Jack Krupansky
Correction: year and category form a “composite partition key”.

frequency, word1, and word2 are “clustering columns”.

The combination of a partition key with clustering columns is a “compound 
primary key”.

Every CQL row will have a partition key by definition, and may optionally have 
clustering columns.

“The key” should just be a synonym for “primary key”, although sometimes people 
are loosely speaking about “the partition” (which should be “the partition 
key”) rather than the CQL “row”.

-- Jack Krupansky

From: Chamila Wijayarathna 
Sent: Tuesday, December 16, 2014 8:03 AM
To: user@cassandra.apache.org 
Subject: Understanding what is key and partition key

Hello all,  

I have read a lot about Cassandra and I read about key-value pairs, partition 
keys, clustering keys, etc.. 
Is key mentioned in key-value pair and partition key refers to same or are they 
different?

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2)
);
In this schema, I know (year, category) is the compound partition key and 
frequency is the clustering key. What is the key here?


Thank You! 


-- 

Chamila Dilshan Wijayarathna,
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jack,

So what will be the keys and values of the following CF instance?

year | category | frequency | word1| word2   | id
--+--+---+--+-+---
 2014 |N | 1 |සියළුම | යුද්ධ |   664
 2014 |N | 1 |එච් |   කාණ්ඩය | 12526
 2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
 2014 |N | 1 |  බී|   කාණ්ඩය | 12505

Thank You!

On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com
wrote:

   Correction: year and category form a “composite partition key”.

 frequency, word1, and word2 are “clustering columns”.

 The combination of a partition key with clustering columns is a “compound
 primary key”.

 Every CQL row will have a partition key by definition, and may optionally
 have clustering columns.

 “The key” should just be a synonym for “primary key”, although sometimes
 people are loosely speaking about “the partition” (which should be “the
 partition key”) rather than the CQL “row”.

 -- Jack Krupansky

  *From:* Chamila Wijayarathna cdwijayarat...@gmail.com
 *Sent:* Tuesday, December 16, 2014 8:03 AM
 *To:* user@cassandra.apache.org
 *Subject:* Understanding what is key and partition key

  Hello all,

 I have read a lot about Cassandra and I read about key-value pairs,
 partition keys, clustering keys, etc..
 Is key mentioned in key-value pair and partition key refers to same or are
 they different?


 CREATE TABLE corpus.bigram_time_category_ordered_frequency (
 id bigint,
 word1 varchar,
 word2 varchar,
 year int,
 category varchar,
 frequency int,
 PRIMARY KEY((year, category),frequency,word1,word2));


 In this schema, I know (year, category) is the compound partition key and
 frequency is the clustering key. What is the key here?


 Thank You!

 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.



-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Jens Rantil
For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part 
is (664).




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna
cdwijayarat...@gmail.com wrote:

 Hi Jack,
 So what will be the keys and values of the following CF instance?
 year | category | frequency | word1| word2   | id
 --+--+---+--+-+---
  2014 |N | 1 |සියළුම | යුද්ධ |   664
  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
  2014 |N | 1 |  බී|   කාණ්ඩය | 12505
 Thank You!
 On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   Correction: year and category form a “composite partition key”.

 frequency, word1, and word2 are “clustering columns”.

 The combination of a partition key with clustering columns is a “compound
 primary key”.

 Every CQL row will have a partition key by definition, and may optionally
 have clustering columns.

 “The key” should just be a synonym for “primary key”, although sometimes
 people are loosely speaking about “the partition” (which should be “the
 partition key”) rather than the CQL “row”.

 -- Jack Krupansky

  *From:* Chamila Wijayarathna cdwijayarat...@gmail.com
 *Sent:* Tuesday, December 16, 2014 8:03 AM
 *To:* user@cassandra.apache.org
 *Subject:* Understanding what is key and partition key

  Hello all,

 I have read a lot about Cassandra and I read about key-value pairs,
 partition keys, clustering keys, etc..
 Is key mentioned in key-value pair and partition key refers to same or are
 they different?


 CREATE TABLE corpus.bigram_time_category_ordered_frequency (
 id bigint,
 word1 varchar,
 word2 varchar,
 year int,
 category varchar,
 frequency int,
 PRIMARY KEY((year, category),frequency,word1,word2));


 In this schema, I know (year, category) is the compound partition key and
 frequency is the clustering key. What is the key here?


 Thank You!

 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.

 -- 
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.

Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jens,

Thank You!

On Tue, Dec 16, 2014 at 7:03 PM, Jens Rantil jens.ran...@tink.se wrote:

 For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the
 value-part is (664).

 Cheers,
 Jens

 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter


 On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna 
 cdwijayarat...@gmail.com wrote:

 Hi Jack,

 So what will be the keys and values of the following CF instance?

  year | category | frequency | word1| word2   | id
 --+--+---+--+-+---
  2014 |N | 1 |සියළුම | යුද්ධ |   664
  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
  2014 |N | 1 |  බී|   කාණ්ඩය | 12505

 Thank You!

 On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   Correction: year and category form a “composite partition key”.

 frequency, word1, and word2 are “clustering columns”.

 The combination of a partition key with clustering columns is a
 “compound primary key”.

 Every CQL row will have a partition key by definition, and may
 optionally have clustering columns.

 “The key” should just be a synonym for “primary key”, although sometimes
 people are loosely speaking about “the partition” (which should be “the
 partition key”) rather than the CQL “row”.

 -- Jack Krupansky

  *From:* Chamila Wijayarathna cdwijayarat...@gmail.com
  *Sent:* Tuesday, December 16, 2014 8:03 AM
  *To:* user@cassandra.apache.org
  *Subject:* Understanding what is key and partition key

   Hello all,

 I have read a lot about Cassandra and I read about key-value pairs,
 partition keys, clustering keys, etc..
 Is key mentioned in key-value pair and partition key refers to same or
 are they different?


 CREATE TABLE corpus.bigram_time_category_ordered_frequency (
 id bigint,
 word1 varchar,
 word2 varchar,
 year int,
 category varchar,
 frequency int,
 PRIMARY KEY((year, category),frequency,word1,word2));


 In this schema, I know (year, category) is the compound partition key
 and frequency is the clustering key. What is the key here?


 Thank You!

 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.



 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.




-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Chamila Wijayarathna
Hello all,

I am trying to test my application using cassandra-unit with following
schema and data given below.

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2));

year | category | frequency | word1| word2   | id
--+--+---+--+-+---
 2014 |N | 1 |සියළුම | යුද්ධ |   664
 2014 |N | 1 |එච් |   කාණ්ඩය | 12526
 2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
 2014 |N | 1 |  බී|   කාණ්ඩය | 12505

Since this has a compound primary key, I am not clear with how to define
dataset.json [1] for this CF. Can somebody help me on how to do that?

Thank You!

1.
https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: batch_size_warn_threshold_in_kb

2014-12-16 Thread Eric Stevens
 You are, of course, free to use batches in your application

I'm not looking to justify the use of batches, I'm looking for the path
forward that will give us the Best Results™ both near and long term, for
some definition of Best (which would be a balance of client throughput and
cluster pressure).  If individual writes are best for us, that's what I
want to do.  If batches are best for us, that's what I want to do.

I'm just struggling that I'm not able to reproduce your advice
experimentally, and it's not just a few percent difference, it's 5x to 8x
difference.  It's really difficult for me to adopt advice blindly when it
differs from my own observations by such a substantial amount.  That means
something is wrong either with my observations or with the advice, and I
would really like to know which.  I'm not trying to be argumentative or
push for a particular approach, I'm trying to resolve an inconsistency.


RE your questions: I'm sorry this turns into a wall of text, simple
questions about parallelism and distributed systems rarely can be
adequately answered in just a few words.  I'm trying to be open and
transparent about my testing approach because I want to find out where the
disconnect is here.  At the same time I'm trying to bridge the knowledge
gap since I'm working with parallelism toolset with which you're not
familiar, and that could obviously have a substantial impact on the
results.  Hopefully someone else in the community familiar with Scala will
notice this and provide feedback that I'm not making a fundamental mistake.


1) My original runs were in EC2 being driven by a different server than the
Cassandra cluster, but in the same AZ as one of the Cassandra
servers (typical 3-AZ setup for Cassandra).  All four instances (3x C*, 1x
test driver) were i2.2xl, so have gigabit network between them.


2) The system was under some moderate other load, this is our test cluster
that takes a steady stream of simulated data to provide other developers
with something to work against.  That load is quite constant and doesn't
work these servers particularly hard - only a few thousand records per
second typically.  Load averages between 1 and 3 most of the time.

Unfortunately I'm not successful getting cassandra-stress talking to this
cluster because of ssl configuration (it doesn't seem to actually pay
attention to -ts and -tspw command line flags).  I can find out if our ops
guys would be ok with turning off ssl for a while, but as that would break
our other applications using the same cluster, and may block our other
engineers as a result.  So it has farther reaching implications than just
being something I can happily turn on or off at whim.

I'm curious how you would expect the performance of my stress tool to
differ when the cluster was being overworked - could you explain what you
anticipate the change in results to look like?  I.e. would single-writes
remain about constant for performance while batches would degrade in
performance?


3) Well I specifically attempt to control for this by testing three
different concurrency models, these were named by me parallel, scatter,
and traverse (just aliases to make it easier to control the driver).  You
can see the code between the different approaches here - they are pretty
similar to each other, but probably involve some knowledge of how
concurrency works in Scala to really appreciate the differences:
https://gist.github.com/MightyE/1c98912fca104f6138fc/a7db68e72f99ac1215fcfb096d69391ee285c080#file-testsuite-L181-L203

I know you're not a Scala guy, so I'll explain roughly what they do, but
the point is that I'm trying hard to control for just having chosen a bad
concurrency model:

scatter - Take all of the Statements and call executeAsync() on them as
*fast* the Session will let me.  This is the Unintelligent Brute Force
approach, and it's definitely not how I would model a typical production
application as it doesn't attempt to respond to system pressure at all, and
it's trying to gobble up as many resources as it can.  Use the Scala
Futures system to combine the the set of async calls into a single Future
that completes when all the futures returned from executeAsync() have
completed.

traverse - Give all of the Statements to the Scala Futures system and tell
it to call executeAsync() on them all at the rate that it thinks is
appropriate.  This would be much closer to my recommendation on how to
model a production application, because in a real application, there's more
than a single class of work to be done, and the Futures system schedules
both this work and other work intelligently and configurably.  It gives us
a single awaitable Future that completes when it has finished all of its
work and all of the async calls have been completed.  You guys are using
Netty for your native protocol, and Netty offers true event driven
concurrency which gets along famously well with Scala's Futures system.

parallel - Use a Scala Parallel collection to 

does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
know that C* is not well suited to this kind of workload, but that's where
we are, and before I go looking for an entirely new data layer I would
rather explore whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always
occur by backend processes to clean up data after it has been processed,
and thus they do not need to be 100% available.  So this made me think,
what if I did the following?

   - gc_grace_seconds = 0, which ensures that tombstones are never created
   - replication factor = 3
   - for writes that are inserts, consistency = QUORUM, which ensures that
   writes can proceed even if 1 replica is slow/down
   - for deletes, consistency = ALL, which ensures that when we delete a
   record it disappears entirely (no need for tombstones)
   - for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't
need to worry about inconsistencies created by partial updates (e.g. value
gets changed on one machine but not another).  Sometimes there will be
duplicate writes, but I think that should be fine since the value is always
identical.

Any red flags with this approach?  Has anyone tried it and have experiences
to share?  Also, I *think* that this means that I don't need to run
repairs, which from an ops perspective is great.

Thanks, as always,
- Ian


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Eric Stevens
No, deletes are always written as a tombstone no matter the consistency.
This is because data at rest is written to sstables which are immutable
once written. The tombstone marks that a record in another sstable is now
deleted, and so a read of that value should be treated as if it doesn't
exist.

When sstables are later compacted, several sstables are merged into one and
any overlapping values between the tables are condensed into one. Values
which have a tombstone can be excluded from the new sstable. GC grace
period indicates how long a tombstone should be kept after all underlying
values have been compacted away so that the deleted value can't be
resurrected if a node rejoins the cluster which knew that value.
On Dec 16, 2014 8:23 AM, Ian Rose ianr...@fullstory.com wrote:

 Howdy all,

 Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

 However, deletions are never driven by users in our app - deletions always
 occur by backend processes to clean up data after it has been processed,
 and thus they do not need to be 100% available.  So this made me think,
 what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

 Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

 Thanks, as always,
 - Ian




Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot 
be deleted. Therefore, a tombstone is required. The value you deleted will be 
physically removed during compaction.

My workload sounds similar to yours in some respects, and I was able to get C* 
working for me. I have large chunks of data which I periodically replace. I 
write the new data, update a reference, and then delete the old data. I 
designed my schema to be tombstone-friendly, and C* works great. For some of my 
tables I am able to delete entire partitions. Because of the reference that I 
updated, I never try to access the old data, and therefore the tombstones for 
these partitions are never read. The old data simply has to wait for 
compaction. Other tables require deleting records within partitions. These 
tombstones do get read, so there are performance implications. I was able to 
design my schema so that no partition ever has more than a few tombstones (one 
for each generation of deleted data, which is usually no more than one).

Hope this helps.

Robert

On Dec 16, 2014, at 8:22 AM, Ian Rose 
ianr...@fullstory.commailto:ianr...@fullstory.com wrote:

Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to clean up data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?

  *   gc_grace_seconds = 0, which ensures that tombstones are never created
  *   replication factor = 3
  *   for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down
  *   for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones)
  *   for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian




Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Ah, makes sense.  Thanks for the explanations!

- Ian


On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille rwi...@fold3.com wrote:

  Tombstones have to be created. The SSTables are immutable, so the data
 cannot be deleted. Therefore, a tombstone is required. The value you
 deleted will be physically removed during compaction.

  My workload sounds similar to yours in some respects, and I was able to
 get C* working for me. I have large chunks of data which I periodically
 replace. I write the new data, update a reference, and then delete the old
 data. I designed my schema to be tombstone-friendly, and C* works great.
 For some of my tables I am able to delete entire partitions. Because of the
 reference that I updated, I never try to access the old data, and therefore
 the tombstones for these partitions are never read. The old data simply has
 to wait for compaction. Other tables require deleting records within
 partitions. These tombstones do get read, so there are performance
 implications. I was able to design my schema so that no partition ever has
 more than a few tombstones (one for each generation of deleted data, which
 is usually no more than one).

  Hope this helps.

  Robert

  On Dec 16, 2014, at 8:22 AM, Ian Rose ianr...@fullstory.com wrote:

  Howdy all,

  Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

  However, deletions are never driven by users in our app - deletions
 always occur by backend processes to clean up data after it has been
 processed, and thus they do not need to be 100% available.  So this made me
 think, what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

  Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

  Thanks, as always,
 - Ian





Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Jack Krupansky
When you say “no need for tombstones”, did you actually read that somewhere or 
were you just speculating? If the former, where exactly?

-- Jack Krupansky

From: Ian Rose 
Sent: Tuesday, December 16, 2014 10:22 AM
To: user 
Subject: does consistency=ALL for deletes obviate the need for tombstones?

Howdy all, 

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to clean up data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?
  a.. gc_grace_seconds = 0, which ensures that tombstones are never created 
  b.. replication factor = 3 
  c.. for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down 
  d.. for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones) 
  e.. for reads, consistency = QUORUM
Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian


Re: Hinted handoff not working

2014-12-16 Thread Robert Wille
Nope. I added millions of records and several GB to the cluster while one node 
was down, and then ran nodetool flush system hints on a couple of nodes that 
were up, and system/hints has less than 200K in it.

Here’s the relevant part of nodetool cfstats system.hints:

Keyspace: system
Read Count: 28572
Read Latency: 0.01806502869942601 ms.
Write Count: 351
Write Latency: 0.04547008547008547 ms.
Pending Tasks: 0
Table: hints
SSTable count: 1
Space used (live), bytes: 7446
Space used (total), bytes: 80062
SSTable Compression Ratio: 0.2651441528992549
Number of keys (estimate): 128
Memtable cell count: 1
Memtable data size, bytes: 1740

The hints are definitely not being stored.

Robert

On Dec 14, 2014, at 11:44 PM, Jens Rantil 
jens.ran...@tink.semailto:jens.ran...@tink.se wrote:

Hi Robert ,

Maybe you need to flush your memtables to actually see the disk usage increase? 
This applies to both hosts.

Cheers,
Jens




On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

I have a cluster with RF=3. If I shut down one node, add a bunch of data to the 
cluster, I don’t see a bunch of records added to system.hints. Also, du of 
/var/lib/cassandra/data/system/hints of the nodes that are up shows that hints 
aren’t being stored. When I start the down node, its data doesn’t grow until I 
run repair, which then takes a really long time because it is significantly out 
of date. Is there some magic setting I cannot find in the documentation to 
enable hinted handoff? I’m running 2.0.11. Any insights would be greatly 
appreciated.

Thanks

Robert





Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum =
(2/2) +1 = 2.

Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
confused, what is my read CL and what is my WRITE CL?)

So, does it mean that for every WRITE it will write in both the nodes?

and For every READ, it will read from both nodes and give back to client?

DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

Regards

Neha

On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 I did a presentation on diagnosing performance problems in production at
 the US  Euro summits, in which I covered quite a few tools  preventative
 measures you should know when running a production cluster.  You may find
 it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I don't
 think it's completely comprehensive (but no tool really is) but it gets you
 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the edge
 of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha




Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Hi,
I have been having a few exchanges with contributors to the project around what 
is possible with Cassandra and a common response that comes up when I describe 
functionality as broken or missing is that I am not modelling my data 
correctly. Unfortunately, I cannot seem to find comprehensive documentation on 
modelling with Cassandra. In particular, I am finding myself modelling by 
restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such 
documentation?The DataStax documentation on data modelling is far too weak to 
be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further 
processing; ie I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a 
column is part of the primary key, it cannot be edited so I have a circular 
dependency
Thanks,
Jason


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
CL quorum with RF2 is equivalent to ALL, writes will require
acknowledgement from both nodes, and reads will be from both nodes.

CL one will write to both replicas, but return success as soon as the first
one responds, read will be from one node ( load balancing strategy
determines which one).

FWIW I've come around to dislike downgrading retry policy. I now feel like
if I'm using downgrading, I'm effectively going to be using that downgraded
policy most of the time under server stress, so in practice that reduced
consistency is the effective consistency I'm asking for from my writes and
reads.



On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com
wrote:

 Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum
 = (2/2) +1 = 2.

 Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
 confused, what is my read CL and what is my WRITE CL?)

 So, does it mean that for every WRITE it will write in both the nodes?

 and For every READ, it will read from both nodes and give back to client?

 DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

 Regards

 Neha

 On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 I did a presentation on diagnosing performance problems in production at
 the US  Euro summits, in which I covered quite a few tools  preventative
 measures you should know when running a production cluster.  You may find
 it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I don't
 think it's completely comprehensive (but no tool really is) but it gets you
 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the
 edge of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Thanks Ryan.
So, as Jonathan recommended, we should have RF=3 with Three nodes.
So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not
need the  downgrading retry policy, in case if my one node goes down.

I can dynamically add a New node to my Cluster.
Can I change my RF to 3, dynamically without affecting my nodes ?

regards
Neha

On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com wrote:


 CL quorum with RF2 is equivalent to ALL, writes will require
 acknowledgement from both nodes, and reads will be from both nodes.

 CL one will write to both replicas, but return success as soon as the
 first one responds, read will be from one node ( load balancing strategy
 determines which one).

 FWIW I've come around to dislike downgrading retry policy. I now feel like
 if I'm using downgrading, I'm effectively going to be using that downgraded
 policy most of the time under server stress, so in practice that reduced
 consistency is the effective consistency I'm asking for from my writes and
 reads.



 On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum
 = (2/2) +1 = 2.

 Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
 confused, what is my read CL and what is my WRITE CL?)

 So, does it mean that for every WRITE it will write in both the nodes?

 and For every READ, it will read from both nodes and give back to client?

 DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

 Regards

 Neha

 On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 I did a presentation on diagnosing performance problems in production at
 the US  Euro summits, in which I covered quite a few tools  preventative
 measures you should know when running a production cluster.  You may find
 it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I don't
 think it's completely comprehensive (but no tool really is) but it gets you
 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the
 edge of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
you'll have to run repair and that will involve some load and streaming,
but this is a normal use case for cassandra..and your cluster should be
sized load wise to allow repair, and bootstrapping of new nodes..otherwise
when you're over whelmed you won't be able to add more nodes easily.

If you need to reduce the cost of streaming to the existing cluster, just
set streaming throughput on your existing nodes to a lower number like 50
or 25.

On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi nehajtriv...@gmail.com
wrote:

 Thanks Ryan.
 So, as Jonathan recommended, we should have RF=3 with Three nodes.
 So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not
 need the  downgrading retry policy, in case if my one node goes down.

 I can dynamically add a New node to my Cluster.
 Can I change my RF to 3, dynamically without affecting my nodes ?

 regards
 Neha

 On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com
 wrote:


 CL quorum with RF2 is equivalent to ALL, writes will require
 acknowledgement from both nodes, and reads will be from both nodes.

 CL one will write to both replicas, but return success as soon as the
 first one responds, read will be from one node ( load balancing strategy
 determines which one).

 FWIW I've come around to dislike downgrading retry policy. I now feel
 like if I'm using downgrading, I'm effectively going to be using that
 downgraded policy most of the time under server stress, so in practice that
 reduced consistency is the effective consistency I'm asking for from my
 writes and reads.



 On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us
 Quorum = (2/2) +1 = 2.

 Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
 confused, what is my read CL and what is my WRITE CL?)

 So, does it mean that for every WRITE it will write in both the nodes?

 and For every READ, it will read from both nodes and give back to client?

 DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

 Regards

 Neha

 On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 I did a presentation on diagnosing performance problems in production
 at the US  Euro summits, in which I covered quite a few tools 
 preventative measures you should know when running a production cluster.
 You may find it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I
 don't think it's completely comprehensive (but no tool really is) but it
 gets you 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the
 edge of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
thanks Ryan.. We will get a new node and add it in the cluster. I will mail
if I have any question regarding the same.

On Tue, Dec 16, 2014 at 10:52 PM, Ryan Svihla rsvi...@datastax.com wrote:

 you'll have to run repair and that will involve some load and streaming,
 but this is a normal use case for cassandra..and your cluster should be
 sized load wise to allow repair, and bootstrapping of new nodes..otherwise
 when you're over whelmed you won't be able to add more nodes easily.

 If you need to reduce the cost of streaming to the existing cluster, just
 set streaming throughput on your existing nodes to a lower number like 50
 or 25.

 On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Thanks Ryan.
 So, as Jonathan recommended, we should have RF=3 with Three nodes.
 So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will
 not need the  downgrading retry policy, in case if my one node goes down.

 I can dynamically add a New node to my Cluster.
 Can I change my RF to 3, dynamically without affecting my nodes ?

 regards
 Neha

 On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla rsvi...@datastax.com
 wrote:


 CL quorum with RF2 is equivalent to ALL, writes will require
 acknowledgement from both nodes, and reads will be from both nodes.

 CL one will write to both replicas, but return success as soon as the
 first one responds, read will be from one node ( load balancing strategy
 determines which one).

 FWIW I've come around to dislike downgrading retry policy. I now feel
 like if I'm using downgrading, I'm effectively going to be using that
 downgraded policy most of the time under server stress, so in practice that
 reduced consistency is the effective consistency I'm asking for from my
 writes and reads.



 On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us
 Quorum = (2/2) +1 = 2.

 Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
 confused, what is my read CL and what is my WRITE CL?)

 So, does it mean that for every WRITE it will write in both the nodes?

 and For every READ, it will read from both nodes and give back to
 client?

 DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

 Regards

 Neha

 On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 I did a presentation on diagnosing performance problems in production
 at the US  Euro summits, in which I covered quite a few tools 
 preventative measures you should know when running a production cluster.
 You may find it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I
 don't think it's completely comprehensive (but no tool really is) but it
 gets you 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the
 edge of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with
 nodetool utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
Data Modeling a distributed application could be a book unto itself.
However, I will add, modeling by restriction is basically the entire
thought process in Cassandra data modeling since it's a distributed hash
table and a core aspect of that sort of application is you need to be able
to quickly locate which server owns the data you want in the cluster (which
is provided by the partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works.
I'm not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise
you'd have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that
:


   1. Querying a single server will be faster than querying many servers
   2. Multiple tables with the same data but with different partition keys
   is much easier to scale that a single table that you have to scan the whole
   cluster for your answer.


If you accept this, you've basically got the key principle down...most
other ideas are extensions of this, some nuance includes dealing with
tombstones, partition size and order. and I can answer any more specifics.

I've been meaning to write a series of blog posts on this, but as I stated,
it's almost a book unto itself. Data modeling a distributed application
requires a fundamental rethink of all the assumptions we've been taught for
master/slave style databases.


On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com wrote:

 Hi,

 I have been having a few exchanges with contributors to the project around
 what is possible with Cassandra and a common response that comes up when I
 describe functionality as broken or missing is that I am not modelling my
 data correctly. Unfortunately, I cannot seem to find comprehensive
 documentation on modelling with Cassandra. In particular, I am finding
 myself modelling by restriction rather than what I would like to do.

 Does such documentations exist? If not, is there any effort to create such
 documentation?The DataStax documentation on data modelling is far too weak
 to be meaningful.

 In particular, I am caught because:

 1) I want to search on a specific column to make updates to it after
 further processing; ie I don't know its value on first insert
 2) If I want to search on a column, it has to be part of the primary key
 3) If a column is part of the primary key, it cannot be edited so I have a
 circular dependency

 Thanks,

 Jason



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Ryan Svihla
I'd ask the author of cassandra-unit. I've not personally used that project.

On Tue, Dec 16, 2014 at 8:00 AM, Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hello all,

 I am trying to test my application using cassandra-unit with following
 schema and data given below.

 CREATE TABLE corpus.bigram_time_category_ordered_frequency (
 id bigint,
 word1 varchar,
 word2 varchar,
 year int,
 category varchar,
 frequency int,
 PRIMARY KEY((year, category),frequency,word1,word2));

 year | category | frequency | word1| word2   | id
 --+--+---+--+-+---
  2014 |N | 1 |සියළුම | යුද්ධ |   664
  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
  2014 |N | 1 |  බී|   කාණ්ඩය | 12505

 Since this has a compound primary key, I am not clear with how to define
 dataset.json [1] for this CF. Can somebody help me on how to do that?

 Thank You!

 1.
 https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet

 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Changing replication factor of Cassandra cluster

2014-12-16 Thread Ryan Svihla
Repair's performance is going to vary heavily by a large number of factors,
hours for 1 node to finish is within range of what I see in the wild, again
there are so many factors it's impossible to speculate on if that is good
or bad for your cluster. Factors that matter include:

   1. speed of disk io
   2. amount of ram and cpu on each node
   3. network interface speed
   4. is this multidc or not
   5. are vnodes enabled or not
   6. what are the jvm tunings
   7. compaction settings
   8. current load on the cluster
   9. streaming settings

Suffice it to say to improve repair performance is a full on tuning
exercise, note you're current operation is going to be worse than
tradtional repair, as your streaming copies of data around and not just
doing normal merkel tree work.

Restoring from backup to a new cluster (including how to handle token
ranges) is discussed in detail here
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html


On Mon, Dec 15, 2014 at 4:14 PM, Pranay Agarwal agarwalpran...@gmail.com
wrote:

 Hi All,


 I have 20 nodes cassandra cluster with 500gb of data and replication
 factor of 1. I increased the replication factor to 3 and ran nodetool
 repair on each node one by one as the docs says. But it takes hours for 1
 node to finish repair. Is that normal or am I doing something wrong?

 Also, I took backup of cassandra data on each node. How do I restore the
 graph in a new cluster of nodes using the backup? Do I have to have the
 tokens range backed up as well?

 -Pranay



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Ryan,
Thanks for the response. It offers a bit more clarity.
I think a series of blog posts with good real world examples would go a long 
way to increasing usability of Cassandra. Right now I find the process like 
going through a mine field because I only discover what is not possible after 
trying something that I would find logical and failing.

For my specific questions, the problem is that since searching is only possible 
on columns in the primary key and the primary key cannot be updated, I am not 
sure what the appropriate solution is when data exists that needs to be 
searched and then updated. What is the preferrable approach to this? Is the 
expectation to maintain a series of tables, one for each stage of data 
manipulation with its own primary key?
Thanks,
Jason
  From: Ryan Svihla rsvi...@datastax.com
 To: user@cassandra.apache.org 
 Sent: Tuesday, December 16, 2014 12:36 PM
 Subject: Re: Comprehensive documentation on Cassandra Data modelling
   
Data Modeling a distributed application could be a book unto itself. However, I 
will add, modeling by restriction is basically the entire thought process in 
Cassandra data modeling since it's a distributed hash table and a core aspect 
of that sort of application is you need to be able to quickly locate which 
server owns the data you want in the cluster (which is provided by the 
partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works. I'm 
not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise you'd 
have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that :

   
   - Querying a single server will be faster than querying many servers
   - Multiple tables with the same data but with different partition keys is 
much easier to scale that a single table that you have to scan the whole 
cluster for your answer. 

If you accept this, you've basically got the key principle down...most other 
ideas are extensions of this, some nuance includes dealing with tombstones, 
partition size and order. and I can answer any more specifics. 

I've been meaning to write a series of blog posts on this, but as I stated, 
it's almost a book unto itself. Data modeling a distributed application 
requires a fundamental rethink of all the assumptions we've been taught for 
master/slave style databases.




On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com wrote:
Hi,
I have been having a few exchanges with contributors to the project around what 
is possible with Cassandra and a common response that comes up when I describe 
functionality as broken or missing is that I am not modelling my data 
correctly. Unfortunately, I cannot seem to find comprehensive documentation on 
modelling with Cassandra. In particular, I am finding myself modelling by 
restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such 
documentation?The DataStax documentation on data modelling is far too weak to 
be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further 
processing; ie I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a 
column is part of the primary key, it cannot be edited so I have a circular 
dependency
Thanks,
Jason



-- 
Ryan SvihlaSolution Architect
 

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


  

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
There is a lot of stuff out there and the best thing you can do today is
watch Patrick McFadden's series. This is  was what I used before I started
at DataStax. Planet Cassandra has a data modeling playlist of videos you
can watch
https://www.youtube.com/playlist?list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA
including the McFadden videos I mentioned.

Finally, you hit a key point, a series of tables is the normal approach to
most data modeling, you model your tables around the queries you need, with
the exception of the nuance I referred to in the last email, this one
concept will get you through 80% of use cases fine.

On Tue, Dec 16, 2014 at 12:01 PM, Jason Kania jason.ka...@ymail.com wrote:

 Ryan,

 Thanks for the response. It offers a bit more clarity.

 I think a series of blog posts with good real world examples would go a
 long way to increasing usability of Cassandra. Right now I find the process
 like going through a mine field because I only discover what is not
 possible after trying something that I would find logical and failing.

 For my specific questions, the problem is that since searching is only
 possible on columns in the primary key and the primary key cannot be
 updated, I am not sure what the appropriate solution is when data exists
 that needs to be searched and then updated. What is the preferrable
 approach to this? Is the expectation to maintain a series of tables, one
 for each stage of data manipulation with its own primary key?

 Thanks,

 Jason

   --
  *From:* Ryan Svihla rsvi...@datastax.com
 *To:* user@cassandra.apache.org
 *Sent:* Tuesday, December 16, 2014 12:36 PM
 *Subject:* Re: Comprehensive documentation on Cassandra Data modelling

 Data Modeling a distributed application could be a book unto itself.
 However, I will add, modeling by restriction is basically the entire
 thought process in Cassandra data modeling since it's a distributed hash
 table and a core aspect of that sort of application is you need to be able
 to quickly locate which server owns the data you want in the cluster (which
 is provided by the partition key).

 in specific response to your questions
 1) as long as you know the primary key and the column name this just
 works. I'm not sure what the problem is
 2) Yes, the partition key tells you which server owns the data, otherwise
 you'd have to scan all servers to find what you're asking for.
 3) I'm not sure I understand this.

 To summarize, all modeling can be understood when you embrace the idea
 that :


1. Querying a single server will be faster than querying many servers
2. Multiple tables with the same data but with different partition
keys is much easier to scale that a single table that you have to scan the
whole cluster for your answer.


 If you accept this, you've basically got the key principle down...most
 other ideas are extensions of this, some nuance includes dealing with
 tombstones, partition size and order. and I can answer any more specifics.

 I've been meaning to write a series of blog posts on this, but as I
 stated, it's almost a book unto itself. Data modeling a distributed
 application requires a fundamental rethink of all the assumptions we've
 been taught for master/slave style databases.




 On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania jason.ka...@ymail.com
 wrote:

 Hi,

 I have been having a few exchanges with contributors to the project around
 what is possible with Cassandra and a common response that comes up when I
 describe functionality as broken or missing is that I am not modelling my
 data correctly. Unfortunately, I cannot seem to find comprehensive
 documentation on modelling with Cassandra. In particular, I am finding
 myself modelling by restriction rather than what I would like to do.

 Does such documentations exist? If not, is there any effort to create such
 documentation?The DataStax documentation on data modelling is far too weak
 to be meaningful.

 In particular, I am caught because:

 1) I want to search on a specific column to make updates to it after
 further processing; ie I don't know its value on first insert
 2) If I want to search on a column, it has to be part of the primary key
 3) If a column is part of the primary key, it cannot be edited so I have a
 circular dependency

 Thanks,

 Jason



 --
 [image: datastax_logo.png] http://www.datastax.com/
 Ryan Svihla
 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.





-- 


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
I was speculating.  From the responses above, it now appears to me that
tombstones serve (at least) 2 distinct roles:

1. When reading within a single cassandra instance, they mark a new version
of a value (that value being deleted).  Without this, the prior version
would be the most recent and so reads would still return the last value
even after it was deleted.

2. They can resolve discrepancies when a client read receives conflicting
answers from Cassandra nodes (e.g. where one of the nodes is out of date
because it never saw the delete command).

So in the above I was only referring to #2, without realizing the role they
play in #1.

- Ian




On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky j...@basetechnology.com
wrote:

   When you say “no need for tombstones”, did you actually read that
 somewhere or were you just speculating? If the former, where exactly?

 -- Jack Krupansky

  *From:* Ian Rose ianr...@fullstory.com
 *Sent:* Tuesday, December 16, 2014 10:22 AM
 *To:* user user@cassandra.apache.org
 *Subject:* does consistency=ALL for deletes obviate the need for
 tombstones?

  Howdy all,

 Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
 know that C* is not well suited to this kind of workload, but that's where
 we are, and before I go looking for an entirely new data layer I would
 rather explore whether C* could be tuned to work well for us.

 However, deletions are never driven by users in our app - deletions always
 occur by backend processes to clean up data after it has been processed,
 and thus they do not need to be 100% available.  So this made me think,
 what if I did the following?

- gc_grace_seconds = 0, which ensures that tombstones are never
created
- replication factor = 3
- for writes that are inserts, consistency = QUORUM, which ensures
that writes can proceed even if 1 replica is slow/down
- for deletes, consistency = ALL, which ensures that when we delete a
record it disappears entirely (no need for tombstones)
- for reads, consistency = QUORUM

 Also, I should clarify that our data essentially append only, so I don't
 need to worry about inconsistencies created by partial updates (e.g. value
 gets changed on one machine but not another).  Sometimes there will be
 duplicate writes, but I think that should be fine since the value is always
 identical.

 Any red flags with this approach?  Has anyone tried it and have
 experiences to share?  Also, I *think* that this means that I don't need to
 run repairs, which from an ops perspective is great.

 Thanks, as always,
 - Ian




100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I have a three node cluster that has been sitting at a load of 4 (for each
node), 100% CPI utilization (although 92% nice) for that last 12 hours,
ever since some significant writes finished. I'm trying to determine what
tuning I should be doing to get it out of this state. The debug log is just
an endless series of:

DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
8000634880

iostat shows virtually no I/O.

Compaction may enter into this, but i don't really know what to make of
compaction stats since they never change:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 10
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction   mediamedia_tracks_raw   271651482
563615497 bytes48.20%
   Compaction   mediamedia_tracks_raw30308910
  21676695677 bytes 0.14%
   Compaction   mediamedia_tracks_raw  1198384080
   1815603161 bytes66.00%
Active compaction remaining time :   0h22m24s

5 minutes later:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 9
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction   mediamedia_tracks_raw   271651482
563615497 bytes48.20%
   Compaction   mediamedia_tracks_raw30308910
  21676695677 bytes 0.14%
   Compaction   mediamedia_tracks_raw  1198384080
   1815603161 bytes66.00%
Active compaction remaining time :   0h22m24s

Sure the pending tasks went down by one, but the rest is identical.
media_tracks_raw likely has a bunch of tombstones (can't figure out how to
get stats on that).

Is this behavior something that indicates that i need more Heap, larger new
generation? Should I be manually running compaction on tables with lots of
tombstones?

Any suggestions or places to educate myself better on performance tuning
would be appreciated.

arne


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jonathan Lacefield
Hello,

  What version of Cassandra are you running?

  If it's 2.0, we recently experienced something similar with 8447 [1],
which 8485 [2] should hopefully resolve.

  Please note that 8447 is not related to tombstones.  Tombstone processing
can put a lot of pressure on the heap as well. Why do you think you have a
lot of tombstones in that one particular table?

  [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
  [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

[image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax https://github.com/datastax/

On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote:

 I have a three node cluster that has been sitting at a load of 4 (for each
 node), 100% CPI utilization (although 92% nice) for that last 12 hours,
 ever since some significant writes finished. I'm trying to determine what
 tuning I should be doing to get it out of this state. The debug log is just
 an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.
 media_tracks_raw likely has a bunch of tombstones (can't figure out how to
 get stats on that).

 Is this behavior something that indicates that i need more Heap, larger
 new generation? Should I be manually running compaction on tables with lots
 of tombstones?

 Any suggestions or places to educate myself better on performance tuning
 would be appreciated.

 arne



Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's heap usage at?

On Tue, Dec 16, 2014 at 1:04 PM, Arne Claassen a...@emotient.com wrote:

 I have a three node cluster that has been sitting at a load of 4 (for each
 node), 100% CPI utilization (although 92% nice) for that last 12 hours,
 ever since some significant writes finished. I'm trying to determine what
 tuning I should be doing to get it out of this state. The debug log is just
 an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.
 media_tracks_raw likely has a bunch of tombstones (can't figure out how to
 get stats on that).

 Is this behavior something that indicates that i need more Heap, larger
 new generation? Should I be manually running compaction on tables with lots
 of tombstones?

 Any suggestions or places to educate myself better on performance tuning
 would be appreciated.

 arne



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I'm running 2.0.10.

The data is all time series data and as we change our pipeline, we've been
periodically been reprocessing the data sources, which causes each time
series to be overwritten, i.e. every row per partition key is deleted and
re-written, so I assume i've been collecting a bunch of tombstones.

Also, the presence of the ever present and never completing compaction
types, i assumed were an artifact of tombstoning, but i fully admit to
conjecture based on about ~20 blog posts and stackoverflow questions i've
surveyed.

I doubled the Heap on one node and it changed nothing regarding the load or
the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%.

Anything else i should look at and report, let me know.

On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447 [1],
 which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to determine
 what tuning I should be doing to get it out of this state. The debug log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   271651482
   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw30308910
 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw  1198384080
  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.
 media_tracks_raw likely has a bunch of tombstones (can't figure out how to
 get stats on that).

 Is this behavior something that indicates that i need more Heap, larger
 new generation? Should I be manually running compaction on tables with lots
 of tombstones?

 Any suggestions or places to educate myself better on performance tuning
 would be appreciated.

 arne




Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's CPU, RAM, Storage layer, and data density per node? Exact heap
settings would be nice. In the logs look for TombstoneOverflowingException


On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've been
 periodically been reprocessing the data sources, which causes each time
 series to be overwritten, i.e. every row per partition key is deleted and
 re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the load
 or the ParNew log statements. New Generation Usage is 50%, Eden itself is
 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447 [1],
 which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to determine
 what tuning I should be doing to get it out of this state. The debug log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.
 media_tracks_raw likely has a bunch of tombstones (can't figure out how to
 get stats on that).

 Is this behavior something that indicates that i need more Heap, larger
 new generation? Should I be manually running compaction on tables with lots
 of tombstones?

 Any suggestions or places to educate myself better on performance tuning
 would be appreciated.

 arne



-- 

[image: datastax_logo.png] http://www.datastax.com/

Ryan Svihla

Solution Architect

[image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
http://www.linkedin.com/pub/ryan-svihla/12/621/727/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
might go c3.2xlarge instead if CPU is more important than RAM
Storage is optimized EBS SSD (but iostat shows no real IO going on)
Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

The node on which I set the Heap to 10GB from 6GB the utlilization has
dropped to 46%nice now, but the ParNew log messages still continue at the
same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
nice CPU further down.

No TombstoneOverflowingExceptions.

On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the load
 or the ParNew log statements. New Generation Usage is 50%, Eden itself is
 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447 [1],
 which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to determine
 what tuning I should be doing to get it out of this state. The debug log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.
 media_tracks_raw likely has a bunch of tombstones (can't figure out how to
 get stats on that).

 Is this behavior something that indicates that i need more Heap, larger
 new 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
The others are 6GB

On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the load
 or the ParNew log statements. New Generation Usage is 50%, Eden itself is
 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447 [1],
 which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug log 
 is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
 (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
 (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
 (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used;
 max is 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make
 of compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 Sure the pending tasks went down by one, but the rest is identical.

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
Checked my dev cluster to see if the ParNew log entries are just par for
the course, but not seeing them there. However, both have the following
every 30 seconds:

DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line
165) Started replayAllFailedBatches
DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
ColumnFamilyStore.java (line 866) forceFlush requested but everything is
clean in batchlog
DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line
200) Finished replayAllFailedBatches

Is that just routine scheduled house-keeping or a sign of something else?

On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
 The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug log 
 is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
 (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
 (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
 (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used;
 max is 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make
 of compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction  

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So heap of that size without some tuning will create a number of problems
(high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
(which I'd only set that low for that low cpu count) , or attempt the
tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150

On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
 Checked my dev cluster to see if the ParNew log entries are just par for
 the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line
 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line
 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
 The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden 
 itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you 
 think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image:
 twitter.png] https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that last 
 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug 
 log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
 (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; 
 max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
 (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; 
 max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
 (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568
 used; max is 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make
 of compaction stats since they never change:

 [root@cassandra-37919c3a ~]# 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
also based on replayed batches..are you using batches to load data?

On Tue, Dec 16, 2014 at 3:12 PM, Ryan Svihla rsvi...@datastax.com wrote:

 So heap of that size without some tuning will create a number of problems
 (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
 (which I'd only set that low for that low cpu count) , or attempt the
 tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
 Checked my dev cluster to see if the ParNew log entries are just par for
 the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
 The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is 
 deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing
 compaction types, i assumed were an artifact of tombstoning, but i fully
 admit to conjecture based on about ~20 blog posts and stackoverflow
 questions i've surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden 
 itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you 
 think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image:
 twitter.png] https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax
 https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that 
 last 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug 
 log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
 (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; 
 max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
 (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; 
 max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
 (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568
 used; max is 8000634880

 iostat shows virtually no 

Best Time Series insert strategy

2014-12-16 Thread Arne Claassen
I have a time series table consisting of frame information for media. The
table is partitioned on the media ID and uses time and some other frame
level keys as cluster keys, i.e. all frames for a one piece of media is
really one column family row, even though it is represented in CQL as a
ordered series of frame data. The size of these sets vary from 5k to 200k
rows per media and are always inserted at one time and available in
memory in ordered form. I'm currently fanning the inserts out via async
calls, using a queue to fix the max parallelism (set to 100 right now).

For some of the larger sets (50k and above) I sometimes get the following
exception:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency ONE (1 replica were required but
only 0 acknowledged the write)
at
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.Responses$Error.asException(Responses.java:93)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:237)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:402)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]


I've tried to reduce the max parallelism and increasing the timeout
threshold, but once the cluster gets humming from a bunch of inserts even
going as low as 10 in parallel doesn't seem to completely avoid those
exceptions from occurring.

I realize that fanning out just means that previously ordered data is not
arriving at random nodes in random order and has to get to the partition
key owning nodes and be re-ordered as they arrive, which seems less like
the wrong way to do it. However the parallelism approach does increase
insert speed almost linearly except for those timeouts.

I'm wondering what the best approach would be. The scenarios I can think of
are:

1) Retry and back off on Timeout Exceptions, but keep the fan out approach.

Seems like a good approach unless the Timeout really is just a warning that
I'm overloading things

2) Switch to BATCH inserts

Would this be better, since the data would go to only a single node and be
inserted in ordered form? And would this even alleviate timeouts since now
giant batches need to be acknowledged by the replicas.

3) Go to consistency ANY.

The docs seem to imply that TimeoutException isn't really a failure, just a
heads up. I don't really care about waiting for all replicas to be up to
date on these inserts anyhow, but is it really safe or am i looking at
replica's drifting out of sync.

4) Figure out how to tune my cluster better and change nothing on the client

thanks,
arne


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
The starting configuration I had, which is still running on two of the
nodes, was 6GB Heap, 1024MB parnew which is close to what you are
suggesting and those have been pegged at load 4 for the over 12 hours with
hardly and read or write traffic. I will set one to 8GB/400MB and see if
its load changes.

On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote:

 So heap of that size without some tuning will create a number of problems
 (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
 (which I'd only set that low for that low cpu count) , or attempt the
 tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
 Checked my dev cluster to see if the ParNew log entries are just par for
 the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
 The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is 
 deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing
 compaction types, i assumed were an artifact of tombstoning, but i fully
 admit to conjecture based on about ~20 blog posts and stackoverflow
 questions i've surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden 
 itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you 
 think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image:
 twitter.png] https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax
 https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that 
 last 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug 
 log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
 (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; 
 max
 is 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
 (line 118) GC for ParNew: 165 ms for 10 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough
to run Cassandra well in, especially if you're going full bore on loads.
However, you maybe just flat out be CPU bound on your write throughput, how
many TPS and what size writes do you have? Also what is your widest row?

Final question what is compaction throughput at?


On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com wrote:

 So heap of that size without some tuning will create a number of problems
 (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
 (which I'd only set that low for that low cpu count) , or attempt the
 tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
 Checked my dev cluster to see if the ParNew log entries are just par for
 the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings 
 that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for 
 TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline,
 we've been periodically been reprocessing the data sources, which causes
 each time series to be overwritten, i.e. every row per partition key is
 deleted and re-written, so I assume i've been collecting a bunch of
 tombstones.

 Also, the presence of the ever present and never completing
 compaction types, i assumed were an artifact of tombstoning, but i fully
 admit to conjecture based on about ~20 blog posts and stackoverflow
 questions i've surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden 
 itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you 
 think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
 facebook.png] https://www.facebook.com/datastax [image:
 twitter.png] https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax
 https://github.com/datastax/

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen a...@emotient.com
 wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that 
 last 12
 hours, ever since some 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Actually not sure why the machine was originally configured at 6GB since we
even started it on an r3.large with 15GB.

Re: Batches

Not using batches. I actually have that as a separate question on the list.
Currently I fan out async single inserts and I'm wondering if batches are
better since my data is inherently inserted in blocks of ordered rows for a
single partition key.


Re: Traffic

There isn't all that much traffic. Inserts come in as blocks per partition
key, but then can be 5k-200k rows for that partition key. Each of these
rows is less than 100k. It's small, lots of ordered rows. It's frame and
sub-frame information for media. and rows for one piece of media is
inserted at once (the partition key).

For the last 12 hours, where the load on all these machine has been stuck
there's been virtually no traffic at all. This is the nodes basically
sitting idle, except that they had  load of 4 each.

BTW, how do you determine widest row or for that matter number of
tombstones in a row?

thanks,
arne

On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote:

 So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
 enough to run Cassandra well in, especially if you're going full bore on
 loads. However, you maybe just flat out be CPU bound on your write
 throughput, how many TPS and what size writes do you have? Also what is
 your widest row?

 Final question what is compaction throughput at?


 On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So heap of that size without some tuning will create a number of
 problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
 parnew (which I'd only set that low for that low cpu count) , or attempt
 the tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com
 wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
 now. Checked my dev cluster to see if the ParNew log entries are just par
 for the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something
 else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
 we might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization
 has dropped to 46%nice now, but the ParNew log messages still continue at
 the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that 
 brings
 that nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact
 heap settings would be nice. In the logs look for
 TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline,
 we've been periodically been reprocessing the data sources, which 
 causes
 each time series to be overwritten, i.e. every row per partition key is
 deleted and re-written, so I assume i've been collecting a bunch of
 tombstones.

 Also, the presence of the ever present and never completing
 compaction types, i assumed were an artifact of tombstoning, but i 
 fully
 admit to conjecture based on about ~20 blog posts and stackoverflow
 questions i've surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden 
 itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:

 Hello,

   What version of Cassandra are you 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Can you define what is virtual no traffic sorry to be repetitive about
that, but I've worked on a lot of clusters in the past year and people have
wildly different ideas what that means.

unlogged batches of the same partition key are definitely a performance
optimization. Typically async is much faster and easier on the cluster when
you're using multip partition key batches.

nodetool cfhistograms keyspace tablename

On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote:

 Actually not sure why the machine was originally configured at 6GB since
 we even started it on an r3.large with 15GB.

 Re: Batches

 Not using batches. I actually have that as a separate question on the
 list. Currently I fan out async single inserts and I'm wondering if batches
 are better since my data is inherently inserted in blocks of ordered rows
 for a single partition key.


 Re: Traffic

 There isn't all that much traffic. Inserts come in as blocks per partition
 key, but then can be 5k-200k rows for that partition key. Each of these
 rows is less than 100k. It's small, lots of ordered rows. It's frame and
 sub-frame information for media. and rows for one piece of media is
 inserted at once (the partition key).

 For the last 12 hours, where the load on all these machine has been stuck
 there's been virtually no traffic at all. This is the nodes basically
 sitting idle, except that they had  load of 4 each.

 BTW, how do you determine widest row or for that matter number of
 tombstones in a row?

 thanks,
 arne

 On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com wrote:

 So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
 enough to run Cassandra well in, especially if you're going full bore on
 loads. However, you maybe just flat out be CPU bound on your write
 throughput, how many TPS and what size writes do you have? Also what is
 your widest row?

 Final question what is compaction throughput at?


 On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So heap of that size without some tuning will create a number of
 problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
 parnew (which I'd only set that low for that low cpu count) , or attempt
 the tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com
 wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
 now. Checked my dev cluster to see if the ParNew log entries are just par
 for the course, but not seeing them there. However, both have the 
 following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something
 else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
 we might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization
 has dropped to 46%nice now, but the ParNew log messages still continue 
 at
 the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that 
 brings
 that nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact
 heap settings would be nice. In the logs look for
 TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com
 wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline,
 we've been periodically been reprocessing the data sources, which 
 causes
 each time series to be overwritten, i.e. every row per partition key 
 is
 deleted and re-written, so I assume i've been collecting a bunch of
 tombstones.

 Also, the presence of the ever present and never completing
 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
No problem with the follow up questions. I'm on a crash course here trying
to understand what makes C* tick so I appreciate all feedback.

We reprocessed all media (1200 partition keys) last night where partition
keys had somewhere between 4k and 200k rows. After that completed, no
traffic went to cluster at all for ~8 hours and throughout today, we may
get a couple (less than 10) queries per second and maybe 3-4 write batches
per hour.

I assume the last value in the Partition Size histogram is the largest row:

20924300 bytes: 79
25109160 bytes: 57

The majority seems clustered around 20 bytes.

I will look at switching my inserts to unlogged batches since they are
always for one partition key.

On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla rsvi...@datastax.com wrote:

 Can you define what is virtual no traffic sorry to be repetitive about
 that, but I've worked on a lot of clusters in the past year and people have
 wildly different ideas what that means.

 unlogged batches of the same partition key are definitely a performance
 optimization. Typically async is much faster and easier on the cluster when
 you're using multip partition key batches.

 nodetool cfhistograms keyspace tablename

 On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote:

 Actually not sure why the machine was originally configured at 6GB since
 we even started it on an r3.large with 15GB.

 Re: Batches

 Not using batches. I actually have that as a separate question on the
 list. Currently I fan out async single inserts and I'm wondering if batches
 are better since my data is inherently inserted in blocks of ordered rows
 for a single partition key.


 Re: Traffic

 There isn't all that much traffic. Inserts come in as blocks per
 partition key, but then can be 5k-200k rows for that partition key. Each of
 these rows is less than 100k. It's small, lots of ordered rows. It's frame
 and sub-frame information for media. and rows for one piece of media is
 inserted at once (the partition key).

 For the last 12 hours, where the load on all these machine has been stuck
 there's been virtually no traffic at all. This is the nodes basically
 sitting idle, except that they had  load of 4 each.

 BTW, how do you determine widest row or for that matter number of
 tombstones in a row?

 thanks,
 arne

 On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
 enough to run Cassandra well in, especially if you're going full bore on
 loads. However, you maybe just flat out be CPU bound on your write
 throughput, how many TPS and what size writes do you have? Also what is
 your widest row?

 Final question what is compaction throughput at?


 On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com
 wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So heap of that size without some tuning will create a number of
 problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
 parnew (which I'd only set that low for that low cpu count) , or attempt
 the tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com
 wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
 now. Checked my dev cluster to see if the ParNew log entries are just par
 for the course, but not seeing them there. However, both have the 
 following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something
 else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
 we might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization
 has dropped to 46%nice now, but the ParNew log messages still continue 
 at
 the same pace. I'm gonna up the HEAP to 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Ok based on those numbers I have a theory..

can you show me nodetool tptats for all 3 nodes?

On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen a...@emotient.com wrote:

 No problem with the follow up questions. I'm on a crash course here trying
 to understand what makes C* tick so I appreciate all feedback.

 We reprocessed all media (1200 partition keys) last night where partition
 keys had somewhere between 4k and 200k rows. After that completed, no
 traffic went to cluster at all for ~8 hours and throughout today, we may
 get a couple (less than 10) queries per second and maybe 3-4 write batches
 per hour.

 I assume the last value in the Partition Size histogram is the largest row:

 20924300 bytes: 79
 25109160 bytes: 57

 The majority seems clustered around 20 bytes.

 I will look at switching my inserts to unlogged batches since they are
 always for one partition key.

 On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla rsvi...@datastax.com wrote:

 Can you define what is virtual no traffic sorry to be repetitive about
 that, but I've worked on a lot of clusters in the past year and people have
 wildly different ideas what that means.

 unlogged batches of the same partition key are definitely a performance
 optimization. Typically async is much faster and easier on the cluster when
 you're using multip partition key batches.

 nodetool cfhistograms keyspace tablename

 On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen a...@emotient.com wrote:

 Actually not sure why the machine was originally configured at 6GB since
 we even started it on an r3.large with 15GB.

 Re: Batches

 Not using batches. I actually have that as a separate question on the
 list. Currently I fan out async single inserts and I'm wondering if batches
 are better since my data is inherently inserted in blocks of ordered rows
 for a single partition key.


 Re: Traffic

 There isn't all that much traffic. Inserts come in as blocks per
 partition key, but then can be 5k-200k rows for that partition key. Each of
 these rows is less than 100k. It's small, lots of ordered rows. It's frame
 and sub-frame information for media. and rows for one piece of media is
 inserted at once (the partition key).

 For the last 12 hours, where the load on all these machine has been
 stuck there's been virtually no traffic at all. This is the nodes basically
 sitting idle, except that they had  load of 4 each.

 BTW, how do you determine widest row or for that matter number of
 tombstones in a row?

 thanks,
 arne

 On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
 enough to run Cassandra well in, especially if you're going full bore on
 loads. However, you maybe just flat out be CPU bound on your write
 throughput, how many TPS and what size writes do you have? Also what is
 your widest row?

 Final question what is compaction throughput at?


 On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen a...@emotient.com
 wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 So heap of that size without some tuning will create a number of
 problems (high cpu usage one of them), I suggest either 8GB heap and 
 400mb
 parnew (which I'd only set that low for that low cpu count) , or attempt
 the tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen a...@emotient.com
 wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
 now. Checked my dev cluster to see if the ParNew log entries are just 
 par
 for the course, but not seeing them there. However, both have the 
 following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something
 else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen a...@emotient.com
 wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com
 wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB
 because we might go c3.2xlarge instead if CPU is more important than 
 RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7%  68.3%.

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Of course QA decided to start a test batch (still relatively low traffic),
so I hope it doesn't throw the tpstats off too much

Node 1:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0   13804928 0
0
ReadStage 0 0  10975 0
0
RequestResponseStage  0 07725378 0
0
ReadRepairStage   0 0   1247 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 1 1 50 0
0
FlushWriter   0 0306 0
   31
MemoryMeter   0 0719 0
0
GossipStage   0 0 286505 0
0
CacheCleanupExecutor  0 0  0 0
0
InternalResponseStage 0 0  0 0
0
CompactionExecutor414159 0
0
ValidationExecutor0 0  0 0
0
MigrationStage0 0  0 0
0
commitlog_archiver0 0  0 0
0
AntiEntropyStage  0 0  0 0
0
PendingRangeCalculator0 0 11 0
0
MemtablePostFlusher   0 0   1781 0
0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION391041
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Node 2:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0 997042 0
0
ReadStage 0 0   2623 0
0
RequestResponseStage  0 0 706650 0
0
ReadRepairStage   0 0275 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 2 2 12 0
0
FlushWriter   0 0 37 0
4
MemoryMeter   0 0 70 0
0
GossipStage   0 0  14927 0
0
CacheCleanupExecutor  0 0  0 0
0
InternalResponseStage 0 0  0 0
0
CompactionExecutor4 7 94 0
0
ValidationExecutor0 0  0 0
0
MigrationStage0 0  0 0
0
commitlog_archiver0 0  0 0
0
AntiEntropyStage  0 0  0 0
0
PendingRangeCalculator0 0  3 0
0
MemtablePostFlusher   0 0114 0
0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Node 3:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 01539324 0
0
ReadStage 0 0   2571 0
0
RequestResponseStage  0 0 373300 0
0
ReadRepairStage   0 0325 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 1 1 21 0
0
FlushWriter   0 0 38 0
5
MemoryMeter   0 0 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so you've got some blocked flush writers but you have a incredibly large
number of dropped mutations, are you using secondary indexes? and if so how
many? what is your flush queue set to?

On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote:

 Of course QA decided to start a test batch (still relatively low traffic),
 so I hope it doesn't throw the tpstats off too much

 Node 1:
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 0 0   13804928 0
 0
 ReadStage 0 0  10975 0
 0
 RequestResponseStage  0 07725378 0
 0
 ReadRepairStage   0 0   1247 0
 0
 ReplicateOnWriteStage 0 0  0 0
 0
 MiscStage 0 0  0 0
 0
 HintedHandoff 1 1 50 0
 0
 FlushWriter   0 0306 0
31
 MemoryMeter   0 0719 0
 0
 GossipStage   0 0 286505 0
 0
 CacheCleanupExecutor  0 0  0 0
 0
 InternalResponseStage 0 0  0 0
 0
 CompactionExecutor414159 0
 0
 ValidationExecutor0 0  0 0
 0
 MigrationStage0 0  0 0
 0
 commitlog_archiver0 0  0 0
 0
 AntiEntropyStage  0 0  0 0
 0
 PendingRangeCalculator0 0 11 0
 0
 MemtablePostFlusher   0 0   1781 0
 0

 Message type   Dropped
 READ 0
 RANGE_SLICE  0
 _TRACE   0
 MUTATION391041
 COUNTER_MUTATION 0
 BINARY   0
 REQUEST_RESPONSE 0
 PAGED_RANGE  0
 READ_REPAIR  0

 Node 2:
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 0 0 997042 0
 0
 ReadStage 0 0   2623 0
 0
 RequestResponseStage  0 0 706650 0
 0
 ReadRepairStage   0 0275 0
 0
 ReplicateOnWriteStage 0 0  0 0
 0
 MiscStage 0 0  0 0
 0
 HintedHandoff 2 2 12 0
 0
 FlushWriter   0 0 37 0
 4
 MemoryMeter   0 0 70 0
 0
 GossipStage   0 0  14927 0
 0
 CacheCleanupExecutor  0 0  0 0
 0
 InternalResponseStage 0 0  0 0
 0
 CompactionExecutor4 7 94 0
 0
 ValidationExecutor0 0  0 0
 0
 MigrationStage0 0  0 0
 0
 commitlog_archiver0 0  0 0
 0
 AntiEntropyStage  0 0  0 0
 0
 PendingRangeCalculator0 0  3 0
 0
 MemtablePostFlusher   0 0114 0
 0

 Message type   Dropped
 READ 0
 RANGE_SLICE  0
 _TRACE   0
 MUTATION 0
 COUNTER_MUTATION 0
 BINARY   0
 REQUEST_RESPONSE 0
 PAGED_RANGE  0
 READ_REPAIR  0

 Node 3:
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 0 01539324 0
 0
 ReadStage 0 0   2571 0
 0
 RequestResponseStage  0 0 373300 0
 0
 ReadRepairStage   0 0325 0
 0
 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Not using any secondary indicies and memtable_flush_queue_size is the default 4.

But let me tell you how data is mutated right now, maybe that will give you 
an insight on how this is happening

Basically the frame data table has the following primary key: PRIMARY KEY 
((id), trackid, timestamp)

Generally data is inserted once. So day to day writes are all new rows.
However, when out process for generating analytics for these rows changes, we 
run the media back through again, causing overwrites.

Up until last night, this was just a new insert because the PK never changed so 
it was always 1-to-1 overwrite of every row.

Last night was the first time that a new change went in where the PK could 
actually change so now the process is always, DELETE by partition key, insert 
all rows for partition key, repeat.

We two tables that have similar frame data projections and some other 
aggregates with much smaller row count per partition key.

hope that helps,
arne

On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote:

 so you've got some blocked flush writers but you have a incredibly large 
 number of dropped mutations, are you using secondary indexes? and if so how 
 many? what is your flush queue set to?
 
 On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote:
 Of course QA decided to start a test batch (still relatively low traffic), so 
 I hope it doesn't throw the tpstats off too much
 
 Node 1:
 Pool NameActive   Pending  Completed   Blocked  All 
 time blocked
 MutationStage 0 0   13804928 0
  0
 ReadStage 0 0  10975 0
  0
 RequestResponseStage  0 07725378 0
  0
 ReadRepairStage   0 0   1247 0
  0
 ReplicateOnWriteStage 0 0  0 0
  0
 MiscStage 0 0  0 0
  0
 HintedHandoff 1 1 50 0
  0
 FlushWriter   0 0306 0
 31
 MemoryMeter   0 0719 0
  0
 GossipStage   0 0 286505 0
  0
 CacheCleanupExecutor  0 0  0 0
  0
 InternalResponseStage 0 0  0 0
  0
 CompactionExecutor414159 0
  0
 ValidationExecutor0 0  0 0
  0
 MigrationStage0 0  0 0
  0
 commitlog_archiver0 0  0 0
  0
 AntiEntropyStage  0 0  0 0
  0
 PendingRangeCalculator0 0 11 0
  0
 MemtablePostFlusher   0 0   1781 0
  0
 
 Message type   Dropped
 READ 0
 RANGE_SLICE  0
 _TRACE   0
 MUTATION391041
 COUNTER_MUTATION 0
 BINARY   0
 REQUEST_RESPONSE 0
 PAGED_RANGE  0
 READ_REPAIR  0
 
 Node 2:
 Pool NameActive   Pending  Completed   Blocked  All 
 time blocked
 MutationStage 0 0 997042 0
  0
 ReadStage 0 0   2623 0
  0
 RequestResponseStage  0 0 706650 0
  0
 ReadRepairStage   0 0275 0
  0
 ReplicateOnWriteStage 0 0  0 0
  0
 MiscStage 0 0  0 0
  0
 HintedHandoff 2 2 12 0
  0
 FlushWriter   0 0 37 0
  4
 MemoryMeter   0 0 70 0
  0
 GossipStage   0 0  14927 0
  0
 CacheCleanupExecutor  0 0  0 0
  0
 InternalResponseStage 0 0  0 0
  0
 CompactionExecutor4 7 94 0
  0
 ValidationExecutor0 0  0 0
  

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so a delete is really another write for gc_grace_seconds (default 10 days),
if you get enough tombstones it can make managing your cluster a challenge
as is. open up cqlsh, turn on tracing and try a few queries..how many
tombstones are scanned for a given query? It's possible the heap problems
you're seeing are actually happening on the query side and not on the
ingest side, the severity of this depends on driver and cassandra version,
but older drivers and versions of cassandra could easily overload heap with
expensive selects, when layered over tombstones it's certainly becomes a
possibility this is your root cause.

Now this will primarily create more load on compaction and depending on
your cassandra version there maybe some other issue at work, but something
I can tell you is every time I see 1 dropped mutation I see a cluster that
was overloaded enough it had to shed load. If I see 200k I see a
cluster/configuration/hardware that is badly overloaded.

I suggest the following

   - trace some of the queries used in prod
   - monitor your ingest rate, see at what levels you run into issues
   (GCInspector log messages, dropped mutations, etc)
   - heap configuration we mentioned earlier..go ahead and monitor heap
   usage, if it hits 75% repeated this is an indication of heavy load
   - monitor dropped mutations..any dropped mutation is evidence of an
   overloaded server, again the root cause can be many other problems that are
   solvable with current hardware, and LOTS of people runs with nodes with
   similar configuration.


On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote:

 Not using any secondary indicies and memtable_flush_queue_size is the
 default 4.

 But let me tell you how data is mutated right now, maybe that will give
 you an insight on how this is happening

 Basically the frame data table has the following primary key: PRIMARY KEY
 ((id), trackid, timestamp)

 Generally data is inserted once. So day to day writes are all new rows.
 However, when out process for generating analytics for these rows changes,
 we run the media back through again, causing overwrites.

 Up until last night, this was just a new insert because the PK never
 changed so it was always 1-to-1 overwrite of every row.

 Last night was the first time that a new change went in where the PK could
 actually change so now the process is always, DELETE by partition key,
 insert all rows for partition key, repeat.

 We two tables that have similar frame data projections and some other
 aggregates with much smaller row count per partition key.

 hope that helps,
 arne

 On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote:

 so you've got some blocked flush writers but you have a incredibly large
 number of dropped mutations, are you using secondary indexes? and if so how
 many? what is your flush queue set to?

 On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote:

 Of course QA decided to start a test batch (still relatively low
 traffic), so I hope it doesn't throw the tpstats off too much

 Node 1:
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 0 0   13804928 0
   0
 ReadStage 0 0  10975 0
   0
 RequestResponseStage  0 07725378 0
   0
 ReadRepairStage   0 0   1247 0
   0
 ReplicateOnWriteStage 0 0  0 0
   0
 MiscStage 0 0  0 0
   0
 HintedHandoff 1 1 50 0
   0
 FlushWriter   0 0306 0
  31
 MemoryMeter   0 0719 0
   0
 GossipStage   0 0 286505 0
   0
 CacheCleanupExecutor  0 0  0 0
   0
 InternalResponseStage 0 0  0 0
   0
 CompactionExecutor414159 0
   0
 ValidationExecutor0 0  0 0
   0
 MigrationStage0 0  0 0
   0
 commitlog_archiver0 0  0 0
   0
 AntiEntropyStage  0 0  0 0
   0
 PendingRangeCalculator0 0 11 0
   0
 MemtablePostFlusher   0 0   1781 0
   0

 Message type   Dropped
 READ 0
 RANGE_SLICE  0
 _TRACE

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I just did a wide set of selects and ran across no tombstones. But while on the 
subject of gc_grace_seconds, any reason, on a small cluster not to set it to 
something low like a single day. It seems like 10 days is only need to large 
clusters undergoing long partition splits, or am i misunderstanding 
gc_grace_seconds.

Now, given all that, does any of this explain a high load when the cluster is 
idle? Is it compaction catching up and would manual forced compaction alleviate 
that?

thanks,
arne

On Dec 16, 2014, at 3:28 PM, Ryan Svihla rsvi...@datastax.com wrote:

 so a delete is really another write for gc_grace_seconds (default 10 days), 
 if you get enough tombstones it can make managing your cluster a challenge as 
 is. open up cqlsh, turn on tracing and try a few queries..how many tombstones 
 are scanned for a given query? It's possible the heap problems you're seeing 
 are actually happening on the query side and not on the ingest side, the 
 severity of this depends on driver and cassandra version, but older drivers 
 and versions of cassandra could easily overload heap with expensive selects, 
 when layered over tombstones it's certainly becomes a possibility this is 
 your root cause.
 
 Now this will primarily create more load on compaction and depending on your 
 cassandra version there maybe some other issue at work, but something I can 
 tell you is every time I see 1 dropped mutation I see a cluster that was 
 overloaded enough it had to shed load. If I see 200k I see a 
 cluster/configuration/hardware that is badly overloaded.
 
 I suggest the following
 trace some of the queries used in prod
 monitor your ingest rate, see at what levels you run into issues (GCInspector 
 log messages, dropped mutations, etc)
 heap configuration we mentioned earlier..go ahead and monitor heap usage, if 
 it hits 75% repeated this is an indication of heavy load
 monitor dropped mutations..any dropped mutation is evidence of an overloaded 
 server, again the root cause can be many other problems that are solvable 
 with current hardware, and LOTS of people runs with nodes with similar 
 configuration.
 
 On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote:
 Not using any secondary indicies and memtable_flush_queue_size is the default 
 4.
 
 But let me tell you how data is mutated right now, maybe that will give you 
 an insight on how this is happening
 
 Basically the frame data table has the following primary key: PRIMARY KEY 
 ((id), trackid, timestamp)
 
 Generally data is inserted once. So day to day writes are all new rows.
 However, when out process for generating analytics for these rows changes, we 
 run the media back through again, causing overwrites.
 
 Up until last night, this was just a new insert because the PK never changed 
 so it was always 1-to-1 overwrite of every row.
 
 Last night was the first time that a new change went in where the PK could 
 actually change so now the process is always, DELETE by partition key, insert 
 all rows for partition key, repeat.
 
 We two tables that have similar frame data projections and some other 
 aggregates with much smaller row count per partition key.
 
 hope that helps,
 arne
 
 On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote:
 
 so you've got some blocked flush writers but you have a incredibly large 
 number of dropped mutations, are you using secondary indexes? and if so how 
 many? what is your flush queue set to?
 
 On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote:
 Of course QA decided to start a test batch (still relatively low traffic), 
 so I hope it doesn't throw the tpstats off too much
 
 Node 1:
 Pool NameActive   Pending  Completed   Blocked  All 
 time blocked
 MutationStage 0 0   13804928 0   
   0
 ReadStage 0 0  10975 0   
   0
 RequestResponseStage  0 07725378 0   
   0
 ReadRepairStage   0 0   1247 0   
   0
 ReplicateOnWriteStage 0 0  0 0   
   0
 MiscStage 0 0  0 0   
   0
 HintedHandoff 1 1 50 0   
   0
 FlushWriter   0 0306 0   
  31
 MemoryMeter   0 0719 0   
   0
 GossipStage   0 0 286505 0   
   0
 CacheCleanupExecutor  0 0  0 0   
   0
 InternalResponseStage 0 0  0 0   
   0
 CompactionExecutor414159 0   
   0
 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
manual forced compactions create more problems than they solve, if you have
no evidence of tombstones in your selects (which seems odd, can you share
some of the tracing output?), then I'm not sure what it would solve for you.

Compaction running could explain a high load, logs messages with ERRORS,
WARN, GCInspector are all meaningful there, I suggest search jira for your
version to see if there are any interesting bugs.



On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen a...@emotient.com wrote:

 I just did a wide set of selects and ran across no tombstones. But while
 on the subject of gc_grace_seconds, any reason, on a small cluster not to
 set it to something low like a single day. It seems like 10 days is only
 need to large clusters undergoing long partition splits, or am i
 misunderstanding gc_grace_seconds.

 Now, given all that, does any of this explain a high load when the cluster
 is idle? Is it compaction catching up and would manual forced compaction
 alleviate that?

 thanks,
 arne


 On Dec 16, 2014, at 3:28 PM, Ryan Svihla rsvi...@datastax.com wrote:

 so a delete is really another write for gc_grace_seconds (default 10
 days), if you get enough tombstones it can make managing your cluster a
 challenge as is. open up cqlsh, turn on tracing and try a few queries..how
 many tombstones are scanned for a given query? It's possible the heap
 problems you're seeing are actually happening on the query side and not on
 the ingest side, the severity of this depends on driver and cassandra
 version, but older drivers and versions of cassandra could easily overload
 heap with expensive selects, when layered over tombstones it's certainly
 becomes a possibility this is your root cause.

 Now this will primarily create more load on compaction and depending on
 your cassandra version there maybe some other issue at work, but something
 I can tell you is every time I see 1 dropped mutation I see a cluster that
 was overloaded enough it had to shed load. If I see 200k I see a
 cluster/configuration/hardware that is badly overloaded.

 I suggest the following

- trace some of the queries used in prod
- monitor your ingest rate, see at what levels you run into issues
(GCInspector log messages, dropped mutations, etc)
- heap configuration we mentioned earlier..go ahead and monitor heap
usage, if it hits 75% repeated this is an indication of heavy load
- monitor dropped mutations..any dropped mutation is evidence of an
overloaded server, again the root cause can be many other problems that are
solvable with current hardware, and LOTS of people runs with nodes with
similar configuration.


 On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen a...@emotient.com wrote:

 Not using any secondary indicies and memtable_flush_queue_size is the
 default 4.

 But let me tell you how data is mutated right now, maybe that will give
 you an insight on how this is happening

 Basically the frame data table has the following primary key: PRIMARY KEY
 ((id), trackid, timestamp)

 Generally data is inserted once. So day to day writes are all new rows.
 However, when out process for generating analytics for these rows
 changes, we run the media back through again, causing overwrites.

 Up until last night, this was just a new insert because the PK never
 changed so it was always 1-to-1 overwrite of every row.

 Last night was the first time that a new change went in where the PK
 could actually change so now the process is always, DELETE by partition
 key, insert all rows for partition key, repeat.

 We two tables that have similar frame data projections and some other
 aggregates with much smaller row count per partition key.

 hope that helps,
 arne

 On Dec 16, 2014, at 2:46 PM, Ryan Svihla rsvi...@datastax.com wrote:

 so you've got some blocked flush writers but you have a incredibly large
 number of dropped mutations, are you using secondary indexes? and if so how
 many? what is your flush queue set to?

 On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote:

 Of course QA decided to start a test batch (still relatively low
 traffic), so I hope it doesn't throw the tpstats off too much

 Node 1:
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 0 0   13804928 0
   0
 ReadStage 0 0  10975 0
   0
 RequestResponseStage  0 07725378 0
   0
 ReadRepairStage   0 0   1247 0
   0
 ReplicateOnWriteStage 0 0  0 0
   0
 MiscStage 0 0  0 0
   0
 HintedHandoff 1 1 50 0
   0
 FlushWriter   0 0306 0
   

Questions about bootrapping and compactions during bootstrapping

2014-12-16 Thread Donald Smith
Looking at the output of nodetool netstats I see that the bootstrapping nodes 
pulling from only two of the nine nodes currently in the datacenter.   That 
surprises me: I'd think the vnodes it pulls from would be randomly spread 
across the existing nodes.  We're using Cassandra 2.0.11 with 256 vnodes each.

I also notice that while bootstrapping, the node is quite busy doing 
compactions.   There are over 1000 pending compactions on the new node and it's 
not finished bootstrapping. I'd think those would be unnecessary, since the 
other nodes in the data center have zero pending compactions.  Perhaps the 
compactions explains why running du -hs /var/lib/cassandra/data on the new 
node shows more disk space usage than on the old nodes.

Is it reasonable to do nodetool disableautocompaction on the bootstrapping 
node? Should that be the default???

If I start bootstrapping one node, it's not yet in the cluster but it decides 
which token ranges it owns and requests streams for that data. If  I then try 
to bootstrap a SECOND node concurrently, it will take over ownership of some 
token ranges from the first node. Will the first node then adjust what data it 
streams?

It seems to me the cassandra server needs to keep track of both the OLD token 
ranges and vnodes and the NEW ones.  I'm not convinced that running two 
bootstraps concurrently (starting the second one after several minutes of 
delay) is safe.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
That's just the thing. There is nothing in the logs except the constant ParNew 
collections like

DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC 
for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888

But the load is staying continuously high.

There's always some compaction on just that one table, media_tracks_raw going 
on and those values rarely changed (certainly the remaining time is meaningless)

pending tasks: 17
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction   mediamedia_tracks_raw   444294932  
1310653468 bytes33.90%
   Compaction   mediamedia_tracks_raw   131931354  
3411631999 bytes 3.87%
   Compaction   mediamedia_tracks_raw30308970 
23097672194 bytes 0.13%
   Compaction   mediamedia_tracks_raw   899216961  
1815591081 bytes49.53%
Active compaction remaining time :   0h27m56s

Here's a sample of a query trace:

 activity   
  | timestamp| source| source_elapsed
--+--+---+
   
execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
 Parsing select * from media_tracks_raw where id 
=74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 
| 47
  
Preparing statement | 00:11:46,612 | 10.140.22.236 |234
 Sending 
message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
 Message received 
from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
 Executing single-partition query 
on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
 Acquiring 
sstable references | 00:11:46,644 |  10.140.21.54 |  22029
  Merging 
memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
Bloom filter allows 
skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
Bloom filter allows 
skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
Bloom filter allows 
skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
Bloom filter allows 
skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
Bloom filter allows 
skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
Bloom filter allows 
skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
Bloom filter allows 
skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
Bloom filter allows 
skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
Bloom filter allows 
skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
Bloom filter allows 
skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
Bloom filter allows 
skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
Bloom filter allows 
skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
Bloom filter allows 
skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
Bloom filter allows 
skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
Bloom filter allows 
skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |  22441
Bloom filter allows 
skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |  22452
Bloom 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What version of Cassandra?
On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote:

 That's just the thing. There is nothing in the logs except the constant
 ParNew collections like

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634888

 But the load is staying continuously high.

 There's always some compaction on just that one table, media_tracks_raw
 going on and those values rarely changed (certainly the remaining time is
 meaningless)

 pending tasks: 17
   compaction typekeyspace   table   completed
   total  unit  progress
Compaction   mediamedia_tracks_raw   444294932
  1310653468 bytes33.90%
Compaction   mediamedia_tracks_raw   131931354
  3411631999 bytes 3.87%
Compaction   mediamedia_tracks_raw30308970
 23097672194 bytes 0.13%
Compaction   mediamedia_tracks_raw   899216961
  1815591081 bytes49.53%
 Active compaction remaining time :   0h27m56s

 Here's a sample of a query trace:

  activity
 | timestamp| source| source_elapsed

 --+--+---+

  execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
  Parsing select * from media_tracks_raw where id
 =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 |
 10.140.22.236 | 47

 Preparing statement | 00:11:46,612 | 10.140.22.236 |234
  Sending
 message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
  Message
 received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 |
 12
  Executing single-partition
 query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971

  Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |
  22029

 Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
 Bloom filter
 allows skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
 Bloom filter
 allows skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
 Bloom filter
 allows skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
 Bloom filter
 allows skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
 Bloom filter
 allows skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
 Bloom filter
 allows skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
 Bloom filter
 allows skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
 Bloom filter
 allows skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
 Bloom filter
 allows skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
 Bloom filter
 allows skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
 Bloom filter
 allows skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
 Bloom filter
 allows skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
 Bloom filter
 allows skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
 Bloom filter
 allows skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
 Bloom filter
 allows skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |  22441
 Bloom filter
 allows skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |  22452
 Bloom filter
 allows skipping sstable 1328 | 00:11:46,644 |  10.140.21.54 |  22463
 Bloom filter
 allows skipping sstable 1327 | 00:11:46,644 |  10.140.21.54 |  22475
 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Cassandra 2.0.10 and Datastax Java Driver 2.1.1

On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote:

 What version of Cassandra?
 
 On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote:
 That's just the thing. There is nothing in the logs except the constant 
 ParNew collections like
 
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
 GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
 
 But the load is staying continuously high.
 
 There's always some compaction on just that one table, media_tracks_raw going 
 on and those values rarely changed (certainly the remaining time is 
 meaningless)
 
 pending tasks: 17
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction   mediamedia_tracks_raw   444294932 
  1310653468 bytes33.90%
Compaction   mediamedia_tracks_raw   131931354 
  3411631999 bytes 3.87%
Compaction   mediamedia_tracks_raw30308970 
 23097672194 bytes 0.13%
Compaction   mediamedia_tracks_raw   899216961 
  1815591081 bytes49.53%
 Active compaction remaining time :   0h27m56s
 
 Here's a sample of a query trace:
 
  activity 
 | timestamp| source| source_elapsed
 --+--+---+
   
  execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
  Parsing select * from media_tracks_raw where id 
 =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
 10.140.22.236 | 47
   
 Preparing statement | 00:11:46,612 | 10.140.22.236 |234
  Sending 
 message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
  Message received 
 from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
  Executing single-partition query 
 on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
  
 Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |  22029
   Merging 
 memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
 Bloom filter allows 
 skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
 Bloom filter allows 
 skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
 Bloom filter allows 
 skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
 Bloom filter allows 
 skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
 Bloom filter allows 
 skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
 Bloom filter allows 
 skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
 Bloom filter allows 
 skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
 Bloom filter allows 
 skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
 Bloom filter allows 
 skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
 Bloom filter allows 
 skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
 Bloom filter allows 
 skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
 Bloom filter allows 
 skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
 Bloom filter allows 
 skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
 Bloom filter allows 
 skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
 Bloom 

[Consitency on cqlsh command prompt]

2014-12-16 Thread nitin padalia
Hi,

When I set Consistency to QUORUM in cqlsh command line. It says
consistency is set to quorum.

cqlsh:testdb CONSISTENCY QUORUM ;
Consistency level set to QUORUM.

However when I check it back using CONSISTENCY command on the prompt
it says consistency is 4. However it should be 2 as my replication
factor for the keyspace is 3.
cqlsh:testdb CONSISTENCY ;
Current consistency level is 4.

Isn't consistency QUORUM calculated by: (replication_factor/2)+1?
Where replication_factor/2 is rounded down.

If yes then why consistency is displayed as 4, however it should be 2
(3/2 = 1.5 = 1)+1 = 2.

I am using Casssandra version 2.1.2 and cqlsh 5.0.1 and CQL spec 3.2.0


Thanks! in advance.
Nitin Padalia


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jens Rantil
Maybe checking which thread(s) would hint what's going on? (see 
http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/).

On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen a...@emotient.com wrote:

 Cassandra 2.0.10 and Datastax Java Driver 2.1.1
 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote:
 What version of Cassandra?
 
 On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote:
 That's just the thing. There is nothing in the logs except the constant 
 ParNew collections like
 
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
 GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
 
 But the load is staying continuously high.
 
 There's always some compaction on just that one table, media_tracks_raw 
 going on and those values rarely changed (certainly the remaining time is 
 meaningless)
 
 pending tasks: 17
   compaction typekeyspace   table   completed
total  unit  progress
Compaction   mediamedia_tracks_raw   444294932
   1310653468 bytes33.90%
Compaction   mediamedia_tracks_raw   131931354
   3411631999 bytes 3.87%
Compaction   mediamedia_tracks_raw30308970
  23097672194 bytes 0.13%
Compaction   mediamedia_tracks_raw   899216961
   1815591081 bytes49.53%
 Active compaction remaining time :   0h27m56s
 
 Here's a sample of a query trace:
 
  activity
  | timestamp| source| source_elapsed
 --+--+---+
  
   execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
  Parsing select * from media_tracks_raw where id 
 =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
 10.140.22.236 | 47
  
  Preparing statement | 00:11:46,612 | 10.140.22.236 |234
  Sending 
 message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
  Message 
 received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
  Executing single-partition 
 query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
  
 Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |  22029
   
 Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
 Bloom filter allows 
 skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
 Bloom filter allows 
 skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
 Bloom filter allows 
 skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
 Bloom filter allows 
 skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
 Bloom filter allows 
 skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
 Bloom filter allows 
 skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
 Bloom filter allows 
 skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
 Bloom filter allows 
 skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
 Bloom filter allows 
 skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
 Bloom filter allows 
 skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
 Bloom filter allows 
 skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
 Bloom filter allows 
 skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
 Bloom filter allows 
 skipping sstable 1334 | 00:11:46,644 |