Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-10-29 Thread Duncan Sands
Hi Peter, are you using the hsha RPC server type on this node?  If you are, then 
it looks like rpc_max_threads threads will be allocated on startup in 2.0.11 
while this wasn't the case before.  This can exhaust your heap if the value of 
rpc_max_threads is too large (eg if you use the default).


Ciao, Duncan.

On 29/10/14 01:08, Peter Haggerty wrote:

On a 3 node test cluster we recently upgraded one node from 2.0.10 to
2.0.11. This is a cluster that had been happily running 2.0.10 for
weeks and that has very little load and very capable hardware. The
upgrade was just your typical package upgrade:

$ dpkg -s cassandra | egrep '^Ver|^Main'
Maintainer: Eric Evans eev...@apache.org
Version: 2.0.11

Immediately after started it ran a couple of ParNews and then started
executing CMS runs. In 10 minutes the node had become unreachable and
was marked as down by the two other nodes in the ring, which are still
2.0.10.

We have jstack output and the server logs but nothing seems to be
jumping out. Has anyone else run into this? What should we be looking
for?


Peter





Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-10-29 Thread Peter Haggerty
That definitely appears to be the issue. Thanks for pointing that out!

https://issues.apache.org/jira/browse/CASSANDRA-8116
It looks like 2.0.12 will check for the default and throw an exception
(thanks Mike Adamson) and also includes a bit more text in the config
file but I'm thinking that 2.0.12 should be pushed out sooner rather
than later as anyone using hsha and the default settings will simply
have their cluster stop working a few minutes after the upgrade and
without any indication of the actual problem.


Peter


On Wed, Oct 29, 2014 at 5:23 AM, Duncan Sands duncan.sa...@gmail.com wrote:
 Hi Peter, are you using the hsha RPC server type on this node?  If you are,
 then it looks like rpc_max_threads threads will be allocated on startup in
 2.0.11 while this wasn't the case before.  This can exhaust your heap if the
 value of rpc_max_threads is too large (eg if you use the default).

 Ciao, Duncan.


 On 29/10/14 01:08, Peter Haggerty wrote:

 On a 3 node test cluster we recently upgraded one node from 2.0.10 to
 2.0.11. This is a cluster that had been happily running 2.0.10 for
 weeks and that has very little load and very capable hardware. The
 upgrade was just your typical package upgrade:

 $ dpkg -s cassandra | egrep '^Ver|^Main'
 Maintainer: Eric Evans eev...@apache.org
 Version: 2.0.11

 Immediately after started it ran a couple of ParNews and then started
 executing CMS runs. In 10 minutes the node had become unreachable and
 was marked as down by the two other nodes in the ring, which are still
 2.0.10.

 We have jstack output and the server logs but nothing seems to be
 jumping out. Has anyone else run into this? What should we be looking
 for?


 Peter




RE: OldGen saturation

2014-10-29 Thread Adria Arcarons
Thank you Bryan and Mark. I have redesigned my schema in such a way that I only 
have 50CFs and I’ve given 2GB for the Heap space and now it’s working fine.

De: Mark Reddy [mailto:mark.l.re...@gmail.com]
Enviado el: martes, 28 de octubre de 2014 18:31
Para: user@cassandra.apache.org
Asunto: Re: OldGen saturation

Hi Adrià,

We have about 50.000 CFs of varying size

Before I read any further, having 50,000 CFs is something that I would highly 
discourage. Each column family is allocated 1MB of available memory 
(CASSANDRA-2252https://issues.apache.org/jira/browse/CASSANDRA-2252) so 
having anything over a few hundred on a 1GB heap would be the first thing I 
would reconsider. Also 1GB isn't something I'd run a production or load test  
Cassandra on. If your test machine has only 4GB give it half the total memory 
(2GB), for a production system you would want something more than a 4GB machine.

Here are some JIRAs and mailing list topics on the subject of large quantities 
of CFs:

https://issues.apache.org/jira/browse/CASSANDRA-7643
https://issues.apache.org/jira/browse/CASSANDRA-6794
https://issues.apache.org/jira/browse/CASSANDRA-7444
http://mail-archives.apache.org/mod_mbox/cassandra-user/201407.mbox/%3C10D771CCF4F243149C928D0CB32BCD78@JackKrupansky14%3E
http://mail-archives.apache.org/mod_mbox/cassandra-user/201408.mbox/%3ccaazu44m87c1yuffz08nzvtkqnww95yaw9bosy_ugu0fswl7...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201408.mbox/%3CCALRai9Ao=mdkrklowrbyajjp+fc4h5tpx-ejgdqxtayqj5u...@mail.gmail.com%3E


Regards,
Mark

On 28 October 2014 17:19, Bryan Talbot 
bryan.tal...@playnext.commailto:bryan.tal...@playnext.com wrote:
On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons 
adria.arcar...@greenpowermonitor.commailto:adria.arcar...@greenpowermonitor.com
 wrote:
Hi,
Hi



We have about 50.000 CFs of varying size



The writing test consists of a continuous flow of inserts. The inserts are done 
inside BATCH statements in groups of 1.000 to a single CF at a time to make 
them faster.



The problem I’m experiencing is that, eventually, when the script has been 
running for almost 40mins, the heap gets saturated. OldGen gets full and then 
there is an intensive GC activity trying to free OldGen objects, but it can 
only free very little space in each pass. Then GC saturates the CPU. Here are 
the graphs obtained with VisualVM that show this behavior:


My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 
4GB RAM. Intel Xeon CPU E5520 @


Without looking at your VM graphs, I'm going to go out on a limb here and say 
that your host is woefully underpowered to host fifty-thousand column families 
and batch writes of one-thousand statements.

A 1 GB java heap size is sometimes acceptable for a unit test or playing around 
with but you can't actually expect it to be adequate for a load test can you?

Every CF consumes some permanent heap space for its metadata. Too many CF are a 
bad thing. You probably have ten times more CF than would be recommended as an 
upper limit.

-Bryan




Upgrade to 2.1.1 causes error

2014-10-29 Thread James Derieg
Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt 
the cluster using the same settings on 2.1.1 and get this error, even 
with only one node present:


Non-system keyspaces don't have the same replication settings, 
effective ownership information is meaningless


I only have one non-system keyspace.  To be sure, I even set system_auth 
and system_traces to use the same replication factor as my main 
keyspace, but the error still persists.  Tried again with a 2.1.0 node 
and upgraded it to 2.1.1 and the cluster errors out again.  Any ideas or 
hints?  Here is what my keyspaces are set to for RF:


CREATE KEYSPACE system_auth WITH replication = {'class': 
'NetworkTopologyStrategy', 'us-east': '3'}  AND durable_writes = true;


CREATE KEYSPACE testspace WITH replication = {'class': 
'NetworkTopologyStrategy', 'us-east': '3'}  AND durable_writes = true;


CREATE KEYSPACE system_traces WITH replication = {'class': 
'NetworkTopologyStrategy', 'us-east': '3'}  AND durable_writes = true;


CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  
AND durable_writes = true;


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



Re: OOM at Bootstrap Time

2014-10-29 Thread DuyHai Doan
Some ideas:

1) Put on DEBUG log on the joining node to see what is going on in details
with the stream with 1500 files

2) Check the stream ID to see whether it's a new stream or an old one
pending



On Wed, Oct 29, 2014 at 2:21 AM, Maxime maxim...@gmail.com wrote:

 Doan, thanks for the tip, I just read about it this morning, just waiting
 for the new version to pop up on the debian datastax repo.

 Michael, I do believe you are correct in the general running of the
 cluster and I've reset everything.

 So it took me a while to reply, I finally got the SSTables down, as seen
 in the OpsCenter graphs. I'm stumped however because when I bootstrap the
 new node, I still see very large number of files being streamed (~1500 for
 some nodes) and the bootstrap process is failing exactly as it did before,
 in a flury of Enqueuing flush of ...

 Any ideas? I'm reaching the end of what I know I can do, OpsCenter says
 around 32 SStables per CF, but still streaming tons of files. :-/


 On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Tombstones will be a very important issue for me since the dataset is
 very much a rolling dataset using TTLs heavily.

 -- You can try the new DateTiered compaction strategy (
 https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1
 if you have a time series data model to eliminate tombstones

 On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 Again, from our experience w 2.0.x:

 Revert to the defaults - you are manually setting heap way too high IMHO.

 On our small nodes we tried LCS - way too much compaction - switch all
 CFs to STCS.

 We do a major rolling compaction on our small nodes weekly during less
 busy hours - works great. Be sure you have enough disk.

 We never explicitly delete and only use ttls or truncation. You can set
 GC to 0 in that case, so tombstones are more readily expunged. There are a
 couple threads in the list that discuss this... also normal rolling repair
 becomes optional, reducing load (still repair if something unusual happens
 tho...).

 In your current situation, you need to kickstart compaction - are there
 any CFs you can truncate at least temporarily? Then try compacting a small
 CF, then another, etc.

 Hopefully you can get enough headroom to add a node.

 ml




 On Sun, Oct 26, 2014 at 6:24 PM, Maxime maxim...@gmail.com wrote:

 Hmm, thanks for the reading.

 I initially followed some (perhaps too old) maintenance scripts, which
 included weekly 'nodetool compact'. Is there a way for me to undo the
 damage? Tombstones will be a very important issue for me since the dataset
 is very much a rolling dataset using TTLs heavily.

 On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan doanduy...@gmail.com
 wrote:

 Should doing a major compaction on those nodes lead to a restructuration
 of the SSTables? -- Beware of the major compaction on SizeTiered, it 
 will
 create 2 giant SSTables and the expired/outdated/tombstone columns in this
 big file will be never cleaned since the SSTable will never get a chance 
 to
 be compacted again

 Essentially to reduce the fragmentation of small SSTables you can stay
 with SizeTiered compaction and play around with compaction properties (the
 thresholds) to make C* group a bunch of files each time it compacts so 
 that
 the file number shrinks to a reasonable count

 Since you're using C* 2.1 and anti-compaction has been introduced, I
 hesitate advising you to use Leveled compaction as a work-around to reduce
 SSTable count.

  Things are a little bit more complicated because of the incremental
 repair process (I don't know whether you're using incremental repair or 
 not
 in production). The Dev blog says that Leveled compaction is performed 
 only
 on repaired SSTables, the un-repaired ones still use SizeTiered, more
 details here:
 http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1

 Regards





 On Sun, Oct 26, 2014 at 9:44 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If the issue is related to I/O, you're going to want to determine if
 you're saturated.  Take a look at `iostat -dmx 1`, you'll see avgqu-sz
 (queue size) and svctm, (service time).The higher those numbers
 are, the most overwhelmed your disk is.

 On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com
 wrote:
  Hello Maxime
 
  Increasing the flush writers won't help if your disk I/O is not
 keeping up.
 
  I've had a look into the log file, below are some remarks:
 
  1) There are a lot of SSTables on disk for some tables (events for
 example,
  but not only). I've seen that some compactions are taking up to 32
 SSTables
  (which corresponds to the default max value for SizeTiered
 compaction).
 
  2) There is a secondary index that I found suspicious :
 loc.loc_id_idx. As
  its name implies I have the impression that it's an index on the id
 of the
  loc which would lead to almost an 1-1 relationship between the
 indexed value
  and 

Re: Upgrade to 2.1.1 causes error

2014-10-29 Thread Michael Shuler

On 10/29/2014 02:05 PM, James Derieg wrote:

Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt
the cluster using the same settings on 2.1.1 and get this error, even
with only one node present:

Non-system keyspaces don't have the same replication settings,
effective ownership information is meaningless


That's not an error. It is an informative message when using 'nodetool 
status' without specifying a keyspace.


https://issues.apache.org/jira/browse/CASSANDRA-7173

--
Michael


Re: Upgrade to 2.1.1 causes error

2014-10-29 Thread James Derieg

Ah, thanks Michael!  Good to know that's not an error.

On 10/29/2014 1:25 PM, Michael Shuler wrote:

On 10/29/2014 02:05 PM, James Derieg wrote:

Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt
the cluster using the same settings on 2.1.1 and get this error, even
with only one node present:

Non-system keyspaces don't have the same replication settings,
effective ownership information is meaningless


That's not an error. It is an informative message when using 'nodetool 
status' without specifying a keyspace.


https://issues.apache.org/jira/browse/CASSANDRA-7173




---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



Commissioning failure

2014-10-29 Thread venkat sam
 I have a 5 node cassandra cluster and i commissioned 1 new node to the 
cluster. when i added 1 node. it received streams from 3 nodes out of which 2 
were completed successfully and one stream got failed. how can i resume the 
stream which has failed?

Re: Commissioning failure

2014-10-29 Thread Robert Coli
On Wed, Oct 29, 2014 at 12:49 PM, venkat sam samvenkat...@outlook.com
wrote:

   I have a 5 node cassandra cluster and i commissioned 1 new node to the
 cluster. when i added 1 node. it received streams from 3 nodes out of which
 2 were completed successfully and one stream got failed. how can i resume
 the stream which has failed?


You can't, you have to wipe the node and start over.

=Rob
http://twitter.com/rcolidba


tuning concurrent_reads param

2014-10-29 Thread Jimmy Lin
Hi,
looking at the docs, the default value for concurrent_reads is 32, which
seems bit small to me (comparing to say http server)? because if my node is
receiving slight traffic, any more than 32 concurrent read query will have
to wait.(?)

Recommend rule is, 16* number of drives. Would that be different if I have
SSDs?

I am attempting to increase it because I have a few tables have wide rows
that app will fetch them, the pure size of data may already eating up the
thread time, which can cause  other read threads need to wait and essential
slow.

thanks


Re: tuning concurrent_reads param

2014-10-29 Thread Chris Lohfink
Theres a bit to it, sometimes it can use tweaking though.  Its a good
default for most systems so I wouldn't increase right off the bat. When
using ssds or something with a lot of horsepower it could be higher though
(ie i2.xlarge+ on ec2).  If you monitor the number of active threads in
read thread pool (nodetool tpstats) you can see if they are actually all
busy or not.  If its near 32 (or whatever you set it at) all the time it
may be a bottleneck.

---
Chris Lohfink

On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 Hi,
 looking at the docs, the default value for concurrent_reads is 32, which
 seems bit small to me (comparing to say http server)? because if my node is
 receiving slight traffic, any more than 32 concurrent read query will have
 to wait.(?)

 Recommend rule is, 16* number of drives. Would that be different if I have
 SSDs?

 I am attempting to increase it because I have a few tables have wide rows
 that app will fetch them, the pure size of data may already eating up the
 thread time, which can cause  other read threads need to wait and essential
 slow.

 thanks






Re: Commissioning failure

2014-10-29 Thread Aravindan T
What could be the reasons for the stream error other than SSTABLE corruption?



Aravind 

-Robert Coli rc...@eventbrite.com wrote: -
To: user@cassandra.apache.org user@cassandra.apache.org
From: Robert Coli rc...@eventbrite.com
Date: 10/30/2014 02:21AM
Subject: Re: Commissioning failure

On Wed, Oct 29, 2014 at 12:49 PM, venkat sam samvenkat...@outlook.com wrote:
 I have a 5 node cassandra cluster and i commissioned 1 new node to the 
cluster. when i added 1 node. it received streams from 3 nodes out of which 2 
were completed successfully and one stream got failed. how can i resume the 
stream which has failed?

You can't, you have to wipe the node and start over.

=Rob
http://twitter.com/rcolidba
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you