RE: Apt repositories

2011-04-25 Thread Gregory Szorc
If you don't want your APT-sourced packages to upgrade automatically, I suggest 
pinning the package.

The apt_preferences(5) man page tells you how to do this.

The gist is to add the following lines:

  Package: cassandra
  Pin: version 0.6.13
  Pin-Priority: 1100

(setting the version to the one you want to install, obviously) to a 
preferences file sourced by apt. On Ubuntu, just place the above 3 lines in the 
file /etc/apt/preferences.d/cassandra and you should be set. No matter what 
happens with the remote APT repository or how you run `apt-get upgrade`, your 
system will always use the version you specified in the preferences file.

Greg

-Original Message-
From: David Strauss [mailto:da...@davidstrauss.net] 
Sent: Saturday, April 23, 2011 4:49 PM
To: user@cassandra.apache.org
Subject: Apt repositories

I just noticed that, following the Cassandra 0.8 beta release, the Apt 
repository is encouraging servers in my clusters to upgrade. Beta releases 
should probably be on different channels (or named differently) than stable 
ones.

Better yet would be naming the packages based on the major release in order to 
prevent an inadvertent upgrade, even once the next release stabilizes. For 
example, having cassandra-0.7 and cassandra-0.8 would be great, with 
installation of the latter replacing any cassandra-0.7 package. This is common 
with PHP and MySQL packages where it's not entirely safe to inadvertently do a 
major upgrade.

Thanks,
David



RE: Nodes frozen in GC

2011-03-10 Thread Gregory Szorc
I do believe there is a fundamental issue with compactions allocating too much 
memory and incurring too many garbage collections (at least with 0.6.12).

On nearly every Cassandra node I operate, garbage collections simply get out of 
control during compactions of any reasonably sized CF (1GB). I can reproduce 
it on CF's with many wider rows (1000's of columns) consisting of smaller 
columns (10's-100's of bytes) and CF's with thinner rows (20 columns) with 
larger columns (10's MBs) and everything in between.

From the GC logs, I can infer that Cassandra is allocating upwards of 4GB/s. I 
once gave the JVM 30GB of heap and saw it run through the entire heap in a few 
seconds while doing a compaction! It would continuously blow through the heap, 
incur a stop-the-world collection, and repeat. Meanwhile, the listed compacted 
bytes from the JMX interface was never increasing and the tmp sstable wasn't 
growing in size.

My current/relevant JVM args are as follows (running on Sun 1.6.0.24 w/ JNA 
3.2.7):

-Xms9G -Xmx9G -Xmn256M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintClassHistogram -XX:+PrintTenuringDistribution 
-Xloggc:/var/log/cassandra/gc.log -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=3 
-XX:CMSInitiatingOccupancyFraction=40 -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=1 
-XX:ParallelGCThreads=6

I've tweaked with nearly every setting imaginable 
(http://www.md.pp.ru/~eu/jdk6options.html is a great resource, BTW) and can't 
control the problem. No matter what I do, nothing can solve the problem of 
Cassandra allocating objects faster than the GC can clean them. And, when we're 
talking about 1GB/s of allocations, I don't think you can blame GC for not 
keeping up.

Since there is no way to prevent these frequent stop-the-world collections, we 
get frequent client timeouts and an occasional unavailable response if we're 
unfortunate to have a couple of nodes compacting large CFs at the same time 
(which happens more than I'd like).

For the past two weeks, we had N=replication factor adjacent nodes in our 
cluster that failed to perform their daily major compaction on a particular 
column family. All N would spew GCInspector logs and the GC logs revealed heavy 
memory allocation rate. The only resolution was to restart Cassandra to abort 
the compaction. I isolated one node from network connectivity and restarted it 
in a cluster of 1 with no caching, memtables, or any operations. Under these 
ideal compacting conditions, I still ran into issues. I experimented with 
extremely large young generations (up to 10GB), very low 
CMSInitiatingOccupancyFraction, etc, but Cassandra would always allocate faster 
than JVM could collect, eventually leading to stop-the-world.

Recently, we rolled out a change to the application accessing the cluster which 
effectively resaved every column in every row. When this was mostly done, our 
daily major compaction for the trouble CF that refused to compact for two 
weeks, suddenly completed! Most interesting. (Although, it still went through 
memory to no end.)

One of my observations is that memory allocations during compaction seems to be 
mostly short-lived objects. The young generation is almost never promoting 
objects to the tenured generation (we changed our MaxTenuringThreshold to 3, 
from Cassandra's default of 1 to discourage early promotion- a default of 1 
seems rather silly to me). However, when the young generation is being 
collected (which happens VERY often during compactions b/c allocation rate is 
so high), objects are allocated directly into the tenured generation. Even with 
relatively short ParNew collections (often 0.05s, almost always 0.1s wall 
time), these tenured allocations quickly accumulate, initiating CMS and 
eventually stop-the-world.

Anyway, not sure how much additional writing is going to help resolve this 
issue. I have gobs of GC logs and supplementary metrics data to back up my 
claims if those will help. But, I have a feeling that if you just create a CF 
of a few GB and incur a compaction with the JVM under a profiler, it will be 
pretty easy to identify the culprit. I've started down this path and will let 
you know if I find anything. But, I'm no Java expert and am quite busy with 
other tasks, so don't expect anything useful from me anytime soon.

I hope this information helps. If you need anything else, just ask, and I'll 
see what I can do.

Gregory Szorc
gregory.sz...@xobni.com

 -Original Message-
 From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter
 Schuller
 Sent: Thursday, March 10, 2011 10:36 AM
 To: ruslan usifov
 Cc: user@cassandra.apache.org
 Subject: Re: Nodes frozen in GC
 
 I think it would be very useful to get to the bottom of this but without
 further details (like the asked for GC logs) I'm not sure what to do/suggest.
 
 It's clear that a single CF

RE: Request For 0.6.12 Release

2011-02-17 Thread Gregory Szorc
Aaron,

Thank you very much for initiating the voting process. I'm looking forward to 
running this release.

Was there any discussion around improving the communication of known issues 
with releases?

Gregory

From: Aaron Morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, February 17, 2011 4:21 PM
To: user@cassandra.apache.org
Subject: Re: Request For 0.6.12 Release

Gregory,
There is a vote going on for 0.6.12 now
http://www.mail-archive.com/dev@cassandra.apache.org/msg01808.html

If you have time grab the bin and give it a test 
http://people.apache.org/~eevans
Aaron

On 16 Feb, 2011,at 09:21 PM, Aaron Morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
Have checked it's all in the 0.6 branch and asked the devs for a 0.6.12 
release. Will let you know how it goes.
cheers
Aaron

On 16 Feb, 2011,at 08:38 AM, Aaron Morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
I worked on that ticket, will try to chase it up.


Aaron


On 15/02/2011, at 2:01 PM, Gregory Szorc 
gregory.sz...@gmail.commailto:gregory.sz...@gmail.com wrote:

The latest official 0.6.x releases, 0.6.10 and 0.6.11, have a very serious 
bug/regression when performing some quorum reads (CASSANDRA-2081), which is 
fixed in the head of the 0.6 branch If there aren't any plans to cut 0.6.12 any 
time soon, as an end user, I request that an official and blessed release of 
0.6.x be made ASAP.

On a related note, I am frustrated that such a serious issue has lingered in 
the latest oldstable release. I would have liked to see one or more of the 
following:


1)  The issue documented prominently on the apache.orghttp://apache.org 
web site and inside the download archive so end users would know they are 
downloading and running known-broken software

2)  The 0.6.10 and 0.6.11 builds pulled after identification of the issue

3)  A 0.6.12 release cut immediately (with reasonable time for testing, of 
course) to address the issue

I understand that releases may not always be as stable as we all desire. But, I 
hope that when future bugs affecting the bread and butter properties of a 
distributed storage engine surface (especially when they are regressions) that 
the official project response (preferably via mailing list and the web site) is 
swift and maximizes the potential for data integrity and availability.

If there is anything I can do to help the process, I'd gladly give some of my 
time to help the overall community.

Gregory Szorc
gregory.sz...@gmail.commailto:gregory.sz...@gmail.com



Request For 0.6.12 Release

2011-02-14 Thread Gregory Szorc
The latest official 0.6.x releases, 0.6.10 and 0.6.11, have a very serious
bug/regression when performing some quorum reads (CASSANDRA-2081), which is
fixed in the head of the 0.6 branch. If there aren't any plans to cut 0.6.12
any time soon, as an end user, I request that an official and blessed
release of 0.6.x be made ASAP.

 

On a related note, I am frustrated that such a serious issue has lingered in
the latest oldstable release. I would have liked to see one or more of the
following:

 

1)  The issue documented prominently on the apache.org web site and
inside the download archive so end users would know they are downloading and
running known-broken software

2)  The 0.6.10 and 0.6.11 builds pulled after identification of the
issue

3)  A 0.6.12 release cut immediately (with reasonable time for testing,
of course) to address the issue

 

I understand that releases may not always be as stable as we all desire.
But, I hope that when future bugs affecting the bread and butter properties
of a distributed storage engine surface (especially when they are
regressions) that the official project response (preferably via mailing list
and the web site) is swift and maximizes the potential for data integrity
and availability.

 

If there is anything I can do to help the process, I'd gladly give some of
my time to help the overall community.

 

Gregory Szorc

gregory.sz...@gmail.com