Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
The main reason to set Xms=Xmx is so mlockall can tag the entire heap as don't swap this out on startup.  Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. I'm not sure where this is coming from,

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
Heap *shrinkage* on the other hand is another matter and for both CMS and G1, shrinkage never happens except on Full GC, unfortunately. I'm And to be clear, the implication here is that shrinkage normally doesn't happen. The implication is *not* that you see fallbacks to full GC for the purpose

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-08 Thread Jonathan Ellis
On Fri, Oct 8, 2010 at 4:54 AM, Jedd Rashbrooke jedd.rashbro...@imagini.net wrote: On 8 October 2010 02:05, Matthew Dennis mden...@riptano.com wrote: Also, in general, you probably want to set Xms = Xmx (regardless of the value you eventually decide on for that).  Matthew - we'd just about

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-07 Thread Matthew Dennis
Also, in general, you probably want to set Xms = Xmx (regardless of the value you eventually decide on for that). If you set them equal, the JVM will just go ahead and allocate that amount on startup. If they're different, then when you grow above Xms it has to allocate more and move a bunch of

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-04 Thread Jedd Rashbrooke
Hi Peter, Thanks again for your time and thoughts on this problem. We think we've got a bit ahead of the problem by just scaling back (quite savagely) on the rate that we try to hit the cluster. Previously, with a surplus of optimism, we were throwing very big Hadoop jobs at Cassandra,

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-02 Thread Peter Schuller
(sorry for the delay in following up on this thread)  Actually, there's a question - is it 'acceptable' do you think  for GC to take out a small number of your nodes at a time,  so long as the bulk (or at least where RF is nodes gone  on STW GC) of the nodes are okay?  I suspect this is a  

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-28 Thread Jedd Rashbrooke
Peter - my apologies for the slow response - we had to divert down a 'Plan B' approach last week involving MySQL, memcache, redis and various other uglies. On 20 September 2010 23:11, Peter Schuller peter.schul...@infidyne.com wrote: Are you running an old JVM by any chance? (Just grasping

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
As a follow up to this conversation; we are still having issues with our Cassandra cluster on EC2. It *looks* to be related to Garbage Collection; however we aren't sure what the root cause of the problem is. Here is an extract from logs: INFO [GMFD:1] 2010-09-20 15:22:00,242 Gossiper.java

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Peter Schuller
Can anyone help shed any light on why this might be happening? We've tried a variety of JVM settings to alleviate this; currently with no luck. Extremely long ParNew (young generations) pause times are almost always due to swapping. Are you swapping? -- / Peter Schuller

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
Nope - no swap enabled. top - 16:53:14 up 12 days, 6:11, 3 users, load average: 1.99, 2.63, 5.03 Tasks: 133 total, 1 running, 132 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 35840228k total, 33077580k used, 2762648k

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
One other question for the list: I gather GMFD is gossip stage - but what does this actually mean? Is it an issue to have 203 pending operations? Thanks Dave INFO [GC inspection] 2010-09-20 16:56:12,792 GCInspector.java (line 129) GC for ParNew: 127970 ms, 570382800 reclaimed leaving

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Peter Schuller
Nope - no swap enabled. Something is seriously weird, unless the system clock is broken... Given: INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is 25907560448 INFO [GC inspection] 2010-09-20

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Dave Viner
Hi Jedd, I'm using Cassandra on EC2 as well - so I'm quite interested. Just to clarify your post - it sounds like you have 4 questions/issue: 1. Writes have slowed down significantly. What's the logical explanation? And what is the logical solution/options to solve it? 2. You grew from 2

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Robert Coli
On 9/17/10 7:41 AM, Jedd Rashbrooke wrote: Happy times. This was when the cluster was modestly sized - 20-50GB. It's now about 200GB, and performance has dropped by an order of magnitude - perhaps 5-6 hours to do the same amount of work, using the same codebase and the same input

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Dave, Thank you for your response. I can clarify a couple of things here: 2. You grew from 2 nodes to 4, but the original 2 nodes have 200GB and the 2 new ones have 40 GB.  What's the recommended practice for rebalancing (i.e., when should you do it), what's the actual procedure, and

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Rob, Thanks for your suggestions. I should have been a bit more verbose in my platform description -- I'm using 64-bit instances, which I think in a Ben Black video I saw led to a sensible default usage of mmap when left at auto. Should I look at forcing this setting? You don't