Re: Dazed and confused with Cassandra on EC2 ...
The main reason to set Xms=Xmx is so mlockall can tag the entire heap as don't swap this out on startup. Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. I'm not sure where this is coming from, assuming you mean a *full* stw GC. I cannot remember ever observing this in all my hears of heap growth ;), and I don't see why it would be the case. Heap *shrinkage* on the other hand is another matter and for both CMS and G1, shrinkage never happens except on Full GC, unfortunately. I'm hoping G1 will eventually improve here since its compacting design should fundamentally make it feasible to implement efficiently. Disregarding the mlockall issue which is pretty specific to Cassandra, the typical reason, as I have understood it, for setting Xms=Xmx on machine dedicatedly running some particular JVM is that if you have to reserve Xmx amount of memory anyway, and perhaps even expect that to be reached, it is generally more efficiency to just let the JVM use it all from the start - due to the usual effect of GC being more efficient with larger heap sizes and in order to avoid bad policy decisions of the GC causing excessive GC activity rather than just growing the heap which you're fine with anyway. -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
Heap *shrinkage* on the other hand is another matter and for both CMS and G1, shrinkage never happens except on Full GC, unfortunately. I'm And to be clear, the implication here is that shrinkage normally doesn't happen. The implication is *not* that you see fallbacks to full GC for the purpose of shrinkage. -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
On Fri, Oct 8, 2010 at 4:54 AM, Jedd Rashbrooke jedd.rashbro...@imagini.net wrote: On 8 October 2010 02:05, Matthew Dennis mden...@riptano.com wrote: Also, in general, you probably want to set Xms = Xmx (regardless of the value you eventually decide on for that). Matthew - we'd just about reached that conclusion! Is it as big an issue for clusters that are up for days at a time, with maybe Xmx:3G? I'd kind of assumed that it'd get to its max pretty quickly. I haven't been watching that with jconsole, but might watch it on the next startup. The main reason to set Xms=Xmx is so mlockall can tag the entire heap as don't swap this out on startup. Secondarily whenever the heap resizes upwards the JVM does a stop-the-world gc, but no, not really a big deal when your uptime is in days or weeks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Dazed and confused with Cassandra on EC2 ...
Also, in general, you probably want to set Xms = Xmx (regardless of the value you eventually decide on for that). If you set them equal, the JVM will just go ahead and allocate that amount on startup. If they're different, then when you grow above Xms it has to allocate more and move a bunch of stuff around. It may have to do this multiple times. Note that it does this as the worst time possible (i.e. under heavy load, which is likely what caused you to grow past Xms in the first place). On Thu, Oct 7, 2010 at 2:49 PM, Peter Schuller peter.schul...@infidyne.comwrote: There's some words on the 'Net that - the recent pages on Riptano's site in fact - that strongly encourage scaling left and right, rather than beefing up the boxes - and certainly we're seeing far less bother from GC using a much smaller heap - previously we'd been going up to 16GB, or even higher. This is based on my previous positive experiences of getting better performance from memory hog apps (eg. Java) by giving them more memory. In any case, it seems that using large amounts of memory on EC2 is just asking for trouble. Keep in mind that while GC tends to be more efficient with larger heap sizes, that does not always translate into better overall performance when other things have to be considered. In particular, in the case of Cassandra, if you waste 10-15 gigs of RAM on the JVM heap for a Cassandra instances which could live with e.g. 1 GB, you're actively taking away those 10-15 gigs of RAM from the operating system to use for the buffer cache. Particularly if you're I/O bound on reads then, this could have very detrimental effects (assuming the data set is sufficiently small and locality is such that 15 GB of extra buffer cache makes a difference; usually, but not always, this is the case). So with Cassandra, in the general case, you definitely want to keep hour heap size reasonable in relation to the actual live set (amount of actually reachable data), rather than just cranking it up as much as possible. (The main issue here is also keeping it high enough to not OOM, given that exact memory demands are hard to predict; it would be absolutely great if the JVM was better at maintaining a reasonable heap size to live set size ratio so that much less tweaking of heap sizes was necessary, but this is not the case.) -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
Hi Peter, Thanks again for your time and thoughts on this problem. We think we've got a bit ahead of the problem by just scaling back (quite savagely) on the rate that we try to hit the cluster. Previously, with a surplus of optimism, we were throwing very big Hadoop jobs at Cassandra, including what I understand to be a worst-case usage (random reads). Now we're throttling right back on the number of parallel jobs that we fire from Hadoop, and we're seeing better performance, in terms of the boxes generally staying up as far as nodetool and other interactive sessions are concerned. As discussed, we've adopted quite a number of different approaches with GC - at the moment we've returned to: JVM_OPTS= \ -ea \ -Xms2G \ -Xmx3G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false ... which is much closer to the default as shipped - notable change is the heap size, which out of the box comes as 1G. There's some words on the 'Net that - the recent pages on Riptano's site in fact - that strongly encourage scaling left and right, rather than beefing up the boxes - and certainly we're seeing far less bother from GC using a much smaller heap - previously we'd been going up to 16GB, or even higher. This is based on my previous positive experiences of getting better performance from memory hog apps (eg. Java) by giving them more memory. In any case, it seems that using large amounts of memory on EC2 is just asking for trouble. And because it's Amazon, more smaller machines generally works out as the same CPU grunt per dollar, of course .. although the management costs go up. To answer your last question there - we'd been using some pretty beefy EC2 boxes, but now we think we'll head back to the 2-core 7GB medium-ish sized machines I think. All IO still runs like a dog no matter how much money you spend, sadly. cheers, Jedd.
Re: Dazed and confused with Cassandra on EC2 ...
(sorry for the delay in following up on this thread) Actually, there's a question - is it 'acceptable' do you think for GC to take out a small number of your nodes at a time, so long as the bulk (or at least where RF is nodes gone on STW GC) of the nodes are okay? I suspect this is a question peculiar to Amazon EC2, as I've never seen a box rendered non-communicative by a single core flat-lining. Well, first of all I still find it very strange that GC takes nodes down at all, unless one is specifically putting sufficiant CPU load on the cluster that e.g. concurrent GC causes a problem. But in particular if you're still seeing those crazy long GC pause times still, IMO something is severaly wrong and I would not personally recommend going to production with that unresolved since whatever the cause is, may suddenly start having other effects. Severely long ParNew pause times are really not expected; the only two major reasons I can think of, at least when running on real hardware, and barring JVM bugs, are (1) swapping, and (2) possibly extreme performance penalties associated with a very full old generation in which case the solution is larger heap. I don't remember whether you indicated any heap statistics so I'm not sure whether (2) is a possibility. But I would expect OutOfMemory errors long before a ParNew takes 300+ *seconds*, just out of JVM policies w.r.t. acceptable GC efficiency. Bottom line: 300+ seconds for a ParNew collections is *way way way* out there. 300 *milli*-seconds is more along the lines of what one might expect (usually lower than that). Even if you can seemingly lessen the impact by using the throughput collector, I wouldn't be comfortable with shrugging off whatever is happening. That said, in terms of the effects on the cluster: I have not had much hands-on experience with this, but I believe you'd expect a definite visible from the point of view of clients. Cassandra is not optimized for instantly detecting slow nodes and transparently working around them with zero impact on clients; I don't think it is recommended to be running a cluster with nodes regularly bouncing in and our, for whatever reason, if it can be avoided. Not sure what to say, other than to strongly recommend getting to the bottom of this problem which seems non-specific to Cassandra, before relying on the system in a production setting. The extremity of the issues you're seeing are far beyond what I would ever expect even allowing for who knows what EC2 is doing or what other people are running on the machine, except for the hypothesis that they over-commit memory and the extreme latencies are due to swapping. But if that is what is happening, that just tells me that EC2 is unusable for this type of thing, but I still think it's far fetched since the impact should be significant on a great number of customers of theirs. I forget and I didn't find it by brief sifting through thread history; were you running on small EC2 instances or larger ones? -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
Peter - my apologies for the slow response - we had to divert down a 'Plan B' approach last week involving MySQL, memcache, redis and various other uglies. On 20 September 2010 23:11, Peter Schuller peter.schul...@infidyne.com wrote: Are you running an old JVM by any chance? (Just grasping for straws.) JVM is Sun's 1.6 - I've been caught out once before with openjdk's performance challenges, so I'm particularly careful with this now. Hmm. I can see useless spinning decreasing efficiency, but the numbers from your log are really extreme. Do you have a URL / bug id or anything that one can read up on about this? We've rebuilt the Cassandra cluster this week, avoiding Hadoop entirely - partly to reduce the variables in play, and partly because it looks like we'll only need two 'feeder' nodes for our jobs with the size of Cassandra cluster that we're likely going to end up with (10-12 ish). Any ratio higher than that seems to, on EC2 at least, cause too many fails on the Cassandra side. Actually, there's a question - is it 'acceptable' do you think for GC to take out a small number of your nodes at a time, so long as the bulk (or at least where RF is nodes gone on STW GC) of the nodes are okay? I suspect this is a question peculiar to Amazon EC2, as I've never seen a box rendered non-communicative by a single core flat-lining. By the end of this week we hope to have a better idea (mind, I've thought that for the past 5 weeks of experimenting). If I'm back to square one at that point I'll start pastebining some logs and configs. Increasingly, I'm convinced that many of these problems would be solved if we hosted our own servers. cheers, Jedd.
Re: Dazed and confused with Cassandra on EC2 ...
As a follow up to this conversation; we are still having issues with our Cassandra cluster on EC2. It *looks* to be related to Garbage Collection; however we aren't sure what the root cause of the problem is. Here is an extract from logs: INFO [GMFD:1] 2010-09-20 15:22:00,242 Gossiper.java (line 578) InetAddress /10.102.57.197 is now UP INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,242 HintedHandOffManager.java (line 165) Started hinted handoff for endPoint / 10.102.57.197 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,247 HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to endpoint /10.102.57.197 INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is 25907560448 INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,052 OutboundTcpConnection.java (line 105) error writing to /10.102.7.187 INFO [GC inspection] 2010-09-20 15:27:42,082 GCInspector.java (line 150) Pool NameActive Pending INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156) STREAM-STAGE 0 0 INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156) RESPONSE-STAGE0 3 INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156) ROW-READ-STAGE625 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156) LB-OPERATIONS 0 0 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156) MISCELLANEOUS-POOL0 0 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156) GMFD 1 129 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156) CONSISTENCY-MANAGER 0 0 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156) LB-TARGET 0 0 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156) ROW-MUTATION-STAGE1 1 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156) MESSAGE-STREAMING-POOL0 0 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156) LOAD-BALANCER-STAGE 0 0 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156) FLUSH-SORTER-POOL 0 0 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER 0 0 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156) AE-SERVICE-STAGE 0 0 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156) FLUSH-WRITER-POOL 0 0 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156) HINTED-HANDOFF-POOL 0 0 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 161) CompactionManager n/a 0 INFO [GMFD:1] 2010-09-20 15:27:42,088 Gossiper.java (line 592) Node / 10.102.7.187 has restarted, now UP again INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 548) Node / 10.102.7.187 state jump to normal INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 555) Will not change my token ownership to /10.102.7.187 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,089 HintedHandOffManager.java (line 165) Started hinted handoff for endPoint / 10.102.7.187 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,112 HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to endpoint /10.102.7.187 INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,389 OutboundTcpConnection.java (line 105) error writing to /10.102.7.187 INFO [WRITE-/10.102.57.197] 2010-09-20 15:34:15,911 OutboundTcpConnection.java (line 105) error writing to /10.102.57.197 INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is 25907560448 INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 150) Pool NameActive Pending INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 156) STREAM-STAGE 0 0 INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156) RESPONSE-STAGE021 INFO [Timer-0] 2010-09-20 15:34:15,926 Gossiper.java (line 180) InetAddress /10.102.7.187 is now dead. INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156) ROW-READ-STAGE185 INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156) LB-OPERATIONS 0 0 INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156) MISCELLANEOUS-POOL0 0 INFO [GC inspection] 2010-09-20 15:34:15,935
Re: Dazed and confused with Cassandra on EC2 ...
Can anyone help shed any light on why this might be happening? We've tried a variety of JVM settings to alleviate this; currently with no luck. Extremely long ParNew (young generations) pause times are almost always due to swapping. Are you swapping? -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
Nope - no swap enabled. top - 16:53:14 up 12 days, 6:11, 3 users, load average: 1.99, 2.63, 5.03 Tasks: 133 total, 1 running, 132 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 35840228k total, 33077580k used, 2762648k free, 263388k buffers Swap:0k total,0k used,0k free, 29156108k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ SWAP COMMAND 3398 root 20 0 203g 1.2g 186m S0 3.5 5124095h 202g jsvc 4162 hadoop20 0 9095m 84m 9544 S0 0.2 5124095h 8.8g java 32153 hadoop20 0 9057m 154m 9416 S0 0.4 5124095h 8.7g java 18091 hadoop20 0 9257m 561m 9460 S0 1.6 17232821w 8.5g java 4267 hadoop20 0 2400m 85m 9404 S0 0.2 5124095h 2.3g java 4289 hadoop20 0 2337m 78m 9348 S0 0.2 0:01.44 2.2g java On 20 September 2010 17:48, Peter Schuller peter.schul...@infidyne.comwrote: Can anyone help shed any light on why this might be happening? We've tried a variety of JVM settings to alleviate this; currently with no luck. Extremely long ParNew (young generations) pause times are almost always due to swapping. Are you swapping? -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
One other question for the list: I gather GMFD is gossip stage - but what does this actually mean? Is it an issue to have 203 pending operations? Thanks Dave INFO [GC inspection] 2010-09-20 16:56:12,792 GCInspector.java (line 129) GC for ParNew: 127970 ms, 570382800 reclaimed leaving 460688576 used; max is 6576406528 INFO [GC inspection] 2010-09-20 16:56:12,803 GCInspector.java (line 150) Pool NameActive Pending INFO [GC inspection] 2010-09-20 16:56:12,803 GCInspector.java (line 156) STREAM-STAGE 0 0 INFO [GC inspection] 2010-09-20 16:56:12,803 GCInspector.java (line 156) RESPONSE-STAGE2 2 INFO [GC inspection] 2010-09-20 16:56:12,804 GCInspector.java (line 156) ROW-READ-STAGE 1212 INFO [GC inspection] 2010-09-20 16:56:12,804 GCInspector.java (line 156) LB-OPERATIONS 0 0 INFO [GC inspection] 2010-09-20 16:56:12,804 GCInspector.java (line 156) MISCELLANEOUS-POOL0 0 INFO [GC inspection] 2010-09-20 16:56:12,804 GCInspector.java (line 156) GMFD 1 203 INFO [GC inspection] 2010-09-20 16:56:12,805 GCInspector.java (line 156) CONSISTENCY-MANAGER 0 0 INFO [GC inspection] 2010-09-20 16:56:12,805 GCInspector.java (line 156) LB-TARGET 0 0 INFO [GC inspection] 2010-09-20 16:56:12,805 GCInspector.java (line 156) ROW-MUTATION-STAGE0 0 INFO [GC inspection] 2010-09-20 16:56:12,806 GCInspector.java (line 156) MESSAGE-STREAMING-POOL0 0 INFO [GC inspection] 2010-09-20 16:56:12,806 GCInspector.java (line 156) LOAD-BALANCER-STAGE 0 0 INFO [GC inspection] 2010-09-20 16:56:12,806 GCInspector.java (line 156) FLUSH-SORTER-POOL 0 0 INFO [GC inspection] 2010-09-20 16:56:12,806 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER 0 0 INFO [GC inspection] 2010-09-20 16:56:12,807 GCInspector.java (line 156) AE-SERVICE-STAGE 0 0 INFO [GC inspection] 2010-09-20 16:56:12,807 GCInspector.java (line 156) FLUSH-WRITER-POOL 0 0 INFO [GC inspection] 2010-09-20 16:56:12,807 GCInspector.java (line 156) HINTED-HANDOFF-POOL 0 0 INFO [GC inspection] 2010-09-20 16:56:12,808 GCInspector.java (line 161) CompactionManager n/a 1
Re: Dazed and confused with Cassandra on EC2 ...
Nope - no swap enabled. Something is seriously weird, unless the system clock is broken... Given: INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is 25907560448 INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is 25907560448 We have *extremely* slow ParNew:s on a heap that is not even close to being full. I highly doubt fragmentation is causing this kind of extremity. I wonder if it is possible that EC2 instances are over-committed on memory such that swapping is happening behind the scenes on the host... but I have always assumed memory was not over-committed on EC2. Can you run with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimestamps? -- / Peter Schuller
Re: Dazed and confused with Cassandra on EC2 ...
Hi Jedd, I'm using Cassandra on EC2 as well - so I'm quite interested. Just to clarify your post - it sounds like you have 4 questions/issue: 1. Writes have slowed down significantly. What's the logical explanation? And what is the logical solution/options to solve it? 2. You grew from 2 nodes to 4, but the original 2 nodes have 200GB and the 2 new ones have 40 GB. What's the recommended practice for rebalancing (i.e., when should you do it), what's the actual procedure, and what's the expected impact of it? 3. Cassandra nodes disappear. (I'm not quite clear what this means.) 4. You took a machine offline without decommissioning it from the cluster. Now the machine is gone, but the other nodes (in Gossip logs) report that they are still looking for it. How do you stop nodes from looking for a removed node? I'm not trying to put words in your mouth - but I want to make sure that I understand what you're asking about (because I have similar ec2-related thoughts). Let me know if this is an accurate summary. Dave Viner On Fri, Sep 17, 2010 at 7:41 AM, Jedd Rashbrooke jedd.rashbro...@imagini.net wrote: Howdi, I've just landed in an experiment to get Cassandra going, and fed by PHP via Thrift via Hadoop, all running on EC2. I've been lurking a bit on the list for a couple of weeks, mostly reading any threads with the word 'performance' in them. Few people have anything polite to say about EC2, but I want to just throw out some observations and get some feedback on whether what I'm seeing is even approaching any kind of normal. My background is mostly *nix and networking, with half-way decent understanding of DB's -- but Cassandra, Hadoop, Thrift and EC2 are all fairly new to me. We're using a four-node decently-specced (m2.2xlarge, if you're EC2-aware) cluster - 32GB, 4-core, if you're not :) I'm using Ubuntu with the Deb packages for Cassandra and Hadoop, and some fairly conservative tweaks to things like JVM memory (bumping them up to 4GB, then 16GB). One of our insert jobs - a mapper only process - was running pretty fast a few days ago. Somewhere around a million lines of input, split into a dozen files, inserting via a Hadoop job in about a half hour. Happy times. This was when the cluster was modestly sized - 20-50GB. It's now about 200GB, and performance has dropped by an order of magnitude - perhaps 5-6 hours to do the same amount of work, using the same codebase and the same input data. I've read that reads slow down as the DB grows, but had an expectation that writes would be consistently snappy. How surprising is this performance drop given the DB growth? My 4-node cluster started off as a 2-node - and now nodetool ring suggests the two original nodes are 200GB each, and the newer two are 40GB. Is this normal? Would a rebalance likely improve performance substantially? My feeling is that it would be expensive to perform. EC2 seems to get a bad rap, and we're feeling quite a bit of pain, which is sad given the (on paper) spec of the machines, and the cost - over US$3k/month for the cluster. I've split Cassandra commitlog, Cassandra data, hadoop(hdfs) and tmp onto separate 'spindles' - observations so far suggest late '90's disk IO speed (15MB max sustained writes, one machine, one disk to another), and consistently inconsistent performance (identical machine next to it running the same task at the same time was getting 28MB) over several hours. Cassandra nodes seems to disappear too easily - even with just one core (out of four) maxed out with a jsvc task, minimal disk or network activity, the machine feels very sluggish. Tailing the cassandra logs hints that it's doing hinted handoffs and occasionally compaction tasks. I've never seen this kind of behaviour - and suspect this is more a feature of EC2. Gossip now seems to be pining the loss of an older machine (that I stupidly took offline briefly - EC2 gave it a new IP address when it came back). There's nothing in the storage-conf to refer to the old address, all 4 Cassandra daemons have been re-started several times since, but gossip occasionally (a day later) says that it is looking for it - and more worrying that it is 'now part of the cluster'. I'm unsure if this is just an irritation or part of the underlying problem. What I'm going to do next is to try importing some data into a local machine - it's just time-consuming to pull in our S3 data - and see if I can fake up to around the same capacity and watch for performance degradation. I'm also toying with the idea of going from 4 to 8 nodes, but I'm clueless on whether / how much this would help. As I say, though, I'm keen on anyone else's observations on my observations - I'm painfully aware that I'm juggling a lot of unknown factors at the moment. cheers, Jedd.
Re: Dazed and confused with Cassandra on EC2 ...
On 9/17/10 7:41 AM, Jedd Rashbrooke wrote: Happy times. This was when the cluster was modestly sized - 20-50GB. It's now about 200GB, and performance has dropped by an order of magnitude - perhaps 5-6 hours to do the same amount of work, using the same codebase and the same input data. You don't mention which version of the deb package you're using, but : https://issues.apache.org/jira/browse/CASSANDRA-1214 Is a performance hit which occurs as a result of the growth in data size, and could be your issue. Have you checked for swapping behavior like the above while your system is unhappy? =Rob
Re: Dazed and confused with Cassandra on EC2 ...
Hi Dave, Thank you for your response. I can clarify a couple of things here: 2. You grew from 2 nodes to 4, but the original 2 nodes have 200GB and the 2 new ones have 40 GB. What's the recommended practice for rebalancing (i.e., when should you do it), what's the actual procedure, and what's the expected impact of it? + is it likely to cause a problem in the short term if I don't (ie. if I just wait until 'normal activity' to somehow even out the distribution of data). 3. Cassandra nodes disappear. (I'm not quite clear what this means.) Nodetool reports the node as down. I'm seeing lots of machine-x is DOWN in the logs. Flapping, actually. I don't have any swap configured (which I've read somewhere might induce flapping). The machine also feels like it goes on a hiatus - separately, but typically observed at the same time. Tail -f on the Cassandra logs delays for several minutes, pending ssh's to the box also stall until 'something' happens that releases the machine from its slumber. Typically that something is a message in the logs that a compaction of a hintedhandoff has completed. As I say, nmon/top show minimal network disk activity, and just one of the four cores flatlining during this time. The machine *should* be more responsive. Actually: http://pastebin.com/AeM2VgL3 All the machines referenced in there are ones that are in the cluster now. 4. You took a machine offline without decommissioning it from the cluster. Now the machine is gone, but the other nodes (in Gossip logs) report that they are still looking for it. How do you stop nodes from looking for a removed node? I was attempting to drain the thing first, but that was stalling, so I stopped Cassandra then stopped the box. The storage and config were on EBS (persistent disk) so they came back - it's just that the IP address of the machine changed. I typically use my own assigned hostnames (cass-01, cass-02, etc, say) but for proper resolution I use the EC2 'internal hostnames', which were updated to all four Cassandra boxes, the other three instances of Cassandra were stopped, and then all four brought back up. You say you have similar EC2-related thoughts .. have you done much on the EC2 hardware so far? Are you seeing the same kind of thing? cheers, Jedd.
Re: Dazed and confused with Cassandra on EC2 ...
Hi Rob, Thanks for your suggestions. I should have been a bit more verbose in my platform description -- I'm using 64-bit instances, which I think in a Ben Black video I saw led to a sensible default usage of mmap when left at auto. Should I look at forcing this setting? You don't mention which version of the deb package you're using, but : I'm using 0.6.5 - the ones bundled by Eric Evans eev...@apache.org Because these are 32GB machines, I've not configured them with any swap at all. I've rarely done this in the past - but was aware there was this swap-hell scenario with JVM's, and the rationale makes sense -- it's better to have the JVM crash ''cleanly than to have it grind the machine to a halt and make it impossible to get onto the machine to kill the process. cheers, Jedd.