Re: TimeOutExceptions and Cluster Performance
The combination of 'too many open files' and lots of memtable flushes could mean you have tons and tons of sstables on disk. This can make reads especially slow. If you are seeing the timeouts on reads a lot more often than on writes, then this explanation might make sense, and you should watch https://issues.apache.org/jira/browse/CASSANDRA-685. Thanks, Stu -Original Message- From: Jonathan Ellis jbel...@gmail.com Sent: Friday, February 12, 2010 9:43pm To: cassandra-user@incubator.apache.org Subject: Re: TimeOutExceptions and Cluster Performance There's a lot more details that would be useful, but if you are on the verge of OOMing and something actually running out, then that's probably the culprit; when the JVM gets low on ram it will consume all your CPU trying to GC enough to continue. (you mentioned seeing high cpu on one core which tends to corroborate this; to confirm you can look at the thread using the CPU: http://publib.boulder.ibm.com/infocenter/javasdk/tools/index.jsp?topic=/com.ibm.java.doc.igaa/_1vg0001475cb4a-1190e2e0f74-8000_1007.html) Look at your executor queues, in the output of nodeprobe tpstats if you have no other metrics system. You probably are just swamping it with writes, if you have 1000s of ops in any of the pending queues, that's bad. -Jonathan On Fri, Feb 12, 2010 at 7:40 PM, Stephen Hamer stephen.ha...@gmail.com wrote: Hi, I'm running a 5 node Cassandra cluster and am having a very tough time getting reasonable performance from it. Many of the requests are failing with TimeOutException. This is making it difficult to use Cassandra in a production setting. The cluster was running fine for a week or two (it was created 3 weeks ago) but has started to degrade in the last week. The cluster was originally only 3 nodes but when performance started to degrade I added another two nodes. This doesn't seem to have helped though. Requests being made from the my application are being balanced across the cluster in a round robin fashion. Many of these requests are failing with TimeOutException. When the occurs I can look at the DB servers and several of them fully utilizing 1 core. I can turn off my application when this is going on (which stops all reads and writes to Cassandra). The cluster seems to stay in this state for another several hour before returning to a resting state. When the CPU is loaded I see lots of messages about en-queuing, sorting, and writing memtables so I have tried adjusting the memtable size down to 16MB and raised the MemtableFlushAfterMinutes to 1440. This doesn't seem to have affected anything though. I was seeing errors about too many file descriptors being open so I added “ulimit –n 32768” to Cassandra.in.sh. This seems to fixed this. I was also seeing lots of out of memory exceptions so I raised the heap size to 4GB. This has helped but not eliminated the OOM issues. I'm not sure if it's related to any of the performance issues but I see lots of log entries about DigestMismatchExceptions. I've included a sample of the exceptions below. My Cassandra cluster is almost unusable in its current state because of the number of timeout exceptions that I'm seeing. I suspect that this is because of a configuration or I have improperly set something up. It feels like the database has entered a bad state which is causing it to churn as much as it is but have no way to verify this. What steps can I take to address the performance issues I am seeing and the consistent stream of TimeOutExceptions? Thanks, Stephen Here are some specifics about the cluster configuration: 5 Large EC2 instances - 7.5 GB ram, 2 cores (64bit, 1-1.2Ghz), data and commit logs stored on separate EBS volumes. Boxes are running Debian 5. r...@prod-cassandra4 ~/cassandra # bin/nodeprobe -host localhost ring Address Status Load Range Ring 101279862673517536112907910111793343978 10.254.55.191 Up 2.94 GB 27246729060092122727944947571993545 |--| 10.214.119.127Up 3.67 GB 34209800341332764076889844611182786881 | ^ 10.215.122.208Up 11.86 GB 42649376116143870288751410571644302377 v | 10.215.30.47 Up 6.37 GB 81374929113514034361049243620869663203 | ^ 10.208.246.160Up 5.15 GB 101279862673517536112907910111793343978 |--| I am running the 0.5 release of Cassandra (at commit 44e8c2e...). Here are some of my configuration options: Memory, disk, performance section of storage-conf.xml (I've only included options that I've changed from the defaults): Partitionerorg.apache.cassandra.dht.RandomPartitioner/Partitioner ReplicationFactor3/ReplicationFactor SlicedBufferSizeInKB512/SlicedBufferSizeInKB FlushDataBufferSizeInMB64/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB16/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableSizeInMB16/MemtableSizeInMB
Re: How to unit test my code calling Cassandra with Thift
I've committed to trunk all the required code and posted about it, hope you find it useful http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ On Sun, Jan 24, 2010 at 12:20 PM, Richard Grossman richie...@gmail.comwrote: Great Ran, I think I've missed the .setDaemon to keep the server alive. Thanks Richard On Sun, Jan 24, 2010 at 12:02 PM, Ran Tavory ran...@gmail.com wrote: Here's the code I've just written over the weekend and started using in test: package com.outbrain.data.cassandra.service; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.service.CassandraDaemon; import org.apache.cassandra.utils.FileUtils; import org.apache.thrift.transport.TTransportException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * An in-memory cassandra storage service that listens to the thrift interface. * Useful for unit testing, * * @author Ran Tavory (r...@outbain.com) * */ public class InProcessCassandraServer implements Runnable { private static final Logger log = LoggerFactory.getLogger(InProcessCassandraServer.class); CassandraDaemon cassandraDaemon; public void init() { try { prepare(); } catch (IOException e) { log.error(Cannot prepare cassandra., e); } try { cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); } catch (TTransportException e) { log.error(TTransportException, e); } catch (IOException e) { log.error(IOException, e); } } @Override public void run() { cassandraDaemon.start(); } public void stop() { cassandraDaemon.stop(); rmdir(tmp); } /** * Creates all files and directories needed * @throws IOException */ private void prepare() throws IOException { // delete tmp dir first rmdir(tmp); // make a tmp dir and copy storag-conf.xml and log4j.properties to it copy(/cassandra/storage-conf.xml, tmp); copy(/cassandra/log4j.properties, tmp); System.setProperty(storage-config, tmp); // make cassandra directories. for (String s: DatabaseDescriptor.getAllDataFileLocations()) { mkdir(s); } mkdir(DatabaseDescriptor.getBootstrapFileLocation()); mkdir(DatabaseDescriptor.getLogFileLocation()); } /** * Copies a resource from within the jar to a directory. * * @param resourceName * @param directory * @throws IOException */ private void copy(String resource, String directory) throws IOException { mkdir(directory); InputStream is = getClass().getResourceAsStream(resource); String fileName = resource.substring(resource.lastIndexOf(/) + 1); File file = new File(directory + System.getProperty(file.separator) + fileName); OutputStream out = new FileOutputStream(file); byte buf[] = new byte[1024]; int len; while ((len = is.read(buf)) 0) { out.write(buf, 0, len); } out.close(); is.close(); } /** * Creates a directory * @param dir * @throws IOException */ private void mkdir(String dir) throws IOException { FileUtils.createDirectory(dir); } /** * Removes a directory from file system * @param dir */ private void rmdir(String dir) { FileUtils.deleteDir(new File(dir)); } } And in the test class: public class XxxTest { private static InProcessCassandraServer cassandra; @BeforeClass public static void setup() throws TTransportException, IOException, InterruptedException { cassandra = new InProcessCassandraServer(); cassandra.init(); Thread t = new Thread(cassandra); t.setDaemon(true); t.start(); } @AfterClass public static void shutdown() { cassandra.stop(); } ... test } Now you can connect to localhost:9160. Assumptions: The code assumes you have two files in your classpath: /cassandra/stogage-config.xml and /cassandra/log4j.xml. This is convenient if you use maven, just throw them at /src/test/resources/cassandra/ If you don't work with maven or would like to configure the configuration files differently it should be fairly easy, just change the prepare() method. On Sun, Jan 24, 2010 at 10:54 AM, Richard Grossman richie...@gmail.comwrote: So Is there anybody ? Unit testing is important people ... Thanks On Thu, Jan 21, 2010 at 12:09 PM, Richard Grossman richie...@gmail.comwrote: Here is the code I use class startServer implements Runnable { @Override public void run() { try { CassandraDaemon cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); cassandraDaemon.start(); } catch (TTransportException e) {