Re: TimeOutExceptions and Cluster Performance

2010-02-13 Thread Stu Hood
The combination of 'too many open files' and lots of memtable flushes could 
mean you have tons and tons of sstables on disk. This can make reads especially 
slow.

If you are seeing the timeouts on reads a lot more often than on writes, then 
this explanation might make sense, and you should watch 
https://issues.apache.org/jira/browse/CASSANDRA-685.

Thanks,
Stu

-Original Message-
From: Jonathan Ellis jbel...@gmail.com
Sent: Friday, February 12, 2010 9:43pm
To: cassandra-user@incubator.apache.org
Subject: Re: TimeOutExceptions and Cluster Performance

There's a lot more details that would be useful, but if you are on the
verge of OOMing and something actually running out, then that's
probably the culprit; when the JVM gets low on ram it will consume all
your CPU trying to GC enough to continue.  (you mentioned seeing high
cpu on one core which tends to corroborate this; to confirm you can
look at the thread using the CPU:
http://publib.boulder.ibm.com/infocenter/javasdk/tools/index.jsp?topic=/com.ibm.java.doc.igaa/_1vg0001475cb4a-1190e2e0f74-8000_1007.html)

Look at your executor queues, in the output of nodeprobe tpstats if
you have no other metrics system.  You probably are just swamping it
with writes, if you have 1000s of ops in any of the pending queues,
that's bad.

-Jonathan

On Fri, Feb 12, 2010 at 7:40 PM, Stephen Hamer stephen.ha...@gmail.com wrote:
 Hi,
 I'm running a 5 node Cassandra cluster and am having a very tough time
 getting reasonable performance from it. Many of the requests are failing
 with TimeOutException. This is making it difficult to use Cassandra in a
 production setting.

 The cluster was running fine for a week or two (it was created 3 weeks ago)
 but has started to degrade in the last week. The cluster was originally only
 3 nodes but when performance started to degrade I added another two nodes.
 This doesn't seem to have helped though.

 Requests being made from the my application are being balanced across the
 cluster in a round robin fashion. Many of these requests are failing with
 TimeOutException. When the occurs I can look at the DB servers and several
 of them fully utilizing 1 core. I can turn off my application when this is
 going on (which stops all reads and writes to Cassandra). The cluster seems
 to stay in this state for another several hour before returning to a resting
 state.

 When the CPU is loaded I see lots of messages about en-queuing, sorting, and
 writing memtables so I have tried adjusting the memtable size down to 16MB
 and raised the MemtableFlushAfterMinutes to 1440. This doesn't seem to have
 affected anything though.

 I was seeing errors about too many file descriptors being open so I added
 “ulimit –n 32768” to Cassandra.in.sh. This seems to fixed this. I was also
 seeing lots of out of memory exceptions so I raised the heap size to 4GB.
 This has helped but not eliminated the OOM issues.

 I'm not sure if it's related to any of the performance issues but I see lots
 of log entries about DigestMismatchExceptions. I've included a sample of the
 exceptions below.

 My Cassandra cluster is almost unusable in its current state because of the
 number of timeout exceptions that I'm seeing. I suspect that this is because
 of a configuration or I have improperly set something up. It feels like the
 database has entered a bad state which is causing it to churn as much as it
 is but have no way to verify this.

 What steps can I take to address the performance issues I am seeing and the
 consistent stream of TimeOutExceptions?

 Thanks,
 Stephen


 Here are some specifics about the cluster configuration:

 5 Large EC2 instances - 7.5 GB ram, 2 cores (64bit, 1-1.2Ghz), data and
 commit logs stored on separate EBS volumes. Boxes are running Debian 5.

 r...@prod-cassandra4 ~/cassandra # bin/nodeprobe -host localhost ring
 Address       Status     Load          Range
      Ring


 101279862673517536112907910111793343978
 10.254.55.191 Up         2.94 GB       27246729060092122727944947571993545
      |--|
 10.214.119.127Up         3.67 GB
 34209800341332764076889844611182786881     |   ^
 10.215.122.208Up         11.86 GB
  42649376116143870288751410571644302377     v   |
 10.215.30.47  Up         6.37 GB
 81374929113514034361049243620869663203     |   ^
 10.208.246.160Up         5.15 GB
 101279862673517536112907910111793343978    |--|


 I am running the 0.5 release of Cassandra (at commit 44e8c2e...). Here are
 some of my configuration options:

 Memory, disk, performance section of storage-conf.xml (I've only included
 options that I've changed from the defaults):
 Partitionerorg.apache.cassandra.dht.RandomPartitioner/Partitioner
 ReplicationFactor3/ReplicationFactor

 SlicedBufferSizeInKB512/SlicedBufferSizeInKB
 FlushDataBufferSizeInMB64/FlushDataBufferSizeInMB
 FlushIndexBufferSizeInMB16/FlushIndexBufferSizeInMB
 ColumnIndexSizeInKB64/ColumnIndexSizeInKB
 MemtableSizeInMB16/MemtableSizeInMB
 

Re: How to unit test my code calling Cassandra with Thift

2010-02-13 Thread Ran Tavory
I've committed to trunk all the required code and posted about it, hope you
find it useful
http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/


On Sun, Jan 24, 2010 at 12:20 PM, Richard Grossman richie...@gmail.comwrote:

 Great Ran,

 I think I've missed the .setDaemon to keep the server alive.
 Thanks

 Richard

 On Sun, Jan 24, 2010 at 12:02 PM, Ran Tavory ran...@gmail.com wrote:

 Here's the code I've just written over the weekend and started using in
 test:


 package com.outbrain.data.cassandra.service;

 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;

 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.service.CassandraDaemon;
 import org.apache.cassandra.utils.FileUtils;
 import org.apache.thrift.transport.TTransportException;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

 /**
  * An in-memory cassandra storage service that listens to the thrift
 interface.
  * Useful for unit testing,
  *
  * @author Ran Tavory (r...@outbain.com)
  *
  */
 public class InProcessCassandraServer implements Runnable {

   private static final Logger log =
 LoggerFactory.getLogger(InProcessCassandraServer.class);

   CassandraDaemon cassandraDaemon;

   public void init() {
 try {
   prepare();
 } catch (IOException e) {
   log.error(Cannot prepare cassandra., e);
 }
 try {
   cassandraDaemon = new CassandraDaemon();
   cassandraDaemon.init(null);
 } catch (TTransportException e) {
   log.error(TTransportException, e);
 } catch (IOException e) {
   log.error(IOException, e);
 }
   }

   @Override
   public void run() {
 cassandraDaemon.start();
   }

   public void stop() {
 cassandraDaemon.stop();
 rmdir(tmp);
   }


   /**
* Creates all files and directories needed
* @throws IOException
*/
   private void prepare() throws IOException {
 // delete tmp dir first
 rmdir(tmp);
 // make a tmp dir and copy storag-conf.xml and log4j.properties to it
 copy(/cassandra/storage-conf.xml, tmp);
 copy(/cassandra/log4j.properties, tmp);
 System.setProperty(storage-config, tmp);

 // make cassandra directories.
 for (String s: DatabaseDescriptor.getAllDataFileLocations()) {
   mkdir(s);
 }
 mkdir(DatabaseDescriptor.getBootstrapFileLocation());
 mkdir(DatabaseDescriptor.getLogFileLocation());
   }

   /**
* Copies a resource from within the jar to a directory.
*
* @param resourceName
* @param directory
* @throws IOException
*/
   private void copy(String resource, String directory) throws IOException
 {
 mkdir(directory);
 InputStream is = getClass().getResourceAsStream(resource);
 String fileName = resource.substring(resource.lastIndexOf(/) + 1);
 File file = new File(directory + System.getProperty(file.separator)
 + fileName);
 OutputStream out = new FileOutputStream(file);
 byte buf[] = new byte[1024];
 int len;
 while ((len = is.read(buf))  0) {
   out.write(buf, 0, len);
 }
 out.close();
 is.close();
   }

   /**
* Creates a directory
* @param dir
* @throws IOException
*/
   private void mkdir(String dir) throws IOException {
 FileUtils.createDirectory(dir);
   }

   /**
* Removes a directory from file system
* @param dir
*/
   private void rmdir(String dir) {
 FileUtils.deleteDir(new File(dir));
   }
 }


 And in the test class:

 public class XxxTest {

   private static InProcessCassandraServer cassandra;

   @BeforeClass
   public static void setup() throws TTransportException, IOException,
 InterruptedException {
 cassandra = new InProcessCassandraServer();
 cassandra.init();
 Thread t = new Thread(cassandra);
 t.setDaemon(true);
 t.start();
   }

   @AfterClass
   public static void shutdown() {
 cassandra.stop();
   }
 ... test
 }

 Now you can connect to localhost:9160.

 Assumptions:
 The code assumes you have two files in your classpath:
 /cassandra/stogage-config.xml and /cassandra/log4j.xml. This is convenient
 if you use maven, just throw them at /src/test/resources/cassandra/
 If you don't work with maven or would like to configure the configuration
 files differently it should be fairly easy, just change the prepare()
 method.



 On Sun, Jan 24, 2010 at 10:54 AM, Richard Grossman 
 richie...@gmail.comwrote:

 So Is there anybody ? Unit testing is important people ...
 Thanks


 On Thu, Jan 21, 2010 at 12:09 PM, Richard Grossman 
 richie...@gmail.comwrote:

 Here is the code I use
 class startServer implements Runnable {

 @Override
 public void run() {
 try {
 CassandraDaemon cassandraDaemon = new CassandraDaemon();
 cassandraDaemon.init(null);
 cassandraDaemon.start();
 } catch (TTransportException e) {