RE: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Dr . Martin Grabmüller
 The other problem is: if I keep mixed write and read (e.g, 8 
 write threads
 plus 7 read threads) against the 2-nodes cluster 
 continuously, the read
 latency will go up gradually (along with the size of 
 Cassandra data file),
 and at the end it will become ~40ms (up from ~20ms) even with only 15
 threads. During this process the data file grew from 1.6GB to 
 over 3GB even
 if I kept writing the same key/values to Cassandra. It seems 
 that Cassandra
 keeps appending to sstable data files and will only clean up 
 them during
 node cleanup or compact (please correct me if this is incorrect). 

In my tests I have observed that good read latency depends on keeping
the number of data files low.  In my current test setup, I have stored
1.9 TB of data on a single node, which is in 21 data files, and read
latency is between 10 and 60ms (for small reads, larger read of course
take more time).  In earlier stages of my test, I had up to 5000
data files, and read performance was quite bad: my configured 10-second
RPC timeout was regularly encountered.

The number of data files is reduced whenever Cassandra compacts them,
which is either automatically, when enough datafiles are generated by
continuous writing, or when triggered by nodeprobe compact, cleanup etc.

So my advice is to keep the write throughput low enough so that Cassandra
can keep up compacting the data files.  For high write throughput, you need
fast drives, if possible on different RAIDs, which are configured as
different DataDirectories for Cassandra.  On my setup (6 drives in a single
RAID-5 configuration), compaction is quite slow: sequential reads/writes
are done at 150 MB/s, whereas during compaction, read/write-performance
drops to a few MB/s.  You definitively want more than one logical drive,
so that Cassandra can alternate between them when flushin memtables and
when compacting.

I would really be interested whether my observations are shared by other
people on this list.

Thanks!

Martin


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller 
martin.grabmuel...@eleven.de wrote:

 In my tests I have observed that good read latency depends on keeping
 the number of data files low.  In my current test setup, I have stored
 1.9 TB of data on a single node, which is in 21 data files, and read
 latency is between 10 and 60ms (for small reads, larger read of course
 take more time).  In earlier stages of my test, I had up to 5000
 data files, and read performance was quite bad: my configured 10-second
 RPC timeout was regularly encountered.


I believe it is known that crossing sstables is O(NlogN) but I'm unable to
find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
enlighten me, but in any case I believe
https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
it.

Keeping write volume low enough that compaction can keep up is one solution,
and throwing hardware at the problem is another, if necessary.  Also, the
row caching in trunk (soon to be 0.6 we hope) helps greatly for repeat hits.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Dumped 50mil records into my 2-node cluster overnight, made sure that
there's not many data files (around 30 only) per Martin's suggestion. The
size of the data directory is 63GB. Now when I read records from the cluster
the read latency is still ~44ms, --there's no write happening during the
read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
saturated:

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00

CPU usage is low.

Does this mean disk i/o is the bottleneck for my case? Will it help if I
increase KCF to cache all sstable index?

Also, this is the almost a read-only mode test, and in reality, our
write/read ratio is close to 1:1 so I'm guessing read latency will even go
higher in that case because there will be difficult for cassandra to find a
good moment to compact the data files that are being busy written.

Thanks,
-Weijun


On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller 
 martin.grabmuel...@eleven.de wrote:

 In my tests I have observed that good read latency depends on keeping
 the number of data files low.  In my current test setup, I have stored
 1.9 TB of data on a single node, which is in 21 data files, and read
 latency is between 10 and 60ms (for small reads, larger read of course
 take more time).  In earlier stages of my test, I had up to 5000
 data files, and read performance was quite bad: my configured 10-second
 RPC timeout was regularly encountered.


 I believe it is known that crossing sstables is O(NlogN) but I'm unable to
 find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
 enlighten me, but in any case I believe
 https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
 it.

 Keeping write volume low enough that compaction can keep up is one
 solution, and throwing hardware at the problem is another, if necessary.
  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
 repeat hits.

 -Brandon



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
One more thoughts about Martin's suggestion: is it possible to put the data
files into multiple directories that are located in different physical
disks? This should help to improve the i/o bottleneck issue.

Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?

-Weijun

On Tue, Feb 16, 2010 at 9:50 AM, Weijun Li weiju...@gmail.com wrote:

 Dumped 50mil records into my 2-node cluster overnight, made sure that
 there's not many data files (around 30 only) per Martin's suggestion. The
 size of the data directory is 63GB. Now when I read records from the cluster
 the read latency is still ~44ms, --there's no write happening during the
 read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
 saturated:

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00
 sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00

 CPU usage is low.

 Does this mean disk i/o is the bottleneck for my case? Will it help if I
 increase KCF to cache all sstable index?

 Also, this is the almost a read-only mode test, and in reality, our
 write/read ratio is close to 1:1 so I'm guessing read latency will even go
 higher in that case because there will be difficult for cassandra to find a
 good moment to compact the data files that are being busy written.

 Thanks,
 -Weijun



 On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams dri...@gmail.comwrote:

 On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller 
 martin.grabmuel...@eleven.de wrote:

 In my tests I have observed that good read latency depends on keeping
 the number of data files low.  In my current test setup, I have stored
 1.9 TB of data on a single node, which is in 21 data files, and read
 latency is between 10 and 60ms (for small reads, larger read of course
 take more time).  In earlier stages of my test, I had up to 5000
 data files, and read performance was quite bad: my configured 10-second
 RPC timeout was regularly encountered.


 I believe it is known that crossing sstables is O(NlogN) but I'm unable to
 find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
 enlighten me, but in any case I believe
 https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
 it.

 Keeping write volume low enough that compaction can keep up is one
 solution, and throwing hardware at the problem is another, if necessary.
  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
 repeat hits.

 -Brandon





Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:50 AM, Weijun Li weiju...@gmail.com wrote:

 Dumped 50mil records into my 2-node cluster overnight, made sure that
 there's not many data files (around 30 only) per Martin's suggestion. The
 size of the data directory is 63GB. Now when I read records from the cluster
 the read latency is still ~44ms, --there's no write happening during the
 read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
 saturated:

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00
 sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00

 CPU usage is low.

 Does this mean disk i/o is the bottleneck for my case? Will it help if I
 increase KCF to cache all sstable index?


That's exactly what this means.  Disk is slow :(


 Also, this is the almost a read-only mode test, and in reality, our
 write/read ratio is close to 1:1 so I'm guessing read latency will even go
 higher in that case because there will be difficult for cassandra to find a
 good moment to compact the data files that are being busy written.


Reads that cause disk seeks are always going to slow things down, since disk
seeks are inherently the slowest operation in a machine.  Writes in
Cassandra should always be fast, as they do not cause any disk seeks.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:

 One more thoughts about Martin's suggestion: is it possible to put the data
 files into multiple directories that are located in different physical
 disks? This should help to improve the i/o bottleneck issue.


Yes, you can already do this, just add more DataFileDirectory directives
pointed at multiple drives.


 Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


Row cache and key cache both help tremendously if your read pattern has a
decent repeat rate.  Completely random io can only be so fast, however.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran nodeprobe compact on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:

 One more thoughts about Martin's suggestion: is it possible to put the
 data files into multiple directories that are located in different physical
 disks? This should help to improve the i/o bottleneck issue.


 Yes, you can already do this, just add more DataFileDirectory directives
 pointed at multiple drives.


 Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


 Row cache and key cache both help tremendously if your read pattern has a
 decent repeat rate.  Completely random io can only be so fast, however.

 -Brandon



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 12:16 PM, Weijun Li weiju...@gmail.com wrote:

 Thanks for for DataFileDirectory trick and I'll give a try.

 Just noticed the impact of number of data files: node A has 13 data files
 with read latency of 20ms and node B has 27 files with read latency of 60ms.
 After I ran nodeprobe compact on node B its read latency went up to 150ms.
 The read latency of node A became as low as 10ms. Is this normal behavior?
 I'm using random partitioner and the hardware/JVM settings are exactly the
 same for these two nodes.


It sounds like the latency jumped to 150ms because the newly written file
was not in the OS cache.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
 any way to utilize all of the heap space to decrease the read latency?


By default, Cassandra will use a 1GB heap, as set in bin/cassandra.in.sh.
 You can adjust the jvm heap there via the -Xmx option, but generally you
want to balance the jvm vs the OS cache.  With 6GB, I would probably give
2GB to the jvm, but if you aren't having issues now increasing the jvm's
memory probably won't provide any performance gains, but it's worth noting
that with row cache in 0.6 this may change.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Stu Hood
 After I ran nodeprobe compact on node B its read latency went up to 150ms.
The compaction process can take a while to finish... in 0.5 you need to watch 
the logs to figure out when it has actually finished, and then you should start 
seeing the improvement in read latency.

 Is there any way to utilize all of the heap space to decrease the read 
 latency?
In 0.5 you can adjust the number of keys that are cached by changing the 
'KeysCachedFraction' parameter in your config file. In 0.6 you can additionally 
cache rows. You don't want to use up all of the memory on your box for those 
caches though: you'll want to leave at least 50% for your OS's disk cache, 
which will store the full row content.


-Original Message-
From: Weijun Li weiju...@gmail.com
Sent: Tuesday, February 16, 2010 12:16pm
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency ( 
100ms)?

Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran nodeprobe compact on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:

 One more thoughts about Martin's suggestion: is it possible to put the
 data files into multiple directories that are located in different physical
 disks? This should help to improve the i/o bottleneck issue.


 Yes, you can already do this, just add more DataFileDirectory directives
 pointed at multiple drives.


 Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


 Row cache and key cache both help tremendously if your read pattern has a
 decent repeat rate.  Completely random io can only be so fast, however.

 -Brandon





Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Still have high read latency with 50mil records in the 2-node cluster
(replica 2). I restarted both nodes but read latency is still above 60ms and
disk i/o saturation is high. Tried compact and repair but doesn't help much.
When I reduced the client threads from 15 to 5 it looks a lot better but
throughput is kind of low. I changed using flushing thread of 16 instead the
defaulted 8, could that cause the disk saturation issue?

For benchmark with decent throughput and latency, how many client threads do
they use? Can anyone share your storage-conf.xml in well-tuned high volume
cluster?

-Weijun

On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood stu.h...@rackspace.com wrote:

  After I ran nodeprobe compact on node B its read latency went up to
 150ms.
 The compaction process can take a while to finish... in 0.5 you need to
 watch the logs to figure out when it has actually finished, and then you
 should start seeing the improvement in read latency.

  Is there any way to utilize all of the heap space to decrease the read
 latency?
 In 0.5 you can adjust the number of keys that are cached by changing the
 'KeysCachedFraction' parameter in your config file. In 0.6 you can
 additionally cache rows. You don't want to use up all of the memory on your
 box for those caches though: you'll want to leave at least 50% for your OS's
 disk cache, which will store the full row content.


 -Original Message-
 From: Weijun Li weiju...@gmail.com
 Sent: Tuesday, February 16, 2010 12:16pm
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra benchmark shows OK throughput but high read latency
 ( 100ms)?

 Thanks for for DataFileDirectory trick and I'll give a try.

 Just noticed the impact of number of data files: node A has 13 data files
 with read latency of 20ms and node B has 27 files with read latency of
 60ms.
 After I ran nodeprobe compact on node B its read latency went up to
 150ms.
 The read latency of node A became as low as 10ms. Is this normal behavior?
 I'm using random partitioner and the hardware/JVM settings are exactly the
 same for these two nodes.

 Another problem is that Java heap usage is always 900mb out of 6GB? Is
 there
 any way to utilize all of the heap space to decrease the read latency?

 -Weijun

 On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com
 wrote:

  On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:
 
  One more thoughts about Martin's suggestion: is it possible to put the
  data files into multiple directories that are located in different
 physical
  disks? This should help to improve the i/o bottleneck issue.
 
 
  Yes, you can already do this, just add more DataFileDirectory
 directives
  pointed at multiple drives.
 
 
  Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
 
 
  Row cache and key cache both help tremendously if your read pattern has a
  decent repeat rate.  Completely random io can only be so fast, however.
 
  -Brandon
 





Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Jonathan Ellis
Have you tried increasing KeysCachedFraction?

On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li weiju...@gmail.com wrote:
 Still have high read latency with 50mil records in the 2-node cluster
 (replica 2). I restarted both nodes but read latency is still above 60ms and
 disk i/o saturation is high. Tried compact and repair but doesn't help much.
 When I reduced the client threads from 15 to 5 it looks a lot better but
 throughput is kind of low. I changed using flushing thread of 16 instead the
 defaulted 8, could that cause the disk saturation issue?

 For benchmark with decent throughput and latency, how many client threads do
 they use? Can anyone share your storage-conf.xml in well-tuned high volume
 cluster?

 -Weijun

 On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood stu.h...@rackspace.com wrote:

  After I ran nodeprobe compact on node B its read latency went up to
  150ms.
 The compaction process can take a while to finish... in 0.5 you need to
 watch the logs to figure out when it has actually finished, and then you
 should start seeing the improvement in read latency.

  Is there any way to utilize all of the heap space to decrease the read
  latency?
 In 0.5 you can adjust the number of keys that are cached by changing the
 'KeysCachedFraction' parameter in your config file. In 0.6 you can
 additionally cache rows. You don't want to use up all of the memory on your
 box for those caches though: you'll want to leave at least 50% for your OS's
 disk cache, which will store the full row content.


 -Original Message-
 From: Weijun Li weiju...@gmail.com
 Sent: Tuesday, February 16, 2010 12:16pm
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra benchmark shows OK throughput but high read latency
 ( 100ms)?

 Thanks for for DataFileDirectory trick and I'll give a try.

 Just noticed the impact of number of data files: node A has 13 data files
 with read latency of 20ms and node B has 27 files with read latency of
 60ms.
 After I ran nodeprobe compact on node B its read latency went up to
 150ms.
 The read latency of node A became as low as 10ms. Is this normal behavior?
 I'm using random partitioner and the hardware/JVM settings are exactly the
 same for these two nodes.

 Another problem is that Java heap usage is always 900mb out of 6GB? Is
 there
 any way to utilize all of the heap space to decrease the read latency?

 -Weijun

 On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com
 wrote:

  On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:
 
  One more thoughts about Martin's suggestion: is it possible to put the
  data files into multiple directories that are located in different
  physical
  disks? This should help to improve the i/o bottleneck issue.
 
 
  Yes, you can already do this, just add more DataFileDirectory
  directives
  pointed at multiple drives.
 
 
  Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
 
 
  Row cache and key cache both help tremendously if your read pattern has
  a
  decent repeat rate.  Completely random io can only be so fast, however.
 
  -Brandon
 






Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o.
I compacted the data to be a 60GB (took quite a while to finish and it
increased latency as expected) one but doesn't help much either.

If I set KCF to 1 (meaning to cache all sstable index), how much memory will
it take for 50mil keys? Is the index a straight key-offset map? I guess key
is 16 bytes and offset is 8 bytes. Will KCF=1 help to reduce disk i/o?

-Weijun

On Tue, Feb 16, 2010 at 5:18 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Have you tried increasing KeysCachedFraction?

 On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li weiju...@gmail.com wrote:
  Still have high read latency with 50mil records in the 2-node cluster
  (replica 2). I restarted both nodes but read latency is still above 60ms
 and
  disk i/o saturation is high. Tried compact and repair but doesn't help
 much.
  When I reduced the client threads from 15 to 5 it looks a lot better but
  throughput is kind of low. I changed using flushing thread of 16 instead
 the
  defaulted 8, could that cause the disk saturation issue?
 
  For benchmark with decent throughput and latency, how many client threads
 do
  they use? Can anyone share your storage-conf.xml in well-tuned high
 volume
  cluster?
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood stu.h...@rackspace.com
 wrote:
 
   After I ran nodeprobe compact on node B its read latency went up to
   150ms.
  The compaction process can take a while to finish... in 0.5 you need to
  watch the logs to figure out when it has actually finished, and then you
  should start seeing the improvement in read latency.
 
   Is there any way to utilize all of the heap space to decrease the read
   latency?
  In 0.5 you can adjust the number of keys that are cached by changing the
  'KeysCachedFraction' parameter in your config file. In 0.6 you can
  additionally cache rows. You don't want to use up all of the memory on
 your
  box for those caches though: you'll want to leave at least 50% for your
 OS's
  disk cache, which will store the full row content.
 
 
  -Original Message-
  From: Weijun Li weiju...@gmail.com
  Sent: Tuesday, February 16, 2010 12:16pm
  To: cassandra-user@incubator.apache.org
  Subject: Re: Cassandra benchmark shows OK throughput but high read
 latency
  ( 100ms)?
 
  Thanks for for DataFileDirectory trick and I'll give a try.
 
  Just noticed the impact of number of data files: node A has 13 data
 files
  with read latency of 20ms and node B has 27 files with read latency of
  60ms.
  After I ran nodeprobe compact on node B its read latency went up to
  150ms.
  The read latency of node A became as low as 10ms. Is this normal
 behavior?
  I'm using random partitioner and the hardware/JVM settings are exactly
 the
  same for these two nodes.
 
  Another problem is that Java heap usage is always 900mb out of 6GB? Is
  there
  any way to utilize all of the heap space to decrease the read latency?
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com
  wrote:
 
   On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com
 wrote:
  
   One more thoughts about Martin's suggestion: is it possible to put
 the
   data files into multiple directories that are located in different
   physical
   disks? This should help to improve the i/o bottleneck issue.
  
  
   Yes, you can already do this, just add more DataFileDirectory
   directives
   pointed at multiple drives.
  
  
   Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
  
  
   Row cache and key cache both help tremendously if your read pattern
 has
   a
   decent repeat rate.  Completely random io can only be so fast,
 however.
  
   -Brandon
  
 
 
 
 



RE: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-15 Thread Weijun Li
It seems that read latency is sensitive to number of threads (or thrift
clients): after reducing number of threads to 15 and read latency decreased
to ~20ms. 

The other problem is: if I keep mixed write and read (e.g, 8 write threads
plus 7 read threads) against the 2-nodes cluster continuously, the read
latency will go up gradually (along with the size of Cassandra data file),
and at the end it will become ~40ms (up from ~20ms) even with only 15
threads. During this process the data file grew from 1.6GB to over 3GB even
if I kept writing the same key/values to Cassandra. It seems that Cassandra
keeps appending to sstable data files and will only clean up them during
node cleanup or compact (please correct me if this is incorrect). 
 
Here's my test settings:

JVM xmx: 6GB
KCF: 0.3
Memtable: 512MB.
Number of records: 1 millon (payload is 1000 bytes)

I used JMX and iostat to watch the cluster but can't find any clue for the
increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io
saturation all seem to be clean. One exception is that the wait time in
iostat goes up quickly once a while but is a small number for most of the
time. Another thing I noticed is that JVM doesn't use more than 1GB of
memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and
increased memtable size to 512MB.

Did I miss anything here? How can I diagnose this kind of increasing read
latency issue? Is there any performance tuning guide available?

Thanks,
-Weijun


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Sunday, February 14, 2010 6:22 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency
( 100ms)?

are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li weiju...@gmail.com wrote:
 Hello,



 I saw some Cassandra benchmark reports mentioning read latency that is
less
 than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support
 that. Here's my settings:



 Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM

 ReplicationFactor=2 Partitioner=Random

 JVM Xmx: 4GB

 Memory table size: 512MB (haven't figured out how to enable binary
memtable
 so I set both memtable number to 512mb)

 Flushing threads: 2-4

 Payload: ~1000 bytes, 3 columns in one CF.

 Read/write time measure: get startTime right before each Java thrift call,
 transport objects are pre-created upon creation of each thread.



 The result shows that total write throughput is around 2000/sec (for 2
nodes
 in the cluster) which is not bad, and read throughput is just around
 750/sec. However for each thread the average read latency is more than
 100ms. I'm running 100 threads for the testing and each thread randomly
pick
 a node for thrift call. So the read/sec of each thread is just around 7.5,
 meaning duration of each thrift call is 1000/7.5=133ms. Without
replication
 the cluster write throughput is around 3300/s, and read throughput is
around
 1400/s, so the read latency is still around 70ms without replication.



 Is there anything wrong in my benchmark test? How can I achieve a
reasonable
 read latency ( 30ms)?



 Thanks,

 -Weijun







Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-14 Thread Weijun Li
Hello,

 

I saw some Cassandra benchmark reports mentioning read latency that is less 
than 50ms or even 30ms. But my benchmark with 0.5 doesn’t seem to support that. 
Here’s my settings:

 

Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM

ReplicationFactor=2 Partitioner=Random

JVM Xmx: 4GB

Memory table size: 512MB (haven’t figured out how to enable binary memtable so 
I set both memtable number to 512mb)

Flushing threads: 2-4

Payload: ~1000 bytes, 3 columns in one CF.

Read/write time measure: get startTime right before each Java thrift call, 
transport objects are pre-created upon creation of each thread.

 

The result shows that total write throughput is around 2000/sec (for 2 nodes in 
the cluster) which is not bad, and read throughput is just around 750/sec. 
However for each thread the average read latency is more than 100ms. I’m 
running 100 threads for the testing and each thread randomly pick a node for 
thrift call. So the read/sec of each thread is just around 7.5, meaning 
duration of each thrift call is 1000/7.5=133ms. Without replication the cluster 
write throughput is around 3300/s, and read throughput is around 1400/s, so the 
read latency is still around 70ms without replication.

 

Is there anything wrong in my benchmark test? How can I achieve a reasonable 
read latency ( 30ms)?

 

Thanks,

-Weijun

 

 



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-14 Thread Jonathan Ellis
are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li weiju...@gmail.com wrote:
 Hello,



 I saw some Cassandra benchmark reports mentioning read latency that is less
 than 50ms or even 30ms. But my benchmark with 0.5 doesn’t seem to support
 that. Here’s my settings:



 Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM

 ReplicationFactor=2 Partitioner=Random

 JVM Xmx: 4GB

 Memory table size: 512MB (haven’t figured out how to enable binary memtable
 so I set both memtable number to 512mb)

 Flushing threads: 2-4

 Payload: ~1000 bytes, 3 columns in one CF.

 Read/write time measure: get startTime right before each Java thrift call,
 transport objects are pre-created upon creation of each thread.



 The result shows that total write throughput is around 2000/sec (for 2 nodes
 in the cluster) which is not bad, and read throughput is just around
 750/sec. However for each thread the average read latency is more than
 100ms. I’m running 100 threads for the testing and each thread randomly pick
 a node for thrift call. So the read/sec of each thread is just around 7.5,
 meaning duration of each thrift call is 1000/7.5=133ms. Without replication
 the cluster write throughput is around 3300/s, and read throughput is around
 1400/s, so the read latency is still around 70ms without replication.



 Is there anything wrong in my benchmark test? How can I achieve a reasonable
 read latency ( 30ms)?



 Thanks,

 -Weijun