Re: Stress testing disk configurations. Your thoughts?

2011-04-18 Thread Jonathan Ellis
Separate commitlog matters the most when you are

(a) doing mixed read/write workload (i.e. most real-world scenarios) and
(b) using full CL durability (batch mode rather than default periodic sync)

If your hot data set fits in memory, reads are about as fast as
writes. Otherwise they will be substantially slower since they have to
do random i/o.

I definitely recommend #2 over #3, btw.

On Thu, Apr 14, 2011 at 11:34 AM, Nathan Milford nat...@milford.io wrote:
 Ahoy,
 I'm building out a new 0.7.4 cluster to migrate our 0.6.6 cluster to.
 While I'm waiting for the dev-side to get time to work on their side of the
 project I have a 10 node cluster evenly split across two data centers (NY 
 LA) and was looking to do some testing while I could.
 My primary focus is on disk configurations.  Space isn't a huge issue, our
 current data set is ~30G on each node and I imagine that'll go up since I
 intend on tweaking the RF on the new cluster.
 Each node has 6 x 146G 10K SAS drives.  I want to test:
 1) 6 disks in R0 where everything is written to the same stripe
 2) 1 disk for OS+Commitlog and 5 disks in R0 for data.
 3) 1 disk for OS+Commitlog and 5 individual disks defined
 as separate data_file_directories.
 I suspect I'll see best performance with option 3, but the issue has become
 political\religious and there are internal doubts that separating the commit
 log and data will truly improve performance despite documentation and logic
 indicating otherwise.  Thus the test :)
 Right now I've been tinkering and not being very scientific while I work out
 a testing methodology and get used to the tools.  I've just been running
 zznate's cassandra-stress against a single node and measuring the time it
 takes to read and write N rows.
 Unscientifically I've found that they all perform about the same. It is hard
 to judge because, when writing to a single node, reads take exponentially
 longer.  Writing 10M rows may take ~500 seconds, but reading will take ~5000
 seconds.  I'm sure this will even out when I test across more than one node.
 Early next week I'll be able to test against all 10 nodes with a realistic
 replication factor.
 I'd really love to hear some people's thoughts on methodologies and what I
 should be looking at/for other than iostat and the time for the test to
 inset/read.
 Thanks,
 nathan




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Stress testing disk configurations. Your thoughts?

2011-04-14 Thread Nathan Milford
Ahoy,

I'm building out a new 0.7.4 cluster to migrate our 0.6.6 cluster to.

While I'm waiting for the dev-side to get time to work on their side of the
project I have a 10 node cluster evenly split across two data centers (NY 
LA) and was looking to do some testing while I could.

My primary focus is on disk configurations.  Space isn't a huge issue, our
current data set is ~30G on each node and I imagine that'll go up since I
intend on tweaking the RF on the new cluster.

Each node has 6 x 146G 10K SAS drives.  I want to test:

1) 6 disks in R0 where everything is written to the same stripe
2) 1 disk for OS+Commitlog and 5 disks in R0 for data.
3) 1 disk for OS+Commitlog and 5 individual disks defined
as separate data_file_directories.

I suspect I'll see best performance with option 3, but the issue has become
political\religious and there are internal doubts that separating the commit
log and data will truly improve performance despite documentation and logic
indicating otherwise.  Thus the test :)

Right now I've been tinkering and not being very scientific while I work out
a testing methodology and get used to the tools.  I've just been running
zznate's cassandra-stress against a single node and measuring the time it
takes to read and write N rows.

Unscientifically I've found that they all perform about the same. It is hard
to judge because, when writing to a single node, reads take exponentially
longer.  Writing 10M rows may take ~500 seconds, but reading will take ~5000
seconds.  I'm sure this will even out when I test across more than one node.

Early next week I'll be able to test against all 10 nodes with a realistic
replication factor.

I'd really love to hear some people's thoughts on methodologies and what I
should be looking at/for other than iostat and the time for the test to
inset/read.

Thanks,
nathan