> Firstly, RAID5 is working against you, it's one of the slowest RAID
> modes (every logical write, for example, is two physical reads and two
> physical writes.

I'm surprised you say this. I thought an append only file arrangement (like 
couch) combined with a battery backed RAID controller would be the perfect 
setup for RAID5. I have 6 disks, so that should be 5 data blocks followed by a 
parity block. If the 5 data blocks are written out in succession then there 
should be no need to do any reads to calculate the parity. Running with a 
battery backed unit means the system should be able to lazily calculate parity 
like this since there is no requirement to get the data and associated parity, 
immediately written to disk.

But I'll admit I have tried to research this many times and have found it neigh 
on impossible to find a decent answer. I have come across many people claiming 
RAID5 is seriously slow and a few other people saying that while that was true 
many years ago RAID controllesr are now much more optimised than they used to 
be and are able to do the sort of thing I'm suggesting above.

Anyway, my blocks read per sec is at least 10 times my blocks written per sec 
so I'm not overly concerned about write performance.

 Suggest RAID-10 instead. Secondly, have you compacted
> the db lately? This will reduce its total size and also organize it
> better on disk.

I do need to do that. It compacts to about 500G. One of the problems I'm having 
is that the compaction process takes a few weeks to run. Again I'm looking for 
ways to speed this up and I was wondering if SSDs might be the answer.

Finally, it's not the number of 'active documents' that
> matters (b-tree performance is typically not predicated on the high
> probability that the leaf nodes are cached), it's the inner b-tree
> nodes that matter (the more that are in the disk cache, the better);
> this is a factor of document count. The cost of a cache miss is
> dependent on the speed of your disk array.
Is there any way I can estimate the number of blocks being used by the inner 
b-tree? My database is currently 60 million documents. Since my documents are 
relatively large I would imagine that all the inner nodes should fit 
comfortably into available RAM and therefore should mostly be in the disk cache 
since most are likely to be read far more often than the leaf nodes.

>
> SSD drives will be considerably better because there is no seek cost,
> but the cost of the drives is still prohibitive for many cases.

I think I shall try to benchmark it myself. I'm thinking of getting a machine 
configured with a sizeable uncompacted couch database and probably a 1:20 ratio 
of RAM to couch database size. Then try a compaction on an SSD vs single 
spinning disk. That will hopefully give me something to go on even though I'm 
ultimately considering 6 SSDs to replace 6 spinning disks.

I may also try RAID5 vs RAID10 if I find the time.

Thanks for the feedback. If I do come up with any numbers I'll share them.

>
> B.
>
> On 6 Dec 2011, at 04:05, Paul Hirst wrote:
>
> > Has anyone done any performance testing of couch on SSD drives?
> >
> > I have a strong suspicion that my disks are constantly seeking in
> order to satisfy read requests and therefore the performance is
> rubbish. The system is a RAID5 with 6 10k SAS drives. I'm wondering if
> upgrading to SSD drives might give a significant performance boost.
> It's either that or spreading the load across multiple boxes using
> something like BigCouch.
> >
> > To give a bit more background....
> >
> > I have a ~1.1Tb database at the moment running on a single box with
> ~48G of RAM. I strongly suspect that the number of active documents
> (ones which are seeing updates) is a larger set than will fit into RAM
> and therefore I assume most document requests are hitting the disk. My
> disk is ~100% utilizied all the time and I'm not keeping up with the
> number of read and writes I need to make.
> >
> > The average wait time for disk IO is around 5ms however the CPU load
> is minimal.
> >
> > Finally I did a test on the box to compare the disk throughput when
> reading a large sequential file. Even without stopping couch, reading a
> sequential file managed to drag about 7 times more data off the disk
> than the system was normally achieving.
> >
> > So even though I might eventually switch to BigCouch or similar I'd
> really like to balance out the CPU power and the disk power in my box
> since at the moment the system seems totally over specced CPU wise and
> totally under specced disk wise. Could SSD drives be the answer?
> >
> > Thanks.
> >
> > ________________________________
> > Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14
> 3YP, United Kingdom.
> > Company Reg No 2096520. VAT Reg No GB 991 2418 08.


Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United 
Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Reply via email to