I think this makes sense to (ie. the setup), since the search is getting 1K
documents each time (for textual analysis, ie. they are probably large
docs), and use Solr as a storage (which is totally fine) then the parallel
multiple drive i/o shards speed things up. The index is probably large, so
it is unrealistic to have enough RAM to cache the most used parts (if they
are hitting different docs all the time). I'm curious, as Toke's points
out, what was the RAID configuration you ran it on initially.

Best,

roman

On Tue, Jan 20, 2015 at 12:43 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> Nimrod Cohen [nimrod.co...@nice.com] wrote:
> > We need to get 1K documents out of 100M documents each
> > time we query solr and send them to text Analysis.
> > First configuration had 8 shards on one RAD (Disk F) we
> > got the 1K in around 15 seconds.
> > Second configuration we removed the RAD and work on 8
> > different disk each shard on one disk and get the 1K
> > documents in 2-3 seconds.
>
> Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be
> about the same performance as shards on individual disks, due to striping.
> If you did a RAID 1 with, for example, 2*4 disks, your performance would be
> markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than
> individual drives as it would mitigate the "slowest drive dictates overall
> speed" problem. If your RAID is not really a RAID but instead JBOD or
> similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD),
> then the poor performance is to be expected as chances are all your data
> would reside on the same physical disk.
>
> Please describe your RAID setup in detail.
>
> Also, is 2-3 second response time satisfactory to you? If not, what are
> you aiming at?
>
> - Toke Eskildsen
>

Reply via email to