Deepak,

It might be a bit outside what you're willing to consider but you can make a 
raid out of your spinning disks then use your SSD(s) as a dm-cache device to 
accelerate reads and writes to the raid device. If you're putting lucene 
indexes on a mixed bag of disks and ssd's without any type of control for what 
goes where you'd want to use the ssd to accelerate the spinning disks anyway. 
Check out http://lwn.net/Articles/540996/ for more information on the dm-cache 
device.

Thanks,
Greg

-----Original Message-----
From: Deepak Konidena [mailto:deepakk...@gmail.com] 
Sent: Wednesday, September 11, 2013 3:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributing lucene segments across multiple disks.

I guess at this point in the discussion, I should probably give some more 
background on why I am doing what I am doing. Having a single Solr shard 
(multiple segments) on the same disk is posing severe performance problems 
under load,in that, calls to Solr cause a lot of connection timeouts. When we 
looked at the ganglia stats for the Solr box, we saw that while memory, cpu and 
network usage were quite normal, the i/o wait spiked. We are unsure on what 
caused the i/o wait and why there were no spikes in the cpu/memory usage. Since 
the Solr box is a beefy box (multi-core setup, huge ram, SSD), we'd like to 
distribute the segments to multiple locations (disks) and see whether this 
improves performance under load.

@Greg - Thanks for clarifying that.  I just learnt that I can't set them up 
using RAID as some of them are SSDs and some others are SATA (spinning disks).

@Shawn Heisey - Could you elaborate more about the "broker" core and delegating 
the requests to other cores?


-Deepak



On Wed, Sep 11, 2013 at 1:10 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 9/11/2013 1:07 PM, Deepak Konidena wrote:
>
>> Are you suggesting a multi-core setup, where all the cores share the 
>> same schema, and the cores lie on different disks?
>>
>> Basically, I'd like to know if I can distribute shards/segments on a 
>> single machine (with multiple disks) without the use of zookeeper.
>>
>
> Sure, you can do it all manually.  At that point you would not be using
> SolrCloud at all, because the way to enable SolrCloud is to tell Solr where
> zookeeper lives.
>
> Without SolrCloud, there is no cluster automation at all.  There is no
> "collection" paradigm, you just have cores.  You have to send updates to
> the correct core; they not be redirected for you.  Similarly, queries will
> not be load balanced automatically.  For Java clients, the CloudSolrServer
> object can work seamlessly when servers go down.  If you're not using
> SolrCloud, you can't use CloudSolrServer.
>
> You would be in charge of creating the shards parameter yourself.  The way
> that I do this on my index is that I have a "broker" core that has no index
> of its own, but its solrconfig.xml has the shards and shards.qt parameters
> in all the request handler definitions.  You can also include the parameter
> with the query.
>
> You would also have to handle redundancy yourself, either with replication
> or with independently updated indexes.  I use the latter method, because it
> offers a lot more flexibility than replication.
>
> As mentioned in another reply, setting up RAID with a lot of disks may be
> better than trying to split your index up on different filesystems that
> each reside on different disks.  I would recommend RAID10 for Solr, and it
> works best if it's hardware RAID and the controller has battery-backed (or
> NVRAM) cache.
>
> Thanks,
> Shawn
>
>

Reply via email to