Re: Solr in NAS or Network Shared Drive

Walter Underwood Fri, 26 May 2017 10:19:02 -0700

Pretty sure that master/slave was in Solr 1.2. That was very nearly ten years 
ago.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 26, 2017, at 9:52 AM, David Hastings <hastings.recurs...@gmail.com> 
> wrote:
> 
> Im curious about this.  when you say "and signal the three Solr servers
> when the updated index is available.  " how does it send the signal? IE
> what command, just a reload?  Also what prevents them from doing a merge on
> their own?  Thanks
> 
> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
> wrote:
> 
>> We have run using this exact scenario for several years.   We have three
>> Solr servers sitting behind a load balancer, with all three accessing the
>> same Solr index stored on read-only network addressable storage.   A fourth
>> machine is used to update the index (typically daily) and signal the three
>> Solr servers when the updated index is available.   Our index is primarily
>> bibliographic information and it contains about 8 million documents and is
>> about 30GB in size.    We've used this configuration since before Zookeeper
>> and Cloud-based Solr or even java-based master slave replication were
>> available.   I cannot say whether this configuration has any benefits over
>> the current accepted way of load-balancing, but it has worked well for us
>> for several years and we've never had a corrupted index problem.
>> 
>> 
>> -Bob Haschart
>> University of Virginia Library
>> 
>> 
>> 
>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>> 
>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>> 
>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>> different servers (linux), Is there any way can we make the Core to be
>>>> located in NAS or Network shared Drive so both the solrs using the same
>>>> Index.
>>>> 
>>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>> 
>>> I think it's a very bad idea to try to share indexes between multiple
>>> Solr instances.  You can override the locking and get it to work, and
>>> you may be able to find advice on the Internet about how to do it.  I
>>> can tell you that it's outside the design intent for both Lucene and
>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>> sharing an index.
>>> 
>>> In general, network storage is not a good idea for Solr.  There's added
>>> latency for accessing any data, and frequently the filesystem won't
>>> support the kind of locking that Lucene wants to use, but the biggest
>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>>> disk caching in the SOlr server's local memory for good performance.  If
>>> the network filesystem cannot be cached by the client that has mounted
>>> the storage, which I believe is the case for most network filesystem
>>> types, then you're reliant on disk caching in the network server(s).
>>> For VERY large indexes, which is really the only viable use case I can
>>> imagine for network storage, it is highly unlikely that the network
>>> server(s) will have enough memory to effectively cache the data.
>>> 
>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>> includes the ability for a client to allocate memory that gets used
>>> exclusively for caching on the client side, which allows HDFS to
>>> function like a local filesystem in ways that I don't think NFS can.
>>> Getting back to my advice about not sharing indexes -- even with
>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>> 
>>> A 1GB index is very small, so there's no good reason I can think of to
>>> involve network storage.  I would strongly recommend local storage, and
>>> you should abandon any attempt to share the same index data between more
>>> than one Solr instance.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>

Re: Solr in NAS or Network Shared Drive

Reply via email to