Re: Two instances of solr - the same datadir?

Peter Sturge Wed, 05 Jun 2013 01:04:23 -0700

Hi,
We use this very same scenario to great effect - 2 instances using the same
dataDir with many cores - 1 is a writer (no caching), the other is a
searcher (lots of caching).
To get the searcher to see the index changes from the writer, you need the
searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
This will refresh the caches (including autowarming), [re]build the
relevant searchers etc. and make any index changes visible to the RO
instance.
Also, make sure to use <lockType>native</lockType> in solrconfig.xml to
ensure the two instances don't try to commit at the same time.
There are several ways to trigger a commit:
Call commit() periodically within your own code.
Use autoCommit in solrconfig.xml.
Use an RPC/IPC mechanism between the 2 instance processes to tell the
searcher the index has changed, then call commit when called (more complex
coding, but good if the index changes on an ad-hoc basis).
Note, doing things this way isn't really suitable for an NRT environment.


HTH,
Peter



On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> Replication is fine, I am going to use it, but I wanted it for instances
> *distributed* across several (physical) machines - but here I have one
> physical machine, it has many cores. I want to run 2 instances of solr
> because I think it has these benefits:
>
> 1) I can give less RAM to the writer (4GB), and use more RAM for the
> searcher (28GB)
> 2) I can deactivate warming for the writer and keep it for the searcher
> (this considerably speeds up indexing - each time we commit, the server is
> rebuilding a citation network of 80M edges)
> 3) saving disk space and better OS caching (OS should be able to use more
> RAM for the caching, which should result in faster operations - the two
> processes are accessing the same index)
>
> Maybe I should just forget it and go with the replication, but it doesn't
> 'feel right' IFF it is on the same physical machine. And Lucene
> specifically has a method for discovering changes and re-opening the index
> (DirectoryReader.openIfChanged)
>
> Am I not seeing something?
>
> roman
>
>
>
> On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
> > Roman,
> >
> > Could you be more specific as to why replication doesn't meet your
> > requirements?  It was geared explicitly for this purpose, including the
> > automatic discovery of changes to the data on the index master.
> >
> > Jason
> >
> > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
> >
> > > OK, so I have verified the two instances can run alongside, sharing the
> > > same datadir
> > >
> > > All update handlers are unaccessible in the read-only master
> > >
> > > <updateHandler class="solr.DirectUpdateHandler2"
> > >                 enable="${solr.can.write:true}">
> > >
> > > java -Dsolr.can.write=false .....
> > >
> > > And I can reload the index manually:
> > >
> > > curl "
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > "
> > >
> > > But this is not an ideal solution; I'd like for the read-only server to
> > > discover index changes on its own. Any pointers?
> > >
> > > Thanks,
> > >
> > >  roman
> > >
> > >
> > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <roman.ch...@gmail.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> I need your expert advice. I am thinking about running two instances
> of
> > >> solr that share the same datadirectory. The *reason* being: indexing
> > >> instance is constantly building cache after every commit (we have a
> big
> > >> cache) and this slows it down. But indexing doesn't need much RAM,
> only
> > the
> > >> search does (and server has lots of CPUs)
> > >>
> > >> So, it is like having two solr instances
> > >>
> > >> 1. solr-indexing-master
> > >> 2. solr-read-only-master
> > >>
> > >> In the solrconfig.xml I can disable update components, It should be
> > fine.
> > >> However, I don't know how to 'trigger' index re-opening on (2) after
> the
> > >> commit happens on (1).
> > >>
> > >> Ideally, the second instance could monitor the disk and re-open disk
> > after
> > >> new files appear there. Do I have to implement custom
> > IndexReaderFactory?
> > >> Or something else?
> > >>
> > >> Please note: I know about the replication, this usecase is IMHO
> slightly
> > >> different - in fact, write-only-master (1) is also a replication
> master
> > >>
> > >> Googling turned out only this
> > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
> > no
> > >> pointers there.
> > >>
> > >> But If I am approaching the problem wrongly, please don't hesitate to
> > >> 're-educate' me :)
> > >>
> > >> Thanks!
> > >>
> > >>  roman
> > >>
> >
> >
>

Re: Two instances of solr - the same datadir?

Reply via email to