Re: Two instances of solr - the same datadir?
I have spent lot of time in the past day playing with this setup, and made it work finally, here are few bits of interest: - solr v40 - linux, java7, local filesystem - big index, 1 RW instance + 2 RO instances (sharing the same index) lock is acquired when solr is writing data - if you happen to be starting your RO instance at this moment and you are using 'native' lock, it will fail. However, when using RW instance with 'native' lock, and 2 RO instances 'single' lock, the RO instances can start, but they will eventually get into troubles too - our index is too big and so when core RELOAD is called and indexing is under way, the RO instances time out. core reload, when using 'native' lock, seems to work fine - if you were lucky and all instances managed to start - HOWEVER, the core is unresponsive until fully loaded (makes sense), but this is actually terrible - your search is gone for seconds/minutes the best setup is as described in my original post - RO instances MUST NOT commit anything - neither use reload (because during reload solr tries to acquire lock). Instead, they should just reopen the searcher - i repeat: you should make sure that nothing is every going to write on the RO instance. And because there is no public api for reopening the searcher, I wrote a simple handler which just calls: req.getCore().getSearcher(true, false, null, false); when called, the RO instances continue to handle requests using the old searcher, warming in the background, once ready, the new searcher takes over [to repeat: i am triggering this refresh from the RW instance, it does 'curl http://foo/solr/myhandler?command=reopenSearcher] the bad thing: when the RO instance dies (eg OOM error) and the RW is just in the middle of writing data, you can't restart RO instance (unless you use lock 'single' or some other lock) HTH, roman On Tue, Jul 2, 2013 at 5:35 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Wouldn't it be better to do a RELOAD? http://wiki.apache.org/solr/CoreAdmin#RELOAD Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com wrote: The RO instance commit isn't (or shouldn't be) doing any real writing, just an empty commit to force new searchers, autowarm/refresh caches etc. Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in this area. As long as you don't have autocommit in solrconfig.xml, there wouldn't be any commits 'behind the scenes' (we do all our commits via a local solrj client so it can be fully managed). The only caveat might be NRT/soft commits, but I'm not too familiar with this in 4.0. In any case, your RO instance must be getting updated somehow, otherwise how would it know your write instance made any changes? Perhaps your write instance notifies the RO instance externally from Solr? (a perfectly valid approach, and one that would allow a 'single' lock to work without contention) On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the
Re: Two instances of solr - the same datadir?
You can do a reload, yes, but a commit() is considerably faster. On Tue, Jul 2, 2013 at 10:35 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Wouldn't it be better to do a RELOAD? http://wiki.apache.org/solr/CoreAdmin#RELOAD Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com wrote: The RO instance commit isn't (or shouldn't be) doing any real writing, just an empty commit to force new searchers, autowarm/refresh caches etc. Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in this area. As long as you don't have autocommit in solrconfig.xml, there wouldn't be any commits 'behind the scenes' (we do all our commits via a local solrj client so it can be fully managed). The only caveat might be NRT/soft commits, but I'm not too familiar with this in 4.0. In any case, your RO instance must be getting updated somehow, otherwise how would it know your write instance made any changes? Perhaps your write instance notifies the RO instance externally from Solr? (a perfectly valid approach, and one that would allow a 'single' lock to work without contention) On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index
Re: Two instances of solr - the same datadir?
as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher
Re: Two instances of solr - the same datadir?
Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time.
Re: Two instances of solr - the same datadir?
Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master: http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote:
Re: Two instances of solr - the same datadir?
The RO instance commit isn't (or shouldn't be) doing any real writing, just an empty commit to force new searchers, autowarm/refresh caches etc. Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in this area. As long as you don't have autocommit in solrconfig.xml, there wouldn't be any commits 'behind the scenes' (we do all our commits via a local solrj client so it can be fully managed). The only caveat might be NRT/soft commits, but I'm not too familiar with this in 4.0. In any case, your RO instance must be getting updated somehow, otherwise how would it know your write instance made any changes? Perhaps your write instance notifies the RO instance externally from Solr? (a perfectly valid approach, and one that would allow a 'single' lock to work without contention) On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master: http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the
Re: Two instances of solr - the same datadir?
Wouldn't it be better to do a RELOAD? http://wiki.apache.org/solr/CoreAdmin#RELOAD Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com wrote: The RO instance commit isn't (or shouldn't be) doing any real writing, just an empty commit to force new searchers, autowarm/refresh caches etc. Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in this area. As long as you don't have autocommit in solrconfig.xml, there wouldn't be any commits 'behind the scenes' (we do all our commits via a local solrj client so it can be fully managed). The only caveat might be NRT/soft commits, but I'm not too familiar with this in 4.0. In any case, your RO instance must be getting updated somehow, otherwise how would it know your write instance made any changes? Perhaps your write instance notifies the RO instance externally from Solr? (a perfectly valid approach, and one that would allow a 'single' lock to work without contention) On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing? Because I am not calling commits, in fact I have deactivated *all* components that write into index, so unless there is something deep inside, which automatically calls the commit, it should never happen. roman On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote: Hmmm, single lock sounds dangerous. It probably works ok because you've been [un]lucky. For example, even with a RO instance, you still need to do a commit in order to reload caches/changes from the other instance. What happens if this commit gets called in the middle of the other instance's commit? I've not tested this scenario, but it's very possible with a 'single' lock the results are indeterminate. If the 'single' lock mechanism is making assumptions e.g. no other process will interfere, and then one does, the Lucene index could very well get corrupted. For the error you're seeing using 'native', we use native lockType for both write and RO instances, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener
Re: Two instances of solr - the same datadir?
If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to
Re: Two instances of solr - the same datadir?
I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the
Re: Two instances of solr - the same datadir?
Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman
Re: Two instances of solr - the same datadir?
Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication
Re: Two instances of solr - the same datadir?
So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.comwrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
Two instances of solr - the same datadir?
Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman
Re: Two instances of solr - the same datadir?
OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman
Re: Two instances of solr - the same datadir?
Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman
Re: Two instances of solr - the same datadir?
Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman