Re: Two instances of solr - the same datadir?

2013-07-04 Thread Roman Chyla
I have spent lot of time in the past day playing with this setup, and made
it work finally, here are few bits of interest:

- solr v40
- linux, java7, local filesystem
- big index, 1 RW instance + 2 RO instances (sharing the same index)


lock is acquired when solr is writing data - if you happen to be starting
your RO instance at this moment and you are using 'native' lock, it will
fail. However, when using RW instance with 'native' lock, and 2 RO
instances 'single' lock, the RO instances can start, but they will
eventually get into troubles too - our index is too big and so when core
RELOAD is called and indexing is under way, the RO instances time out.

core reload, when using 'native' lock, seems to work fine - if you were
lucky and all instances managed to start - HOWEVER, the core is
unresponsive until fully loaded (makes sense), but this is actually
terrible - your search is gone for seconds/minutes

the best setup is as described in my original post - RO instances MUST NOT
commit anything - neither use reload (because during reload solr tries to
acquire lock). Instead, they should just reopen the searcher - i repeat:
you should make sure that nothing is every going to write on the RO
instance. And because there is no public api for reopening the searcher, I
wrote a simple handler which just calls:

req.getCore().getSearcher(true, false, null, false);

when called, the RO instances continue to handle requests using the old
searcher, warming in the background, once ready, the new searcher takes
over [to repeat: i am triggering this refresh from the RW instance, it does
'curl http://foo/solr/myhandler?command=reopenSearcher]


the bad thing: when the RO instance dies (eg OOM error) and the RW is just
in the middle of writing data, you can't restart RO instance (unless you
use lock 'single' or some other lock)

HTH,

  roman




On Tue, Jul 2, 2013 at 5:35 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Wouldn't it be better to do a RELOAD?

 http://wiki.apache.org/solr/CoreAdmin#RELOAD

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com
 wrote:

  The RO instance commit isn't (or shouldn't be) doing any real writing,
 just
  an empty commit to force new searchers, autowarm/refresh caches etc.
  Admittedly, we do all this on 3.6, so 4.0 could have different behaviour
 in
  this area.
  As long as you don't have autocommit in solrconfig.xml, there wouldn't be
  any commits 'behind the scenes' (we do all our commits via a local solrj
  client so it can be fully managed).
  The only caveat might be NRT/soft commits, but I'm not too familiar with
  this in 4.0.
  In any case, your RO instance must be getting updated somehow, otherwise
  how would it know your write instance made any changes?
  Perhaps your write instance notifies the RO instance externally from
 Solr?
  (a perfectly valid approach, and one that would allow a 'single' lock to
  work without contention)
 
 
 
  On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   Interesting, we are running 4.0 - and solr will refuse the start (or
   reload) the core. But from looking at the code I am not seeing it is
  doing
   any writing - but I should digg more...
  
   Are you sure it needs to do writing? Because I am not calling commits,
 in
   fact I have deactivated *all* components that write into index, so
 unless
   there is something deep inside, which automatically calls the commit,
 it
   should never happen.
  
   roman
  
  
   On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com
   wrote:
  
Hmmm, single lock sounds dangerous. It probably works ok because
 you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit
 in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very
  possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other
   process
will interfere, and then one does, the Lucene index could very well
 get
corrupted.
   
For the error you're seeing using 'native', we use native lockType
 for
   both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?
   
Peter
   
   
On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
 as i discovered, it is not good to use 'native' locktype in this
scenario,
 actually there is a note in the 

Re: Two instances of solr - the same datadir?

2013-07-03 Thread Peter Sturge
You can do a reload, yes, but a commit() is considerably faster.


On Tue, Jul 2, 2013 at 10:35 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Wouldn't it be better to do a RELOAD?

 http://wiki.apache.org/solr/CoreAdmin#RELOAD

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com
 wrote:

  The RO instance commit isn't (or shouldn't be) doing any real writing,
 just
  an empty commit to force new searchers, autowarm/refresh caches etc.
  Admittedly, we do all this on 3.6, so 4.0 could have different behaviour
 in
  this area.
  As long as you don't have autocommit in solrconfig.xml, there wouldn't be
  any commits 'behind the scenes' (we do all our commits via a local solrj
  client so it can be fully managed).
  The only caveat might be NRT/soft commits, but I'm not too familiar with
  this in 4.0.
  In any case, your RO instance must be getting updated somehow, otherwise
  how would it know your write instance made any changes?
  Perhaps your write instance notifies the RO instance externally from
 Solr?
  (a perfectly valid approach, and one that would allow a 'single' lock to
  work without contention)
 
 
 
  On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   Interesting, we are running 4.0 - and solr will refuse the start (or
   reload) the core. But from looking at the code I am not seeing it is
  doing
   any writing - but I should digg more...
  
   Are you sure it needs to do writing? Because I am not calling commits,
 in
   fact I have deactivated *all* components that write into index, so
 unless
   there is something deep inside, which automatically calls the commit,
 it
   should never happen.
  
   roman
  
  
   On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com
   wrote:
  
Hmmm, single lock sounds dangerous. It probably works ok because
 you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit
 in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very
  possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other
   process
will interfere, and then one does, the Lucene index could very well
 get
corrupted.
   
For the error you're seeing using 'native', we use native lockType
 for
   both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?
   
Peter
   
   
On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
 as i discovered, it is not good to use 'native' locktype in this
scenario,
 actually there is a note in the solrconfig.xml which says the same

 when a core is reloaded and solr tries to grab lock, it will fail -
   even
if
 the instance is configured to be read-only, so i am using 'single'
  lock
for
 the readers and 'native' for the writer, which seems to work OK

 roman


 On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com
 
wrote:

  I have auto commit after 40k RECs/1800secs. But I only tested
 with
manual
  commit, but I don't see why it should work differently.
  Roman
  On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com
   wrote:
 
  If it makes you feel better, I also considered this approach
 when
  I
was
 in
  the same situation with a separate indexer and searcher on one
Physical
  linux machine.
 
  My main concern was re-using the FS cache between both
  instances -
If
 I
  replicated to myself there would be two independent copies of
 the
index,
  FS-cached separately.
 
  I like the suggestion of using autoCommit to reload the index.
 If
   I'm
  reading that right, you'd set an autoCommit on 'zero docs
  changing',
or
  just 'every N seconds'? Did that work?
 
  Best of luck!
 
  Tim
 
 
  On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   So here it is for a record how I am solving it right now:
  
   Write-master is started with:
 -Dmontysolr.warming.enabled=false
   -Dmontysolr.write.master=true -Dmontysolr.read.master=
   http://localhost:5005
   Read-master is started with: -Dmontysolr.warming.enabled=true
   -Dmontysolr.write.master=false
  
  
   solrconfig.xml changes:
  
   1. all index 

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
as i discovered, it is not good to use 'native' locktype in this scenario,
actually there is a note in the solrconfig.xml which says the same

when a core is reloaded and solr tries to grab lock, it will fail - even if
the instance is configured to be read-only, so i am using 'single' lock for
the readers and 'native' for the writer, which seems to work OK

roman


On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote:

 I have auto commit after 40k RECs/1800secs. But I only tested with manual
 commit, but I don't see why it should work differently.
 Roman
 On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote:

 If it makes you feel better, I also considered this approach when I was in
 the same situation with a separate indexer and searcher on one Physical
 linux machine.

 My main concern was re-using the FS cache between both instances - If I
 replicated to myself there would be two independent copies of the index,
 FS-cached separately.

 I like the suggestion of using autoCommit to reload the index. If I'm
 reading that right, you'd set an autoCommit on 'zero docs changing', or
 just 'every N seconds'? Did that work?

 Best of luck!

 Tim


 On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:

  So here it is for a record how I am solving it right now:
 
  Write-master is started with: -Dmontysolr.warming.enabled=false
  -Dmontysolr.write.master=true -Dmontysolr.read.master=
  http://localhost:5005
  Read-master is started with: -Dmontysolr.warming.enabled=true
  -Dmontysolr.write.master=false
 
 
  solrconfig.xml changes:
 
  1. all index changing components have this bit,
  enable=${montysolr.master:true} - ie.
 
  updateHandler class=solr.DirectUpdateHandler2
   enable=${montysolr.master:true}
 
  2. for cache warming de/activation
 
  listener event=newSearcher
class=solr.QuerySenderListener
enable=${montysolr.enable.warming:true}...
 
  3. to trigger refresh of the read-only-master (from write-master):
 
  listener event=postCommit
class=solr.RunExecutableListener
enable=${montysolr.master:true}
str name=execurl/str
str name=dir./str
bool name=waitfalse/bool
arr name=args str${montysolr.read.master:http://localhost
 
 
 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
  /listener
 
  This works, I still don't like the reload of the whole core, but it
 seems
  like the easiest thing to do now.
 
  -- roman
 
 
  On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Hi Peter,
  
   Thank you, I am glad to read that this usecase is not alien.
  
   I'd like to make the second instance (searcher) completely read-only,
 so
  I
   have disabled all the components that can write.
  
   (being lazy ;)) I'll probably use
   http://wiki.apache.org/solr/CollectionDistribution to call the curl
  after
   commit, or write some IndexReaderFactory that checks for changes
  
   The problem with calling the 'core reload' - is that it seems lots of
  work
   for just opening a new searcher, eeekkk...somewhere I read that it is
  cheap
   to reload a core, but re-opening the index searches must be definitely
   cheaper...
  
   roman
  
  
   On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com
  wrote:
  
   Hi,
   We use this very same scenario to great effect - 2 instances using
 the
   same
   dataDir with many cores - 1 is a writer (no caching), the other is a
   searcher (lots of caching).
   To get the searcher to see the index changes from the writer, you
 need
  the
   searcher to do an empty commit - i.e. you invoke a commit with 0
   documents.
   This will refresh the caches (including autowarming), [re]build the
   relevant searchers etc. and make any index changes visible to the RO
   instance.
   Also, make sure to use lockTypenative/lockType in solrconfig.xml
 to
   ensure the two instances don't try to commit at the same time.
   There are several ways to trigger a commit:
   Call commit() periodically within your own code.
   Use autoCommit in solrconfig.xml.
   Use an RPC/IPC mechanism between the 2 instance processes to tell the
   searcher the index has changed, then call commit when called (more
  complex
   coding, but good if the index changes on an ad-hoc basis).
   Note, doing things this way isn't really suitable for an NRT
  environment.
  
   HTH,
   Peter
  
  
  
   On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
  
Replication is fine, I am going to use it, but I wanted it for
  instances
*distributed* across several (physical) machines - but here I have
 one
physical machine, it has many cores. I want to run 2 instances of
 solr
because I think it has these benefits:
   
1) I can give less RAM to the writer (4GB), and use more RAM for
 the
searcher (28GB)
2) I can deactivate warming for the writer and keep it for the
  searcher

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Peter Sturge
Hmmm, single lock sounds dangerous. It probably works ok because you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other process
will interfere, and then one does, the Lucene index could very well get
corrupted.

For the error you're seeing using 'native', we use native lockType for both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?

Peter


On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote:

 as i discovered, it is not good to use 'native' locktype in this scenario,
 actually there is a note in the solrconfig.xml which says the same

 when a core is reloaded and solr tries to grab lock, it will fail - even if
 the instance is configured to be read-only, so i am using 'single' lock for
 the readers and 'native' for the writer, which seems to work OK

 roman


 On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote:

  I have auto commit after 40k RECs/1800secs. But I only tested with manual
  commit, but I don't see why it should work differently.
  Roman
  On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote:
 
  If it makes you feel better, I also considered this approach when I was
 in
  the same situation with a separate indexer and searcher on one Physical
  linux machine.
 
  My main concern was re-using the FS cache between both instances - If
 I
  replicated to myself there would be two independent copies of the index,
  FS-cached separately.
 
  I like the suggestion of using autoCommit to reload the index. If I'm
  reading that right, you'd set an autoCommit on 'zero docs changing', or
  just 'every N seconds'? Did that work?
 
  Best of luck!
 
  Tim
 
 
  On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:
 
   So here it is for a record how I am solving it right now:
  
   Write-master is started with: -Dmontysolr.warming.enabled=false
   -Dmontysolr.write.master=true -Dmontysolr.read.master=
   http://localhost:5005
   Read-master is started with: -Dmontysolr.warming.enabled=true
   -Dmontysolr.write.master=false
  
  
   solrconfig.xml changes:
  
   1. all index changing components have this bit,
   enable=${montysolr.master:true} - ie.
  
   updateHandler class=solr.DirectUpdateHandler2
enable=${montysolr.master:true}
  
   2. for cache warming de/activation
  
   listener event=newSearcher
 class=solr.QuerySenderListener
 enable=${montysolr.enable.warming:true}...
  
   3. to trigger refresh of the read-only-master (from write-master):
  
   listener event=postCommit
 class=solr.RunExecutableListener
 enable=${montysolr.master:true}
 str name=execurl/str
 str name=dir./str
 bool name=waitfalse/bool
 arr name=args str${montysolr.read.master:http://localhost
  
  
 
 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
   /listener
  
   This works, I still don't like the reload of the whole core, but it
  seems
   like the easiest thing to do now.
  
   -- roman
  
  
   On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
  
Hi Peter,
   
Thank you, I am glad to read that this usecase is not alien.
   
I'd like to make the second instance (searcher) completely
 read-only,
  so
   I
have disabled all the components that can write.
   
(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl
   after
commit, or write some IndexReaderFactory that checks for changes
   
The problem with calling the 'core reload' - is that it seems lots
 of
   work
for just opening a new searcher, eeekkk...somewhere I read that it
 is
   cheap
to reload a core, but re-opening the index searches must be
 definitely
cheaper...
   
roman
   
   
On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge 
 peter.stu...@gmail.com
   wrote:
   
Hi,
We use this very same scenario to great effect - 2 instances using
  the
same
dataDir with many cores - 1 is a writer (no caching), the other is
 a
searcher (lots of caching).
To get the searcher to see the index changes from the writer, you
  need
   the
searcher to do an empty commit - i.e. you invoke a commit with 0
documents.
This will refresh the caches (including autowarming), [re]build the
relevant searchers etc. and make any index changes visible to the
 RO
instance.
Also, make sure to use lockTypenative/lockType in
 solrconfig.xml
  to
ensure the two instances don't try to commit at the same time.

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
Interesting, we are running 4.0 - and solr will refuse the start (or
reload) the core. But from looking at the code I am not seeing it is doing
any writing - but I should digg more...

Are you sure it needs to do writing? Because I am not calling commits, in
fact I have deactivated *all* components that write into index, so unless
there is something deep inside, which automatically calls the commit, it
should never happen.

roman


On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com wrote:

 Hmmm, single lock sounds dangerous. It probably works ok because you've
 been [un]lucky.
 For example, even with a RO instance, you still need to do a commit in
 order to reload caches/changes from the other instance.
 What happens if this commit gets called in the middle of the other
 instance's commit? I've not tested this scenario, but it's very possible
 with a 'single' lock the results are indeterminate.
 If the 'single' lock mechanism is making assumptions e.g. no other process
 will interfere, and then one does, the Lucene index could very well get
 corrupted.

 For the error you're seeing using 'native', we use native lockType for both
 write and RO instances, and it works fine - no contention.
 Which version of Solr are you using? Perhaps there's been a change in
 behaviour?

 Peter


 On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote:

  as i discovered, it is not good to use 'native' locktype in this
 scenario,
  actually there is a note in the solrconfig.xml which says the same
 
  when a core is reloaded and solr tries to grab lock, it will fail - even
 if
  the instance is configured to be read-only, so i am using 'single' lock
 for
  the readers and 'native' for the writer, which seems to work OK
 
  roman
 
 
  On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   I have auto commit after 40k RECs/1800secs. But I only tested with
 manual
   commit, but I don't see why it should work differently.
   Roman
   On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote:
  
   If it makes you feel better, I also considered this approach when I
 was
  in
   the same situation with a separate indexer and searcher on one
 Physical
   linux machine.
  
   My main concern was re-using the FS cache between both instances -
 If
  I
   replicated to myself there would be two independent copies of the
 index,
   FS-cached separately.
  
   I like the suggestion of using autoCommit to reload the index. If I'm
   reading that right, you'd set an autoCommit on 'zero docs changing',
 or
   just 'every N seconds'? Did that work?
  
   Best of luck!
  
   Tim
  
  
   On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:
  
So here it is for a record how I am solving it right now:
   
Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=
http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false
   
   
solrconfig.xml changes:
   
1. all index changing components have this bit,
enable=${montysolr.master:true} - ie.
   
updateHandler class=solr.DirectUpdateHandler2
 enable=${montysolr.master:true}
   
2. for cache warming de/activation
   
listener event=newSearcher
  class=solr.QuerySenderListener
  enable=${montysolr.enable.warming:true}...
   
3. to trigger refresh of the read-only-master (from write-master):
   
listener event=postCommit
  class=solr.RunExecutableListener
  enable=${montysolr.master:true}
  str name=execurl/str
  str name=dir./str
  bool name=waitfalse/bool
  arr name=args str${montysolr.read.master:
 http://localhost
   
   
  
 
 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
/listener
   
This works, I still don't like the reload of the whole core, but it
   seems
like the easiest thing to do now.
   
-- roman
   
   
On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
 
wrote:
   
 Hi Peter,

 Thank you, I am glad to read that this usecase is not alien.

 I'd like to make the second instance (searcher) completely
  read-only,
   so
I
 have disabled all the components that can write.

 (being lazy ;)) I'll probably use
 http://wiki.apache.org/solr/CollectionDistribution to call the
 curl
after
 commit, or write some IndexReaderFactory that checks for changes

 The problem with calling the 'core reload' - is that it seems lots
  of
work
 for just opening a new searcher, eeekkk...somewhere I read that it
  is
cheap
 to reload a core, but re-opening the index searches must be
  definitely
 cheaper...

 roman


 On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge 
  peter.stu...@gmail.com
wrote:

 

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Peter Sturge
The RO instance commit isn't (or shouldn't be) doing any real writing, just
an empty commit to force new searchers, autowarm/refresh caches etc.
Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in
this area.
As long as you don't have autocommit in solrconfig.xml, there wouldn't be
any commits 'behind the scenes' (we do all our commits via a local solrj
client so it can be fully managed).
The only caveat might be NRT/soft commits, but I'm not too familiar with
this in 4.0.
In any case, your RO instance must be getting updated somehow, otherwise
how would it know your write instance made any changes?
Perhaps your write instance notifies the RO instance externally from Solr?
(a perfectly valid approach, and one that would allow a 'single' lock to
work without contention)



On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Interesting, we are running 4.0 - and solr will refuse the start (or
 reload) the core. But from looking at the code I am not seeing it is doing
 any writing - but I should digg more...

 Are you sure it needs to do writing? Because I am not calling commits, in
 fact I have deactivated *all* components that write into index, so unless
 there is something deep inside, which automatically calls the commit, it
 should never happen.

 roman


 On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com
 wrote:

  Hmmm, single lock sounds dangerous. It probably works ok because you've
  been [un]lucky.
  For example, even with a RO instance, you still need to do a commit in
  order to reload caches/changes from the other instance.
  What happens if this commit gets called in the middle of the other
  instance's commit? I've not tested this scenario, but it's very possible
  with a 'single' lock the results are indeterminate.
  If the 'single' lock mechanism is making assumptions e.g. no other
 process
  will interfere, and then one does, the Lucene index could very well get
  corrupted.
 
  For the error you're seeing using 'native', we use native lockType for
 both
  write and RO instances, and it works fine - no contention.
  Which version of Solr are you using? Perhaps there's been a change in
  behaviour?
 
  Peter
 
 
  On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   as i discovered, it is not good to use 'native' locktype in this
  scenario,
   actually there is a note in the solrconfig.xml which says the same
  
   when a core is reloaded and solr tries to grab lock, it will fail -
 even
  if
   the instance is configured to be read-only, so i am using 'single' lock
  for
   the readers and 'native' for the writer, which seems to work OK
  
   roman
  
  
   On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
  
I have auto commit after 40k RECs/1800secs. But I only tested with
  manual
commit, but I don't see why it should work differently.
Roman
On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com
 wrote:
   
If it makes you feel better, I also considered this approach when I
  was
   in
the same situation with a separate indexer and searcher on one
  Physical
linux machine.
   
My main concern was re-using the FS cache between both instances -
  If
   I
replicated to myself there would be two independent copies of the
  index,
FS-cached separately.
   
I like the suggestion of using autoCommit to reload the index. If
 I'm
reading that right, you'd set an autoCommit on 'zero docs changing',
  or
just 'every N seconds'? Did that work?
   
Best of luck!
   
Tim
   
   
On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:
   
 So here it is for a record how I am solving it right now:

 Write-master is started with: -Dmontysolr.warming.enabled=false
 -Dmontysolr.write.master=true -Dmontysolr.read.master=
 http://localhost:5005
 Read-master is started with: -Dmontysolr.warming.enabled=true
 -Dmontysolr.write.master=false


 solrconfig.xml changes:

 1. all index changing components have this bit,
 enable=${montysolr.master:true} - ie.

 updateHandler class=solr.DirectUpdateHandler2
  enable=${montysolr.master:true}

 2. for cache warming de/activation

 listener event=newSearcher
   class=solr.QuerySenderListener
   enable=${montysolr.enable.warming:true}...

 3. to trigger refresh of the read-only-master (from write-master):

 listener event=postCommit
   class=solr.RunExecutableListener
   enable=${montysolr.master:true}
   str name=execurl/str
   str name=dir./str
   bool name=waitfalse/bool
   arr name=args str${montysolr.read.master:
  http://localhost


   
  
 
 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
 /listener

 This works, I still don't like the reload of the 

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Michael Della Bitta
Wouldn't it be better to do a RELOAD?

http://wiki.apache.org/solr/CoreAdmin#RELOAD

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com wrote:

 The RO instance commit isn't (or shouldn't be) doing any real writing, just
 an empty commit to force new searchers, autowarm/refresh caches etc.
 Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in
 this area.
 As long as you don't have autocommit in solrconfig.xml, there wouldn't be
 any commits 'behind the scenes' (we do all our commits via a local solrj
 client so it can be fully managed).
 The only caveat might be NRT/soft commits, but I'm not too familiar with
 this in 4.0.
 In any case, your RO instance must be getting updated somehow, otherwise
 how would it know your write instance made any changes?
 Perhaps your write instance notifies the RO instance externally from Solr?
 (a perfectly valid approach, and one that would allow a 'single' lock to
 work without contention)



 On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote:

  Interesting, we are running 4.0 - and solr will refuse the start (or
  reload) the core. But from looking at the code I am not seeing it is
 doing
  any writing - but I should digg more...
 
  Are you sure it needs to do writing? Because I am not calling commits, in
  fact I have deactivated *all* components that write into index, so unless
  there is something deep inside, which automatically calls the commit, it
  should never happen.
 
  roman
 
 
  On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com
  wrote:
 
   Hmmm, single lock sounds dangerous. It probably works ok because you've
   been [un]lucky.
   For example, even with a RO instance, you still need to do a commit in
   order to reload caches/changes from the other instance.
   What happens if this commit gets called in the middle of the other
   instance's commit? I've not tested this scenario, but it's very
 possible
   with a 'single' lock the results are indeterminate.
   If the 'single' lock mechanism is making assumptions e.g. no other
  process
   will interfere, and then one does, the Lucene index could very well get
   corrupted.
  
   For the error you're seeing using 'native', we use native lockType for
  both
   write and RO instances, and it works fine - no contention.
   Which version of Solr are you using? Perhaps there's been a change in
   behaviour?
  
   Peter
  
  
   On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
  
as i discovered, it is not good to use 'native' locktype in this
   scenario,
actually there is a note in the solrconfig.xml which says the same
   
when a core is reloaded and solr tries to grab lock, it will fail -
  even
   if
the instance is configured to be read-only, so i am using 'single'
 lock
   for
the readers and 'native' for the writer, which seems to work OK
   
roman
   
   
On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
 I have auto commit after 40k RECs/1800secs. But I only tested with
   manual
 commit, but I don't see why it should work differently.
 Roman
 On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com
  wrote:

 If it makes you feel better, I also considered this approach when
 I
   was
in
 the same situation with a separate indexer and searcher on one
   Physical
 linux machine.

 My main concern was re-using the FS cache between both
 instances -
   If
I
 replicated to myself there would be two independent copies of the
   index,
 FS-cached separately.

 I like the suggestion of using autoCommit to reload the index. If
  I'm
 reading that right, you'd set an autoCommit on 'zero docs
 changing',
   or
 just 'every N seconds'? Did that work?

 Best of luck!

 Tim


 On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:

  So here it is for a record how I am solving it right now:
 
  Write-master is started with: -Dmontysolr.warming.enabled=false
  -Dmontysolr.write.master=true -Dmontysolr.read.master=
  http://localhost:5005
  Read-master is started with: -Dmontysolr.warming.enabled=true
  -Dmontysolr.write.master=false
 
 
  solrconfig.xml changes:
 
  1. all index changing components have this bit,
  enable=${montysolr.master:true} - ie.
 
  updateHandler class=solr.DirectUpdateHandler2
   enable=${montysolr.master:true}
 
  2. for cache warming de/activation
 
  listener event=newSearcher
class=solr.QuerySenderListener
   

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Tim Vaillancourt
If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.

My main concern was re-using the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:

 So here it is for a record how I am solving it right now:

 Write-master is started with: -Dmontysolr.warming.enabled=false
 -Dmontysolr.write.master=true -Dmontysolr.read.master=
 http://localhost:5005
 Read-master is started with: -Dmontysolr.warming.enabled=true
 -Dmontysolr.write.master=false


 solrconfig.xml changes:

 1. all index changing components have this bit,
 enable=${montysolr.master:true} - ie.

 updateHandler class=solr.DirectUpdateHandler2
  enable=${montysolr.master:true}

 2. for cache warming de/activation

 listener event=newSearcher
   class=solr.QuerySenderListener
   enable=${montysolr.enable.warming:true}...

 3. to trigger refresh of the read-only-master (from write-master):

 listener event=postCommit
   class=solr.RunExecutableListener
   enable=${montysolr.master:true}
   str name=execurl/str
   str name=dir./str
   bool name=waitfalse/bool
   arr name=args str${montysolr.read.master:http://localhost

 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
 /listener

 This works, I still don't like the reload of the whole core, but it seems
 like the easiest thing to do now.

 -- roman


 On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Hi Peter,
 
  Thank you, I am glad to read that this usecase is not alien.
 
  I'd like to make the second instance (searcher) completely read-only, so
 I
  have disabled all the components that can write.
 
  (being lazy ;)) I'll probably use
  http://wiki.apache.org/solr/CollectionDistribution to call the curl
 after
  commit, or write some IndexReaderFactory that checks for changes
 
  The problem with calling the 'core reload' - is that it seems lots of
 work
  for just opening a new searcher, eeekkk...somewhere I read that it is
 cheap
  to reload a core, but re-opening the index searches must be definitely
  cheaper...
 
  roman
 
 
  On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com
 wrote:
 
  Hi,
  We use this very same scenario to great effect - 2 instances using the
  same
  dataDir with many cores - 1 is a writer (no caching), the other is a
  searcher (lots of caching).
  To get the searcher to see the index changes from the writer, you need
 the
  searcher to do an empty commit - i.e. you invoke a commit with 0
  documents.
  This will refresh the caches (including autowarming), [re]build the
  relevant searchers etc. and make any index changes visible to the RO
  instance.
  Also, make sure to use lockTypenative/lockType in solrconfig.xml to
  ensure the two instances don't try to commit at the same time.
  There are several ways to trigger a commit:
  Call commit() periodically within your own code.
  Use autoCommit in solrconfig.xml.
  Use an RPC/IPC mechanism between the 2 instance processes to tell the
  searcher the index has changed, then call commit when called (more
 complex
  coding, but good if the index changes on an ad-hoc basis).
  Note, doing things this way isn't really suitable for an NRT
 environment.
 
  HTH,
  Peter
 
 
 
  On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Replication is fine, I am going to use it, but I wanted it for
 instances
   *distributed* across several (physical) machines - but here I have one
   physical machine, it has many cores. I want to run 2 instances of solr
   because I think it has these benefits:
  
   1) I can give less RAM to the writer (4GB), and use more RAM for the
   searcher (28GB)
   2) I can deactivate warming for the writer and keep it for the
 searcher
   (this considerably speeds up indexing - each time we commit, the
 server
  is
   rebuilding a citation network of 80M edges)
   3) saving disk space and better OS caching (OS should be able to use
  more
   RAM for the caching, which should result in faster operations - the
 two
   processes are accessing the same index)
  
   Maybe I should just forget it and go with the replication, but it
  doesn't
   'feel right' IFF it is on the same physical machine. And Lucene
   specifically has a method for discovering changes and re-opening the
  index
   (DirectoryReader.openIfChanged)
  
   Am I not seeing something?
  
   roman
  
  
  
   On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
   jhell...@innoventsolutions.com wrote:
  
Roman,
   
Could you be more specific as to 

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Roman Chyla
I have auto commit after 40k RECs/1800secs. But I only tested with manual
commit, but I don't see why it should work differently.
Roman
On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote:

 If it makes you feel better, I also considered this approach when I was in
 the same situation with a separate indexer and searcher on one Physical
 linux machine.

 My main concern was re-using the FS cache between both instances - If I
 replicated to myself there would be two independent copies of the index,
 FS-cached separately.

 I like the suggestion of using autoCommit to reload the index. If I'm
 reading that right, you'd set an autoCommit on 'zero docs changing', or
 just 'every N seconds'? Did that work?

 Best of luck!

 Tim


 On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:

  So here it is for a record how I am solving it right now:
 
  Write-master is started with: -Dmontysolr.warming.enabled=false
  -Dmontysolr.write.master=true -Dmontysolr.read.master=
  http://localhost:5005
  Read-master is started with: -Dmontysolr.warming.enabled=true
  -Dmontysolr.write.master=false
 
 
  solrconfig.xml changes:
 
  1. all index changing components have this bit,
  enable=${montysolr.master:true} - ie.
 
  updateHandler class=solr.DirectUpdateHandler2
   enable=${montysolr.master:true}
 
  2. for cache warming de/activation
 
  listener event=newSearcher
class=solr.QuerySenderListener
enable=${montysolr.enable.warming:true}...
 
  3. to trigger refresh of the read-only-master (from write-master):
 
  listener event=postCommit
class=solr.RunExecutableListener
enable=${montysolr.master:true}
str name=execurl/str
str name=dir./str
bool name=waitfalse/bool
arr name=args str${montysolr.read.master:http://localhost
 
 
 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
  /listener
 
  This works, I still don't like the reload of the whole core, but it seems
  like the easiest thing to do now.
 
  -- roman
 
 
  On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Hi Peter,
  
   Thank you, I am glad to read that this usecase is not alien.
  
   I'd like to make the second instance (searcher) completely read-only,
 so
  I
   have disabled all the components that can write.
  
   (being lazy ;)) I'll probably use
   http://wiki.apache.org/solr/CollectionDistribution to call the curl
  after
   commit, or write some IndexReaderFactory that checks for changes
  
   The problem with calling the 'core reload' - is that it seems lots of
  work
   for just opening a new searcher, eeekkk...somewhere I read that it is
  cheap
   to reload a core, but re-opening the index searches must be definitely
   cheaper...
  
   roman
  
  
   On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com
  wrote:
  
   Hi,
   We use this very same scenario to great effect - 2 instances using the
   same
   dataDir with many cores - 1 is a writer (no caching), the other is a
   searcher (lots of caching).
   To get the searcher to see the index changes from the writer, you need
  the
   searcher to do an empty commit - i.e. you invoke a commit with 0
   documents.
   This will refresh the caches (including autowarming), [re]build the
   relevant searchers etc. and make any index changes visible to the RO
   instance.
   Also, make sure to use lockTypenative/lockType in solrconfig.xml
 to
   ensure the two instances don't try to commit at the same time.
   There are several ways to trigger a commit:
   Call commit() periodically within your own code.
   Use autoCommit in solrconfig.xml.
   Use an RPC/IPC mechanism between the 2 instance processes to tell the
   searcher the index has changed, then call commit when called (more
  complex
   coding, but good if the index changes on an ad-hoc basis).
   Note, doing things this way isn't really suitable for an NRT
  environment.
  
   HTH,
   Peter
  
  
  
   On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
  
Replication is fine, I am going to use it, but I wanted it for
  instances
*distributed* across several (physical) machines - but here I have
 one
physical machine, it has many cores. I want to run 2 instances of
 solr
because I think it has these benefits:
   
1) I can give less RAM to the writer (4GB), and use more RAM for the
searcher (28GB)
2) I can deactivate warming for the writer and keep it for the
  searcher
(this considerably speeds up indexing - each time we commit, the
  server
   is
rebuilding a citation network of 80M edges)
3) saving disk space and better OS caching (OS should be able to use
   more
RAM for the caching, which should result in faster operations - the
  two
processes are accessing the same index)
   
Maybe I should just forget it and go with the replication, but it
   doesn't
'feel right' IFF it is on the 

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Peter Sturge
Hi,
We use this very same scenario to great effect - 2 instances using the same
dataDir with many cores - 1 is a writer (no caching), the other is a
searcher (lots of caching).
To get the searcher to see the index changes from the writer, you need the
searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
This will refresh the caches (including autowarming), [re]build the
relevant searchers etc. and make any index changes visible to the RO
instance.
Also, make sure to use lockTypenative/lockType in solrconfig.xml to
ensure the two instances don't try to commit at the same time.
There are several ways to trigger a commit:
Call commit() periodically within your own code.
Use autoCommit in solrconfig.xml.
Use an RPC/IPC mechanism between the 2 instance processes to tell the
searcher the index has changed, then call commit when called (more complex
coding, but good if the index changes on an ad-hoc basis).
Note, doing things this way isn't really suitable for an NRT environment.

HTH,
Peter



On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Replication is fine, I am going to use it, but I wanted it for instances
 *distributed* across several (physical) machines - but here I have one
 physical machine, it has many cores. I want to run 2 instances of solr
 because I think it has these benefits:

 1) I can give less RAM to the writer (4GB), and use more RAM for the
 searcher (28GB)
 2) I can deactivate warming for the writer and keep it for the searcher
 (this considerably speeds up indexing - each time we commit, the server is
 rebuilding a citation network of 80M edges)
 3) saving disk space and better OS caching (OS should be able to use more
 RAM for the caching, which should result in faster operations - the two
 processes are accessing the same index)

 Maybe I should just forget it and go with the replication, but it doesn't
 'feel right' IFF it is on the same physical machine. And Lucene
 specifically has a method for discovering changes and re-opening the index
 (DirectoryReader.openIfChanged)

 Am I not seeing something?

 roman



 On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
 jhell...@innoventsolutions.com wrote:

  Roman,
 
  Could you be more specific as to why replication doesn't meet your
  requirements?  It was geared explicitly for this purpose, including the
  automatic discovery of changes to the data on the index master.
 
  Jason
 
  On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
   OK, so I have verified the two instances can run alongside, sharing the
   same datadir
  
   All update handlers are unaccessible in the read-only master
  
   updateHandler class=solr.DirectUpdateHandler2
   enable=${solr.can.write:true}
  
   java -Dsolr.can.write=false .
  
   And I can reload the index manually:
  
   curl 
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
   
  
   But this is not an ideal solution; I'd like for the read-only server to
   discover index changes on its own. Any pointers?
  
   Thanks,
  
roman
  
  
   On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
  
   Hello,
  
   I need your expert advice. I am thinking about running two instances
 of
   solr that share the same datadirectory. The *reason* being: indexing
   instance is constantly building cache after every commit (we have a
 big
   cache) and this slows it down. But indexing doesn't need much RAM,
 only
  the
   search does (and server has lots of CPUs)
  
   So, it is like having two solr instances
  
   1. solr-indexing-master
   2. solr-read-only-master
  
   In the solrconfig.xml I can disable update components, It should be
  fine.
   However, I don't know how to 'trigger' index re-opening on (2) after
 the
   commit happens on (1).
  
   Ideally, the second instance could monitor the disk and re-open disk
  after
   new files appear there. Do I have to implement custom
  IndexReaderFactory?
   Or something else?
  
   Please note: I know about the replication, this usecase is IMHO
 slightly
   different - in fact, write-only-master (1) is also a replication
 master
  
   Googling turned out only this
   http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
  no
   pointers there.
  
   But If I am approaching the problem wrongly, please don't hesitate to
   're-educate' me :)
  
   Thanks!
  
roman
  
 
 



Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
Hi Peter,

Thank you, I am glad to read that this usecase is not alien.

I'd like to make the second instance (searcher) completely read-only, so I
have disabled all the components that can write.

(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl after
commit, or write some IndexReaderFactory that checks for changes

The problem with calling the 'core reload' - is that it seems lots of work
for just opening a new searcher, eeekkk...somewhere I read that it is cheap
to reload a core, but re-opening the index searches must be definitely
cheaper...

roman


On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote:

 Hi,
 We use this very same scenario to great effect - 2 instances using the same
 dataDir with many cores - 1 is a writer (no caching), the other is a
 searcher (lots of caching).
 To get the searcher to see the index changes from the writer, you need the
 searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
 This will refresh the caches (including autowarming), [re]build the
 relevant searchers etc. and make any index changes visible to the RO
 instance.
 Also, make sure to use lockTypenative/lockType in solrconfig.xml to
 ensure the two instances don't try to commit at the same time.
 There are several ways to trigger a commit:
 Call commit() periodically within your own code.
 Use autoCommit in solrconfig.xml.
 Use an RPC/IPC mechanism between the 2 instance processes to tell the
 searcher the index has changed, then call commit when called (more complex
 coding, but good if the index changes on an ad-hoc basis).
 Note, doing things this way isn't really suitable for an NRT environment.

 HTH,
 Peter



 On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Replication is fine, I am going to use it, but I wanted it for instances
  *distributed* across several (physical) machines - but here I have one
  physical machine, it has many cores. I want to run 2 instances of solr
  because I think it has these benefits:
 
  1) I can give less RAM to the writer (4GB), and use more RAM for the
  searcher (28GB)
  2) I can deactivate warming for the writer and keep it for the searcher
  (this considerably speeds up indexing - each time we commit, the server
 is
  rebuilding a citation network of 80M edges)
  3) saving disk space and better OS caching (OS should be able to use more
  RAM for the caching, which should result in faster operations - the two
  processes are accessing the same index)
 
  Maybe I should just forget it and go with the replication, but it doesn't
  'feel right' IFF it is on the same physical machine. And Lucene
  specifically has a method for discovering changes and re-opening the
 index
  (DirectoryReader.openIfChanged)
 
  Am I not seeing something?
 
  roman
 
 
 
  On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
   Roman,
  
   Could you be more specific as to why replication doesn't meet your
   requirements?  It was geared explicitly for this purpose, including the
   automatic discovery of changes to the data on the index master.
  
   Jason
  
   On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:
  
OK, so I have verified the two instances can run alongside, sharing
 the
same datadir
   
All update handlers are unaccessible in the read-only master
   
updateHandler class=solr.DirectUpdateHandler2
enable=${solr.can.write:true}
   
java -Dsolr.can.write=false .
   
And I can reload the index manually:
   
curl 
   
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1

   
But this is not an ideal solution; I'd like for the read-only server
 to
discover index changes on its own. Any pointers?
   
Thanks,
   
 roman
   
   
On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
Hello,
   
I need your expert advice. I am thinking about running two instances
  of
solr that share the same datadirectory. The *reason* being: indexing
instance is constantly building cache after every commit (we have a
  big
cache) and this slows it down. But indexing doesn't need much RAM,
  only
   the
search does (and server has lots of CPUs)
   
So, it is like having two solr instances
   
1. solr-indexing-master
2. solr-read-only-master
   
In the solrconfig.xml I can disable update components, It should be
   fine.
However, I don't know how to 'trigger' index re-opening on (2) after
  the
commit happens on (1).
   
Ideally, the second instance could monitor the disk and re-open disk
   after
new files appear there. Do I have to implement custom
   IndexReaderFactory?
Or something else?
   
Please note: I know about the replication, this usecase is IMHO
  slightly
different - in fact, write-only-master (1) is also a replication

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
So here it is for a record how I am solving it right now:

Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false


solrconfig.xml changes:

1. all index changing components have this bit,
enable=${montysolr.master:true} - ie.

updateHandler class=solr.DirectUpdateHandler2
 enable=${montysolr.master:true}

2. for cache warming de/activation

listener event=newSearcher
  class=solr.QuerySenderListener
  enable=${montysolr.enable.warming:true}...

3. to trigger refresh of the read-only-master (from write-master):

listener event=postCommit
  class=solr.RunExecutableListener
  enable=${montysolr.master:true}
  str name=execurl/str
  str name=dir./str
  bool name=waitfalse/bool
  arr name=args str${montysolr.read.master:http://localhost
}/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
/listener

This works, I still don't like the reload of the whole core, but it seems
like the easiest thing to do now.

-- roman


On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Peter,

 Thank you, I am glad to read that this usecase is not alien.

 I'd like to make the second instance (searcher) completely read-only, so I
 have disabled all the components that can write.

 (being lazy ;)) I'll probably use
 http://wiki.apache.org/solr/CollectionDistribution to call the curl after
 commit, or write some IndexReaderFactory that checks for changes

 The problem with calling the 'core reload' - is that it seems lots of work
 for just opening a new searcher, eeekkk...somewhere I read that it is cheap
 to reload a core, but re-opening the index searches must be definitely
 cheaper...

 roman


 On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.comwrote:

 Hi,
 We use this very same scenario to great effect - 2 instances using the
 same
 dataDir with many cores - 1 is a writer (no caching), the other is a
 searcher (lots of caching).
 To get the searcher to see the index changes from the writer, you need the
 searcher to do an empty commit - i.e. you invoke a commit with 0
 documents.
 This will refresh the caches (including autowarming), [re]build the
 relevant searchers etc. and make any index changes visible to the RO
 instance.
 Also, make sure to use lockTypenative/lockType in solrconfig.xml to
 ensure the two instances don't try to commit at the same time.
 There are several ways to trigger a commit:
 Call commit() periodically within your own code.
 Use autoCommit in solrconfig.xml.
 Use an RPC/IPC mechanism between the 2 instance processes to tell the
 searcher the index has changed, then call commit when called (more complex
 coding, but good if the index changes on an ad-hoc basis).
 Note, doing things this way isn't really suitable for an NRT environment.

 HTH,
 Peter



 On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Replication is fine, I am going to use it, but I wanted it for instances
  *distributed* across several (physical) machines - but here I have one
  physical machine, it has many cores. I want to run 2 instances of solr
  because I think it has these benefits:
 
  1) I can give less RAM to the writer (4GB), and use more RAM for the
  searcher (28GB)
  2) I can deactivate warming for the writer and keep it for the searcher
  (this considerably speeds up indexing - each time we commit, the server
 is
  rebuilding a citation network of 80M edges)
  3) saving disk space and better OS caching (OS should be able to use
 more
  RAM for the caching, which should result in faster operations - the two
  processes are accessing the same index)
 
  Maybe I should just forget it and go with the replication, but it
 doesn't
  'feel right' IFF it is on the same physical machine. And Lucene
  specifically has a method for discovering changes and re-opening the
 index
  (DirectoryReader.openIfChanged)
 
  Am I not seeing something?
 
  roman
 
 
 
  On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
   Roman,
  
   Could you be more specific as to why replication doesn't meet your
   requirements?  It was geared explicitly for this purpose, including
 the
   automatic discovery of changes to the data on the index master.
  
   Jason
  
   On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
  
OK, so I have verified the two instances can run alongside, sharing
 the
same datadir
   
All update handlers are unaccessible in the read-only master
   
updateHandler class=solr.DirectUpdateHandler2
enable=${solr.can.write:true}
   
java -Dsolr.can.write=false .
   
And I can reload the index manually:
   
curl 
   
  
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1

  

Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Hello,

I need your expert advice. I am thinking about running two instances of
solr that share the same datadirectory. The *reason* being: indexing
instance is constantly building cache after every commit (we have a big
cache) and this slows it down. But indexing doesn't need much RAM, only the
search does (and server has lots of CPUs)

So, it is like having two solr instances

1. solr-indexing-master
2. solr-read-only-master

In the solrconfig.xml I can disable update components, It should be fine.
However, I don't know how to 'trigger' index re-opening on (2) after the
commit happens on (1).

Ideally, the second instance could monitor the disk and re-open disk after
new files appear there. Do I have to implement custom IndexReaderFactory?
Or something else?

Please note: I know about the replication, this usecase is IMHO slightly
different - in fact, write-only-master (1) is also a replication master

Googling turned out only this
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
pointers there.

But If I am approaching the problem wrongly, please don't hesitate to
're-educate' me :)

Thanks!

  roman


Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
OK, so I have verified the two instances can run alongside, sharing the
same datadir

All update handlers are unaccessible in the read-only master

updateHandler class=solr.DirectUpdateHandler2
 enable=${solr.can.write:true}

java -Dsolr.can.write=false .

And I can reload the index manually:

curl 
http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1


But this is not an ideal solution; I'd like for the read-only server to
discover index changes on its own. Any pointers?

Thanks,

  roman


On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hello,

 I need your expert advice. I am thinking about running two instances of
 solr that share the same datadirectory. The *reason* being: indexing
 instance is constantly building cache after every commit (we have a big
 cache) and this slows it down. But indexing doesn't need much RAM, only the
 search does (and server has lots of CPUs)

 So, it is like having two solr instances

 1. solr-indexing-master
 2. solr-read-only-master

 In the solrconfig.xml I can disable update components, It should be fine.
 However, I don't know how to 'trigger' index re-opening on (2) after the
 commit happens on (1).

 Ideally, the second instance could monitor the disk and re-open disk after
 new files appear there. Do I have to implement custom IndexReaderFactory?
 Or something else?

 Please note: I know about the replication, this usecase is IMHO slightly
 different - in fact, write-only-master (1) is also a replication master

 Googling turned out only this
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
 pointers there.

 But If I am approaching the problem wrongly, please don't hesitate to
 're-educate' me :)

 Thanks!

   roman



Re: Two instances of solr - the same datadir?

2013-06-04 Thread Jason Hellman
Roman,

Could you be more specific as to why replication doesn't meet your 
requirements?  It was geared explicitly for this purpose, including the 
automatic discovery of changes to the data on the index master.  

Jason

On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:

 OK, so I have verified the two instances can run alongside, sharing the
 same datadir
 
 All update handlers are unaccessible in the read-only master
 
 updateHandler class=solr.DirectUpdateHandler2
 enable=${solr.can.write:true}
 
 java -Dsolr.can.write=false .
 
 And I can reload the index manually:
 
 curl 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
 
 
 But this is not an ideal solution; I'd like for the read-only server to
 discover index changes on its own. Any pointers?
 
 Thanks,
 
  roman
 
 
 On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
 Hello,
 
 I need your expert advice. I am thinking about running two instances of
 solr that share the same datadirectory. The *reason* being: indexing
 instance is constantly building cache after every commit (we have a big
 cache) and this slows it down. But indexing doesn't need much RAM, only the
 search does (and server has lots of CPUs)
 
 So, it is like having two solr instances
 
 1. solr-indexing-master
 2. solr-read-only-master
 
 In the solrconfig.xml I can disable update components, It should be fine.
 However, I don't know how to 'trigger' index re-opening on (2) after the
 commit happens on (1).
 
 Ideally, the second instance could monitor the disk and re-open disk after
 new files appear there. Do I have to implement custom IndexReaderFactory?
 Or something else?
 
 Please note: I know about the replication, this usecase is IMHO slightly
 different - in fact, write-only-master (1) is also a replication master
 
 Googling turned out only this
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
 pointers there.
 
 But If I am approaching the problem wrongly, please don't hesitate to
 're-educate' me :)
 
 Thanks!
 
  roman
 



Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Replication is fine, I am going to use it, but I wanted it for instances
*distributed* across several (physical) machines - but here I have one
physical machine, it has many cores. I want to run 2 instances of solr
because I think it has these benefits:

1) I can give less RAM to the writer (4GB), and use more RAM for the
searcher (28GB)
2) I can deactivate warming for the writer and keep it for the searcher
(this considerably speeds up indexing - each time we commit, the server is
rebuilding a citation network of 80M edges)
3) saving disk space and better OS caching (OS should be able to use more
RAM for the caching, which should result in faster operations - the two
processes are accessing the same index)

Maybe I should just forget it and go with the replication, but it doesn't
'feel right' IFF it is on the same physical machine. And Lucene
specifically has a method for discovering changes and re-opening the index
(DirectoryReader.openIfChanged)

Am I not seeing something?

roman



On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
jhell...@innoventsolutions.com wrote:

 Roman,

 Could you be more specific as to why replication doesn't meet your
 requirements?  It was geared explicitly for this purpose, including the
 automatic discovery of changes to the data on the index master.

 Jason

 On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:

  OK, so I have verified the two instances can run alongside, sharing the
  same datadir
 
  All update handlers are unaccessible in the read-only master
 
  updateHandler class=solr.DirectUpdateHandler2
  enable=${solr.can.write:true}
 
  java -Dsolr.can.write=false .
 
  And I can reload the index manually:
 
  curl 
 
 http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
  
 
  But this is not an ideal solution; I'd like for the read-only server to
  discover index changes on its own. Any pointers?
 
  Thanks,
 
   roman
 
 
  On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
  Hello,
 
  I need your expert advice. I am thinking about running two instances of
  solr that share the same datadirectory. The *reason* being: indexing
  instance is constantly building cache after every commit (we have a big
  cache) and this slows it down. But indexing doesn't need much RAM, only
 the
  search does (and server has lots of CPUs)
 
  So, it is like having two solr instances
 
  1. solr-indexing-master
  2. solr-read-only-master
 
  In the solrconfig.xml I can disable update components, It should be
 fine.
  However, I don't know how to 'trigger' index re-opening on (2) after the
  commit happens on (1).
 
  Ideally, the second instance could monitor the disk and re-open disk
 after
  new files appear there. Do I have to implement custom
 IndexReaderFactory?
  Or something else?
 
  Please note: I know about the replication, this usecase is IMHO slightly
  different - in fact, write-only-master (1) is also a replication master
 
  Googling turned out only this
  http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
 no
  pointers there.
 
  But If I am approaching the problem wrongly, please don't hesitate to
  're-educate' me :)
 
  Thanks!
 
   roman