Re: should slave replication be turned off / on during master clean and re-index?

2012-05-03 Thread geeky2
thanks for all of the advice / help.

i appreciate it ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3959088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-05-03 Thread Shawn Heisey

On 5/1/2012 6:55 AM, geeky2 wrote:

you said, you don't use autocommit.  if so - then why don't you use / like
autocommit?


It's not really that I don't like it, I just don't need it.  I think 
that it actually caused me problems when I first started using Solr 
(1.4.0), but that's been long enough ago that I no longer remember.


I use the live/build core method, so I do not need to be able to search 
the documents as they are being added.  A commit at the end is good 
enough.  It already creates multiple Lucene segments when 
ramBufferSizeMB fills up.


I used to use the dataimporter for everything, with a Perl-based build 
system using cron and LWP.  Now I have a multi-threaded SolrJ 
application that only use the importer for full rebuilds, which are very 
rare.  Because I could not do replication between 1.4.1 and 3.x, I had 
to abandon replication in order to upgrade Solr.  The new build program 
updates both of my index chains in parallel.


Thanks,
Shawn



Re: should slave replication be turned off / on during master clean and re-index?

2012-05-02 Thread Erick Erickson
Simply turn off replication during your rebuild-from-scratch. See:
http://wiki.apache.org/solr/SolrReplication#HTTP_API
the disabelreplication command.

The autocommit thing was, I think, in reference to keeping
any replication of a partial-rebuild from being replicated.
Autocommit is usually a fine thing.

So your full-rebuild looks like this
1 disable replication on the master
2 rebuild the index (autocommit on or off, makes little difference as
far as replication)
3 enable replication on the master

Best
Erick

On Tue, May 1, 2012 at 8:55 AM, geeky2 gee...@hotmail.com wrote:
 hello shawn,

 thanks for the reply.

 ok - i did some testing and yes you are correct.

 autocommit is doing the commit work in chunks. yes - the slaves are also
 going to having everything to nothing, then slowly building back up again,
 lagging behind the master.

 ... and yes - this is probably not what we need - as far as a replication
 strategy for the slaves.

 you said, you don't use autocommit.  if so - then why don't you use / like
 autocommit?

 since we have not done this here - there is no established reference point,
 from an operations perspective.

 i am looking to formulate some sort of operation strategy, so ANY ideas or
 input is really welcome.



 it seems to me that we have to account for two operational strategies -

 the first operational mode is a daily append to the solr core after the
 database tables have been updated.  this can probably be done with a simple
 delta import.  i would think that autocommit could remain on for the master
 and replication could also be left on so the slaves picked up the changes
 ASAP.  this seems like the mode that we would / should be in most of the
 time.


 the second operational mode would be a build from scratch mode, where
 changes in the schema necessitated a full re-index of the data.  given that
 our site (powered by solr) must be up all of the time, and that our full
 index time on the master (for the moment) is hovering somewhere around 16
 hours - it makes sense that some sort of parallel path - with a cut-over,
 must be used.

 in this situation is it possible to have the indexing process going on in
 the background - then have one commit at the end - then turn replication on
 for the slaves?

 are there disadvantages to this approach?

 also - i really like your suggestion of a build core and live core.  is
 this approach you use?

 thank you for all of the great input




 then


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-05-01 Thread geeky2
hello shawn,

thanks for the reply.

ok - i did some testing and yes you are correct.  

autocommit is doing the commit work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.

... and yes - this is probably not what we need - as far as a replication
strategy for the slaves.

you said, you don't use autocommit.  if so - then why don't you use / like
autocommit?

since we have not done this here - there is no established reference point,
from an operations perspective.

i am looking to formulate some sort of operation strategy, so ANY ideas or
input is really welcome.



it seems to me that we have to account for two operational strategies - 

the first operational mode is a daily append to the solr core after the
database tables have been updated.  this can probably be done with a simple
delta import.  i would think that autocommit could remain on for the master
and replication could also be left on so the slaves picked up the changes
ASAP.  this seems like the mode that we would / should be in most of the
time.


the second operational mode would be a build from scratch mode, where
changes in the schema necessitated a full re-index of the data.  given that
our site (powered by solr) must be up all of the time, and that our full
index time on the master (for the moment) is hovering somewhere around 16
hours - it makes sense that some sort of parallel path - with a cut-over,
must be used.

in this situation is it possible to have the indexing process going on in
the background - then have one commit at the end - then turn replication on
for the slaves?

are there disadvantages to this approach?

also - i really like your suggestion of a build core and live core.  is
this approach you use?

thank you for all of the great input




then 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-04-29 Thread Shawn Heisey

On 4/27/2012 8:33 PM, geeky2 wrote:

well, in this case when i say, clean  (on the Master), i mean selecting
the Full Import with Cleaning button from the DataImportHandler
Development Console page in solr.  at the top of the page, i have the check
boxes selected for verbose and clean (*but i don't have the commit checkbox
selected*).

by doing the above process - doesn't this issue a deletion query - then
start the import?

and as a follow-up - when actually is the commit being done?


here is my from my solrconfig.xml file on the master

   updateHandler class=solr.DirectUpdateHandler2
*autoCommit
   maxTime6/maxTime
   maxDocs1000/maxDocs
 /autoCommit*
 maxPendingDeletes10/maxPendingDeletes
   /updateHandler


With commit turned off on the import, the *import* will not do a commit 
at any time, so something else has to do the commit or you will never 
see the new index.


In your case, you are relying on autocommit.  Because I don't use 
autocommit, I can't say for sure that the following is right, but I 
believe that it is:  With your settings during a full import, your index 
will go from having everything in it to having 1000 documents or less 
within one minute of the import starting.


If that is indeed what happens (and you should definitely test to make 
sure) and you have replication active, your slaves would have a reduced 
index that would slowly build back up as the import progressed on the 
master.  I am pretty sure that's not what you want, so it is a good idea 
to disable replication until the full import is complete.


There is another option, one that would be a good idea if you make 
additions/deletions to your index on an interval that is smaller than 
the time it takes for a full-import:  Maintain a live core and a build 
core on your master server.  Build a new index in the build core while 
simultaneously keeping the live core up to date.  When the build is 
complete, update it to be current and then swap the live core and build 
core.  If replication is set up correctly, the slaves should replicate 
the new index as soon as the cores are swapped.


Thanks,
Shawn



should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread geeky2
hello all,

i am just getting replication going on our master and two (2) slaves.

from time to time, i may need to do a complete re-index and clean on the
master.

should replication on the slave - remain On or Off during a full clean and
re-index on the Master?

thank you,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread Jeff Schmidt
Does a clean mean issuing a deletion query (e.g. 
deleteid*:*/id/delete) prior to re-indexing all of your content?  I 
don't think the slaves will download any changes until you've committed at some 
point on the master.  If you delete everything and then commit, and proceed to 
re-index, then the slaves will pick that up at some point, perhaps sooner than 
you'd like.

If you expect the slaves to continue to serve queries during this process, then 
don't commit on the master until you want the slaves to be aware of what you've 
done.  If it's more complicated where you're going to perform multiple commits 
on the master, or shut it down and remove the data files etc., then I think it 
makes sense to turn off replication during that interval. Once you're happy 
with your update on the master, then enable replication again.

Cheers,

Jeff

On Apr 27, 2012, at 3:59 PM, geeky2 wrote:

 hello all,
 
 i am just getting replication going on our master and two (2) slaves.
 
 from time to time, i may need to do a complete re-index and clean on the
 master.
 
 should replication on the slave - remain On or Off during a full clean and
 re-index on the Master?
 
 thank you,
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945531.html
 Sent from the Solr - User mailing list archive at Nabble.com.



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Re: should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread geeky2
hello,

thank you for the reply,


Does a clean mean issuing a deletion query (e.g.
deleteid*:*/id/delete) prior to re-indexing all of your content?  I
don't think the slaves will download any changes until you've committed at
some point on the master.  


well, in this case when i say, clean  (on the Master), i mean selecting
the Full Import with Cleaning button from the DataImportHandler
Development Console page in solr.  at the top of the page, i have the check
boxes selected for verbose and clean (*but i don't have the commit checkbox
selected*).

by doing the above process - doesn't this issue a deletion query - then
start the import?

and as a follow-up - when actually is the commit being done?


here is my from my solrconfig.xml file on the master

  updateHandler class=solr.DirectUpdateHandler2
*autoCommit
  maxTime6/maxTime
  maxDocs1000/maxDocs
/autoCommit*
maxPendingDeletes10/maxPendingDeletes
  /updateHandler






--
View this message in context: 
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945954.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: should slave replication be turned off / on during master clean and re-index?

2012-04-27 Thread Jeevanandam Madanagopal
I guess you're looking for 'disabling replication poll on slave'

go to 'Replication dashboard[1]', there you have options like Enable/Disable 
Poll, Force replication, Abort replication
dashboard url: http://slave_host:port/solr/corename/admin/replication/index.jsp

Poll Disabled = slave will not poll master for replication

- Jeevanandam
[1] http://wiki.apache.org/solr/SolrReplication#Replication_Dashboard

On Apr 28, 2012, at 8:03 AM, geeky2 wrote:

 hello,
 
 thank you for the reply,
 
 
 Does a clean mean issuing a deletion query (e.g.
 deleteid*:*/id/delete) prior to re-indexing all of your content?  I
 don't think the slaves will download any changes until you've committed at
 some point on the master.  
 
 
 well, in this case when i say, clean  (on the Master), i mean selecting
 the Full Import with Cleaning button from the DataImportHandler
 Development Console page in solr.  at the top of the page, i have the check
 boxes selected for verbose and clean (*but i don't have the commit checkbox
 selected*).
 
 by doing the above process - doesn't this issue a deletion query - then
 start the import?
 
 and as a follow-up - when actually is the commit being done?
 
 
 here is my from my solrconfig.xml file on the master
 
  updateHandler class=solr.DirectUpdateHandler2
 *autoCommit
  maxTime6/maxTime
  maxDocs1000/maxDocs
/autoCommit*
maxPendingDeletes10/maxPendingDeletes
  /updateHandler
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945954.html
 Sent from the Solr - User mailing list archive at Nabble.com.