Re: Conditions for replication to copy full index

Chris Ulicny Mon, 06 Mar 2017 09:29:58 -0800

Thanks Erik. I love Mike's video on segment merging.

However I do not believe a large number of merged segments or accidental
optimization is the issue. The data in the core is mostly static and there
is no evidence so far of a large number of merges that took place. Usually
the only updates the index receives are deletes.

The other reason I assume it was a copy of the entire data directory is
that the log lines for the IndexFetcher threads have the fullCopy flag set
to true, where the usual replication seems to have it set to false. This
fullCopy for the core in question is preceded by a failure to fetch the
index on the previous replication attempt, but the subsequent check yields
matching generations between the slave and master. I've included the logs
for the indexFetcher thread for the core.

11:13:00,138 ERROR [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Master at: <master> is not available. Index
fetch failed. Exception: IOException occured when talking to server at:
<master>
11:14:00,036 INFO  [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Master's generation: 182823
11:14:00,044 INFO  [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Slave's generation: 182823
11:14:00,081 INFO  [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Starting replication process
11:14:00,422 INFO  [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Number of files in latest index in master: 404
11:14:00,435 INFO  [org.apache.solr.core.CachingDirectoryFactory]
(indexFetcher-23-thread-1) return new directory for
/<path>/data/index.20170306111400434
11:14:00,555 INFO  [org.apache.solr.handler.IndexFetcher]
(indexFetcher-23-thread-1) Starting download to
NRTCachingDirectory(MMapDirectory@/<path>/data/index.20170306111400434
lockFactory=org.apache.lucene.store.NativeFSLockFactory@6a453731;
maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true

Thanks

On Mon, Mar 6, 2017 at 11:30 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> We need to be pretty nit-picky here.
>
> bq: do a full copy of an index instead of only the necessary files
>
> It's all about "necessary files". "necessary" here means a
> all changed segments. Since segments are not changed
> after a commit, then replication can safely ignore any segments
> files it already has and only copies new segments.
>
> The rub is that "new" includes merged segments. And it's
> possible that _all_ current segments are merged into a new
> segment. At that point, technically, a full copy is done.
>
> You can force this by an optimize (not recommended) or,
> perhaps expungeDeletes options.
>
> Here's a great video of segment merging, the third one down
> is the TieredMergePolicy which has been the default for some
> time.
>
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> And, if you want to force a full replication, shut down the slave,
> "rm -rf data". (data should be the parent of the "index" dir) and
> restart solr.
>
> Best,
> Erick
>
> On Mon, Mar 6, 2017 at 8:06 AM, Chris Ulicny <culicny@iq.media> wrote:
> > Hi all,
> >
> > We've recently had some issues with a 5.1.0 core copying the whole index
> > when it was set to replicate from a master core.
> >
> > I've read that if there are documents that have been added to the slave
> > core by mistake, it will do a full copy. Though we are still
> investigating,
> > this is probably not the cause of it.
> >
> > Are there any other conditions in which the slave core will do a full
> copy
> > of an index instead of only the necessary files?
> >
> > Thanks,
> > Chris
>

Re: Conditions for replication to copy full index

Reply via email to