Erick,

I haven't changed the maxCommitsToKeep yet.
We stopped the slave that had issues and removed the data dir as you
pointed and afer starting it, everything started working as normal.
I guess that at some point someone commited on the slave or even copied the
master files over and made this mess. Will check on the internal docs to
prevent this from happening again.

Thanks for explaining the whole concept, will be useful to understand the
whole process.

Best,
Alexandre

On Fri, Mar 23, 2012 at 4:05 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Alexandre:
>
> Have you changed anything like <maxCommitsToKeep> on your slave?
> And do you have more than one slave? If you do, have you considered
> just blowing away the entire .../data directory on the slave and letting
> it re-start from scratch? I'd take the slave out of service for the
> duration of this operation, or do it when you are OK with some number of
> requests going to an empty index....
>
> Because having an index.<timestamp> directory indicates that sometime
> someone forced the slave to get out of sync, possibly as you say by
> doing a commit. Or sending docs to it to be indexed or some such. Starting
> the slave over should fix that if it's the root of your problem.
>
> Note a curious thing about the <timestamp>. When you start indexing, the
> index version is a timestamp. However, from that point on when the index
> changes, the version number is just incremented (not made the current
> time). This is to avoid problems with masters and slaves having different
> times. But a consequence of that is if your slave somehow gets an index
> that's newer, the replication process does the best it can to not delete
> indexes that are out of sync with the master and saves them away. This
> might be what you're seeing.
>
> I'm grasping at straws a bit here, but this seems possible.
>
> Best
> Erick
>
> On Fri, Mar 23, 2012 at 1:16 PM, Alexandre Rocco <alel...@gmail.com>
> wrote:
> > Tomás,
> >
> > The 300+GB size is only inside the index.20110926152410 dir. Inside there
> > are a lot of files.
> > I am almost conviced that something is messed up like someone commited on
> > this slave machine.
> >
> > Thanks
> >
> > 2012/3/23 Tomás Fernández Löbbe <tomasflo...@gmail.com>
> >
> >> Alexandre, additionally to what Erick said, you may want to check in the
> >> slave if what's 300+GB is the "data" directory or the
> "index.<timestamp>"
> >> directory.
> >>
> >> On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >wrote:
> >>
> >> > not really, unless perhaps you're issuing commits or optimizes
> >> > on the _slave_ (which you should NOT do).
> >> >
> >> > Replication happens based on the version of the index on the master.
> >> > True, it starts out as a timestamp, but then successive versions
> >> > just have that number incremented. The version number
> >> > in the index on the slave is compared against the one on the master,
> >> > but the actual time (on the slave or master) is irrelevant. This is
> >> > explicitly to avoid problems with time synching across
> >> > machines/timezones/whataver....
> >> >
> >> > It would be instructive to look at the admin/info page to see what
> >> > the index version is on the master and slave.
> >> >
> >> > But, if you optimize or commit (I think) on the _slave_, you might
> >> > change the timestamp and mess things up (although I'm reaching
> >> > here, I don't know this for certain).
> >> >
> >> > What's the  index look like on the slave as compared to the master?
> >> > Are there just a bunch of files on the slave? Or a bunch of
> directories?
> >> >
> >> > Instead of re-indexing on the master, you could try to bring down the
> >> > slave, blow away the entire index and start it back up. Since this is
> a
> >> > production system, I'd only try this if I had more than one slave.
> >> Although
> >> > you could bring up a new slave and attach it to the master and see
> >> > what happens there. You wouldn't affect production if you didn't point
> >> > incoming requests at it...
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco <alel...@gmail.com>
> >> > wrote:
> >> > > Erick,
> >> > >
> >> > > We're using Solr 3.3 on Linux (CentOS 5.6).
> >> > > The /data dir on master is actually 1.2G.
> >> > >
> >> > > I haven't tried to recreate the index yet. Since it's a production
> >> > > environment,
> >> > > I guess that I can stop replication and indexing and then recreate
> the
> >> > > master index to see if it makes any difference.
> >> > >
> >> > > Also just noticed another thread here named "Simple Slave
> Replication
> >> > > Question" that tells that it could be a problem if I'm seeing an
> >> > > /data/index with an timestamp on the slave node.
> >> > > Is this info relevant to this issue?
> >> > >
> >> > > Thanks,
> >> > > Alexandre
> >> > >
> >> > > On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson <
> >> > erickerick...@gmail.com>wrote:
> >> > >
> >> > >> What version of Solr and what operating system?
> >> > >>
> >> > >> But regardless, this shouldn't be happening. Indexes can
> >> > >> temporarily double in size, but any extras should be
> >> > >> cleaned up relatively soon.
> >> > >>
> >> > >> On the master, what's the total size of the <solr home>/data
> >> directory?
> >> > >> I'm a little suspicious of the <backupAfter> on your master, but I
> >> > >> don't think that's the root of your problem....
> >> > >>
> >> > >> Are you recreating the index on the master (by deleting the
> >> > >> index directory and starting over)?
> >> > >>
> >> > >> This is unusual, and I suspect it's something odd in your
> >> configuration,
> >> > >> but I confess I'm at a loss as to what.
> >> > >>
> >> > >> Best
> >> > >> Erick
> >> > >>
> >> > >> On Fri, Mar 23, 2012 at 10:28 AM, Alexandre Rocco <
> alel...@gmail.com>
> >> > >> wrote:
> >> > >> > Hello,
> >> > >> >
> >> > >> > We have a Solr index that has an average of 1.19 GB in size.
> >> > >> > After configuring the replication, the slave machine is growing
> the
> >> > index
> >> > >> > size expoentially.
> >> > >> > Currently we have an slave with 323.44 GB in size.
> >> > >> > Is there anything that could cause this behavior?
> >> > >> > The current replication config is below.
> >> > >> >
> >> > >> > Master:
> >> > >> > <requestHandler name="/replication"
> class="solr.ReplicationHandler">
> >> > >> > <lst name="master">
> >> > >> > <str name="replicateAfter">commit</str>
> >> > >> > <str name="replicateAfter">startup</str>
> >> > >> > <str name="backupAfter">startup</str>
> >> > >> > <str name="confFiles">
> >> > >> >
> >> > >>
> >> >
> >>
> elevate.xml,protwords.txt,schema.xml,spellings.txt,stopwords.txt,synonyms.txt
> >> > >> > </str>
> >> > >> > </lst>
> >> > >> > </requestHandler>
> >> > >> >
> >> > >> > Slave:
> >> > >> > <requestHandler name="/replication"
> class="solr.ReplicationHandler">
> >> > >> > <lst name="slave">
> >> > >> > <str name="masterUrl">http://master:8984/solr/Index/replication
> >> </str>
> >> > >> > </lst>
> >> > >> > </requestHandler>
> >> > >> >
> >> > >> > Any pointers will be useful.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Alexandre
> >> > >>
> >> >
> >>
>

Reply via email to