> Thanks for the answers, more questions below. > > On 2/16/2011 3:37 PM, Markus Jelsma wrote: > > 200.000 stored fields? I asume that number includes your number of > > documents? Sounds crazy =) > > Nope, I wasn't clear. I have less than a dozen stored field, but the > value of a stored field can sometimes be as large as 200kb. > > > You can set mergeFactor to 2, not lower. > > Am I right though that manually running an 'optimize' is the equivalent > of a mergeFactor=1? So there's no way to get Solr to keep the index in > an 'always optimized' state, if I'm understanding correctly? Cool. Just > want to understand what's going on.
That should be it. If i remember correctly a second segment is always written, new updates aren't merged immediately. > > > This depends on commit rate and if there are a lot of updates and deletes > > instead of adds. Setting it very low will indeed cause a lot of merging > > and slow commits. It will also be very slow in replication because > > merged files are copied over again and again, causing high I/O on your > > slaves. > > > > There is always a `break even` but it depends (as usual) on your scenario > > and business demands. > > There are indeed sadly lots of updates and deletes, which is why I need > to run optimize periodically. I am aware that this will cause more work > for replication -- I think this is true whether I manually issue an > optimize before replication _or_ whether I just keep the mergeFactor > very low, right? Same issue either way. Yes. But having several segments shouldn't make that much of a difference. If search latency is just a few addidional milliseconds than i'd rather have a few more segments being copied over more quickly. > > So... if I'm going to do lots of updates and deletes, and my other > option is running an optimize before replication anyway.... is there > any reason it's going to be completely stupid to set the mergeFactor to > 2 on the master? I realize it'll mean all index files are going to have > to be replicated, but that would be the case if I ran a manual optimize > in the same situation before replication too, I think. No, it's not stupid if you allow for slow indexing and slow copying of files but want a very quick search. > > Jonathan