Re: Upgrading cluster from 4 to 5. Slow replication detected.
I am guessing that the index has got corrupted somehow and deleted the data directory on the slave. It has started copying the index. I'll report here once that gets completed. If there is any other suggestion you might have please reply back in the meantime. Thanks. On Wed, Apr 19, 2017 at 12:21 PM, Himanshu Sachdeva <himan...@limeroad.com> wrote: > Hello Shawn, > > Thanks for taking the time out to help me. I had assigned 45GB to the heap > as starting memory and maximum memory it can use. The logs show the > following two warnings repeatedly : > >- IndexFetcher : Cannot complete replication attempt because file >already exists. >- IndexFetcher : Replication attempt was not successful - trying a >full index replication reloadCore=false. > > > > On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote: >> > We're starting to upgrade our solr cluster to version 5.5. So we >> > removed one slave node from the cluster and installed solr 5.5.4 on it >> > and started solr. So it started copying the index from the master. >> > However, we noticed a drop in the replication speed compared to the >> > other nodes which were still running solr 4. To do a fair comparison, >> > I removed another slave node from the cluster and disabled replication >> > on it till the new node has caught up with it. When both these nodes >> > were at the same index generation, I turned replication on for both >> > the nodes. Now, it has been over 15 hours since this exercise and the >> > new node has again started lagging behind. Currently, the node with >> > solr 5.5 is seven generations behind the other node. >> >> Version 5 is capable of replication bandwidth throttling, but unless you >> actually configure the maxWriteMBPerSec attribute in the replication >> handler definition, this should not happen by default. >> >> One problem that I think might be possible is that the heap has been >> left at the default 512MB on the new 5.5.4 install and therefore the >> machine is doing constant full garbage collections to free up memory for >> normal operation, which would make Solr run EXTREMELY slowly. >> Eventually a machine in this state would most likely encounter an >> OutOfMemoryError. On non-windows systems, OOME will cause a forced halt >> of the entire Solr instance. >> >> The heap might not be the problem ... if it's not, then I do not know >> what is going on. Are there any errors or warnings in solr.log? >> >> Thanks, >> Shawn >> >> > > > -- > Himanshu Sachdeva > > -- Himanshu Sachdeva
Re: Upgrading cluster from 4 to 5. Slow replication detected.
Hello Shawn, Thanks for taking the time out to help me. I had assigned 45GB to the heap as starting memory and maximum memory it can use. The logs show the following two warnings repeatedly : - IndexFetcher : Cannot complete replication attempt because file already exists. - IndexFetcher : Replication attempt was not successful - trying a full index replication reloadCore=false. On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote: > > We're starting to upgrade our solr cluster to version 5.5. So we > > removed one slave node from the cluster and installed solr 5.5.4 on it > > and started solr. So it started copying the index from the master. > > However, we noticed a drop in the replication speed compared to the > > other nodes which were still running solr 4. To do a fair comparison, > > I removed another slave node from the cluster and disabled replication > > on it till the new node has caught up with it. When both these nodes > > were at the same index generation, I turned replication on for both > > the nodes. Now, it has been over 15 hours since this exercise and the > > new node has again started lagging behind. Currently, the node with > > solr 5.5 is seven generations behind the other node. > > Version 5 is capable of replication bandwidth throttling, but unless you > actually configure the maxWriteMBPerSec attribute in the replication > handler definition, this should not happen by default. > > One problem that I think might be possible is that the heap has been > left at the default 512MB on the new 5.5.4 install and therefore the > machine is doing constant full garbage collections to free up memory for > normal operation, which would make Solr run EXTREMELY slowly. > Eventually a machine in this state would most likely encounter an > OutOfMemoryError. On non-windows systems, OOME will cause a forced halt > of the entire Solr instance. > > The heap might not be the problem ... if it's not, then I do not know > what is going on. Are there any errors or warnings in solr.log? > > Thanks, > Shawn > > -- Himanshu Sachdeva
Upgrading cluster from 4 to 5. Slow replication detected.
Hi, We're starting to upgrade our solr cluster to version 5.5. So we removed one slave node from the cluster and installed solr 5.5.4 on it and started solr. So it started copying the index from the master. However, we noticed a drop in the replication speed compared to the other nodes which were still running solr 4. To do a fair comparison, I removed another slave node from the cluster and disabled replication on it till the new node has caught up with it. When both these nodes were at the same index generation, I turned replication on for both the nodes. Now, it has been over 15 hours since this exercise and the new node has again started lagging behind. Currently, the node with solr 5.5 is seven generations behind the other node. Is it because the master is running solr 4 and this node is running solr 5? Has anyone else faced similar problem while upgrading? -- Himanshu Sachdeva
Re: Solr Index size keeps fluctuating, becomes ~4x normal size.
Hi Toke, Thanks for your time and quick response. As you said, I changed our logging level from SEVERE to INFO and indeed found the performance warning *Overlapping onDeckSearchers=2* in the logs. I am considering limiting the *maxWarmingSearchers* count in configuration but want to be sure that nothing breaks in production in case simultaneous commits do happen afterwards. What would happen if we set *maxWarmingSearchers* count to 1 and make simultaneous commit from different endpoints? I understand that solr will prevent opening a new searcher for the second commit but is that all there is to it? Does it mean solr will serve stale data( i.e. send stale data to the slaves) ignoring the changes from the second commit? Will these changes reflect only when a new searcher is initialized and will they be ignored till then? Do we even need searchers on the master as we will be querying only the slaves? What purpose do the searchers serve exactly? Your time and guidance will be very much appreciated. Thank you. On Thu, Apr 6, 2017 at 6:12 PM, Toke Eskildsen <t...@kb.dk> wrote: > On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote: > > We monitored the index size for a few days and found that it varies > > widely from 11GB to 43GB. > > Lucene/Solr indexes consists of segments, each holding a number of > documents. When a document is deleted, its bytes are not removed > immediately, only marked. When a document is updated, it is effectively > a delete and an add. > > If you have an index with 3 documents > segment-0 (live docs [0, 1, 2], deleted docs []) > and update document 0 and 1, you will have > segment-0 (live docs [2], deleted docs [0, 1]) > segment-1 (live docs > [0, 1], deleted docs []) > if you then update document 1 again, you will > have > segment-0 (live docs [2], deleted docs [0, 1]) > segment-1 (live > docs [0], deleted docs [1]) > segment-1 (live docs [1], deleted docs []) > > for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents. > > The space is reclaimed when segments are merged, but depending on your > setup and update pattern that may take some time. Furthermore there is a > temporary overhead of merging, when the merged segment is being written and > the old segments are still available. 4x the minimum size is fairly large, > but not unrealistic, with enough index-updates. > > > Recently, we started getting a lot of out of memory errors on the > > master. Everytime, solr becomes unresponsive and we need to restart > > jetty to bring it back up. At the same we observed the variation in > > index size. We are suspecting that these two problems may be linked. > > Quick sanity check: Look for "Overlapping onDeckSearchers" in your > solr.log to see if your memory problems are caused by multiple open > searchers: > https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm > ingSearchers.3DX.22_mean.3F > -- > Toke Eskildsen, Royal Danish Library > -- Himanshu Sachdeva
Solr Index size keeps fluctuating, becomes ~4x normal size.
Hi all, We use solr in our website for product search. Currently, we have 2.1 million documents in the products core and these documents each have around 350 fields. >90% of the fields are indexed. We have this master instance of solr running on 15GB RAM and 200GB drive. We have also configured 10 slaves for handling the reads from website. Slaves poll master at an interval of 20 minutes. We monitored the index size for a few days and found that it varies widely from 11GB to 43GB. Recently, we started getting a lot of out of memory errors on the master. Everytime, solr becomes unresponsive and we need to restart jetty to bring it back up. At the same we observed the variation in index size. We are suspecting that these two problems may be linked. What could be the reason that the index size becomes almost 4x? Why does it vary so much? Any pointers will be appreciated. If you need any more details on the config, please let me know. -- Himanshu Sachdeva