Re: Possible memory leaks with frequent replication
I hadn't looked at the code, am not familiar with Solr code, and can't say what that code does. But I have experienced issues that I _believe_ were caused by too frequent commits causing over-lapping searcher preperation. And I've definitely seen Solr documentation that suggests this is an issue. Let me find it now to see if the experts think these documented suggests are still correct or not: On the other hand, autowarming (populating) a new collection could take a lot of time, especially since it uses only one thread and one CPU. If your settings fire off snapinstaller too frequently, then a Solr slave could be in the undesirable condition of handing-off queries to one (old) collection, and, while warming a new collection, a second “new” one could be snapped and begin warming! If we attempted to solve such a situation, we would have to invalidate the first “new” collection in order to use the second one, then when a “third” new collection would be snapped and warmed, we would have to invalidate the “second” new collection, and so on ad infinitum. A completely warmed collection would never make it to full term before it was aborted. This can be prevented with a properly tuned configuration so new collections do not get installed too rapidly. http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs I think I've seen that same advice on another wiki page without being specifically regarding replication, but just being about commit frequency balanced with auto-warming, leading to overlapping warming, leading to spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error delivered. I can't find it on the wiki, but here's a listserv post with someone reporting findings that match my understanding: http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html How does this advice square with the code Lance found? Is my understanding of how frequent commits can interact with time it takes to warm a new collection correct? Appreciate any additional info. Lance Norskog wrote: Isn't that what this code does? onDeckSearchers++; if (onDeckSearchers 1) { // should never happen... just a sanity check log.error(logid+ERROR!!! onDeckSearchers is + onDeckSearchers); onDeckSearchers=1; // reset } else if (onDeckSearchers maxWarmingSearchers) { onDeckSearchers--; String msg=Error opening new searcher. exceeded limit of maxWarmingSearchers=+maxWarmingSearchers + , try again later.; log.warn(logid++ msg); // HTTP 503==service unavailable, or 409==Conflict throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); } else if (onDeckSearchers 1) { log.info(logid+PERFORMANCE WARNING: Overlapping onDeckSearchers= + onDeckSearchers); } On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote: It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io. Lance Norskog wrote: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than
Re: Possible memory leaks with frequent replication
Ah, but reading Peter's email message I reference more carefully, it seems that Solr already DOES provide an info-level log warning you about over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw or HTTP error in that condition, based on my and others experience). To check if your Solr environment is suffering from this, turn on INFO level logging, and look for: 'PERFORMANCE WARNING: Overlapping onDeckSearchers=x'. Sweet, good to know, and I'll definitely add this to my debugging toolbox. Peter's listserv message really ought to be a wiki page, I think. Any reason for me not to just add it as a new one with title Commit frequency and auto-warming or something like that? Unless it's already in the wiki somewhere I haven't found, assuming the wiki will let an ordinary user-created account add a new page. // Jonathan Rochkind wrote: I hadn't looked at the code, am not familiar with Solr code, and can't say what that code does. But I have experienced issues that I _believe_ were caused by too frequent commits causing over-lapping searcher preperation. And I've definitely seen Solr documentation that suggests this is an issue. Let me find it now to see if the experts think these documented suggests are still correct or not: On the other hand, autowarming (populating) a new collection could take a lot of time, especially since it uses only one thread and one CPU. If your settings fire off snapinstaller too frequently, then a Solr slave could be in the undesirable condition of handing-off queries to one (old) collection, and, while warming a new collection, a second “new” one could be snapped and begin warming! If we attempted to solve such a situation, we would have to invalidate the first “new” collection in order to use the second one, then when a “third” new collection would be snapped and warmed, we would have to invalidate the “second” new collection, and so on ad infinitum. A completely warmed collection would never make it to full term before it was aborted. This can be prevented with a properly tuned configuration so new collections do not get installed too rapidly. http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs I think I've seen that same advice on another wiki page without being specifically regarding replication, but just being about commit frequency balanced with auto-warming, leading to overlapping warming, leading to spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error delivered. I can't find it on the wiki, but here's a listserv post with someone reporting findings that match my understanding: http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html How does this advice square with the code Lance found? Is my understanding of how frequent commits can interact with time it takes to warm a new collection correct? Appreciate any additional info. Lance Norskog wrote: Isn't that what this code does? onDeckSearchers++; if (onDeckSearchers 1) { // should never happen... just a sanity check log.error(logid+ERROR!!! onDeckSearchers is + onDeckSearchers); onDeckSearchers=1; // reset } else if (onDeckSearchers maxWarmingSearchers) { onDeckSearchers--; String msg=Error opening new searcher. exceeded limit of maxWarmingSearchers=+maxWarmingSearchers + , try again later.; log.warn(logid++ msg); // HTTP 503==service unavailable, or 409==Conflict throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); } else if (onDeckSearchers 1) { log.info(logid+PERFORMANCE WARNING: Overlapping onDeckSearchers= + onDeckSearchers); } On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote: It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen
Re: Possible memory leaks with frequent replication
Do you use EmbeddedSolr in the query server? There is a memory leak that shows up when taking a lot of replications. On Wed, Nov 3, 2010 at 8:28 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Ah, but reading Peter's email message I reference more carefully, it seems that Solr already DOES provide an info-level log warning you about over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw or HTTP error in that condition, based on my and others experience). To check if your Solr environment is suffering from this, turn on INFO level logging, and look for: 'PERFORMANCE WARNING: Overlapping onDeckSearchers=x'. Sweet, good to know, and I'll definitely add this to my debugging toolbox. Peter's listserv message really ought to be a wiki page, I think. Any reason for me not to just add it as a new one with title Commit frequency and auto-warming or something like that? Unless it's already in the wiki somewhere I haven't found, assuming the wiki will let an ordinary user-created account add a new page. // Jonathan Rochkind wrote: I hadn't looked at the code, am not familiar with Solr code, and can't say what that code does. But I have experienced issues that I _believe_ were caused by too frequent commits causing over-lapping searcher preperation. And I've definitely seen Solr documentation that suggests this is an issue. Let me find it now to see if the experts think these documented suggests are still correct or not: On the other hand, autowarming (populating) a new collection could take a lot of time, especially since it uses only one thread and one CPU. If your settings fire off snapinstaller too frequently, then a Solr slave could be in the undesirable condition of handing-off queries to one (old) collection, and, while warming a new collection, a second “new” one could be snapped and begin warming! If we attempted to solve such a situation, we would have to invalidate the first “new” collection in order to use the second one, then when a “third” new collection would be snapped and warmed, we would have to invalidate the “second” new collection, and so on ad infinitum. A completely warmed collection would never make it to full term before it was aborted. This can be prevented with a properly tuned configuration so new collections do not get installed too rapidly. http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs I think I've seen that same advice on another wiki page without being specifically regarding replication, but just being about commit frequency balanced with auto-warming, leading to overlapping warming, leading to spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error delivered. I can't find it on the wiki, but here's a listserv post with someone reporting findings that match my understanding: http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html How does this advice square with the code Lance found? Is my understanding of how frequent commits can interact with time it takes to warm a new collection correct? Appreciate any additional info. Lance Norskog wrote: Isn't that what this code does? onDeckSearchers++; if (onDeckSearchers 1) { // should never happen... just a sanity check log.error(logid+ERROR!!! onDeckSearchers is + onDeckSearchers); onDeckSearchers=1; // reset } else if (onDeckSearchers maxWarmingSearchers) { onDeckSearchers--; String msg=Error opening new searcher. exceeded limit of maxWarmingSearchers=+maxWarmingSearchers + , try again later.; log.warn(logid++ msg); // HTTP 503==service unavailable, or 409==Conflict throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); } else if (onDeckSearchers 1) { log.info(logid+PERFORMANCE WARNING: Overlapping onDeckSearchers= + onDeckSearchers); } On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote: It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you
Re: Possible memory leaks with frequent replication
On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. That's our current solution - I was just wondering if there was anything I was missing. Thanks!
Re: Possible memory leaks with frequent replication
On Tue, Nov 2, 2010 at 12:32 PM, Simon Wistow si...@thegestalt.org wrote: On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. That's our current solution - I was just wondering if there was anything I was missing. You could also try dialing down maxWarmingSearchers to 1 - that should prevent multiple searchers warming at the same time and may be the source of you running out of memory. -Yonik http://www.lucidimagination.com
Re: Possible memory leaks with frequent replication
It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io. Lance Norskog wrote: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon
Re: Possible memory leaks with frequent replication
Isn't that what this code does? onDeckSearchers++; if (onDeckSearchers 1) { // should never happen... just a sanity check log.error(logid+ERROR!!! onDeckSearchers is + onDeckSearchers); onDeckSearchers=1; // reset } else if (onDeckSearchers maxWarmingSearchers) { onDeckSearchers--; String msg=Error opening new searcher. exceeded limit of maxWarmingSearchers=+maxWarmingSearchers + , try again later.; log.warn(logid++ msg); // HTTP 503==service unavailable, or 409==Conflict throw new SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); } else if (onDeckSearchers 1) { log.info(logid+PERFORMANCE WARNING: Overlapping onDeckSearchers= + onDeckSearchers); } On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind rochk...@jhu.edu wrote: It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen. It's in some documentation somewhere I saw, for sure. The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes. But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue. The only answer I know of is Don't commit (or replicate) at a faster rate than it takes your warming to complete. You can reduce your warming queries/operations, or reduce your commit/replicate frequency. Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) Overlapping warming queries, you're committing too fast or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io. Lance Norskog wrote: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon -- Lance Norskog goks...@gmail.com
Possible memory leaks with frequent replication
We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon
Re: Possible memory leaks with frequent replication
You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow si...@thegestalt.org wrote: We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: Java heap space (our monitoring seems to confirm this). Looking around my suspicion is that it takes new Readers longer to warm than the gap between replication and thus they just build up until all memory is consumed (which, I suppose isn't really memory 'leaking' per se, more just resource consumption) That said, we've tried turning off caching on the slave and that didn't help either so it's possible I'm wrong. Is there anything we can do about this? I'm reluctant to increase the heap space since I suspect that will mean that there's just a longer period between failures. Might Zoie help here? Or should we just query against the Master? Thanks, Simon -- Lance Norskog goks...@gmail.com