Re: replication, disk space
Hey Jonathan, Any update? We are experiencing the same thing you describe. As days go on these index directories continue to collect. We have deleted timestamped indices that are not currently in-use, but I've been nervous to remove the one simply called 'index'. Did you end up doing that successfully? Some days, instead of getting additional directories the current index doubles in size. It looks like the files are getting mv'ed into it during replication, but they have different filenames and so don't overwrite the files that were already in there. Have you seen this at all? Some observations: - We wiped the slave index and triggered a fresh replication. The problem was better but not solved for about a week (only had 2 full-size indices, instead of getting a new one every day). The problem came back in force after the master index was deleted and recreated. - We also have memory issues on both our master and slave machines right now; we're in the process of moving over to 64-bit servers to alleviate that problem. - We are also running red hat (6) and solr 1.4. Best, Anna On Thu, Jan 19, 2012 at 13:25, Dyer, James james.d...@ingrambook.comwrote: You can do all the steps to rename the timestamp dir back to index, but I don't think you don't have to. Solr will know on restart to use the timestamped directory so long as it is in the properties file (sorry, I must have told you to look at the wrong file...I'm working on old memories here.) You might want to test this in your dev enviornment but I think its going to work. The only thing is if it really bothers you that the index isn't being stored in index... The reason why you get into this situation with the timestamped directory is explained here: http://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Thursday, January 19, 2012 11:43 AM To: solr-user@lucene.apache.org Cc: Dyer, James Subject: Re: replication, disk space Okay, I do have an index.properties file too, and THAT one does contain the name of an index directory. But it's got the name of the timestamped index directory! Not sure how that happened, could have been Solr trying to recover from running out of disk space in the middle of a replication? I certainly never did that intentionally. But okay, if someone can confirm if this plan makes sense to restore things without downtime: 1. rm the 'index' directory, which seems to be an old copy of the index at this point 2. 'mv index.20120113121302 index' 3. Manually edit index.properties to have index=index, not index=index.20120113121302 4. Send reload core command. Does this make sense? (I just experimentally tried an reload core command, and even though it's not supposed to, it DID result in about 20 seconds of unresponsiveness from my solr server, not sure why, could just be lack of CPU or RAM on the server to do what's being asked of it. But if that's the best I can do, 20 minutes of unavailability, I'll take it). On 1/19/2012 12:37 PM, Jonathan Rochkind wrote: Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm still not sure how to properly recover from this situation withotu downtime. It _looks_ to me like the timestamped directory is actually the live/recent one. It's files have a more recent timestamp, and it's the one that /admin/replication.jsp mentions. replication.properties: #Replication details #Wed Jan 18 10:58:25 EST 2012 confFilesReplicated=[solrconfig.xml, schema.xml] timesIndexReplicated=350 lastCycleBytesDownloaded=6524299012 replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902 replicationFailedAt=1326902305288 timesConfigReplicated=1 indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294 confFilesReplicatedAt=1316547200637 previousCycleTimeInSeconds=295 timesFailed=54 indexReplicatedAt=1326902305288 ~ On 1/18/2012 1:41 PM, Dyer, James wrote: I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started
Re: replication, disk space
Thanks for the response. I am using Linux (RedHat). It sounds like it may possibly be related to that bug. But the thing is, the timestamped index directory is looking to me like it's the _current_ one, with the non-timestamped one being an old out of date one. So that does not seem to be quite the same thing reported in that bug, although it may very well be related. At this point, I'm just trying to figure out how to clean up. How to verify which of those copies really is the current one, which is currently being used by Solr -- and if it's the timestamped one, how to restore things to the state where there's only one non-timestamped index dir, ideally without downtime to Solr. Anyone have any advice or ideas on those questions? On 1/18/2012 1:23 PM, Artem Lokotosh wrote: Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkindrochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing to restart the tomcat container, then I could delete one, rename the other, etc. But I don't want downtime. I really don't understand what's going on or how it got in this state. Any ideas? Jonathan
Re: replication, disk space
Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm still not sure how to properly recover from this situation withotu downtime. It _looks_ to me like the timestamped directory is actually the live/recent one. It's files have a more recent timestamp, and it's the one that /admin/replication.jsp mentions. replication.properties: #Replication details #Wed Jan 18 10:58:25 EST 2012 confFilesReplicated=[solrconfig.xml, schema.xml] timesIndexReplicated=350 lastCycleBytesDownloaded=6524299012 replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902 replicationFailedAt=1326902305288 timesConfigReplicated=1 indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294 confFilesReplicatedAt=1316547200637 previousCycleTimeInSeconds=295 timesFailed=54 indexReplicatedAt=1326902305288 ~ On 1/18/2012 1:41 PM, Dyer, James wrote: I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started, both index copies to exist until the swap is complete. I remember having the same concern about re-starts, but I believe I tested this and solr will look at the replication.properties file on startup and determine the correct index dir to use from that. So (If my memory is correct) you can safely delete index so long as replication.properties points to the other directory. I wasn't familiar with SOLR-1781. Maybe replication is supposed to clean up the extra directories and doesn't sometimes? In any case, I've found whenever it happens its ok to go out and delete the one(s) not being used, even if that means deleting index. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Lokotosh [mailto:arco...@gmail.com] Sent: Wednesday, January 18, 2012 12:24 PM To: solr-user@lucene.apache.org Subject: Re: replication, disk space Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkindrochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing to restart the tomcat container, then I could delete one, rename the other, etc. But I don't want downtime. I really don't understand what's going on or how it got in this state. Any ideas? Jonathan
Re: replication, disk space
On 1/18/2012 1:53 PM, Tomás Fernández Löbbe wrote: As far as I know, the replication is supposed to delete the old directory index. However, the initial question is why is this new index directory being created. Are you adding/updating documents in the slave? what about optimizing it? Are you rebuilding the index from scratch in the master? Thanks for the response. Not adding/updating in slave. Not optimizing in slave. YES sometimes rebuilding index from scratch in master. I am on Linux, RedHat 5. This server has also been occasionally been having out-of-disk problems, which caused some replications to fail, an aborted replication could also possibly account for the extra index directory, perhaps? (It now has enough disk space to avoid that problem). At this point, my main concern is getting things back in an expected stable state at this point, eliminating the extra index dir, ideally without downtime.
Re: replication, disk space
Okay, I do have an index.properties file too, and THAT one does contain the name of an index directory. But it's got the name of the timestamped index directory! Not sure how that happened, could have been Solr trying to recover from running out of disk space in the middle of a replication? I certainly never did that intentionally. But okay, if someone can confirm if this plan makes sense to restore things without downtime: 1. rm the 'index' directory, which seems to be an old copy of the index at this point 2. 'mv index.20120113121302 index' 3. Manually edit index.properties to have index=index, not index=index.20120113121302 4. Send reload core command. Does this make sense? (I just experimentally tried an reload core command, and even though it's not supposed to, it DID result in about 20 seconds of unresponsiveness from my solr server, not sure why, could just be lack of CPU or RAM on the server to do what's being asked of it. But if that's the best I can do, 20 minutes of unavailability, I'll take it). On 1/19/2012 12:37 PM, Jonathan Rochkind wrote: Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm still not sure how to properly recover from this situation withotu downtime. It _looks_ to me like the timestamped directory is actually the live/recent one. It's files have a more recent timestamp, and it's the one that /admin/replication.jsp mentions. replication.properties: #Replication details #Wed Jan 18 10:58:25 EST 2012 confFilesReplicated=[solrconfig.xml, schema.xml] timesIndexReplicated=350 lastCycleBytesDownloaded=6524299012 replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902 replicationFailedAt=1326902305288 timesConfigReplicated=1 indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294 confFilesReplicatedAt=1316547200637 previousCycleTimeInSeconds=295 timesFailed=54 indexReplicatedAt=1326902305288 ~ On 1/18/2012 1:41 PM, Dyer, James wrote: I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started, both index copies to exist until the swap is complete. I remember having the same concern about re-starts, but I believe I tested this and solr will look at the replication.properties file on startup and determine the correct index dir to use from that. So (If my memory is correct) you can safely delete index so long as replication.properties points to the other directory. I wasn't familiar with SOLR-1781. Maybe replication is supposed to clean up the extra directories and doesn't sometimes? In any case, I've found whenever it happens its ok to go out and delete the one(s) not being used, even if that means deleting index. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Lokotosh [mailto:arco...@gmail.com] Sent: Wednesday, January 18, 2012 12:24 PM To: solr-user@lucene.apache.org Subject: Re: replication, disk space Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkindrochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing
RE: replication, disk space
You can do all the steps to rename the timestamp dir back to index, but I don't think you don't have to. Solr will know on restart to use the timestamped directory so long as it is in the properties file (sorry, I must have told you to look at the wrong file...I'm working on old memories here.) You might want to test this in your dev enviornment but I think its going to work. The only thing is if it really bothers you that the index isn't being stored in index... The reason why you get into this situation with the timestamped directory is explained here: http://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Thursday, January 19, 2012 11:43 AM To: solr-user@lucene.apache.org Cc: Dyer, James Subject: Re: replication, disk space Okay, I do have an index.properties file too, and THAT one does contain the name of an index directory. But it's got the name of the timestamped index directory! Not sure how that happened, could have been Solr trying to recover from running out of disk space in the middle of a replication? I certainly never did that intentionally. But okay, if someone can confirm if this plan makes sense to restore things without downtime: 1. rm the 'index' directory, which seems to be an old copy of the index at this point 2. 'mv index.20120113121302 index' 3. Manually edit index.properties to have index=index, not index=index.20120113121302 4. Send reload core command. Does this make sense? (I just experimentally tried an reload core command, and even though it's not supposed to, it DID result in about 20 seconds of unresponsiveness from my solr server, not sure why, could just be lack of CPU or RAM on the server to do what's being asked of it. But if that's the best I can do, 20 minutes of unavailability, I'll take it). On 1/19/2012 12:37 PM, Jonathan Rochkind wrote: Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm still not sure how to properly recover from this situation withotu downtime. It _looks_ to me like the timestamped directory is actually the live/recent one. It's files have a more recent timestamp, and it's the one that /admin/replication.jsp mentions. replication.properties: #Replication details #Wed Jan 18 10:58:25 EST 2012 confFilesReplicated=[solrconfig.xml, schema.xml] timesIndexReplicated=350 lastCycleBytesDownloaded=6524299012 replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902 replicationFailedAt=1326902305288 timesConfigReplicated=1 indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294 confFilesReplicatedAt=1316547200637 previousCycleTimeInSeconds=295 timesFailed=54 indexReplicatedAt=1326902305288 ~ On 1/18/2012 1:41 PM, Dyer, James wrote: I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started, both index copies to exist until the swap is complete. I remember having the same concern about re-starts, but I believe I tested this and solr will look at the replication.properties file on startup and determine the correct index dir to use from that. So (If my memory is correct) you can safely delete index so long as replication.properties points to the other directory. I wasn't familiar with SOLR-1781. Maybe replication is supposed to clean up the extra directories and doesn't sometimes? In any case, I've found whenever it happens its ok to go out and delete the one(s) not being used, even if that means deleting index. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Lokotosh [mailto:arco...@gmail.com] Sent: Wednesday, January 18, 2012 12:24 PM To: solr-user@lucene.apache.org Subject: Re: replication, disk space Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkindrochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly
Re: replication, disk space
Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing to restart the tomcat container, then I could delete one, rename the other, etc. But I don't want downtime. I really don't understand what's going on or how it got in this state. Any ideas? Jonathan -- Best regards, Artem Lokotosh mailto:arco...@gmail.com
RE: replication, disk space
I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started, both index copies to exist until the swap is complete. I remember having the same concern about re-starts, but I believe I tested this and solr will look at the replication.properties file on startup and determine the correct index dir to use from that. So (If my memory is correct) you can safely delete index so long as replication.properties points to the other directory. I wasn't familiar with SOLR-1781. Maybe replication is supposed to clean up the extra directories and doesn't sometimes? In any case, I've found whenever it happens its ok to go out and delete the one(s) not being used, even if that means deleting index. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Lokotosh [mailto:arco...@gmail.com] Sent: Wednesday, January 18, 2012 12:24 PM To: solr-user@lucene.apache.org Subject: Re: replication, disk space Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing to restart the tomcat container, then I could delete one, rename the other, etc. But I don't want downtime. I really don't understand what's going on or how it got in this state. Any ideas? Jonathan -- Best regards, Artem Lokotosh mailto:arco...@gmail.com
Re: replication, disk space
As far as I know, the replication is supposed to delete the old directory index. However, the initial question is why is this new index directory being created. Are you adding/updating documents in the slave? what about optimizing it? Are you rebuilding the index from scratch in the master? Also, What OS are you on? Tomás On Wed, Jan 18, 2012 at 3:41 PM, Dyer, James james.d...@ingrambook.comwrote: I've seen this happen when the configuration files change on the master and replication deems it necessary to do a core-reload on the slave. In this case, replication copies the entire index to the new directory then does a core re-load to make the new config files and new index directory go live. Because it is keeping the old searcher running while the new searcher is being started, both index copies to exist until the swap is complete. I remember having the same concern about re-starts, but I believe I tested this and solr will look at the replication.properties file on startup and determine the correct index dir to use from that. So (If my memory is correct) you can safely delete index so long as replication.properties points to the other directory. I wasn't familiar with SOLR-1781. Maybe replication is supposed to clean up the extra directories and doesn't sometimes? In any case, I've found whenever it happens its ok to go out and delete the one(s) not being used, even if that means deleting index. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Lokotosh [mailto:arco...@gmail.com] Sent: Wednesday, January 18, 2012 12:24 PM To: solr-user@lucene.apache.org Subject: Re: replication, disk space Which OS do you using? Maybe related to this Solr bug https://issues.apache.org/jira/browse/SOLR-1781 On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory. The /admin/replication/index.jsp admin page reports: Local Index Index Version: 1326407139862, Generation: 183 Location: /opt/solr/solr_searcher/prod/data/index.20120113121302 So does this mean the index. file is actually the one currently being used live, not the straight 'index'? Why? I can't afford the disk space to leave both of these around indefinitely. After replication completes and is committed, why would two index dirs be left? And how can I restore this to one index dir, without downtime? If it's really using the index.X directory, then I could just delete the index directory, but that's a bad idea, because next time the server starts it's going to be looking for index, not index.. And if it's using the timestamped index file now, I can't delete THAT one now either. If I was willing to restart the tomcat container, then I could delete one, rename the other, etc. But I don't want downtime. I really don't understand what's going on or how it got in this state. Any ideas? Jonathan -- Best regards, Artem Lokotoshmailto:arco...@gmail.com