Re: Move index directory to another partition
Thanks all for your commits. I followed Shawn steps (rsync) cause everything on that volume (ZooKeeper, Solr home and data) and everything went great. Thanks again, Mahmoud On Sun, Aug 6, 2017 at 12:47 AM, Erick Ericksonwrote: > bq: I was envisioning a scenario where the entire solr home is on the old > volume that's going away. If I were setting up a Solr install where the > large/fast storage was a separate filesystem, I would put the solr home > (or possibly even the entire install) under that mount point. It would > be a lot easier than setting dataDir in core.properties for every core, > especially in a cloud install. > > Agreed. Nothing in what I said precludes this. If you don't specify > dataDir, > then the index for a new replica goes in the default place, i.e. under > your install > directory usually. In your case under your new mount point. I usually don't > recommend trying to take control of where dataDir points, just let it > default. > I only mentioned it so you'd be aware it exists. So if your new install > is associated with a bigger/better/larger EBS it's all automatic. > > bq: If the dataDir property is already in use to relocate index data, then > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > expect most SolrCloud users to use that method. > > I really don't understand this. Each Solr replica has an associated > dataDir whether you specified it or not (the default is relative to > the core.properties file). ADDREPLICA creates a new replica in a new > place, initially the data directory and index are empty. The new > replica goes into recovery and uses the standard replication process > to copy the index via HTTP from a healthy replica and write it to its > data directory. Once that's done, the replica becomes live. There's > nothing about dataDir already being in use here at all. > > When you start Solr there's the default place Solr expects to find the > replicas. This is not necessarily where Solr is executing from, see > the "-s" option in bin/solr start -s. > > If you're talking about using dataDir to point to an existing index, > yes that would be a problem and not something I meant to imply at all. > > Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's > commonly used to more replicas around a cluster. > > Best, > Erick > > On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey wrote: > > On 8/2/2017 9:17 AM, Erick Erickson wrote: > >> Not entirely sure about AWS intricacies, but getting a new replica to > >> use a particular index directory in the general case is just > >> specifying dataDir=some_directory on the ADDREPLICA command. The index > >> just needs an HTTP connection (uses the old replication process) so > >> nothing huge there. Then DELETEREPLICA for the old one. There's > >> nothing that ZK has to know about to make this work, it's all local to > >> the Solr instance. > > > > I was envisioning a scenario where the entire solr home is on the old > > volume that's going away. If I were setting up a Solr install where the > > large/fast storage was a separate filesystem, I would put the solr home > > (or possibly even the entire install) under that mount point. It would > > be a lot easier than setting dataDir in core.properties for every core, > > especially in a cloud install. > > > > If the dataDir property is already in use to relocate index data, then > > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > > expect most SolrCloud users to use that method. > > > > Thanks, > > Shawn > > >
Re: Move index directory to another partition
bq: I was envisioning a scenario where the entire solr home is on the old volume that's going away. If I were setting up a Solr install where the large/fast storage was a separate filesystem, I would put the solr home (or possibly even the entire install) under that mount point. It would be a lot easier than setting dataDir in core.properties for every core, especially in a cloud install. Agreed. Nothing in what I said precludes this. If you don't specify dataDir, then the index for a new replica goes in the default place, i.e. under your install directory usually. In your case under your new mount point. I usually don't recommend trying to take control of where dataDir points, just let it default. I only mentioned it so you'd be aware it exists. So if your new install is associated with a bigger/better/larger EBS it's all automatic. bq: If the dataDir property is already in use to relocate index data, then ADDREPLICA and DELETEREPLICA would be a great way to go. I would not expect most SolrCloud users to use that method. I really don't understand this. Each Solr replica has an associated dataDir whether you specified it or not (the default is relative to the core.properties file). ADDREPLICA creates a new replica in a new place, initially the data directory and index are empty. The new replica goes into recovery and uses the standard replication process to copy the index via HTTP from a healthy replica and write it to its data directory. Once that's done, the replica becomes live. There's nothing about dataDir already being in use here at all. When you start Solr there's the default place Solr expects to find the replicas. This is not necessarily where Solr is executing from, see the "-s" option in bin/solr start -s. If you're talking about using dataDir to point to an existing index, yes that would be a problem and not something I meant to imply at all. Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's commonly used to more replicas around a cluster. Best, Erick On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heiseywrote: > On 8/2/2017 9:17 AM, Erick Erickson wrote: >> Not entirely sure about AWS intricacies, but getting a new replica to >> use a particular index directory in the general case is just >> specifying dataDir=some_directory on the ADDREPLICA command. The index >> just needs an HTTP connection (uses the old replication process) so >> nothing huge there. Then DELETEREPLICA for the old one. There's >> nothing that ZK has to know about to make this work, it's all local to >> the Solr instance. > > I was envisioning a scenario where the entire solr home is on the old > volume that's going away. If I were setting up a Solr install where the > large/fast storage was a separate filesystem, I would put the solr home > (or possibly even the entire install) under that mount point. It would > be a lot easier than setting dataDir in core.properties for every core, > especially in a cloud install. > > If the dataDir property is already in use to relocate index data, then > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > expect most SolrCloud users to use that method. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/2/2017 9:17 AM, Erick Erickson wrote: > Not entirely sure about AWS intricacies, but getting a new replica to > use a particular index directory in the general case is just > specifying dataDir=some_directory on the ADDREPLICA command. The index > just needs an HTTP connection (uses the old replication process) so > nothing huge there. Then DELETEREPLICA for the old one. There's > nothing that ZK has to know about to make this work, it's all local to > the Solr instance. I was envisioning a scenario where the entire solr home is on the old volume that's going away. If I were setting up a Solr install where the large/fast storage was a separate filesystem, I would put the solr home (or possibly even the entire install) under that mount point. It would be a lot easier than setting dataDir in core.properties for every core, especially in a cloud install. If the dataDir property is already in use to relocate index data, then ADDREPLICA and DELETEREPLICA would be a great way to go. I would not expect most SolrCloud users to use that method. Thanks, Shawn
Re: Move index directory to another partition
Shawn: Not entirely sure about AWS intricacies, but getting a new replica to use a particular index directory in the general case is just specifying dataDir=some_directory on the ADDREPLICA command. The index just needs an HTTP connection (uses the old replication process) so nothing huge there. Then DELETEREPLICA for the old one. There's nothing that ZK has to know about to make this work, it's all local to the Solr instance. Or I'm completely out in the weeds. Best, Erick On Tue, Aug 1, 2017 at 7:52 PM, Davewrote: > To add to this, not sure of solr cloud uses it, but you're going to want to > destroy the wrote.lock file as well > >> On Aug 1, 2017, at 9:31 PM, Shawn Heisey wrote: >> >>> On 8/1/2017 7:09 PM, Erick Erickson wrote: >>> WARNING: what I currently understand about the limitations of AWS >>> could fill volumes so I might be completely out to lunch. >>> >>> If you ADDREPLICA with the new replica's data residing on the new EBS >>> volume, then wait for it to sync (which it'll do all by itself) then >>> DELETEREPLICA on the original you'll be all set. >>> >>> In recent Solr's, theres also the MOVENODE collections API call. >> >> I did consider mentioning that as a possible way forward, but I hate to >> rely on special configurations with core.properties, particularly if the >> newly built replica core instanceDirs aren't in the solr home (or >> coreRootDirectory) at all. I didn't want to try and explain the precise >> steps required to get that plan to work. I would expect to need some >> arcane Collections API work or manual ZK modification to reach a correct >> state -- steps that would be prone to error. >> >> The idea I mentioned seemed to me to be the way forward that would >> require the least specialized knowledge. Here's a simplified stating of >> the steps: >> >> * Mount the new volume somewhere. >> * Use multiple rsync passes to get the data copied. >> * Stop Solr. >> * Do a final rsync pass. >> * Unmount the original volume. >> * Remount the new volume in the original location. >> * Start Solr. >> >> Thanks, >> Shawn >>
Re: Move index directory to another partition
To add to this, not sure of solr cloud uses it, but you're going to want to destroy the wrote.lock file as well > On Aug 1, 2017, at 9:31 PM, Shawn Heiseywrote: > >> On 8/1/2017 7:09 PM, Erick Erickson wrote: >> WARNING: what I currently understand about the limitations of AWS >> could fill volumes so I might be completely out to lunch. >> >> If you ADDREPLICA with the new replica's data residing on the new EBS >> volume, then wait for it to sync (which it'll do all by itself) then >> DELETEREPLICA on the original you'll be all set. >> >> In recent Solr's, theres also the MOVENODE collections API call. > > I did consider mentioning that as a possible way forward, but I hate to > rely on special configurations with core.properties, particularly if the > newly built replica core instanceDirs aren't in the solr home (or > coreRootDirectory) at all. I didn't want to try and explain the precise > steps required to get that plan to work. I would expect to need some > arcane Collections API work or manual ZK modification to reach a correct > state -- steps that would be prone to error. > > The idea I mentioned seemed to me to be the way forward that would > require the least specialized knowledge. Here's a simplified stating of > the steps: > > * Mount the new volume somewhere. > * Use multiple rsync passes to get the data copied. > * Stop Solr. > * Do a final rsync pass. > * Unmount the original volume. > * Remount the new volume in the original location. > * Start Solr. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/1/2017 7:09 PM, Erick Erickson wrote: > WARNING: what I currently understand about the limitations of AWS > could fill volumes so I might be completely out to lunch. > > If you ADDREPLICA with the new replica's data residing on the new EBS > volume, then wait for it to sync (which it'll do all by itself) then > DELETEREPLICA on the original you'll be all set. > > In recent Solr's, theres also the MOVENODE collections API call. I did consider mentioning that as a possible way forward, but I hate to rely on special configurations with core.properties, particularly if the newly built replica core instanceDirs aren't in the solr home (or coreRootDirectory) at all. I didn't want to try and explain the precise steps required to get that plan to work. I would expect to need some arcane Collections API work or manual ZK modification to reach a correct state -- steps that would be prone to error. The idea I mentioned seemed to me to be the way forward that would require the least specialized knowledge. Here's a simplified stating of the steps: * Mount the new volume somewhere. * Use multiple rsync passes to get the data copied. * Stop Solr. * Do a final rsync pass. * Unmount the original volume. * Remount the new volume in the original location. * Start Solr. Thanks, Shawn
Re: Move index directory to another partition
WARNING: what I currently understand about the limitations of AWS could fill volumes so I might be completely out to lunch. If you ADDREPLICA with the new replica's data residing on the new EBS volume, then wait for it to sync (which it'll do all by itself) then DELETEREPLICA on the original you'll be all set. In recent Solr's, theres also the MOVENODE collections API call. Best, Erick On Tue, Aug 1, 2017 at 6:03 PM, Shawn Heiseywrote: > On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote: >> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one >> replication factor but I think the downtime will be less than five minutes >> after following your steps. >> >> But how can I start Solr backup or why should I run it although I copied >> the index and changed theo path? >> >> And what do you mean with "Using multiple passes with rsync"? > > The first time you copy the data, which you could do with cp if you > want, the time required will be limited by the size of the data and the > speed of the disks. Depending on the size, it could take several hours > like you estimated. I would suggest using rsync for the first copy just > because you're going to need the same command again for the later passes. > > Doing a second pass with rsync should go very quickly. How fast would > depend on the rate that the index data is changing. You might need to > do this step more than once just so that it gets faster each time, in > preparation for the final pass. > > A final pass with rsync might only take a few seconds, and if Solr is > stopped before that final copy is started, then there's no way the index > data can change. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote: > I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one > replication factor but I think the downtime will be less than five minutes > after following your steps. > > But how can I start Solr backup or why should I run it although I copied > the index and changed theo path? > > And what do you mean with "Using multiple passes with rsync"? The first time you copy the data, which you could do with cp if you want, the time required will be limited by the size of the data and the speed of the disks. Depending on the size, it could take several hours like you estimated. I would suggest using rsync for the first copy just because you're going to need the same command again for the later passes. Doing a second pass with rsync should go very quickly. How fast would depend on the rate that the index data is changing. You might need to do this step more than once just so that it gets faster each time, in preparation for the final pass. A final pass with rsync might only take a few seconds, and if Solr is stopped before that final copy is started, then there's no way the index data can change. Thanks, Shawn
Re: Move index directory to another partition
Thanks Shawn, I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one replication factor but I think the downtime will be less than five minutes after following your steps. But how can I start Solr backup or why should I run it although I copied the index and changed theo path? And what do you mean with "Using multiple passes with rsync"? Thanks, Mahmoud On Tuesday, August 1, 2017, Shawn Heiseywrote: > On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: > > I've a SolrCloud of four instances on Amazon and the EBS volumes that > > contain the data on everynode is going to be full, unfortunately Amazon > > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to > > move the index to. > > > > I can stop the updates on the index, but I'm afraid to use "cp" command > to > > copy the files that are "on merge" operation. > > > > The copy operation may take several hours. > > > > How can I move the data directory without stopping the instance? > > Use rsync to do the copy. Do an initial copy while Solr is running, > then do a second copy, which should be pretty fast because rsync will > see the data from the first copy. Then shut Solr down and do a third > rsync which will only copy a VERY small changeset. Reconfigure Solr > and/or the OS to use the new location, and start Solr back up. Because > you mentioned "cp" I am assuming that you're NOT on Windows, and that > the OS will most likely allow you to do anything you need with index > files while Solr has them open. > > If you have set up your replicas with SolrCloud properly, then your > collections will not go offline when one Solr instance is shut down, and > that instance will be brought back into sync with the rest of the > cluster when it starts back up. Using multiple passes with rsync should > mean that Solr will not need to be shutdown for very long. > > The options I typically use for this kind of copy with rsync are "-avH > --delete". I would recommend that you research rsync options so that > you fully understand what I have suggested. > > Thanks, > Shawn > >
Re: Move index directory to another partition
Way back in the 1.x days, replication was done with shell scripts and rsync, right? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 1, 2017, at 2:45 PM, Shawn Heiseywrote: > > On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: >> I've a SolrCloud of four instances on Amazon and the EBS volumes that >> contain the data on everynode is going to be full, unfortunately Amazon >> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to >> move the index to. >> >> I can stop the updates on the index, but I'm afraid to use "cp" command to >> copy the files that are "on merge" operation. >> >> The copy operation may take several hours. >> >> How can I move the data directory without stopping the instance? > > Use rsync to do the copy. Do an initial copy while Solr is running, > then do a second copy, which should be pretty fast because rsync will > see the data from the first copy. Then shut Solr down and do a third > rsync which will only copy a VERY small changeset. Reconfigure Solr > and/or the OS to use the new location, and start Solr back up. Because > you mentioned "cp" I am assuming that you're NOT on Windows, and that > the OS will most likely allow you to do anything you need with index > files while Solr has them open. > > If you have set up your replicas with SolrCloud properly, then your > collections will not go offline when one Solr instance is shut down, and > that instance will be brought back into sync with the rest of the > cluster when it starts back up. Using multiple passes with rsync should > mean that Solr will not need to be shutdown for very long. > > The options I typically use for this kind of copy with rsync are "-avH > --delete". I would recommend that you research rsync options so that > you fully understand what I have suggested. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: > I've a SolrCloud of four instances on Amazon and the EBS volumes that > contain the data on everynode is going to be full, unfortunately Amazon > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to > move the index to. > > I can stop the updates on the index, but I'm afraid to use "cp" command to > copy the files that are "on merge" operation. > > The copy operation may take several hours. > > How can I move the data directory without stopping the instance? Use rsync to do the copy. Do an initial copy while Solr is running, then do a second copy, which should be pretty fast because rsync will see the data from the first copy. Then shut Solr down and do a third rsync which will only copy a VERY small changeset. Reconfigure Solr and/or the OS to use the new location, and start Solr back up. Because you mentioned "cp" I am assuming that you're NOT on Windows, and that the OS will most likely allow you to do anything you need with index files while Solr has them open. If you have set up your replicas with SolrCloud properly, then your collections will not go offline when one Solr instance is shut down, and that instance will be brought back into sync with the rest of the cluster when it starts back up. Using multiple passes with rsync should mean that Solr will not need to be shutdown for very long. The options I typically use for this kind of copy with rsync are "-avH --delete". I would recommend that you research rsync options so that you fully understand what I have suggested. Thanks, Shawn
Move index directory to another partition
Hello, I've a SolrCloud of four instances on Amazon and the EBS volumes that contain the data on everynode is going to be full, unfortunately Amazon doesn't support expanding the EBS. So, I'll attach larger EBS volumes to move the index to. I can stop the updates on the index, but I'm afraid to use "cp" command to copy the files that are "on merge" operation. The copy operation may take several hours. How can I move the data directory without stopping the instance? Thanks, Mahmoud