Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
Hi, I just wanted to make sure that our latest findings reach the OP of this thread. We posted it in a different thread [1] and hope this helps some of you. It is possible to migrate a journal from one partition to another almost without downtime of the OSD. But it's *not* sufficient to dd the journal to the new partition and replace the symlink. The OSD will restart successfully only if the old partition still exists, and you'll find references to it in /proc/fd/. Removing the old partition will prevent the OSD from starting. You can find details in the provided link [1]. We managed to replace the journals of six 1 TB OSDs residing on the same host within 25 minutes in our production environment. Note: this only applies if the wal/db already reside on a separate partition. Currently, I'm looking for a way to extract the journal of an all-in-one OSD (bluestore) into a separate partition, I thought maybe "ceph-objectstore-tool --op dump-journal" could do the trick, but this command doesn't work for me. Has anyone any insights on this? Regards, Eugen [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025930.html <ronny+ceph-us...@aasen.cx> - Datum: Fri, 17 Nov 2017 17:04:36 +0100 Von: Ronny Aasen <ronny+ceph-us...@aasen.cx> Betreff: Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation An: ceph-users@lists.ceph.com On 16.11.2017 09:45, Loris Cuoghi wrote: Le Wed, 15 Nov 2017 19:46:48 +, Shawn Edwards <lesser.e...@gmail.com> a écrit : On Wed, Nov 15, 2017, 11:07 David Turner <drakonst...@gmail.com> wrote: I'm not going to lie. This makes me dislike Bluestore quite a bit. Using multiple OSDs to an SSD journal allowed for you to monitor the write durability of the SSD and replace it without having to out and re-add all of the OSDs on the device. Having to now out and backfill back onto the HDDs is awful and would have made a time when I realized that 20 journal SSDs all ran low on writes at the same time nearly impossible to recover from. Flushing journals, replacing SSDs, and bringing it all back online was a slick process. Formatting the HDDs and backfilling back onto the same disks sounds like a big regression. A process to migrate the WAL and DB onto the HDD and then back off to a new device would be very helpful. On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco <mgiamma...@gmail.com> wrote: It seems it is not possible. I recreated the OSD 2017-11-12 17:44 GMT+01:00 Shawn Edwards <lesser.e...@gmail.com>: I've created some Bluestore OSD with all data (wal, db, and data) all on the same rotating disk. I would like to now move the wal and db onto an nvme disk. Is that possible without re-creating the OSD? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This. Exactly this. Not being able to move the .db and .wal data on and off the main storage disk on Bluestore is a regression. Hello, What stops you from dd'ing the DB/WAL's partitions on another disk and updating the symlinks in the OSD's mount point under /var/lib/ceph/osd? this probably works when you deployed bluestore with partitions, but if you did not create partitions for block.db on orginal bluestore creation there is no block.db symlink, db and wal are mixed into the block partition and not easy to extract. also just dd the block device may not help if you want to change the size of the db partition. this needs more testing. probably tools can be created in the future for resizing db and wal partitions, and for extracting db data from block into a separate block.db partition. dd block.db would probably work when you need to replace a worn out ssd drive. but not so much if you want to deploy separate block.db from a bluestore made without block.db kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsi
Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
On 16.11.2017 09:45, Loris Cuoghi wrote: Le Wed, 15 Nov 2017 19:46:48 +, Shawn Edwardsa écrit : On Wed, Nov 15, 2017, 11:07 David Turner wrote: I'm not going to lie. This makes me dislike Bluestore quite a bit. Using multiple OSDs to an SSD journal allowed for you to monitor the write durability of the SSD and replace it without having to out and re-add all of the OSDs on the device. Having to now out and backfill back onto the HDDs is awful and would have made a time when I realized that 20 journal SSDs all ran low on writes at the same time nearly impossible to recover from. Flushing journals, replacing SSDs, and bringing it all back online was a slick process. Formatting the HDDs and backfilling back onto the same disks sounds like a big regression. A process to migrate the WAL and DB onto the HDD and then back off to a new device would be very helpful. On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco wrote: It seems it is not possible. I recreated the OSD 2017-11-12 17:44 GMT+01:00 Shawn Edwards : I've created some Bluestore OSD with all data (wal, db, and data) all on the same rotating disk. I would like to now move the wal and db onto an nvme disk. Is that possible without re-creating the OSD? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This. Exactly this. Not being able to move the .db and .wal data on and off the main storage disk on Bluestore is a regression. Hello, What stops you from dd'ing the DB/WAL's partitions on another disk and updating the symlinks in the OSD's mount point under /var/lib/ceph/osd? this probably works when you deployed bluestore with partitions, but if you did not create partitions for block.db on orginal bluestore creation there is no block.db symlink, db and wal are mixed into the block partition and not easy to extract. also just dd the block device may not help if you want to change the size of the db partition. this needs more testing. probably tools can be created in the future for resizing db and wal partitions, and for extracting db data from block into a separate block.db partition. dd block.db would probably work when you need to replace a worn out ssd drive. but not so much if you want to deploy separate block.db from a bluestore made without block.db kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
Le Wed, 15 Nov 2017 19:46:48 +, Shawn Edwardsa écrit : > On Wed, Nov 15, 2017, 11:07 David Turner > wrote: > > > I'm not going to lie. This makes me dislike Bluestore quite a > > bit. Using multiple OSDs to an SSD journal allowed for you to > > monitor the write durability of the SSD and replace it without > > having to out and re-add all of the OSDs on the device. Having to > > now out and backfill back onto the HDDs is awful and would have > > made a time when I realized that 20 journal SSDs all ran low on > > writes at the same time nearly impossible to recover from. > > > > Flushing journals, replacing SSDs, and bringing it all back online > > was a slick process. Formatting the HDDs and backfilling back onto > > the same disks sounds like a big regression. A process to migrate > > the WAL and DB onto the HDD and then back off to a new device would > > be very helpful. > > > > On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco > > wrote: > > > >> It seems it is not possible. I recreated the OSD > >> > >> 2017-11-12 17:44 GMT+01:00 Shawn Edwards : > >> > >>> I've created some Bluestore OSD with all data (wal, db, and data) > >>> all on the same rotating disk. I would like to now move the wal > >>> and db onto an nvme disk. Is that possible without re-creating > >>> the OSD? > >>> > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > This. Exactly this. Not being able to move the .db and .wal data on > and off the main storage disk on Bluestore is a regression. > Hello, What stops you from dd'ing the DB/WAL's partitions on another disk and updating the symlinks in the OSD's mount point under /var/lib/ceph/osd? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
On Wed, Nov 15, 2017, 11:07 David Turnerwrote: > I'm not going to lie. This makes me dislike Bluestore quite a bit. Using > multiple OSDs to an SSD journal allowed for you to monitor the write > durability of the SSD and replace it without having to out and re-add all > of the OSDs on the device. Having to now out and backfill back onto the > HDDs is awful and would have made a time when I realized that 20 journal > SSDs all ran low on writes at the same time nearly impossible to recover > from. > > Flushing journals, replacing SSDs, and bringing it all back online was a > slick process. Formatting the HDDs and backfilling back onto the same > disks sounds like a big regression. A process to migrate the WAL and DB > onto the HDD and then back off to a new device would be very helpful. > > On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco > wrote: > >> It seems it is not possible. I recreated the OSD >> >> 2017-11-12 17:44 GMT+01:00 Shawn Edwards : >> >>> I've created some Bluestore OSD with all data (wal, db, and data) all on >>> the same rotating disk. I would like to now move the wal and db onto an >>> nvme disk. Is that possible without re-creating the OSD? >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > This. Exactly this. Not being able to move the .db and .wal data on and off the main storage disk on Bluestore is a regression. >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
I'm not going to lie. This makes me dislike Bluestore quite a bit. Using multiple OSDs to an SSD journal allowed for you to monitor the write durability of the SSD and replace it without having to out and re-add all of the OSDs on the device. Having to now out and backfill back onto the HDDs is awful and would have made a time when I realized that 20 journal SSDs all ran low on writes at the same time nearly impossible to recover from. Flushing journals, replacing SSDs, and bringing it all back online was a slick process. Formatting the HDDs and backfilling back onto the same disks sounds like a big regression. A process to migrate the WAL and DB onto the HDD and then back off to a new device would be very helpful. On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarcowrote: > It seems it is not possible. I recreated the OSD > > 2017-11-12 17:44 GMT+01:00 Shawn Edwards : > >> I've created some Bluestore OSD with all data (wal, db, and data) all on >> the same rotating disk. I would like to now move the wal and db onto an >> nvme disk. Is that possible without re-creating the OSD? >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation
It seems it is not possible. I recreated the OSD 2017-11-12 17:44 GMT+01:00 Shawn Edwards: > I've created some Bluestore OSD with all data (wal, db, and data) all on > the same rotating disk. I would like to now move the wal and db onto an > nvme disk. Is that possible without re-creating the OSD? > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Moving bluestore WAL and DB after bluestore creation
I've created some Bluestore OSD with all data (wal, db, and data) all on the same rotating disk. I would like to now move the wal and db onto an nvme disk. Is that possible without re-creating the OSD? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com