Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2018-04-10 Thread Eugen Block

Hi,

I just wanted to make sure that our latest findings reach the OP of  
this thread. We posted it in a different thread [1] and hope this  
helps some of you.
It is possible to migrate a journal from one partition to another  
almost without downtime of the OSD. But it's *not* sufficient to dd  
the journal to the new partition and replace the symlink. The OSD will  
restart successfully only if the old partition still exists, and  
you'll find references to it in /proc/fd/. Removing the old  
partition will prevent the OSD from starting. You can find details in  
the provided link [1].


We managed to replace the journals of six 1 TB OSDs residing on the  
same host within 25 minutes in our production environment.


Note: this only applies if the wal/db already reside on a separate partition.

Currently, I'm looking for a way to extract the journal of an  
all-in-one OSD (bluestore) into a separate partition, I thought maybe  
"ceph-objectstore-tool --op dump-journal" could do the trick, but this  
command doesn't work for me. Has anyone any insights on this?


Regards,
Eugen

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025930.html



<ronny+ceph-us...@aasen.cx> -
 Datum: Fri, 17 Nov 2017 17:04:36 +0100
   Von: Ronny Aasen <ronny+ceph-us...@aasen.cx>
Betreff: Re: [ceph-users] Moving bluestore WAL and DB after  
bluestore creation

An: ceph-users@lists.ceph.com

On 16.11.2017 09:45, Loris Cuoghi wrote:

Le Wed, 15 Nov 2017 19:46:48 +,
Shawn Edwards <lesser.e...@gmail.com> a écrit :


On Wed, Nov 15, 2017, 11:07 David Turner <drakonst...@gmail.com>
wrote:


I'm not going to lie.  This makes me dislike Bluestore quite a
bit.  Using multiple OSDs to an SSD journal allowed for you to
monitor the write durability of the SSD and replace it without
having to out and re-add all of the OSDs on the device.  Having to
now out and backfill back onto the HDDs is awful and would have
made a time when I realized that 20 journal SSDs all ran low on
writes at the same time nearly impossible to recover from.

Flushing journals, replacing SSDs, and bringing it all back online
was a slick process.  Formatting the HDDs and backfilling back onto
the same disks sounds like a big regression.  A process to migrate
the WAL and DB onto the HDD and then back off to a new device would
be very helpful.

On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco
<mgiamma...@gmail.com> wrote:


It seems it is not possible. I recreated the OSD

2017-11-12 17:44 GMT+01:00 Shawn Edwards <lesser.e...@gmail.com>:


I've created some Bluestore OSD with all data (wal, db, and data)
all on the same rotating disk.  I would like to now move the wal
and db onto an nvme disk.  Is that possible without re-creating
the OSD?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



This.  Exactly this.  Not being able to move the .db and .wal data on
and off the main storage disk on Bluestore is a regression.


Hello,

What stops you from dd'ing the DB/WAL's partitions on another disk and
updating the symlinks in the OSD's mount point under /var/lib/ceph/osd?



this probably works when you deployed bluestore with partitions, but  
if you did not create partitions for block.db on orginal bluestore  
creation there is no block.db symlink, db and wal are mixed into the  
block partition and not easy to extract.  also just dd the block  
device may not help if you want to change the size of the db  
partition. this needs more testing.  probably tools can be created  
in the future for resizing  db and wal partitions, and for  
extracting db data from block into a separate block.db partition.


dd block.db would probably work when you need to replace a worn out  
ssd drive. but not so much if you want to deploy separate block.db  
from a bluestore made without block.db



kind regards
Ronny Aasen





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

 Vorsitzende des Aufsichtsrates: Angelika Mozdzen
   Sitz und Registergericht: Hamburg, HRB 90934
   Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983


--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsi

Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-17 Thread Ronny Aasen

On 16.11.2017 09:45, Loris Cuoghi wrote:

Le Wed, 15 Nov 2017 19:46:48 +,
Shawn Edwards  a écrit :


On Wed, Nov 15, 2017, 11:07 David Turner 
wrote:


I'm not going to lie.  This makes me dislike Bluestore quite a
bit.  Using multiple OSDs to an SSD journal allowed for you to
monitor the write durability of the SSD and replace it without
having to out and re-add all of the OSDs on the device.  Having to
now out and backfill back onto the HDDs is awful and would have
made a time when I realized that 20 journal SSDs all ran low on
writes at the same time nearly impossible to recover from.

Flushing journals, replacing SSDs, and bringing it all back online
was a slick process.  Formatting the HDDs and backfilling back onto
the same disks sounds like a big regression.  A process to migrate
the WAL and DB onto the HDD and then back off to a new device would
be very helpful.

On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco
 wrote:
  

It seems it is not possible. I recreated the OSD

2017-11-12 17:44 GMT+01:00 Shawn Edwards :
  

I've created some Bluestore OSD with all data (wal, db, and data)
all on the same rotating disk.  I would like to now move the wal
and db onto an nvme disk.  Is that possible without re-creating
the OSD?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  

This.  Exactly this.  Not being able to move the .db and .wal data on
and off the main storage disk on Bluestore is a regression.


Hello,

What stops you from dd'ing the DB/WAL's partitions on another disk and
updating the symlinks in the OSD's mount point under /var/lib/ceph/osd?



this probably works when you deployed bluestore with partitions, but if 
you did not create partitions for block.db on orginal bluestore creation 
there is no block.db symlink, db and wal are mixed into the block 
partition and not easy to extract.  also just dd the block device may 
not help if you want to change the size of the db partition. this needs 
more testing.  probably tools can be created in the future for resizing  
db and wal partitions, and for extracting db data from block into a 
separate block.db partition.


dd block.db would probably work when you need to replace a worn out ssd 
drive. but not so much if you want to deploy separate block.db from a 
bluestore made without block.db



kind regards
Ronny Aasen





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-16 Thread Loris Cuoghi
Le Wed, 15 Nov 2017 19:46:48 +,
Shawn Edwards  a écrit :

> On Wed, Nov 15, 2017, 11:07 David Turner 
> wrote:
> 
> > I'm not going to lie.  This makes me dislike Bluestore quite a
> > bit.  Using multiple OSDs to an SSD journal allowed for you to
> > monitor the write durability of the SSD and replace it without
> > having to out and re-add all of the OSDs on the device.  Having to
> > now out and backfill back onto the HDDs is awful and would have
> > made a time when I realized that 20 journal SSDs all ran low on
> > writes at the same time nearly impossible to recover from.
> >
> > Flushing journals, replacing SSDs, and bringing it all back online
> > was a slick process.  Formatting the HDDs and backfilling back onto
> > the same disks sounds like a big regression.  A process to migrate
> > the WAL and DB onto the HDD and then back off to a new device would
> > be very helpful.
> >
> > On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco
> >  wrote:
> >  
> >> It seems it is not possible. I recreated the OSD
> >>
> >> 2017-11-12 17:44 GMT+01:00 Shawn Edwards :
> >>  
> >>> I've created some Bluestore OSD with all data (wal, db, and data)
> >>> all on the same rotating disk.  I would like to now move the wal
> >>> and db onto an nvme disk.  Is that possible without re-creating
> >>> the OSD?
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>  
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> >
> >  
> This.  Exactly this.  Not being able to move the .db and .wal data on
> and off the main storage disk on Bluestore is a regression.
> 

Hello,

What stops you from dd'ing the DB/WAL's partitions on another disk and
updating the symlinks in the OSD's mount point under /var/lib/ceph/osd?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-15 Thread Shawn Edwards
On Wed, Nov 15, 2017, 11:07 David Turner  wrote:

> I'm not going to lie.  This makes me dislike Bluestore quite a bit.  Using
> multiple OSDs to an SSD journal allowed for you to monitor the write
> durability of the SSD and replace it without having to out and re-add all
> of the OSDs on the device.  Having to now out and backfill back onto the
> HDDs is awful and would have made a time when I realized that 20 journal
> SSDs all ran low on writes at the same time nearly impossible to recover
> from.
>
> Flushing journals, replacing SSDs, and bringing it all back online was a
> slick process.  Formatting the HDDs and backfilling back onto the same
> disks sounds like a big regression.  A process to migrate the WAL and DB
> onto the HDD and then back off to a new device would be very helpful.
>
> On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco 
> wrote:
>
>> It seems it is not possible. I recreated the OSD
>>
>> 2017-11-12 17:44 GMT+01:00 Shawn Edwards :
>>
>>> I've created some Bluestore OSD with all data (wal, db, and data) all on
>>> the same rotating disk.  I would like to now move the wal and db onto an
>>> nvme disk.  Is that possible without re-creating the OSD?
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
This.  Exactly this.  Not being able to move the .db and .wal data on and
off the main storage disk on Bluestore is a regression.


>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-15 Thread David Turner
I'm not going to lie.  This makes me dislike Bluestore quite a bit.  Using
multiple OSDs to an SSD journal allowed for you to monitor the write
durability of the SSD and replace it without having to out and re-add all
of the OSDs on the device.  Having to now out and backfill back onto the
HDDs is awful and would have made a time when I realized that 20 journal
SSDs all ran low on writes at the same time nearly impossible to recover
from.

Flushing journals, replacing SSDs, and bringing it all back online was a
slick process.  Formatting the HDDs and backfilling back onto the same
disks sounds like a big regression.  A process to migrate the WAL and DB
onto the HDD and then back off to a new device would be very helpful.

On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco 
wrote:

> It seems it is not possible. I recreated the OSD
>
> 2017-11-12 17:44 GMT+01:00 Shawn Edwards :
>
>> I've created some Bluestore OSD with all data (wal, db, and data) all on
>> the same rotating disk.  I would like to now move the wal and db onto an
>> nvme disk.  Is that possible without re-creating the OSD?
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-15 Thread Mario Giammarco
It seems it is not possible. I recreated the OSD

2017-11-12 17:44 GMT+01:00 Shawn Edwards :

> I've created some Bluestore OSD with all data (wal, db, and data) all on
> the same rotating disk.  I would like to now move the wal and db onto an
> nvme disk.  Is that possible without re-creating the OSD?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-12 Thread Shawn Edwards
I've created some Bluestore OSD with all data (wal, db, and data) all on
the same rotating disk.  I would like to now move the wal and db onto an
nvme disk.  Is that possible without re-creating the OSD?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com