Re: [ceph-users] Best way to reformat OSD drives?
On 03.09.2013, at 16:27, Sage Weil wrote: >> ceph osd create # this should give you back the same osd number as the one >> you just removed > > OSD=`ceph osd create` # may or may not be the same osd id good point - so far it has been good to us! > >> >> umount ${PART}1 >> parted $PART rm 1 # remove partion and create a new one >> parted $PART mkpart primary 0% 100% # remove partion and create a new one > > I don't think the partition removal/add step is needed. it isn't - I'm still learning the ropes :) > > Otherwise it looks fine! ok - I have tried a simplified version (that doesn't take the OSD out) that just "simulates" a disk failure (i.e.. stops the OSD, reformats the drive, recreates the OSD structure and starts the process again). This (seems) to work, but is really slow in rebuilding the disk (we see write speed of 4-20 MB/s - and it takes ages to refill around 100GB of data) I don't dare to run this on multiple OSDs a the same time for fear of loosing data, so the "slower/longer" process of first marking all OSDs of a server as out, waiting for them to empty and then batch formatting all OSDs on the server and waiting for the cluster to be stable again, might be faster in the end cheers jc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
On Mon, 2 Sep 2013, Jens-Christian Fischer wrote: > Hi all > we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally > formatted the OSDs with btrfs but have had numerous problems (server kernel > panics) that we could point back to btrfs. We are therefore in the process > of reformatting our OSDs to XFS. We have a process that works, but I was > wondering, if there is a simpler / faster way. > > Currently we 'ceph osd out' all drives of a server and wait for the data to > migrate away, then delete the OSD, recreate it and start the OSD processes > again. This takes at least 1-2 days per server (mostly waiting for the data > to migrate back and forth) > > Here's the script we are using: > > --- cut --- > #! /bin/bash > > OSD=$1 > PART=$2 > HOST=$3 > echo "changing partition ${PART}1 to XFS for OSD: $OSD on host: $HOST" > read -p "continue or CTRL-C" > > > service ceph -a stop osd.$OSD > ceph osd crush remove osd.$OSD > ceph auth del osd.$OSD > ceph osd rm $OSD > ceph osd create # this should give you back the same osd number as the one > you just removed OSD=`ceph osd create` # may or may not be the same osd id > > umount ${PART}1 > parted $PART rm 1 # remove partion and create a new one > parted $PART mkpart primary 0% 100% # remove partion and create a new one I don't think the partition removal/add step is needed. > mkfs.xfs -f -i size=2048 ${PART}1 -L osd.$OSD > mount -o inode64,noatime ${PART}1 /var/lib/ceph/osd/ceph-$OSD > ceph-osd -i $OSD --mkfs --mkkey --mkjournal > ceph auth add osd.$OSD osd 'allow *' mon 'allow rwx' -i > /var/lib/ceph/osd/ceph-${OSD}/keyring > ceph osd crush set $OSD 1 root=default host=$HOST > service ceph -a start osd.$OSD Otherwise it looks fine! sage > > --- cut --- > > cheers > Jens-Christian > > -- > SWITCH > Jens-Christian Fischer, Peta Solutions > Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland > phone +41 44 268 15 15, direct +41 44 268 15 71 > jens-christian.fisc...@switch.ch > http://www.switch.ch > > http://www.switch.ch/socialmedia > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
> Why wait for the data to migrate away? Normally you have replicas of the > whole osd data, so you can simply stop the osd, reformat the disk and restart > it again. It'll join the cluster and automatically get all data it's missing. > Of course the risk of dataloss is a bit higher during that time, but normally > that should be ok, because it's not different from an ordinary disk failure > which can happen any time. > > I just found a similar question from one year ago: > http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole > thread, but probably you can find some other ideas there. > > service ceph osd stop $OSD > mkfs -t xfs /dev/XXX > ceph-osd -i $OSD --mkfs --mkkey --mkjournal > service ceph osd start $OSD this is what I did now: ceph osd set noout service ceph stop osd.X umount /dev/sdX1 mkfs.xfs -f -i size=2048 /dev/sdX1 -L osd.X vim /etc/fstab # edit line for /dev/sdX1 mount /dev/sdX1 ceph-osd -i X --mkfs --mkkey --mkjournal ceph auth add osd.X osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-X/keyring service ceph start osd.X seems to work so far, the OSD is busy retrieving data - and I didn't have to wait for the OSD to become empty. cheers jc___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
Hi Martin > On 2013-09-02 19:37, Jens-Christian Fischer wrote: >> we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally >> formatted the OSDs with btrfs but have had numerous problems (server kernel >> panics) that we could point back to btrfs. We are therefore in the process >> of reformatting our OSDs to XFS. We have a process that works, but I was >> wondering, if there is a simpler / faster way. >> >> Currently we 'ceph osd out' all drives of a server and wait for the data to >> migrate away, then delete the OSD, recreate it and start the OSD processes >> again. This takes at least 1-2 days per server (mostly waiting for the data >> to migrate back and forth) >> > > The first thing I'd try is doing one osd at a time, rather than the entire > server; in theory, this should allow for (as opposed to definitely make it > happen) data to move from one osd to the other, rather than having to push it > across the network from other nodes. Isn't that depending on the CRUSH map and some rules? > > depending on just how much data you have on an individual osd, you could stop > two, blow the first away, copy the data from osd 2 to the disk osd 1 was > using, change the mount-points, then bring osd 2 back up again; in theory, > osd 2 will only need to resync changes that have occurred while it was > offline. This, of course, presumes that there's no change in the on-disk > layout between btrfs and xfs... We were actually thinking of doing that, but I wanted to hear the wisdom of the crowd… The thread from a year ago (that I just read) cautioned against that procedure though. cheers jc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
> > Why wait for the data to migrate away? Normally you have replicas of the > whole osd data, so you can simply stop the osd, reformat the disk and restart > it again. It'll join the cluster and automatically get all data it's missing. > Of course the risk of dataloss is a bit higher during that time, but normally > that should be ok, because it's not different from an ordinary disk failure > which can happen any time. Because I lost 2 objects last time I did that trick (probably caused by additional user (i.e. me) stupidity in the first place, but I don't really fancy taking chances this time :) ) > > I just found a similar question from one year ago: > http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole > thread, but probably you can find some other ideas there. I read it, but it is the usual to a fro - no definitive solution... > > service ceph osd stop $OSD > mkfs -t xfs /dev/XXX > ceph-osd -i $OSD --mkfs --mkkey --mkjournal > service ceph osd start $OSD I'll give that a whirl - I have enough OSDs to try - as soon as the cluster has recovered from the 9 disks I formatted on saturday cheers jc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
Hi Jens, On 2013-09-02 19:37, Jens-Christian Fischer wrote: we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally formatted the OSDs with btrfs but have had numerous problems (server kernel panics) that we could point back to btrfs. We are therefore in the process of reformatting our OSDs to XFS. We have a process that works, but I was wondering, if there is a simpler / faster way. Currently we 'ceph osd out' all drives of a server and wait for the data to migrate away, then delete the OSD, recreate it and start the OSD processes again. This takes at least 1-2 days per server (mostly waiting for the data to migrate back and forth) The first thing I'd try is doing one osd at a time, rather than the entire server; in theory, this should allow for (as opposed to definitely make it happen) data to move from one osd to the other, rather than having to push it across the network from other nodes. depending on just how much data you have on an individual osd, you could stop two, blow the first away, copy the data from osd 2 to the disk osd 1 was using, change the mount-points, then bring osd 2 back up again; in theory, osd 2 will only need to resync changes that have occurred while it was offline. This, of course, presumes that there's no change in the on-disk layout between btrfs and xfs... -- Martin Rudat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best way to reformat OSD drives?
Am 02.09.2013 11:37, schrieb Jens-Christian Fischer: we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally formatted the OSDs with btrfs but have had numerous problems (server kernel panics) that we could point back to btrfs. We are therefore in the process of reformatting our OSDs to XFS. We have a process that works, but I was wondering, if there is a simpler / faster way. Currently we 'ceph osd out' all drives of a server and wait for the data to migrate away, then delete the OSD, recreate it and start the OSD processes again. This takes at least 1-2 days per server (mostly waiting for the data to migrate back and forth) Why wait for the data to migrate away? Normally you have replicas of the whole osd data, so you can simply stop the osd, reformat the disk and restart it again. It'll join the cluster and automatically get all data it's missing. Of course the risk of dataloss is a bit higher during that time, but normally that should be ok, because it's not different from an ordinary disk failure which can happen any time. I just found a similar question from one year ago: http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole thread, but probably you can find some other ideas there. service ceph osd stop $OSD mkfs -t xfs /dev/XXX ceph-osd -i $OSD --mkfs --mkkey --mkjournal service ceph osd start $OSD Corin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best way to reformat OSD drives?
Hi all we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally formatted the OSDs with btrfs but have had numerous problems (server kernel panics) that we could point back to btrfs. We are therefore in the process of reformatting our OSDs to XFS. We have a process that works, but I was wondering, if there is a simpler / faster way. Currently we 'ceph osd out' all drives of a server and wait for the data to migrate away, then delete the OSD, recreate it and start the OSD processes again. This takes at least 1-2 days per server (mostly waiting for the data to migrate back and forth) Here's the script we are using: --- cut --- #! /bin/bash OSD=$1 PART=$2 HOST=$3 echo "changing partition ${PART}1 to XFS for OSD: $OSD on host: $HOST" read -p "continue or CTRL-C" service ceph -a stop osd.$OSD ceph osd crush remove osd.$OSD ceph auth del osd.$OSD ceph osd rm $OSD ceph osd create # this should give you back the same osd number as the one you just removed umount ${PART}1 parted $PART rm 1 # remove partion and create a new one parted $PART mkpart primary 0% 100% # remove partion and create a new one mkfs.xfs -f -i size=2048 ${PART}1 -L osd.$OSD mount -o inode64,noatime ${PART}1 /var/lib/ceph/osd/ceph-$OSD ceph-osd -i $OSD --mkfs --mkkey --mkjournal ceph auth add osd.$OSD osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-${OSD}/keyring ceph osd crush set $OSD 1 root=default host=$HOST service ceph -a start osd.$OSD --- cut --- cheers Jens-Christian -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fisc...@switch.ch http://www.switch.ch http://www.switch.ch/socialmedia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com