Re: [ceph-users] Best way to reformat OSD drives?

2013-09-03 Thread Jens-Christian Fischer

On 03.09.2013, at 16:27, Sage Weil  wrote:

>> ceph osd create # this should give you back the same osd number as the one
>> you just removed
> 
> OSD=`ceph osd create` # may or may not be the same osd id

good point - so far it has been good to us!

> 
>> 
>> umount ${PART}1
>> parted $PART rm 1 # remove partion and create a new one
>> parted $PART mkpart primary 0% 100%  # remove partion and create a new one
> 
> I don't think the partition removal/add step is needed.

it isn't - I'm still learning the ropes :)


> 
> Otherwise it looks fine!

ok - I have tried a simplified version (that doesn't take the OSD out) that 
just "simulates" a disk failure (i.e.. stops the OSD, reformats the drive, 
recreates the OSD structure and starts the process again). This (seems) to 
work, but is really slow in rebuilding the disk (we see write speed of 4-20 
MB/s - and it takes ages to refill around 100GB of data)

I don't dare to run this on multiple OSDs a the same time for fear of loosing 
data, so the "slower/longer" process of first marking all OSDs of a server as 
out, waiting for them to empty and then batch formatting all OSDs on the server 
and waiting for the cluster to be stable again, might be faster in the end

cheers
jc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-03 Thread Sage Weil
On Mon, 2 Sep 2013, Jens-Christian Fischer wrote:
> Hi all
> we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally
> formatted the OSDs with btrfs but have had numerous problems (server kernel
> panics) that we could point back to btrfs. We are therefore in the process
> of reformatting our OSDs to XFS. We have a process that works, but I was
> wondering, if there is a simpler / faster way.
> 
> Currently we 'ceph osd out' all drives of a server and wait for the data to
> migrate away, then delete the OSD, recreate it and start the OSD processes
> again. This takes at least 1-2 days per server (mostly waiting for the data
> to migrate back and forth)
> 
> Here's the script we are using:
> 
> --- cut ---
> #! /bin/bash
> 
> OSD=$1
> PART=$2
> HOST=$3
> echo "changing partition ${PART}1 to XFS for OSD: $OSD on host: $HOST"
> read -p "continue or CTRL-C"
> 
> 
> service ceph -a stop osd.$OSD
> ceph osd crush remove osd.$OSD
> ceph auth del osd.$OSD
> ceph osd rm $OSD
> ceph osd create # this should give you back the same osd number as the one
> you just removed

OSD=`ceph osd create` # may or may not be the same osd id

> 
> umount ${PART}1
> parted $PART rm 1 # remove partion and create a new one
> parted $PART mkpart primary 0% 100%  # remove partion and create a new one

I don't think the partition removal/add step is needed.

> mkfs.xfs -f -i size=2048 ${PART}1 -L osd.$OSD
> mount -o inode64,noatime ${PART}1 /var/lib/ceph/osd/ceph-$OSD
> ceph-osd -i $OSD --mkfs --mkkey --mkjournal
> ceph auth add osd.$OSD osd 'allow *' mon 'allow rwx' -i
> /var/lib/ceph/osd/ceph-${OSD}/keyring
> ceph osd crush set $OSD 1 root=default host=$HOST
> service ceph -a start osd.$OSD

Otherwise it looks fine!

sage


> 
> --- cut ---
> 
> cheers
> Jens-Christian
> 
> -- 
> SWITCH
> Jens-Christian Fischer, Peta Solutions
> Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
> phone +41 44 268 15 15, direct +41 44 268 15 71
> jens-christian.fisc...@switch.ch
> http://www.switch.ch
> 
> http://www.switch.ch/socialmedia
> 
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-03 Thread Jens-Christian Fischer
> Why wait for the data to migrate away? Normally you have replicas of the 
> whole osd data, so you can simply stop the osd, reformat the disk and restart 
> it again. It'll join the cluster and automatically get all data it's missing. 
> Of course the risk of dataloss is a bit higher during that time, but normally 
> that should be ok, because it's not different from an ordinary disk failure 
> which can happen any time.
> 
> I just found a similar question from one year ago: 
> http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole 
> thread, but probably you can find some other ideas there.
> 
> service ceph osd stop $OSD
> mkfs -t xfs /dev/XXX
> ceph-osd -i $OSD --mkfs --mkkey --mkjournal
> service ceph osd start $OSD


this is what I did now:

ceph osd set noout
service ceph stop osd.X
umount /dev/sdX1
mkfs.xfs -f -i size=2048 /dev/sdX1 -L osd.X
vim /etc/fstab # edit line for /dev/sdX1
mount /dev/sdX1
ceph-osd -i X --mkfs --mkkey --mkjournal
ceph auth add osd.X osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/osd/ceph-X/keyring
service ceph start osd.X

seems to work so far, the OSD is busy retrieving data - and I didn't have to 
wait for the OSD to become empty.

cheers
jc___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-02 Thread Jens-Christian Fischer
Hi Martin

> On 2013-09-02 19:37, Jens-Christian Fischer wrote:
>> we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally 
>> formatted the OSDs with btrfs but have had numerous problems (server kernel 
>> panics) that we could point back to btrfs. We are therefore in the process 
>> of reformatting our OSDs to XFS. We have a process that works, but I was 
>> wondering, if there is a simpler / faster way.
>> 
>> Currently we 'ceph osd out' all drives of a server and wait for the data to 
>> migrate away, then delete the OSD, recreate it and start the OSD processes 
>> again. This takes at least 1-2 days per server (mostly waiting for the data 
>> to migrate back and forth)
>> 
> 
> The first thing I'd try is doing one osd at a time, rather than the entire 
> server; in theory, this should allow for (as opposed to definitely make it 
> happen) data to move from one osd to the other, rather than having to push it 
> across the network from other nodes.

Isn't that depending on the CRUSH map and some rules?

> 
> depending on just how much data you have on an individual osd, you could stop 
> two, blow the first away, copy the data from osd 2 to the disk osd 1 was 
> using, change the mount-points, then bring osd 2 back up again; in theory, 
> osd 2 will only need to resync changes that have occurred while it was 
> offline. This, of course, presumes that there's no change in the on-disk 
> layout between btrfs and xfs...

We were actually thinking of doing that, but I wanted to hear the wisdom of the 
crowd… The thread from a year ago (that I just read) cautioned against that 
procedure though. 

cheers
jc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-02 Thread Jens-Christian Fischer
> 
> Why wait for the data to migrate away? Normally you have replicas of the 
> whole osd data, so you can simply stop the osd, reformat the disk and restart 
> it again. It'll join the cluster and automatically get all data it's missing. 
> Of course the risk of dataloss is a bit higher during that time, but normally 
> that should be ok, because it's not different from an ordinary disk failure 
> which can happen any time.

Because I lost 2 objects last time I did that trick (probably caused by 
additional user (i.e. me) stupidity in the first place, but I don't really 
fancy taking chances this time :) )

> 
> I just found a similar question from one year ago: 
> http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole 
> thread, but probably you can find some other ideas there.

I read it, but it is the usual to a fro - no definitive solution...

> 
> service ceph osd stop $OSD
> mkfs -t xfs /dev/XXX
> ceph-osd -i $OSD --mkfs --mkkey --mkjournal
> service ceph osd start $OSD

I'll give that a whirl - I have enough OSDs to try - as soon as the cluster has 
recovered from the 9 disks I formatted on saturday

cheers
jc

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-02 Thread Martin Rudat

Hi Jens,

On 2013-09-02 19:37, Jens-Christian Fischer wrote:
we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally 
formatted the OSDs with btrfs but have had numerous problems (server 
kernel panics) that we could point back to btrfs. We are therefore in 
the process of reformatting our OSDs to XFS. We have a process that 
works, but I was wondering, if there is a simpler / faster way.


Currently we 'ceph osd out' all drives of a server and wait for the 
data to migrate away, then delete the OSD, recreate it and start the 
OSD processes again. This takes at least 1-2 days per server (mostly 
waiting for the data to migrate back and forth)




The first thing I'd try is doing one osd at a time, rather than the 
entire server; in theory, this should allow for (as opposed to 
definitely make it happen) data to move from one osd to the other, 
rather than having to push it across the network from other nodes.


depending on just how much data you have on an individual osd, you could 
stop two, blow the first away, copy the data from osd 2 to the disk osd 
1 was using, change the mount-points, then bring osd 2 back up again; in 
theory, osd 2 will only need to resync changes that have occurred while 
it was offline. This, of course, presumes that there's no change in the 
on-disk layout between btrfs and xfs...


--
Martin Rudat


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best way to reformat OSD drives?

2013-09-02 Thread Corin Langosch

Am 02.09.2013 11:37, schrieb Jens-Christian Fischer:
we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally 
formatted the OSDs with btrfs but have had numerous problems (server kernel 
panics) that we could point back to btrfs. We are therefore in the process of 
reformatting our OSDs to XFS. We have a process that works, but I was 
wondering, if there is a simpler / faster way.


Currently we 'ceph osd out' all drives of a server and wait for the data to 
migrate away, then delete the OSD, recreate it and start the OSD processes 
again. This takes at least 1-2 days per server (mostly waiting for the data to 
migrate back and forth)





Why wait for the data to migrate away? Normally you have replicas of the whole 
osd data, so you can simply stop the osd, reformat the disk and restart it 
again. It'll join the cluster and automatically get all data it's missing. Of 
course the risk of dataloss is a bit higher during that time, but normally that 
should be ok, because it's not different from an ordinary disk failure which can 
happen any time.


I just found a similar question from one year ago: 
http://www.spinics.net/lists/ceph-devel/msg05915.html I didn't read the whole 
thread, but probably you can find some other ideas there.


service ceph osd stop $OSD
mkfs -t xfs /dev/XXX
ceph-osd -i $OSD --mkfs --mkkey --mkjournal
service ceph osd start $OSD

Corin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best way to reformat OSD drives?

2013-09-02 Thread Jens-Christian Fischer
Hi all

we have a Ceph Cluster with 64 OSD drives in 10 servers. We originally 
formatted the OSDs with btrfs but have had numerous problems (server kernel 
panics) that we could point back to btrfs. We are therefore in the process of 
reformatting our OSDs to XFS. We have a process that works, but I was 
wondering, if there is a simpler / faster way.

Currently we 'ceph osd out' all drives of a server and wait for the data to 
migrate away, then delete the OSD, recreate it and start the OSD processes 
again. This takes at least 1-2 days per server (mostly waiting for the data to 
migrate back and forth)

Here's the script we are using:

--- cut ---
#! /bin/bash

OSD=$1
PART=$2
HOST=$3
echo "changing partition ${PART}1 to XFS for OSD: $OSD on host: $HOST"
read -p "continue or CTRL-C"


service ceph -a stop osd.$OSD
ceph osd crush remove osd.$OSD
ceph auth del osd.$OSD
ceph osd rm $OSD
ceph osd create # this should give you back the same osd number as the one you 
just removed

umount ${PART}1
parted $PART rm 1 # remove partion and create a new one
parted $PART mkpart primary 0% 100%  # remove partion and create a new one
mkfs.xfs -f -i size=2048 ${PART}1 -L osd.$OSD
mount -o inode64,noatime ${PART}1 /var/lib/ceph/osd/ceph-$OSD
ceph-osd -i $OSD --mkfs --mkkey --mkjournal
ceph auth add osd.$OSD osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/osd/ceph-${OSD}/keyring
ceph osd crush set $OSD 1 root=default host=$HOST
service ceph -a start osd.$OSD

--- cut ---

cheers
Jens-Christian

-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/socialmedia

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com