[ceph-users] Replacing a failed OSD disk drive (or replace XFS with BTRFS)
I have been experimenting with Ceph, and have some OSDs with drives containing XFS filesystems which I want to change to BTRFS. (I started with BTRFS, then started again from scratch with XFS [currently recommended] in order to eleminate that as a potential cause of some issues, now with further experience, I want to go back to BTRFS, but have data in my cluster and I don't want to scrap it). This is exactly equivalent to the case in which I have an OSD with a drive that I see is starting to error. I would in that case need to replace the drive and recreate the Ceph structures on it. So, I mark the OSD out, and the cluster automatically eliminates its notion of data stored on the OSD and creates copies of the affected PGs elsewhere to make the cluster healthy again. All of the disk replacement instructions that I see then tell me to then follow an OSD removal process: This procedure removes an OSD from a cluster map, removes its authentication key, removes the OSD from the OSD map, and removes the OSD from the ceph.conf file. This seems to me to be too heavy-handed. I'm worried about doing this and then effectively adding a new OSD where I have the same id number as the OSD that I apparently unnecessarily removed. I don't actually want to remove the OSD. The OSD is fine, I just want to replace the disk drive that it uses. This suggests that I really want to take the OSD out, allow the cluster to get healthy again, then (replace the disk if this is due to failure,) create a new BTRFS/XFS filesystem, remount the drive, then recreate the Ceph structures on the disk to be compatible with the old disk and the original OSD that it was attached to. The OSD then gets marked back in, the cluster says hello again, we missed you, but its good to see you back, here are some PGs What I'm saying is that I really don't want to destroy the OSD, I want to refresh it with a new disk/filesystem and put it back to work. Is there some fundamental reason why this can't be done? If not, how should I do it? Best regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
that I can increase the size of pools in line with increasing osd numbers. I felt that this had to be the case, otherwise the 'scalable' claim becomes a bit limited. Returning from these digressions to my own experience; I set up my cephfs file system as illuminated by John Spray. I mounted it and started to rsync a multi-terabyte filesystem to it. This is my test, if cephfs handles this without grinding to a snails pace or failing, I will be ready to start to commit my data to it. My osd disk lights started to flash and flicker and a comforting sound of drive activity issued forth. I checked the osd logs, and to my dismay, there were crash reports in them all. However, a closer look revealed that I am getting the too many open files messages that precede the failures. I can see that this is not an osd failure, but a resource limit issue. I completely acknowledge that I must now RTFM, but I will ask whether anybody can give any guidance, based on experience, with respect to this issue. Thank you again for all for the previous prompt and invaluable advice and information. David On Wed, 4 Mar 2015 20:27:51 + Datatone Lists li...@datatone.co.uk wrote: I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used instead? [I am trying with btrfs]. I followed the manual deployment instructions on the web site (http://ceph.com/docs/master/install/manual-deployment/) and I managed to get a monitor and several osds running and apparently working. The instructions fizzle out without explaining how to set up mds. I went back to mkcephfs and got things set up that way. The mds starts. [Please don't mention ceph-deploy] The first thing that I noticed is that (whether I set up mon and osds by following the manual deployment, or using mkcephfs), the correct default pools were not created. bash-4.3# ceph osd lspools 0 rbd, bash-4.3# I get only 'rbd' created automatically. I deleted this pool, and re-created data, metadata and rbd manually. When doing this, I had to juggle with the pg- num in order to avoid the 'too many pgs for osd'. I have three osds running at the moment, but intend to add to these when I have some experience of things working reliably. I am puzzled, because I seem to have to set the pg-num for the pool to a number that makes (N-pools x pg-num)/N-osds come to the right kind of number. So this implies that I can't really expand a set of pools by adding osds at a later date. Q2) Is there any obvious reason why my default pools are not getting created automatically as expected? Q3) Can pg-num be modified for a pool later? (If the number of osds is increased dramatically). Finally, when I try to mount cephfs, I get a mount 5 error. A mount 5 error typically occurs if a MDS server is laggy or if it crashed. Ensure at least one MDS is up and running, and the cluster is active + healthy. My mds is running, but its log is not terribly active: 2015-03-04 17:47:43.177349 7f42da2c47c0 0 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors {default=true} (This is all there is in the log). I think that a key indicator of the problem must be this from the monitor log: 2015-03-04 16:53:20.715132 7f3cd0014700 1 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.? [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem disabled (I have added the '' sections to obscure my ip address) Q4) Can you give me an idea of what is wrong that causes the mds to not play properly? I think that there are some typos on the manual deployment pages, for example: ceph-osd id={osd-num} This is not right. As far as I am aware it should be: ceph-osd -i {osd-num} An observation. In principle, setting things up manually is not all that complicated, provided that clear and unambiguous instructions are provided. This simple piece of documentation is very important. My view is that the existing manual deployment instructions gets a bit confused and confusing when it gets to the osd setup, and the mds setup is completely absent. For someone who knows, this would be a fairly simple and fairly quick operation to review and revise this part of the documentation. I suspect that this part suffers from being really obvious stuff to the well initiated. For those of us closer to the start, this forms the ends of the threads that have to be picked up before the journey can be made. Very best regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph User Teething Problems
I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used instead? [I am trying with btrfs]. I followed the manual deployment instructions on the web site (http://ceph.com/docs/master/install/manual-deployment/) and I managed to get a monitor and several osds running and apparently working. The instructions fizzle out without explaining how to set up mds. I went back to mkcephfs and got things set up that way. The mds starts. [Please don't mention ceph-deploy] The first thing that I noticed is that (whether I set up mon and osds by following the manual deployment, or using mkcephfs), the correct default pools were not created. bash-4.3# ceph osd lspools 0 rbd, bash-4.3# I get only 'rbd' created automatically. I deleted this pool, and re-created data, metadata and rbd manually. When doing this, I had to juggle with the pg- num in order to avoid the 'too many pgs for osd'. I have three osds running at the moment, but intend to add to these when I have some experience of things working reliably. I am puzzled, because I seem to have to set the pg-num for the pool to a number that makes (N-pools x pg-num)/N-osds come to the right kind of number. So this implies that I can't really expand a set of pools by adding osds at a later date. Q2) Is there any obvious reason why my default pools are not getting created automatically as expected? Q3) Can pg-num be modified for a pool later? (If the number of osds is increased dramatically). Finally, when I try to mount cephfs, I get a mount 5 error. A mount 5 error typically occurs if a MDS server is laggy or if it crashed. Ensure at least one MDS is up and running, and the cluster is active + healthy. My mds is running, but its log is not terribly active: 2015-03-04 17:47:43.177349 7f42da2c47c0 0 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors {default=true} (This is all there is in the log). I think that a key indicator of the problem must be this from the monitor log: 2015-03-04 16:53:20.715132 7f3cd0014700 1 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.? [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem disabled (I have added the '' sections to obscure my ip address) Q4) Can you give me an idea of what is wrong that causes the mds to not play properly? I think that there are some typos on the manual deployment pages, for example: ceph-osd id={osd-num} This is not right. As far as I am aware it should be: ceph-osd -i {osd-num} An observation. In principle, setting things up manually is not all that complicated, provided that clear and unambiguous instructions are provided. This simple piece of documentation is very important. My view is that the existing manual deployment instructions gets a bit confused and confusing when it gets to the osd setup, and the mds setup is completely absent. For someone who knows, this would be a fairly simple and fairly quick operation to review and revise this part of the documentation. I suspect that this part suffers from being really obvious stuff to the well initiated. For those of us closer to the start, this forms the ends of the threads that have to be picked up before the journey can be made. Very best regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com