Re: [ceph-users] Ceph User Teething Problems
Hi David, I also see only the RBD pool getting created by default in 0.93. With regards to resizing placement groups, I believe you can use: ceph osd pool set [pool name] pg_num ceph osd pool set [pool name] pgp_num Be forewarned, this will trigger data migration. Cheers, Lincoln On Mar 4, 2015, at 2:27 PM, Datatone Lists wrote: I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used instead? [I am trying with btrfs]. I followed the manual deployment instructions on the web site (http://ceph.com/docs/master/install/manual-deployment/) and I managed to get a monitor and several osds running and apparently working. The instructions fizzle out without explaining how to set up mds. I went back to mkcephfs and got things set up that way. The mds starts. [Please don't mention ceph-deploy] The first thing that I noticed is that (whether I set up mon and osds by following the manual deployment, or using mkcephfs), the correct default pools were not created. bash-4.3# ceph osd lspools 0 rbd, bash-4.3# I get only 'rbd' created automatically. I deleted this pool, and re-created data, metadata and rbd manually. When doing this, I had to juggle with the pg- num in order to avoid the 'too many pgs for osd'. I have three osds running at the moment, but intend to add to these when I have some experience of things working reliably. I am puzzled, because I seem to have to set the pg-num for the pool to a number that makes (N-pools x pg-num)/N-osds come to the right kind of number. So this implies that I can't really expand a set of pools by adding osds at a later date. Q2) Is there any obvious reason why my default pools are not getting created automatically as expected? Q3) Can pg-num be modified for a pool later? (If the number of osds is increased dramatically). Finally, when I try to mount cephfs, I get a mount 5 error. A mount 5 error typically occurs if a MDS server is laggy or if it crashed. Ensure at least one MDS is up and running, and the cluster is active + healthy. My mds is running, but its log is not terribly active: 2015-03-04 17:47:43.177349 7f42da2c47c0 0 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors {default=true} (This is all there is in the log). I think that a key indicator of the problem must be this from the monitor log: 2015-03-04 16:53:20.715132 7f3cd0014700 1 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.? [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem disabled (I have added the '' sections to obscure my ip address) Q4) Can you give me an idea of what is wrong that causes the mds to not play properly? I think that there are some typos on the manual deployment pages, for example: ceph-osd id={osd-num} This is not right. As far as I am aware it should be: ceph-osd -i {osd-num} An observation. In principle, setting things up manually is not all that complicated, provided that clear and unambiguous instructions are provided. This simple piece of documentation is very important. My view is that the existing manual deployment instructions gets a bit confused and confusing when it gets to the osd setup, and the mds setup is completely absent. For someone who knows, this would be a fairly simple and fairly quick operation to review and revise this part of the documentation. I suspect that this part suffers from being really obvious stuff to the well initiated. For those of us closer to the start, this forms the ends of the threads that have to be picked up before the journey can be made. Very best regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
Thank you all for such wonderful feedback. Thank you to John Spray for putting me on the right track. I now see that the cephfs aspect of the project is being de-emphasised, so that the manual deployment instructions tell how to set up the object store, and then the cephfs is a separate issue that needs to be explicitly set up and configured in its own right. So that explains why the cephfs pools are not created by default, and why the required cephfs pools are now referred to, not as 'data' and 'metadata', but 'cepfs_data' and 'cephfs_metadata'. I have created these pools, and created a new cephfs filesystem, and I can mount it without problem. This confirms my suspicion that the manual deployment pages are in need of review and revision. They still refer to three default pools. I am happy that this section should deal with the object store setup only, but I still think that the osd part is a bit confused and confusing, particularly with respect to what is done on which machine. It would then be useful to say something like this completes the configuration of the basic store. If you wish to use cephfs, you must set up a metadata server, appropriate pools, and a cephfs filesystem. (See http://...). I was not trying to be smart or obscure when I made a brief and apparently dismissive reference to ceph-deploy. I railed against it and the demise of mkcephfs on this list at the point that mkcephfs was discontinued in the releases. That caused a few supportive responses at the time, so I know that I'm not alone. I did not wish to trawl over those arguments again unnecessarily. There is a principle that is being missed. The 'ceph' code contains everything required to set up and operate a ceph cluster. There should be documentation detailing how this is done. 'Ceph-deploy' is a separate thing. It is one of several tools that promise to make setting things up easy. However, my resistance is based on two factors. If I recall correctly, it is one of those projects in which the configuration needs to know what 'distribution' is being used. (Presumably, this is to try to deduce where various things are located). So if one is not using one of these 'distributions', one is stuffed right from the start. Secondly, the challenge that we are trying to overcome is learning what the various ceph components need, and how they need to be set up and configured. I don't think that the don't worry your pretty little head about that, we have a natty tool to do it for you approach is particularly useful. So I am not knocking ceph-deploy, Travis, it is just that I do not believe that it is relevant or useful to me at this point in time. I see that Lionel Bouton seems to share my views here. In general, the ceph documentation (in my humble opinion) needs to be draughted with a keen eye on the required scope. Deal with ceph; don't let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or anything else that is not actually part of ceph. As an example, once you have configured your osd, you start it with: ceph-osd -i {osd-number} It is as simple as that! If it is required to start the osd automatically, then that will be done using sysvinit, upstart, systemd, or whatever else is being used to bring the system up in the first place. It is unnecessary and confusing to try to second-guess the environment in which ceph may be being used, and contaminate the documentation with such details. (Having said that, I see no problem with adding separate, helpful, sections such as Suggestions for starting using 'upstart', or Suggestions for starting using 'systemd'). So I would reiterate the point that the really important documentation is probably quite simple for an expert to produce. Just spell out what each component needs in terms of keys, access to keys, files, and so on. Spell out how to set everything up. Also how to change things after the event, so that 'trial and error' does not have to contain really expensive errors. Once we understand the fundamentals, getting fancy and efficient is a completely separate further goal, and is not really a responsibility of core ceph development. I have an inexplicable emotional desire to see ceph working well with btrfs, which I like very much and have been using since the very early days. Despite all the 'not ready for production' warnings, I adopted it with enthusiasm, and have never had cause to regret it, and only once or twice experienced a failure that was painful to me. However, as I have experimented with ceph over the years, it has been very clear that ceph seems to be the most ruthless stress test for it, and it has always broken quite quickly (I also used xfs for comparison). I have seen evidence of much work going into btrfs in the kernel development now that the lead developer has moved from Oracle to, I think, Facebook. I now share the view that I think Robert LeBlanc has, that maybe btrfs will now stand the ceph test. Thanks, Lincoln Bryant, for confirming
Re: [ceph-users] Ceph User Teething Problems
David, You will need to up the limit of open files in the linux system. Check /etc/security/limits.conf. it is explained somewhere in the docs and the autostart scripts 'fixes' the issue for most people. When I did a manual deploy for the same reasons you are, I ran into this too. Robert LeBlanc Sent from a mobile device please excuse any typos. On Mar 5, 2015 3:14 AM, Datatone Lists li...@datatone.co.uk wrote: Thank you all for such wonderful feedback. Thank you to John Spray for putting me on the right track. I now see that the cephfs aspect of the project is being de-emphasised, so that the manual deployment instructions tell how to set up the object store, and then the cephfs is a separate issue that needs to be explicitly set up and configured in its own right. So that explains why the cephfs pools are not created by default, and why the required cephfs pools are now referred to, not as 'data' and 'metadata', but 'cepfs_data' and 'cephfs_metadata'. I have created these pools, and created a new cephfs filesystem, and I can mount it without problem. This confirms my suspicion that the manual deployment pages are in need of review and revision. They still refer to three default pools. I am happy that this section should deal with the object store setup only, but I still think that the osd part is a bit confused and confusing, particularly with respect to what is done on which machine. It would then be useful to say something like this completes the configuration of the basic store. If you wish to use cephfs, you must set up a metadata server, appropriate pools, and a cephfs filesystem. (See http://...). I was not trying to be smart or obscure when I made a brief and apparently dismissive reference to ceph-deploy. I railed against it and the demise of mkcephfs on this list at the point that mkcephfs was discontinued in the releases. That caused a few supportive responses at the time, so I know that I'm not alone. I did not wish to trawl over those arguments again unnecessarily. There is a principle that is being missed. The 'ceph' code contains everything required to set up and operate a ceph cluster. There should be documentation detailing how this is done. 'Ceph-deploy' is a separate thing. It is one of several tools that promise to make setting things up easy. However, my resistance is based on two factors. If I recall correctly, it is one of those projects in which the configuration needs to know what 'distribution' is being used. (Presumably, this is to try to deduce where various things are located). So if one is not using one of these 'distributions', one is stuffed right from the start. Secondly, the challenge that we are trying to overcome is learning what the various ceph components need, and how they need to be set up and configured. I don't think that the don't worry your pretty little head about that, we have a natty tool to do it for you approach is particularly useful. So I am not knocking ceph-deploy, Travis, it is just that I do not believe that it is relevant or useful to me at this point in time. I see that Lionel Bouton seems to share my views here. In general, the ceph documentation (in my humble opinion) needs to be draughted with a keen eye on the required scope. Deal with ceph; don't let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or anything else that is not actually part of ceph. As an example, once you have configured your osd, you start it with: ceph-osd -i {osd-number} It is as simple as that! If it is required to start the osd automatically, then that will be done using sysvinit, upstart, systemd, or whatever else is being used to bring the system up in the first place. It is unnecessary and confusing to try to second-guess the environment in which ceph may be being used, and contaminate the documentation with such details. (Having said that, I see no problem with adding separate, helpful, sections such as Suggestions for starting using 'upstart', or Suggestions for starting using 'systemd'). So I would reiterate the point that the really important documentation is probably quite simple for an expert to produce. Just spell out what each component needs in terms of keys, access to keys, files, and so on. Spell out how to set everything up. Also how to change things after the event, so that 'trial and error' does not have to contain really expensive errors. Once we understand the fundamentals, getting fancy and efficient is a completely separate further goal, and is not really a responsibility of core ceph development. I have an inexplicable emotional desire to see ceph working well with btrfs, which I like very much and have been using since the very early days. Despite all the 'not ready for production' warnings, I adopted it with enthusiasm, and have never had cause to regret it, and only once or twice experienced a failure that was painful to me. However, as I have
Re: [ceph-users] Ceph User Teething Problems
On 04/03/2015 20:27, Datatone Lists wrote: I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used instead? [I am trying with btrfs]. XFS is still the recommended default backend (http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/#filesystems) I followed the manual deployment instructions on the web site (http://ceph.com/docs/master/install/manual-deployment/) and I managed to get a monitor and several osds running and apparently working. The instructions fizzle out without explaining how to set up mds. I went back to mkcephfs and got things set up that way. The mds starts. [Please don't mention ceph-deploy] This kind of comment isn't very helpful unless there is a specific issue with ceph-deploy that is preventing you from using it, and causing you to resort to manual steps. I happy to find ceph-deploy very useful, so I'm afraid I'm going to mention it anyway :-) The first thing that I noticed is that (whether I set up mon and osds by following the manual deployment, or using mkcephfs), the correct default pools were not created. This is not a bug. The 'data' and 'metadata' pools are no longer created by default. http://docs.ceph.com/docs/master/cephfs/createfs/ I get only 'rbd' created automatically. I deleted this pool, and re-created data, metadata and rbd manually. When doing this, I had to juggle with the pg- num in order to avoid the 'too many pgs for osd'. I have three osds running at the moment, but intend to add to these when I have some experience of things working reliably. I am puzzled, because I seem to have to set the pg-num for the pool to a number that makes (N-pools x pg-num)/N-osds come to the right kind of number. So this implies that I can't really expand a set of pools by adding osds at a later date. You should pick an appropriate number of PGs for the number of OSDs you have at the present time. When you add more OSDs, you can increase the number of PGs. You would not want to create the larger number of PGs initially, as they could exceed the resources available on your initial small number of OSDs. Q4) Can you give me an idea of what is wrong that causes the mds to not play properly? You have to explicitly enable the filesystem now (also http://docs.ceph.com/docs/master/cephfs/createfs/) I think that there are some typos on the manual deployment pages, for example: ceph-osd id={osd-num} This is not right. As far as I am aware it should be: ceph-osd -i {osd-num} ceph-osd id={osd-num} is an upstart invokation (i.e. it's prefaced with sudo start on the manual deployment page). In that context it's correct afaik, unless you're finding otherwise? John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
I can't help much on the MDS front, but here is some answers and my view on some of it. On Wed, Mar 4, 2015 at 1:27 PM, Datatone Lists li...@datatone.co.uk wrote: I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used instead? [I am trying with btrfs]. We are moving forward with btrfs on our production cluster aware that there may be performance issues. So far, it seems the later kernels have resolved the issues we've seen with snapshots. As the system grows we will keep an eye on it and are prepared to move to XFS if needed. I followed the manual deployment instructions on the web site (http://ceph.com/docs/master/install/manual-deployment/) and I managed to get a monitor and several osds running and apparently working. The instructions fizzle out without explaining how to set up mds. I went back to mkcephfs and got things set up that way. The mds starts. [Please don't mention ceph-deploy] The first thing that I noticed is that (whether I set up mon and osds by following the manual deployment, or using mkcephfs), the correct default pools were not created. bash-4.3# ceph osd lspools 0 rbd, bash-4.3# I get only 'rbd' created automatically. I deleted this pool, and re-created data, metadata and rbd manually. When doing this, I had to juggle with the pg- num in order to avoid the 'too many pgs for osd'. I have three osds running at the moment, but intend to add to these when I have some experience of things working reliably. I am puzzled, because I seem to have to set the pg-num for the pool to a number that makes (N-pools x pg-num)/N-osds come to the right kind of number. So this implies that I can't really expand a set of pools by adding osds at a later date. Q2) Is there any obvious reason why my default pools are not getting created automatically as expected? Since Giant, these pools are not automatically created, only the rbd pool is. Q3) Can pg-num be modified for a pool later? (If the number of osds is increased dramatically). pg_num and pgp_num can be increased (not decreased) on the fly later to expand with more OSDs. Finally, when I try to mount cephfs, I get a mount 5 error. A mount 5 error typically occurs if a MDS server is laggy or if it crashed. Ensure at least one MDS is up and running, and the cluster is active + healthy. My mds is running, but its log is not terribly active: 2015-03-04 17:47:43.177349 7f42da2c47c0 0 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors {default=true} (This is all there is in the log). I think that a key indicator of the problem must be this from the monitor log: 2015-03-04 16:53:20.715132 7f3cd0014700 1 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.? [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem disabled (I have added the '' sections to obscure my ip address) Q4) Can you give me an idea of what is wrong that causes the mds to not play properly? I think that there are some typos on the manual deployment pages, for example: ceph-osd id={osd-num} This is not right. As far as I am aware it should be: ceph-osd -i {osd-num} There are a few of these, usually running --help for the command gives you the right syntax needed for the version you have installed. But it is still very confusing. An observation. In principle, setting things up manually is not all that complicated, provided that clear and unambiguous instructions are provided. This simple piece of documentation is very important. My view is that the existing manual deployment instructions gets a bit confused and confusing when it gets to the osd setup, and the mds setup is completely absent. For someone who knows, this would be a fairly simple and fairly quick operation to review and revise this part of the documentation. I suspect that this part suffers from being really obvious stuff to the well initiated. For those of us closer to the start, this forms the ends of the threads that have to be picked up before the journey can be made. Very best regards, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton lionel-subscript...@bouton.name wrote: On 03/04/15 22:18, John Spray wrote: On 04/03/2015 20:27, Datatone Lists wrote: [...] [Please don't mention ceph-deploy] This kind of comment isn't very helpful unless there is a specific issue with ceph-deploy that is preventing you from using it, and causing you to resort to manual steps. As a new maintainer of ceph-deploy, I'm happy to hear all gripes. :) ceph-deploy is a subject I never took the time to give feedback on. We can't use it (we use Gentoo which isn't supported by ceph-deploy) and even if we could I probably wouldn't allow it: I believe that for important pieces of infrastructure like Ceph you have to understand its inner workings to the point where you can hack your way out in cases of problems and build tools to integrate them better with your environment (you can understand one of the reasons why we use Gentoo in production with other distributions...). I believe using ceph-deploy makes it more difficult to acquire the knowledge to do so. For example we have a script to replace a defective OSD (destroying an existing one and replacing with a new one) locking data in place as long as we can to avoid crush map changes to trigger movements until the map reaches its original state again which minimizes the total amount of data copied around. It might have been possible to achieve this with ceph-deploy, but I doubt we would have achieved it as easily (from understanding the causes of data movements through understanding the osd identifiers allocation process to implementing the script) if we hadn't created the OSD by hand repeatedly before scripting some processes. Thanks for this feedback. I share a lot of your sentiments, especially that it is good to understand as much of the system as you can. Everyone's skill level and use-case is different, and ceph-deploy is targeted more towards PoC use-cases. It tries to make things as easy as possible, but that necessarily abstracts most of the details away. Last time I searched for documentation on manual configuration it was much harder to find (mds manual configuration was indeed something I didn't find at all too). Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Teething Problems
On 03/04/15 22:50, Travis Rhoden wrote: [...] Thanks for this feedback. I share a lot of your sentiments, especially that it is good to understand as much of the system as you can. Everyone's skill level and use-case is different, and ceph-deploy is targeted more towards PoC use-cases. It tries to make things as easy as possible, but that necessarily abstracts most of the details away. To follow up on this subject, assuming ceph-deploy worked with Gentoo, one feature which would make it really useful to us would be for it to dump each and every one of the commands it uses so that they might be replicated manually. Documentation might be inaccurate or hard to browse for various reasons, but a tool which achieves its purpose can't be wrong about the command it uses (assuming it simply calls standard command-line tools and not some API over a socket...). There might be a way to do it already (seems something you would want at least when developing it) but obviously I didn't check. Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com