Re: [ceph-users] Ceph User Teething Problems

2015-03-23 Thread Lincoln Bryant
Hi David,

I also see only the RBD pool getting created by default in 0.93.

With regards to resizing placement groups, I believe you can use:
ceph osd pool set [pool name] pg_num
ceph osd pool set [pool name] pgp_num

Be forewarned, this will trigger data migration.

Cheers,
Lincoln

On Mar 4, 2015, at 2:27 PM, Datatone Lists wrote:

 I have been following ceph for a long time. I have yet to put it into
 service, and I keep coming back as btrfs improves and ceph reaches
 higher version numbers.
 
 I am now trying ceph 0.93 and kernel 4.0-rc1.
 
 Q1) Is it still considered that btrfs is not robust enough, and that
 xfs should be used instead? [I am trying with btrfs].
 
 I followed the manual deployment instructions on the web site 
 (http://ceph.com/docs/master/install/manual-deployment/) and I managed
 to get a monitor and several osds running and apparently working. The
 instructions fizzle out without explaining how to set up mds. I went
 back to mkcephfs and got things set up that way. The mds starts.
 
 [Please don't mention ceph-deploy]
 
 The first thing that I noticed is that (whether I set up mon and osds
 by following the manual deployment, or using mkcephfs), the correct
 default pools were not created.
 
 bash-4.3# ceph osd lspools
 0 rbd,
 bash-4.3# 
 
 I get only 'rbd' created automatically. I deleted this pool, and
 re-created data, metadata and rbd manually. When doing this, I had to
 juggle with the pg- num in order to avoid the 'too many pgs for osd'.
 I have three osds running at the moment, but intend to add to these
 when I have some experience of things working reliably. I am puzzled,
 because I seem to have to set the pg-num for the pool to a number that
 makes (N-pools x pg-num)/N-osds come to the right kind of number. So
 this implies that I can't really expand a set of pools by adding osds
 at a later date. 
 
 Q2) Is there any obvious reason why my default pools are not getting
 created automatically as expected?
 
 Q3) Can pg-num be modified for a pool later? (If the number of osds is 
 increased dramatically).
 
 Finally, when I try to mount cephfs, I get a mount 5 error.
 
 A mount 5 error typically occurs if a MDS server is laggy or if it
 crashed. Ensure at least one MDS is up and running, and the cluster is
 active + healthy.
 
 My mds is running, but its log is not terribly active:
 
 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93 
 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors 
 {default=true}
 
 (This is all there is in the log).
 
 I think that a key indicator of the problem must be this from the
 monitor log:
 
 2015-03-04 16:53:20.715132 7f3cd0014700  1
 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
 [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem
 disabled
 
 (I have added the '' sections to obscure my ip address)
 
 Q4) Can you give me an idea of what is wrong that causes the mds to not
 play properly?
 
 I think that there are some typos on the manual deployment pages, for
 example:
 
 ceph-osd id={osd-num}
 
 This is not right. As far as I am aware it should be:
 
 ceph-osd -i {osd-num}
 
 An observation. In principle, setting things up manually is not all
 that complicated, provided that clear and unambiguous instructions are
 provided. This simple piece of documentation is very important. My view
 is that the existing manual deployment instructions gets a bit confused
 and confusing when it gets to the osd setup, and the mds setup is
 completely absent.
 
 For someone who knows, this would be a fairly simple and fairly quick 
 operation to review and revise this part of the documentation. I
 suspect that this part suffers from being really obvious stuff to the
 well initiated. For those of us closer to the start, this forms the
 ends of the threads that have to be picked up before the journey can be
 made.
 
 Very best regards,
 David
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-05 Thread Datatone Lists

Thank you all for such wonderful feedback.

Thank you to John Spray for putting me on the right track. I now see
that the cephfs aspect of the project is being de-emphasised, so that
the manual deployment instructions tell how to set up the object store,
and then the cephfs is a separate issue that needs to be explicitly set
up and configured in its own right. So that explains why the cephfs
pools are not created by default, and why the required cephfs pools are
now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
'cephfs_metadata'. I have created these pools, and created a new cephfs
filesystem, and I can mount it without problem.

This confirms my suspicion that the manual deployment pages are in need
of review and revision. They still refer to three default pools. I am
happy that this section should deal with the object store setup only,
but I still think that the osd part is a bit confused and confusing,
particularly with respect to what is done on which machine. It would
then be useful to say something like this completes the configuration
of the basic store. If you wish to use cephfs, you must set up a
metadata server, appropriate pools, and a cephfs filesystem. (See
http://...).

I was not trying to be smart or obscure when I made a brief and
apparently dismissive reference to ceph-deploy. I railed against it and
the demise of mkcephfs on this list at the point that mkcephfs was
discontinued in the releases. That caused a few supportive responses at
the time, so I know that I'm not alone. I did not wish to trawl over
those arguments again unnecessarily.

There is a principle that is being missed. The 'ceph' code contains
everything required to set up and operate a ceph cluster. There should
be documentation detailing how this is done.

'Ceph-deploy' is a separate thing. It is one of several tools that
promise to make setting things up easy. However, my resistance is based
on two factors. If I recall correctly, it is one of those projects in
which the configuration needs to know what 'distribution' is being
used. (Presumably, this is to try to deduce where various things are
located). So if one is not using one of these 'distributions', one is
stuffed right from the start. Secondly, the challenge that we are
trying to overcome is learning what the various ceph components need,
and how they need to be set up and configured. I don't think that the
don't worry your pretty little head about that, we have a natty tool
to do it for you approach is particularly useful.

So I am not knocking ceph-deploy, Travis, it is just that I do not
believe that it is relevant or useful to me at this point in time.

I see that Lionel Bouton seems to share my views here.

In general, the ceph documentation (in my humble opinion) needs to be
draughted with a keen eye on the required scope. Deal with ceph; don't
let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
anything else that is not actually part of ceph.

As an example, once you have configured your osd, you start it with:

ceph-osd -i {osd-number}

It is as simple as that! 

If it is required to start the osd automatically, then that will be
done using sysvinit, upstart, systemd, or whatever else is being used
to bring the system up in the first place. It is unnecessary and
confusing to try to second-guess the environment in which ceph may be
being used, and contaminate the documentation with such details.
(Having said that, I see no problem with adding separate, helpful,
sections such as Suggestions for starting using 'upstart', or
Suggestions for starting using 'systemd').

So I would reiterate the point that the really important documentation
is probably quite simple for an expert to produce. Just spell out what
each component needs in terms of keys, access to keys, files, and so
on. Spell out how to set everything up. Also how to change things after
the event, so that 'trial and error' does not have to contain really
expensive errors. Once we understand the fundamentals, getting fancy
and efficient is a completely separate further goal, and is not really
a responsibility of core ceph development.

I have an inexplicable emotional desire to see ceph working well with
btrfs, which I like very much and have been using since the very early
days. Despite all the 'not ready for production' warnings, I adopted it
with enthusiasm, and have never had cause to regret it, and only once
or twice experienced a failure that was painful to me. However, as I
have experimented with ceph over the years, it has been very clear that
ceph seems to be the most ruthless stress test for it, and it has
always broken quite quickly (I also used xfs for comparison). I have
seen evidence of much work going into btrfs in the kernel development
now that the lead developer has moved from Oracle to, I think, Facebook.

I now share the view that I think Robert LeBlanc has, that maybe btrfs
will now stand the ceph test.

Thanks, Lincoln Bryant, for confirming 

Re: [ceph-users] Ceph User Teething Problems

2015-03-05 Thread Robert LeBlanc
David,

You will need to up the limit of open files in the linux system. Check
/etc/security/limits.conf. it is explained somewhere in the docs and the
autostart scripts 'fixes' the issue for most people. When I did a manual
deploy for the same reasons you are, I ran into this too.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 5, 2015 3:14 AM, Datatone Lists li...@datatone.co.uk wrote:


 Thank you all for such wonderful feedback.

 Thank you to John Spray for putting me on the right track. I now see
 that the cephfs aspect of the project is being de-emphasised, so that
 the manual deployment instructions tell how to set up the object store,
 and then the cephfs is a separate issue that needs to be explicitly set
 up and configured in its own right. So that explains why the cephfs
 pools are not created by default, and why the required cephfs pools are
 now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
 'cephfs_metadata'. I have created these pools, and created a new cephfs
 filesystem, and I can mount it without problem.

 This confirms my suspicion that the manual deployment pages are in need
 of review and revision. They still refer to three default pools. I am
 happy that this section should deal with the object store setup only,
 but I still think that the osd part is a bit confused and confusing,
 particularly with respect to what is done on which machine. It would
 then be useful to say something like this completes the configuration
 of the basic store. If you wish to use cephfs, you must set up a
 metadata server, appropriate pools, and a cephfs filesystem. (See
 http://...).

 I was not trying to be smart or obscure when I made a brief and
 apparently dismissive reference to ceph-deploy. I railed against it and
 the demise of mkcephfs on this list at the point that mkcephfs was
 discontinued in the releases. That caused a few supportive responses at
 the time, so I know that I'm not alone. I did not wish to trawl over
 those arguments again unnecessarily.

 There is a principle that is being missed. The 'ceph' code contains
 everything required to set up and operate a ceph cluster. There should
 be documentation detailing how this is done.

 'Ceph-deploy' is a separate thing. It is one of several tools that
 promise to make setting things up easy. However, my resistance is based
 on two factors. If I recall correctly, it is one of those projects in
 which the configuration needs to know what 'distribution' is being
 used. (Presumably, this is to try to deduce where various things are
 located). So if one is not using one of these 'distributions', one is
 stuffed right from the start. Secondly, the challenge that we are
 trying to overcome is learning what the various ceph components need,
 and how they need to be set up and configured. I don't think that the
 don't worry your pretty little head about that, we have a natty tool
 to do it for you approach is particularly useful.

 So I am not knocking ceph-deploy, Travis, it is just that I do not
 believe that it is relevant or useful to me at this point in time.

 I see that Lionel Bouton seems to share my views here.

 In general, the ceph documentation (in my humble opinion) needs to be
 draughted with a keen eye on the required scope. Deal with ceph; don't
 let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
 anything else that is not actually part of ceph.

 As an example, once you have configured your osd, you start it with:

 ceph-osd -i {osd-number}

 It is as simple as that!

 If it is required to start the osd automatically, then that will be
 done using sysvinit, upstart, systemd, or whatever else is being used
 to bring the system up in the first place. It is unnecessary and
 confusing to try to second-guess the environment in which ceph may be
 being used, and contaminate the documentation with such details.
 (Having said that, I see no problem with adding separate, helpful,
 sections such as Suggestions for starting using 'upstart', or
 Suggestions for starting using 'systemd').

 So I would reiterate the point that the really important documentation
 is probably quite simple for an expert to produce. Just spell out what
 each component needs in terms of keys, access to keys, files, and so
 on. Spell out how to set everything up. Also how to change things after
 the event, so that 'trial and error' does not have to contain really
 expensive errors. Once we understand the fundamentals, getting fancy
 and efficient is a completely separate further goal, and is not really
 a responsibility of core ceph development.

 I have an inexplicable emotional desire to see ceph working well with
 btrfs, which I like very much and have been using since the very early
 days. Despite all the 'not ready for production' warnings, I adopted it
 with enthusiasm, and have never had cause to regret it, and only once
 or twice experienced a failure that was painful to me. However, as I
 have 

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread John Spray

On 04/03/2015 20:27, Datatone Lists wrote:

I have been following ceph for a long time. I have yet to put it into
service, and I keep coming back as btrfs improves and ceph reaches
higher version numbers.

I am now trying ceph 0.93 and kernel 4.0-rc1.

Q1) Is it still considered that btrfs is not robust enough, and that
xfs should be used instead? [I am trying with btrfs].
XFS is still the recommended default backend 
(http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/#filesystems)


I followed the manual deployment instructions on the web site
(http://ceph.com/docs/master/install/manual-deployment/) and I managed
to get a monitor and several osds running and apparently working. The
instructions fizzle out without explaining how to set up mds. I went
back to mkcephfs and got things set up that way. The mds starts.

[Please don't mention ceph-deploy]
This kind of comment isn't very helpful unless there is a specific issue 
with ceph-deploy that is preventing you from using it, and causing you 
to resort to manual steps.  I happy to find ceph-deploy very useful, so 
I'm afraid I'm going to mention it anyway :-)


The first thing that I noticed is that (whether I set up mon and osds
by following the manual deployment, or using mkcephfs), the correct
default pools were not created.
This is not a bug.  The 'data' and 'metadata' pools are no longer 
created by default. http://docs.ceph.com/docs/master/cephfs/createfs/

  I get only 'rbd' created automatically. I deleted this pool, and
  re-created data, metadata and rbd manually. When doing this, I had to
  juggle with the pg- num in order to avoid the 'too many pgs for osd'.
  I have three osds running at the moment, but intend to add to these
  when I have some experience of things working reliably. I am puzzled,
  because I seem to have to set the pg-num for the pool to a number that
  makes (N-pools x pg-num)/N-osds come to the right kind of number. So
  this implies that I can't really expand a set of pools by adding osds
  at a later date.
You should pick an appropriate number of PGs for the number of OSDs you 
have at the present time.  When you add more OSDs, you can increase the 
number of PGs.  You would not want to create the larger number of PGs 
initially, as they could exceed the resources available on your initial 
small number of OSDs.

Q4) Can you give me an idea of what is wrong that causes the mds to not
play properly?
You have to explicitly enable the filesystem now (also 
http://docs.ceph.com/docs/master/cephfs/createfs/)

I think that there are some typos on the manual deployment pages, for
example:

ceph-osd id={osd-num}

This is not right. As far as I am aware it should be:

ceph-osd -i {osd-num}
ceph-osd id={osd-num} is an upstart invokation (i.e. it's prefaced with 
sudo start on the manual deployment page).  In that context it's 
correct afaik, unless you're finding otherwise?


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Robert LeBlanc
I can't help much on the MDS front, but here is some answers and my
view on some of it.

On Wed, Mar 4, 2015 at 1:27 PM, Datatone Lists li...@datatone.co.uk wrote:
 I have been following ceph for a long time. I have yet to put it into
 service, and I keep coming back as btrfs improves and ceph reaches
 higher version numbers.

 I am now trying ceph 0.93 and kernel 4.0-rc1.

 Q1) Is it still considered that btrfs is not robust enough, and that
 xfs should be used instead? [I am trying with btrfs].

We are moving forward with btrfs on our production cluster aware that
there may be performance issues. So far, it seems the later kernels
have resolved the issues we've seen with snapshots. As the system
grows we will keep an eye on it and are prepared to move to XFS if
needed.

 I followed the manual deployment instructions on the web site
 (http://ceph.com/docs/master/install/manual-deployment/) and I managed
 to get a monitor and several osds running and apparently working. The
 instructions fizzle out without explaining how to set up mds. I went
 back to mkcephfs and got things set up that way. The mds starts.

 [Please don't mention ceph-deploy]

 The first thing that I noticed is that (whether I set up mon and osds
 by following the manual deployment, or using mkcephfs), the correct
 default pools were not created.

 bash-4.3# ceph osd lspools
 0 rbd,
 bash-4.3#

  I get only 'rbd' created automatically. I deleted this pool, and
  re-created data, metadata and rbd manually. When doing this, I had to
  juggle with the pg- num in order to avoid the 'too many pgs for osd'.
  I have three osds running at the moment, but intend to add to these
  when I have some experience of things working reliably. I am puzzled,
  because I seem to have to set the pg-num for the pool to a number that
  makes (N-pools x pg-num)/N-osds come to the right kind of number. So
  this implies that I can't really expand a set of pools by adding osds
  at a later date.

 Q2) Is there any obvious reason why my default pools are not getting
 created automatically as expected?

Since Giant, these pools are not automatically created, only the rbd pool is.

 Q3) Can pg-num be modified for a pool later? (If the number of osds is
 increased dramatically).

pg_num and pgp_num can be increased (not decreased) on the fly later
to expand with more OSDs.

 Finally, when I try to mount cephfs, I get a mount 5 error.

 A mount 5 error typically occurs if a MDS server is laggy or if it
 crashed. Ensure at least one MDS is up and running, and the cluster is
 active + healthy.

 My mds is running, but its log is not terribly active:

 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93
 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors
 {default=true}

 (This is all there is in the log).

 I think that a key indicator of the problem must be this from the
 monitor log:

 2015-03-04 16:53:20.715132 7f3cd0014700  1
 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
 [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem
 disabled

 (I have added the '' sections to obscure my ip address)

 Q4) Can you give me an idea of what is wrong that causes the mds to not
 play properly?

 I think that there are some typos on the manual deployment pages, for
 example:

 ceph-osd id={osd-num}

 This is not right. As far as I am aware it should be:

 ceph-osd -i {osd-num}

There are a few of these, usually running --help for the command gives
you the right syntax needed for the version you have installed. But it
is still very confusing.

 An observation. In principle, setting things up manually is not all
 that complicated, provided that clear and unambiguous instructions are
 provided. This simple piece of documentation is very important. My view
 is that the existing manual deployment instructions gets a bit confused
 and confusing when it gets to the osd setup, and the mds setup is
 completely absent.

 For someone who knows, this would be a fairly simple and fairly quick
 operation to review and revise this part of the documentation. I
 suspect that this part suffers from being really obvious stuff to the
 well initiated. For those of us closer to the start, this forms the
 ends of the threads that have to be picked up before the journey can be
 made.

 Very best regards,
 David
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Travis Rhoden
On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton
lionel-subscript...@bouton.name wrote:
 On 03/04/15 22:18, John Spray wrote:
 On 04/03/2015 20:27, Datatone Lists wrote:
 [...] [Please don't mention ceph-deploy]
 This kind of comment isn't very helpful unless there is a specific
 issue with ceph-deploy that is preventing you from using it, and
 causing you to resort to manual steps.

As a new maintainer of ceph-deploy, I'm happy to hear all gripes.  :)


 ceph-deploy is a subject I never took the time to give feedback on.

 We can't use it (we use Gentoo which isn't supported by ceph-deploy) and
 even if we could I probably wouldn't allow it: I believe that for
 important pieces of infrastructure like Ceph you have to understand its
 inner workings to the point where you can hack your way out in cases of
 problems and build tools to integrate them better with your environment
 (you can understand one of the reasons why we use Gentoo in production
 with other distributions...).
 I believe using ceph-deploy makes it more difficult to acquire the
 knowledge to do so.
 For example we have a script to replace a defective OSD (destroying an
 existing one and replacing with a new one) locking data in place as long
 as we can to avoid crush map changes to trigger movements until the map
 reaches its original state again which minimizes the total amount of
 data copied around. It might have been possible to achieve this with
 ceph-deploy, but I doubt we would have achieved it as easily (from
 understanding the causes of data movements through understanding the osd
 identifiers allocation process to implementing the script) if we hadn't
 created the OSD by hand repeatedly before scripting some processes.

Thanks for this feedback.  I share a lot of your sentiments,
especially that it is good to understand as much of the system as you
can.  Everyone's skill level and use-case is different, and
ceph-deploy is targeted more towards PoC use-cases. It tries to make
things as easy as possible, but that necessarily abstracts most of the
details away.


 Last time I searched for documentation on manual configuration it was
 much harder to find (mds manual configuration was indeed something I
 didn't find at all too).

 Best regards,

 Lionel
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Lionel Bouton
On 03/04/15 22:50, Travis Rhoden wrote:
 [...]
 Thanks for this feedback.  I share a lot of your sentiments,
 especially that it is good to understand as much of the system as you
 can.  Everyone's skill level and use-case is different, and
 ceph-deploy is targeted more towards PoC use-cases. It tries to make
 things as easy as possible, but that necessarily abstracts most of the
 details away.

To follow up on this subject, assuming ceph-deploy worked with Gentoo,
one feature which would make it really useful to us would be for it to
dump each and every one of the commands it uses so that they might be
replicated manually. Documentation might be inaccurate or hard to browse
for various reasons, but a tool which achieves its purpose can't be
wrong about the command it uses (assuming it simply calls standard
command-line tools and not some API over a socket...).
There might be a way to do it already (seems something you would want at
least when developing it) but obviously I didn't check.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com