Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-07-17 Thread Marc Roos
 
I still wanted to thank you for the nicely detailed arguments regarding 
this, it is much appreciated. It really gives me the broader perspective 
I was lacking. 



-Original Message-
From: Warren Wang [mailto:warren.w...@walmart.com] 
Sent: maandag 11 juni 2018 17:30
To: Konstantin Shalygin; ceph-users@lists.ceph.com; Marc Roos
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

I'll chime in as a large scale operator, and a strong proponent of 
ceph-volume.
Ceph-disk wasn't accomplishing what was needed with anything other than 
vanilla use cases (even then, still kind of broken). I'm not going to 
re-hash Sage's valid points too much, but trying to manipulate the old 
ceph-disk to work with your own LVM (or other block manager). As far as 
the pain of doing something new goes, yes, sometimes moving to newer 
more flexible methods results in a large amount of work. Trust me, I 
feel that pain when we're talking about things like ceph-volume, 
bluestore, etc, but these changes are not made without reason.

As far as LVM performance goes, I think that's well understood in the 
larger Linux community. We accept that minimal overhead to accomplish 
some of the setups that we're interested in, such as encrypted, 
lvm-cached OSDs. The above is not a trivial thing to do using ceph-disk. 
We know, we run that in production, at large scale. It's plagued with 
problems, and since it's done without Ceph itself, it is difficult to 
tie the two together. Having it managed directly by Ceph, via 
ceph-volume makes much more sense. 
We're not alone in this, so I know it will benefit others as well, at 
the cost of technical expertise.

There are maintainers now for ceph-volume, so if there's something you 
don't like, I suggest proposing a change. 

Warren Wang

On 6/8/18, 11:05 AM, "ceph-users on behalf of Konstantin Shalygin" 
 wrote:

> - ceph-disk was replaced for two reasons: (1) It's design was
> centered around udev, and it was terrible.  We have been plagued 
for years
> with bugs due to race conditions in the udev-driven activation of 
OSDs,
> mostly variations of "I rebooted and not all of my OSDs started."  
It's
> horrible to observe and horrible to debug. (2) It was based on GPT
> partitions, lots of people had block layer tools they wanted to 
use
> that were LVM-based, and the two didn't mix (no GPT partitions on 
top of
> LVs).
>
> - We designed ceph-volome to be *modular* because antipicate that 
there
> are going to be lots of ways that people provision the hardware 
devices
> that we need to consider.  There are already two: legacy ceph-disk 
devices
> that are still in use and have GPT partitions (handled by 
'simple'), and
> lvm.  SPDK devices where we manage NVMe devices directly from 
userspace
> are on the immediate horizon--obviously LVM won't work there since 
the
> kernel isn't involved at all.  We can add any other schemes we 
like.
>
> - If you don't like LVM (e.g., because you find that there is a 
measurable
> overhead), let's design a new approach!  I wouldn't bother unless 
you can
> actually measure an impact.  But if you can demonstrate a 
measurable cost,
> let's do it.
>
> - LVM was chosen as the default appraoch for new devices are a few
> reasons:
>- It allows you to attach arbitrary metadata do each device, 
like which
> cluster uuid it belongs to, which osd uuid it belongs to, which 
type of
> device it is (primary, db, wal, journal), any secrets needed to 
fetch it's
> decryption key from a keyserver (the mon by default), and so on.
>- One of the goals was to enable lvm-based block layer modules 
beneath
> OSDs (dm-cache).  All of the other devicemapper-based tools we are
> aware of work with LVM.  It was a hammer that hit all nails.
>
> - The 'simple' mode is the current 'out' that avoids using LVM if 
it's not
> an option for you.  We only implemented scan and activate because 
that was
> all that we saw a current need for.  It should be quite easy to 
add the
> ability to create new OSDs.
>
> I would caution you, though, that simple relies on a file in 
/etc/ceph
> that has the metadata about the devices.  If you lose that file 
you need
> to have some way to rebuild it or we won't know what to do with 
your
> devices.  That means you should make the devices self-describing 
in some
> way... not, say, a raw device with dm-crypt layered directly on 
top, or
> some other option that makes it impossible to tell what it is.  As 
long as
> you can implement 'scan' and get any other info you need (e.g., 
whatever
> is necessary to fetch decryption keys) then great.


Thanks, I got what I wanted. It was in this form that it was 
necessary 
to submit deprecations to the community: "why do we do 

Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-11 Thread Warren Wang
I'll chime in as a large scale operator, and a strong proponent of ceph-volume.
Ceph-disk wasn't accomplishing what was needed with anything other than 
vanilla use cases (even then, still kind of broken). I'm not going to re-hash 
Sage's valid points too much, but trying to manipulate the old ceph-disk to 
work with your own LVM (or other block manager). As far as the pain of 
doing something new goes, yes, sometimes moving to newer more flexible 
methods results in a large amount of work. Trust me, I feel that pain when 
we're talking about things like ceph-volume, bluestore, etc, but these 
changes are not made without reason.

As far as LVM performance goes, I think that's well understood in the larger
Linux community. We accept that minimal overhead to accomplish some 
of the setups that we're interested in, such as encrypted, lvm-cached 
OSDs. The above is not a trivial thing to do using ceph-disk. We know, we 
run that in production, at large scale. It's plagued with problems, and since 
it's done without Ceph itself, it is difficult to tie the two together. Having 
it 
managed directly by Ceph, via ceph-volume makes much more sense. 
We're not alone in this, so I know it will benefit others as well, at the cost 
of technical expertise.

There are maintainers now for ceph-volume, so if there's something you 
don't like, I suggest proposing a change. 

Warren Wang

On 6/8/18, 11:05 AM, "ceph-users on behalf of Konstantin Shalygin" 
 wrote:

> - ceph-disk was replaced for two reasons: (1) It's design was
> centered around udev, and it was terrible.  We have been plagued for years
> with bugs due to race conditions in the udev-driven activation of OSDs,
> mostly variations of "I rebooted and not all of my OSDs started."  It's
> horrible to observe and horrible to debug. (2) It was based on GPT
> partitions, lots of people had block layer tools they wanted to use
> that were LVM-based, and the two didn't mix (no GPT partitions on top of
> LVs).
>
> - We designed ceph-volome to be *modular* because antipicate that there
> are going to be lots of ways that people provision the hardware devices
> that we need to consider.  There are already two: legacy ceph-disk devices
> that are still in use and have GPT partitions (handled by 'simple'), and
> lvm.  SPDK devices where we manage NVMe devices directly from userspace
> are on the immediate horizon--obviously LVM won't work there since the
> kernel isn't involved at all.  We can add any other schemes we like.
>
> - If you don't like LVM (e.g., because you find that there is a measurable
> overhead), let's design a new approach!  I wouldn't bother unless you can
> actually measure an impact.  But if you can demonstrate a measurable cost,
> let's do it.
>
> - LVM was chosen as the default appraoch for new devices are a few
> reasons:
>- It allows you to attach arbitrary metadata do each device, like which
> cluster uuid it belongs to, which osd uuid it belongs to, which type of
> device it is (primary, db, wal, journal), any secrets needed to fetch it's
> decryption key from a keyserver (the mon by default), and so on.
>- One of the goals was to enable lvm-based block layer modules beneath
> OSDs (dm-cache).  All of the other devicemapper-based tools we are
> aware of work with LVM.  It was a hammer that hit all nails.
>
> - The 'simple' mode is the current 'out' that avoids using LVM if it's not
> an option for you.  We only implemented scan and activate because that was
> all that we saw a current need for.  It should be quite easy to add the
> ability to create new OSDs.
>
> I would caution you, though, that simple relies on a file in /etc/ceph
> that has the metadata about the devices.  If you lose that file you need
> to have some way to rebuild it or we won't know what to do with your
> devices.  That means you should make the devices self-describing in some
> way... not, say, a raw device with dm-crypt layered directly on top, or
> some other option that makes it impossible to tell what it is.  As long as
> you can implement 'scan' and get any other info you need (e.g., whatever
> is necessary to fetch decryption keys) then great.


Thanks, I got what I wanted. It was in this form that it was necessary 
to submit deprecations to the community: "why do we do this, and what 
will it give us." As it was presented: "We kill the tool along with its 
functionality, you should use the new one as is, even if you do not know 
what it does."

Thanks again, Sage. I think this post should be in ceph blog.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Konstantin Shalygin

- ceph-disk was replaced for two reasons: (1) It's design was
centered around udev, and it was terrible.  We have been plagued for years
with bugs due to race conditions in the udev-driven activation of OSDs,
mostly variations of "I rebooted and not all of my OSDs started."  It's
horrible to observe and horrible to debug. (2) It was based on GPT
partitions, lots of people had block layer tools they wanted to use
that were LVM-based, and the two didn't mix (no GPT partitions on top of
LVs).

- We designed ceph-volome to be *modular* because antipicate that there
are going to be lots of ways that people provision the hardware devices
that we need to consider.  There are already two: legacy ceph-disk devices
that are still in use and have GPT partitions (handled by 'simple'), and
lvm.  SPDK devices where we manage NVMe devices directly from userspace
are on the immediate horizon--obviously LVM won't work there since the
kernel isn't involved at all.  We can add any other schemes we like.

- If you don't like LVM (e.g., because you find that there is a measurable
overhead), let's design a new approach!  I wouldn't bother unless you can
actually measure an impact.  But if you can demonstrate a measurable cost,
let's do it.

- LVM was chosen as the default appraoch for new devices are a few
reasons:
   - It allows you to attach arbitrary metadata do each device, like which
cluster uuid it belongs to, which osd uuid it belongs to, which type of
device it is (primary, db, wal, journal), any secrets needed to fetch it's
decryption key from a keyserver (the mon by default), and so on.
   - One of the goals was to enable lvm-based block layer modules beneath
OSDs (dm-cache).  All of the other devicemapper-based tools we are
aware of work with LVM.  It was a hammer that hit all nails.

- The 'simple' mode is the current 'out' that avoids using LVM if it's not
an option for you.  We only implemented scan and activate because that was
all that we saw a current need for.  It should be quite easy to add the
ability to create new OSDs.

I would caution you, though, that simple relies on a file in /etc/ceph
that has the metadata about the devices.  If you lose that file you need
to have some way to rebuild it or we won't know what to do with your
devices.  That means you should make the devices self-describing in some
way... not, say, a raw device with dm-crypt layered directly on top, or
some other option that makes it impossible to tell what it is.  As long as
you can implement 'scan' and get any other info you need (e.g., whatever
is necessary to fetch decryption keys) then great.



Thanks, I got what I wanted. It was in this form that it was necessary 
to submit deprecations to the community: "why do we do this, and what 
will it give us." As it was presented: "We kill the tool along with its 
functionality, you should use the new one as is, even if you do not know 
what it does."


Thanks again, Sage. I think this post should be in ceph blog.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Sage Weil
On Fri, 8 Jun 2018, Alfredo Deza wrote:
> On Fri, Jun 8, 2018 at 8:13 AM, Sage Weil  wrote:
> > I'm going to jump in here with a few points.
> >
> > - ceph-disk was replaced for two reasons: (1) It's design was
> > centered around udev, and it was terrible.  We have been plagued for years
> > with bugs due to race conditions in the udev-driven activation of OSDs,
> > mostly variations of "I rebooted and not all of my OSDs started."  It's
> > horrible to observe and horrible to debug. (2) It was based on GPT
> > partitions, lots of people had block layer tools they wanted to use
> > that were LVM-based, and the two didn't mix (no GPT partitions on top of
> > LVs).
> >
> > - We designed ceph-volome to be *modular* because antipicate that there
> > are going to be lots of ways that people provision the hardware devices
> > that we need to consider.  There are already two: legacy ceph-disk devices
> > that are still in use and have GPT partitions (handled by 'simple'), and
> > lvm.  SPDK devices where we manage NVMe devices directly from userspace
> > are on the immediate horizon--obviously LVM won't work there since the
> > kernel isn't involved at all.  We can add any other schemes we like.
> >
> > - If you don't like LVM (e.g., because you find that there is a measurable
> > overhead), let's design a new approach!  I wouldn't bother unless you can
> > actually measure an impact.  But if you can demonstrate a measurable cost,
> > let's do it.
> >
> > - LVM was chosen as the default appraoch for new devices are a few
> > reasons:
> >   - It allows you to attach arbitrary metadata do each device, like which
> > cluster uuid it belongs to, which osd uuid it belongs to, which type of
> > device it is (primary, db, wal, journal), any secrets needed to fetch it's
> > decryption key from a keyserver (the mon by default), and so on.
> >   - One of the goals was to enable lvm-based block layer modules beneath
> > OSDs (dm-cache).  All of the other devicemapper-based tools we are
> > aware of work with LVM.  It was a hammer that hit all nails.
> >
> > - The 'simple' mode is the current 'out' that avoids using LVM if it's not
> > an option for you.  We only implemented scan and activate because that was
> > all that we saw a current need for.  It should be quite easy to add the
> > ability to create new OSDs.
> >
> > I would caution you, though, that simple relies on a file in /etc/ceph
> > that has the metadata about the devices.  If you lose that file you need
> > to have some way to rebuild it or we won't know what to do with your
> > devices.
> > That means you should make the devices self-describing in some
> > way... not, say, a raw device with dm-crypt layered directly on top, or
> > some other option that makes it impossible to tell what it is.  As long as
> > you can implement 'scan' and get any other info you need (e.g., whatever
> > is necessary to fetch decryption keys) then great.
> 
> 'scan'  allows you to recreate that file from a data device or from
> an OSD directory (e.g. /var/lib/ceph/osd/ceph-0/)
> 
> So even in the case of disaster (or migrating) we can still get that
> file again. This includes the ability to detect both ceph-disks's
> encryption support
> as well as regular OSDs.
> 
> Do you mean that there might be situations where we 'scan' wouldn't be
> able to recreate this file? I think the out would be if the OSD is
> mounted/available already.

Right, it works great for the GPT-style ceph-disk devices.

I'm just cautioning that if someone wants to implement a *new* mode that 
doesn't use lvm or the legacy ceph-disk scheme and "uses raw devices for 
lower overhead" (whatever that ends up meaning), it should be done in a 
way such that scan can be implemented.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Alfredo Deza
On Fri, Jun 8, 2018 at 8:13 AM, Sage Weil  wrote:
> I'm going to jump in here with a few points.
>
> - ceph-disk was replaced for two reasons: (1) It's design was
> centered around udev, and it was terrible.  We have been plagued for years
> with bugs due to race conditions in the udev-driven activation of OSDs,
> mostly variations of "I rebooted and not all of my OSDs started."  It's
> horrible to observe and horrible to debug. (2) It was based on GPT
> partitions, lots of people had block layer tools they wanted to use
> that were LVM-based, and the two didn't mix (no GPT partitions on top of
> LVs).
>
> - We designed ceph-volome to be *modular* because antipicate that there
> are going to be lots of ways that people provision the hardware devices
> that we need to consider.  There are already two: legacy ceph-disk devices
> that are still in use and have GPT partitions (handled by 'simple'), and
> lvm.  SPDK devices where we manage NVMe devices directly from userspace
> are on the immediate horizon--obviously LVM won't work there since the
> kernel isn't involved at all.  We can add any other schemes we like.
>
> - If you don't like LVM (e.g., because you find that there is a measurable
> overhead), let's design a new approach!  I wouldn't bother unless you can
> actually measure an impact.  But if you can demonstrate a measurable cost,
> let's do it.
>
> - LVM was chosen as the default appraoch for new devices are a few
> reasons:
>   - It allows you to attach arbitrary metadata do each device, like which
> cluster uuid it belongs to, which osd uuid it belongs to, which type of
> device it is (primary, db, wal, journal), any secrets needed to fetch it's
> decryption key from a keyserver (the mon by default), and so on.
>   - One of the goals was to enable lvm-based block layer modules beneath
> OSDs (dm-cache).  All of the other devicemapper-based tools we are
> aware of work with LVM.  It was a hammer that hit all nails.
>
> - The 'simple' mode is the current 'out' that avoids using LVM if it's not
> an option for you.  We only implemented scan and activate because that was
> all that we saw a current need for.  It should be quite easy to add the
> ability to create new OSDs.
>
> I would caution you, though, that simple relies on a file in /etc/ceph
> that has the metadata about the devices.  If you lose that file you need
> to have some way to rebuild it or we won't know what to do with your
> devices.
> That means you should make the devices self-describing in some
> way... not, say, a raw device with dm-crypt layered directly on top, or
> some other option that makes it impossible to tell what it is.  As long as
> you can implement 'scan' and get any other info you need (e.g., whatever
> is necessary to fetch decryption keys) then great.

'scan'  allows you to recreate that file from a data device or from
an OSD directory (e.g. /var/lib/ceph/osd/ceph-0/)

So even in the case of disaster (or migrating) we can still get that
file again. This includes the ability to detect both ceph-disks's
encryption support
as well as regular OSDs.

Do you mean that there might be situations where we 'scan' wouldn't be
able to recreate this file? I think the out would be if the OSD is
mounted/available already.

>
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Sage Weil
I'm going to jump in here with a few points.

- ceph-disk was replaced for two reasons: (1) It's design was 
centered around udev, and it was terrible.  We have been plagued for years 
with bugs due to race conditions in the udev-driven activation of OSDs, 
mostly variations of "I rebooted and not all of my OSDs started."  It's 
horrible to observe and horrible to debug. (2) It was based on GPT 
partitions, lots of people had block layer tools they wanted to use 
that were LVM-based, and the two didn't mix (no GPT partitions on top of 
LVs).

- We designed ceph-volome to be *modular* because antipicate that there 
are going to be lots of ways that people provision the hardware devices 
that we need to consider.  There are already two: legacy ceph-disk devices 
that are still in use and have GPT partitions (handled by 'simple'), and 
lvm.  SPDK devices where we manage NVMe devices directly from userspace 
are on the immediate horizon--obviously LVM won't work there since the 
kernel isn't involved at all.  We can add any other schemes we like.

- If you don't like LVM (e.g., because you find that there is a measurable 
overhead), let's design a new approach!  I wouldn't bother unless you can 
actually measure an impact.  But if you can demonstrate a measurable cost, 
let's do it.

- LVM was chosen as the default appraoch for new devices are a few 
reasons:
  - It allows you to attach arbitrary metadata do each device, like which 
cluster uuid it belongs to, which osd uuid it belongs to, which type of 
device it is (primary, db, wal, journal), any secrets needed to fetch it's 
decryption key from a keyserver (the mon by default), and so on.
  - One of the goals was to enable lvm-based block layer modules beneath 
OSDs (dm-cache).  All of the other devicemapper-based tools we are 
aware of work with LVM.  It was a hammer that hit all nails.

- The 'simple' mode is the current 'out' that avoids using LVM if it's not 
an option for you.  We only implemented scan and activate because that was 
all that we saw a current need for.  It should be quite easy to add the 
ability to create new OSDs.

I would caution you, though, that simple relies on a file in /etc/ceph 
that has the metadata about the devices.  If you lose that file you need 
to have some way to rebuild it or we won't know what to do with your 
devices.  That means you should make the devices self-describing in some 
way... not, say, a raw device with dm-crypt layered directly on top, or 
some other option that makes it impossible to tell what it is.  As long as 
you can implement 'scan' and get any other info you need (e.g., whatever 
is necessary to fetch decryption keys) then great.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Konstantin Shalygin

http://docs.ceph.com/docs/master/ceph-volume/simple/

?



Only 'scan' & 'activate'. Not 'create'.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Janne Johansson
Den fre 8 juni 2018 kl 12:35 skrev Marc Roos :

>
> I am getting the impression that not everyone understands the subject
> that has been raised here.
>

Or they do and they do not agree with your vision of how things should be
done.

That is a distinct possibility one has to consider when using someone elses
code.

I seem to recall you getting told what would be required in order to get
ceph-disk back.
The programmers are ceph are not willing to do the work for you, so if you
want it you
would have to put in the effort.

I think the point where "more mails on the same subject" will have an effect
is past now, regardless of if you are right or wrong in your stance.

Time to decide if you can bear the horror of LVM or not. If not, then ceph
might not
be for you, and that is ok. No product or program will cover everyones
needs, but can
we stop rehashing it over and over?

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Marc Roos
 
> Answers:
> - unify setup, support for crypto & more
Unify setup by adding a dependency? There is / should be already support 
for crypto now, not?

> - none
Costs of lvm can be argued. Something to go through, is worse than 
nothing to go through.
https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination
https://hrcak.srce.hr/index.php?show=clanak_clanak_jezik=216661

If there is no cost, then there is no discussion. But I cannot believe 
there is no cost. And if there is a cost, then the reason for adding 
this cost should not be something like unify setup or crypto. Ceph has 
been around already a long time, and it does not look like its users had 
many problems. 
What about the clusters that have thousands of disks, have them migrate 
to lvm for just for fun? Direct disk access should stay.




-Original Message-
From: c...@jack.fr.eu.org [mailto:c...@jack.fr.eu.org] 
Sent: vrijdag 8 juni 2018 12:47
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

Beuh ...

I have other questions:
- why not use LVM, and stick with direct disk access ?
- what are the cost of LVM (performance, latency etc) ?


Answers:
- unify setup, support for crypto & more
- none

Tldr: that technical choice is fine, nothing to argue about.


On 06/08/2018 07:15 AM, Marc Roos wrote:
> 
> I am getting the impression that not everyone understands the subject 
> that has been raised here.
> 
> Why do osd's need to be via lvm, and why not stick with direct disk 
> access as it is now?
> 
> - Bluestore is created to cut out some fs overhead,
> - everywhere 10Gb is recommended because of better latency. (I even 
> posted here something to make ceph better performing with 1Gb eth, 
> disregarded because it would add complexity, fine, I can understand)
> 
> And then because of some start-up/automation issues (because that is 
> the only thing being mentioned here for now), lets add the lvm tier? 
> Introducing a layer that is constantly there and adds some overhead 
> (maybe not that much) for every read and write operation?
> 
> 
> 
> 
> 
> -Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk]
> Sent: vrijdag 8 juni 2018 12:14
> To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 

> and lvm? (and just not stick with direct disk access)
> 
> http://docs.ceph.com/docs/master/ceph-volume/simple/
> 
> ?
> 
>  
> 
> From: ceph-users  On Behalf Of 
> Konstantin Shalygin
> Sent: 08 June 2018 11:11
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 

> and lvm? (and just not stick with direct disk access)
> 
>  
> 
>   What is the reasoning behind switching to lvm? Does it make sense 
to 
> go
>   through (yet) another layer to access the disk? Why creating this 

>   dependency and added complexity? It is fine as it is, or not?
> 
> In fact, the question is why one tool is replaced by another without 
> saving functionality.
> Why lvm, why not bcache?
> 
> It seems to me that in the heads dev team someone has pushed the idea 
> that lvm solves all problems.
> But this is also added the overhead, and since this is a kernel module 

> with a update we can get a performance drop, changes in module 
> settings, etc.
> I understand that for Red Hat Storage this is a solution, but for a 
> community with different distributions and hardware this may be 
> superfluous.
> I would like to get back possibility of preparing osd's with direct 
> access was restored, and let it not be the default.
> Also this will save configurations for ceph-ansible. Actually I was 
> don't know what is create my osd's ceph-disk/ceph-volume or whatever 
> before this deprecation.
> 
> 
> 
> 
> 
> k
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread ceph
Beuh ...

I have other questions:
- why not use LVM, and stick with direct disk access ?
- what are the cost of LVM (performance, latency etc) ?


Answers:
- unify setup, support for crypto & more
- none

Tldr: that technical choice is fine, nothing to argue about.


On 06/08/2018 07:15 AM, Marc Roos wrote:
> 
> I am getting the impression that not everyone understands the subject 
> that has been raised here.
> 
> Why do osd's need to be via lvm, and why not stick with direct disk 
> access as it is now?
> 
> - Bluestore is created to cut out some fs overhead, 
> - everywhere 10Gb is recommended because of better latency. (I even 
> posted here something to make ceph better performing with 1Gb eth, 
> disregarded because it would add complexity, fine, I can understand)
> 
> And then because of some start-up/automation issues (because that is the 
> only thing being mentioned here for now), lets add the lvm 
> tier? Introducing a layer that is constantly there and adds some 
> overhead (maybe not that much) for every read and write operation? 
> 
> 
> 
> 
> 
> -Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk] 
> Sent: vrijdag 8 juni 2018 12:14
> To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
> and lvm? (and just not stick with direct disk access)
> 
> http://docs.ceph.com/docs/master/ceph-volume/simple/
> 
> ?
> 
>  
> 
> From: ceph-users  On Behalf Of 
> Konstantin Shalygin
> Sent: 08 June 2018 11:11
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
> and lvm? (and just not stick with direct disk access)
> 
>  
> 
>   What is the reasoning behind switching to lvm? Does it make sense 
> to go 
>   through (yet) another layer to access the disk? Why creating this 
>   dependency and added complexity? It is fine as it is, or not?
> 
> In fact, the question is why one tool is replaced by another without 
> saving functionality.
> Why lvm, why not bcache?
> 
> It seems to me that in the heads dev team someone has pushed the idea 
> that lvm solves all problems.
> But this is also added the overhead, and since this is a kernel module 
> with a update we can get a performance drop, changes in module settings, 
> etc.
> I understand that for Red Hat Storage this is a solution, but for a 
> community with different distributions and hardware this may be 
> superfluous.
> I would like to get back possibility of preparing osd's with direct 
> access was restored, and let it not be the default.
> Also this will save configurations for ceph-ansible. Actually I was 
> don't know what is create my osd's ceph-disk/ceph-volume or whatever 
> before this deprecation.
> 
> 
> 
> 
> 
> k
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Marc Roos


I am getting the impression that not everyone understands the subject 
that has been raised here.

Why do osd's need to be via lvm, and why not stick with direct disk 
access as it is now?

- Bluestore is created to cut out some fs overhead, 
- everywhere 10Gb is recommended because of better latency. (I even 
posted here something to make ceph better performing with 1Gb eth, 
disregarded because it would add complexity, fine, I can understand)

And then because of some start-up/automation issues (because that is the 
only thing being mentioned here for now), lets add the lvm 
tier? Introducing a layer that is constantly there and adds some 
overhead (maybe not that much) for every read and write operation? 





-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: vrijdag 8 juni 2018 12:14
To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

http://docs.ceph.com/docs/master/ceph-volume/simple/

?

 

From: ceph-users  On Behalf Of 
Konstantin Shalygin
Sent: 08 June 2018 11:11
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

 

What is the reasoning behind switching to lvm? Does it make sense 
to go 
through (yet) another layer to access the disk? Why creating this 
dependency and added complexity? It is fine as it is, or not?

In fact, the question is why one tool is replaced by another without 
saving functionality.
Why lvm, why not bcache?

It seems to me that in the heads dev team someone has pushed the idea 
that lvm solves all problems.
But this is also added the overhead, and since this is a kernel module 
with a update we can get a performance drop, changes in module settings, 
etc.
I understand that for Red Hat Storage this is a solution, but for a 
community with different distributions and hardware this may be 
superfluous.
I would like to get back possibility of preparing osd's with direct 
access was restored, and let it not be the default.
Also this will save configurations for ceph-ansible. Actually I was 
don't know what is create my osd's ceph-disk/ceph-volume or whatever 
before this deprecation.





k




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Nick Fisk
http://docs.ceph.com/docs/master/ceph-volume/simple/

?

 

From: ceph-users  On Behalf Of Konstantin 
Shalygin
Sent: 08 June 2018 11:11
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? 
(and just not stick with direct disk access)

 

What is the reasoning behind switching to lvm? Does it make sense to go 
through (yet) another layer to access the disk? Why creating this 
dependency and added complexity? It is fine as it is, or not?

In fact, the question is why one tool is replaced by another without saving 
functionality.
Why lvm, why not bcache?

It seems to me that in the heads dev team someone has pushed the idea that lvm 
solves all problems.
But this is also added the overhead, and since this is a kernel module with a 
update we can get a performance drop, changes in module settings, etc.
I understand that for Red Hat Storage this is a solution, but for a community 
with different distributions and hardware this may be superfluous.
I would like to get back possibility of preparing osd's with direct access was 
restored, and let it not be the default.
Also this will save configurations for ceph-ansible. Actually I was don't know 
what is create my osd's ceph-disk/ceph-volume or whatever before this 
deprecation.





k



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Konstantin Shalygin

What is the reasoning behind switching to lvm? Does it make sense to go
through (yet) another layer to access the disk? Why creating this
dependency and added complexity? It is fine as it is, or not?


In fact, the question is why one tool is replaced by another without 
saving functionality.

Why lvm, why not bcache?

It seems to me that in the heads dev team someone has pushed the idea 
that lvm solves all problems.
But this is also added the overhead, and since this is a kernel module 
with a update we can get a performance drop, changes in module settings, 
etc.
I understand that for Red Hat Storage this is a solution, but for a 
community with different distributions and hardware this may be superfluous.
I would like to get back possibility of preparing osd's with direct 
access was restored, and let it not be the default.
Also this will save configurations for ceph-ansible. Actually I was 
don't know what is create my osd's ceph-disk/ceph-volume or whatever 
before this deprecation.






k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-01 Thread Marc Roos
 
Yes it is indeed difficult to find a good balance between asking 
multiple things in one email and risk that not all are answered, or 
putting them as individual questions. 


-Original Message-
From: David Turner [mailto:drakonst...@gmail.com] 
Sent: donderdag 31 mei 2018 23:50
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

You are also making this entire conversation INCREDIBLY difficult to 
follow by creating so many new email threads instead of sticking with 
one.

On Thu, May 31, 2018 at 5:48 PM David Turner  
wrote:


Your question assumes that ceph-disk was a good piece of software.  
It had a bug list a mile long and nobody working on it.  A common 
example was how simple it was to mess up any part of the dozens of 
components that allowed an OSD to autostart on boot.  One of the biggest 
problems was when ceph-disk was doing it's thing and an OSD would take 
longer than 3 minutes to start and ceph-disk would give up on it.

That is a little bit about why a new solution was sought after and 
why ceph-disk is being removed entirely.  LVM was a choice made to 
implement something other than partitions and udev magic while still 
incorporating the information still needed from all of that in a better 
solution.  There has been a lot of talk about this on the ML.

On Thu, May 31, 2018 at 5:23 PM Marc Roos 
 wrote:



What is the reasoning behind switching to lvm? Does it make 
sense to go 
through (yet) another layer to access the disk? Why creating 
this 
dependency and added complexity? It is fine as it is, or not?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-01 Thread Alfredo Deza
On Thu, May 31, 2018 at 10:33 PM, Marc Roos  wrote:
>
> I actually tried to search the ML before bringing up this topic. Because
> I do not get the logic choosing this direction.
>
> - Bluestore is created to cut out some fs overhead,
> - everywhere 10Gb is recommended because of better latency. (I even
> posted here something to make ceph better performing with 1Gb eth,
> disregarded because it would add complexity, fine, I can understand)
>
> And then because of some start-up/automation issues, lets add the lvm
> tier? Introducing a layer that is constantly there and adds some
> overhead (maybe not that much) for every read and write operation?
>
> Is see ceph-disk as a tool to prepare the osd and the do the rest
> myself. Without ceph-deploy or ansible, because I trust more what I see
> I type than someone else scripted. I don’t have any startup problems.

You can certainly do that with ceph-volume. You can create the OSD
manually, and then add the information about your OSD (drives,
locations, fsid, uuids, etc..)
on /etc/ceph/osd/

This is how we are able to take over ceph-disk deployed OSDs

See: http://docs.ceph.com/docs/master/ceph-volume/simple/scan/#scan

>
> Do assume I am not an expert in any field. But it is understandable that
> having nothing between the disk access and something (lvm) should have a
> performance penalty.
> I know you can hack around nicely with disks and lvm, but those pro's
> fall into the same category of questions people are suggesting related
> to putting disks in raid.
>
> Let alone the risk that your are taking when there is going to be a
> significant performance penalty:
> https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination
> https://hrcak.srce.hr/index.php?show=clanak_clanak_jezik=216661
>
>
>
> -Original Message-
> From: David Turner [mailto:drakonst...@gmail.com]
> Sent: donderdag 31 mei 2018 23:48
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume
> and lvm? (and just not stick with direct disk access)
>
> Your question assumes that ceph-disk was a good piece of software.  It
> had a bug list a mile long and nobody working on it.  A common example
> was how simple it was to mess up any part of the dozens of components
> that allowed an OSD to autostart on boot.  One of the biggest problems
> was when ceph-disk was doing it's thing and an OSD would take longer
> than 3 minutes to start and ceph-disk would give up on it.
>
> That is a little bit about why a new solution was sought after and why
> ceph-disk is being removed entirely.  LVM was a choice made to implement
> something other than partitions and udev magic while still incorporating
> the information still needed from all of that in a better solution.
> There has been a lot of talk about this on the ML.
>
> On Thu, May 31, 2018 at 5:23 PM Marc Roos 
> wrote:
>
>
>
> What is the reasoning behind switching to lvm? Does it make sense
> to go
> through (yet) another layer to access the disk? Why creating this
> dependency and added complexity? It is fine as it is, or not?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-05-31 Thread David Turner
You are also making this entire conversation INCREDIBLY difficult to follow
by creating so many new email threads instead of sticking with one.

On Thu, May 31, 2018 at 5:48 PM David Turner  wrote:

> Your question assumes that ceph-disk was a good piece of software.  It had
> a bug list a mile long and nobody working on it.  A common example was how
> simple it was to mess up any part of the dozens of components that allowed
> an OSD to autostart on boot.  One of the biggest problems was when
> ceph-disk was doing it's thing and an OSD would take longer than 3 minutes
> to start and ceph-disk would give up on it.
>
> That is a little bit about why a new solution was sought after and why
> ceph-disk is being removed entirely.  LVM was a choice made to implement
> something other than partitions and udev magic while still incorporating
> the information still needed from all of that in a better solution.  There
> has been a lot of talk about this on the ML.
>
> On Thu, May 31, 2018 at 5:23 PM Marc Roos 
> wrote:
>
>>
>> What is the reasoning behind switching to lvm? Does it make sense to go
>> through (yet) another layer to access the disk? Why creating this
>> dependency and added complexity? It is fine as it is, or not?
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-05-31 Thread David Turner
Your question assumes that ceph-disk was a good piece of software.  It had
a bug list a mile long and nobody working on it.  A common example was how
simple it was to mess up any part of the dozens of components that allowed
an OSD to autostart on boot.  One of the biggest problems was when
ceph-disk was doing it's thing and an OSD would take longer than 3 minutes
to start and ceph-disk would give up on it.

That is a little bit about why a new solution was sought after and why
ceph-disk is being removed entirely.  LVM was a choice made to implement
something other than partitions and udev magic while still incorporating
the information still needed from all of that in a better solution.  There
has been a lot of talk about this on the ML.

On Thu, May 31, 2018 at 5:23 PM Marc Roos  wrote:

>
> What is the reasoning behind switching to lvm? Does it make sense to go
> through (yet) another layer to access the disk? Why creating this
> dependency and added complexity? It is fine as it is, or not?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com