Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
I still wanted to thank you for the nicely detailed arguments regarding this, it is much appreciated. It really gives me the broader perspective I was lacking. -Original Message- From: Warren Wang [mailto:warren.w...@walmart.com] Sent: maandag 11 juni 2018 17:30 To: Konstantin Shalygin; ceph-users@lists.ceph.com; Marc Roos Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) I'll chime in as a large scale operator, and a strong proponent of ceph-volume. Ceph-disk wasn't accomplishing what was needed with anything other than vanilla use cases (even then, still kind of broken). I'm not going to re-hash Sage's valid points too much, but trying to manipulate the old ceph-disk to work with your own LVM (or other block manager). As far as the pain of doing something new goes, yes, sometimes moving to newer more flexible methods results in a large amount of work. Trust me, I feel that pain when we're talking about things like ceph-volume, bluestore, etc, but these changes are not made without reason. As far as LVM performance goes, I think that's well understood in the larger Linux community. We accept that minimal overhead to accomplish some of the setups that we're interested in, such as encrypted, lvm-cached OSDs. The above is not a trivial thing to do using ceph-disk. We know, we run that in production, at large scale. It's plagued with problems, and since it's done without Ceph itself, it is difficult to tie the two together. Having it managed directly by Ceph, via ceph-volume makes much more sense. We're not alone in this, so I know it will benefit others as well, at the cost of technical expertise. There are maintainers now for ceph-volume, so if there's something you don't like, I suggest proposing a change. Warren Wang On 6/8/18, 11:05 AM, "ceph-users on behalf of Konstantin Shalygin" wrote: > - ceph-disk was replaced for two reasons: (1) It's design was > centered around udev, and it was terrible. We have been plagued for years > with bugs due to race conditions in the udev-driven activation of OSDs, > mostly variations of "I rebooted and not all of my OSDs started." It's > horrible to observe and horrible to debug. (2) It was based on GPT > partitions, lots of people had block layer tools they wanted to use > that were LVM-based, and the two didn't mix (no GPT partitions on top of > LVs). > > - We designed ceph-volome to be *modular* because antipicate that there > are going to be lots of ways that people provision the hardware devices > that we need to consider. There are already two: legacy ceph-disk devices > that are still in use and have GPT partitions (handled by 'simple'), and > lvm. SPDK devices where we manage NVMe devices directly from userspace > are on the immediate horizon--obviously LVM won't work there since the > kernel isn't involved at all. We can add any other schemes we like. > > - If you don't like LVM (e.g., because you find that there is a measurable > overhead), let's design a new approach! I wouldn't bother unless you can > actually measure an impact. But if you can demonstrate a measurable cost, > let's do it. > > - LVM was chosen as the default appraoch for new devices are a few > reasons: >- It allows you to attach arbitrary metadata do each device, like which > cluster uuid it belongs to, which osd uuid it belongs to, which type of > device it is (primary, db, wal, journal), any secrets needed to fetch it's > decryption key from a keyserver (the mon by default), and so on. >- One of the goals was to enable lvm-based block layer modules beneath > OSDs (dm-cache). All of the other devicemapper-based tools we are > aware of work with LVM. It was a hammer that hit all nails. > > - The 'simple' mode is the current 'out' that avoids using LVM if it's not > an option for you. We only implemented scan and activate because that was > all that we saw a current need for. It should be quite easy to add the > ability to create new OSDs. > > I would caution you, though, that simple relies on a file in /etc/ceph > that has the metadata about the devices. If you lose that file you need > to have some way to rebuild it or we won't know what to do with your > devices. That means you should make the devices self-describing in some > way... not, say, a raw device with dm-crypt layered directly on top, or > some other option that makes it impossible to tell what it is. As long as > you can implement 'scan' and get any other info you need (e.g., whatever > is necessary to fetch decryption keys) then great. Thanks, I got what I wanted. It was in this form that it was necessary to submit deprecations to the community: "why do we do
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
I'll chime in as a large scale operator, and a strong proponent of ceph-volume. Ceph-disk wasn't accomplishing what was needed with anything other than vanilla use cases (even then, still kind of broken). I'm not going to re-hash Sage's valid points too much, but trying to manipulate the old ceph-disk to work with your own LVM (or other block manager). As far as the pain of doing something new goes, yes, sometimes moving to newer more flexible methods results in a large amount of work. Trust me, I feel that pain when we're talking about things like ceph-volume, bluestore, etc, but these changes are not made without reason. As far as LVM performance goes, I think that's well understood in the larger Linux community. We accept that minimal overhead to accomplish some of the setups that we're interested in, such as encrypted, lvm-cached OSDs. The above is not a trivial thing to do using ceph-disk. We know, we run that in production, at large scale. It's plagued with problems, and since it's done without Ceph itself, it is difficult to tie the two together. Having it managed directly by Ceph, via ceph-volume makes much more sense. We're not alone in this, so I know it will benefit others as well, at the cost of technical expertise. There are maintainers now for ceph-volume, so if there's something you don't like, I suggest proposing a change. Warren Wang On 6/8/18, 11:05 AM, "ceph-users on behalf of Konstantin Shalygin" wrote: > - ceph-disk was replaced for two reasons: (1) It's design was > centered around udev, and it was terrible. We have been plagued for years > with bugs due to race conditions in the udev-driven activation of OSDs, > mostly variations of "I rebooted and not all of my OSDs started." It's > horrible to observe and horrible to debug. (2) It was based on GPT > partitions, lots of people had block layer tools they wanted to use > that were LVM-based, and the two didn't mix (no GPT partitions on top of > LVs). > > - We designed ceph-volome to be *modular* because antipicate that there > are going to be lots of ways that people provision the hardware devices > that we need to consider. There are already two: legacy ceph-disk devices > that are still in use and have GPT partitions (handled by 'simple'), and > lvm. SPDK devices where we manage NVMe devices directly from userspace > are on the immediate horizon--obviously LVM won't work there since the > kernel isn't involved at all. We can add any other schemes we like. > > - If you don't like LVM (e.g., because you find that there is a measurable > overhead), let's design a new approach! I wouldn't bother unless you can > actually measure an impact. But if you can demonstrate a measurable cost, > let's do it. > > - LVM was chosen as the default appraoch for new devices are a few > reasons: >- It allows you to attach arbitrary metadata do each device, like which > cluster uuid it belongs to, which osd uuid it belongs to, which type of > device it is (primary, db, wal, journal), any secrets needed to fetch it's > decryption key from a keyserver (the mon by default), and so on. >- One of the goals was to enable lvm-based block layer modules beneath > OSDs (dm-cache). All of the other devicemapper-based tools we are > aware of work with LVM. It was a hammer that hit all nails. > > - The 'simple' mode is the current 'out' that avoids using LVM if it's not > an option for you. We only implemented scan and activate because that was > all that we saw a current need for. It should be quite easy to add the > ability to create new OSDs. > > I would caution you, though, that simple relies on a file in /etc/ceph > that has the metadata about the devices. If you lose that file you need > to have some way to rebuild it or we won't know what to do with your > devices. That means you should make the devices self-describing in some > way... not, say, a raw device with dm-crypt layered directly on top, or > some other option that makes it impossible to tell what it is. As long as > you can implement 'scan' and get any other info you need (e.g., whatever > is necessary to fetch decryption keys) then great. Thanks, I got what I wanted. It was in this form that it was necessary to submit deprecations to the community: "why do we do this, and what will it give us." As it was presented: "We kill the tool along with its functionality, you should use the new one as is, even if you do not know what it does." Thanks again, Sage. I think this post should be in ceph blog. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
- ceph-disk was replaced for two reasons: (1) It's design was centered around udev, and it was terrible. We have been plagued for years with bugs due to race conditions in the udev-driven activation of OSDs, mostly variations of "I rebooted and not all of my OSDs started." It's horrible to observe and horrible to debug. (2) It was based on GPT partitions, lots of people had block layer tools they wanted to use that were LVM-based, and the two didn't mix (no GPT partitions on top of LVs). - We designed ceph-volome to be *modular* because antipicate that there are going to be lots of ways that people provision the hardware devices that we need to consider. There are already two: legacy ceph-disk devices that are still in use and have GPT partitions (handled by 'simple'), and lvm. SPDK devices where we manage NVMe devices directly from userspace are on the immediate horizon--obviously LVM won't work there since the kernel isn't involved at all. We can add any other schemes we like. - If you don't like LVM (e.g., because you find that there is a measurable overhead), let's design a new approach! I wouldn't bother unless you can actually measure an impact. But if you can demonstrate a measurable cost, let's do it. - LVM was chosen as the default appraoch for new devices are a few reasons: - It allows you to attach arbitrary metadata do each device, like which cluster uuid it belongs to, which osd uuid it belongs to, which type of device it is (primary, db, wal, journal), any secrets needed to fetch it's decryption key from a keyserver (the mon by default), and so on. - One of the goals was to enable lvm-based block layer modules beneath OSDs (dm-cache). All of the other devicemapper-based tools we are aware of work with LVM. It was a hammer that hit all nails. - The 'simple' mode is the current 'out' that avoids using LVM if it's not an option for you. We only implemented scan and activate because that was all that we saw a current need for. It should be quite easy to add the ability to create new OSDs. I would caution you, though, that simple relies on a file in /etc/ceph that has the metadata about the devices. If you lose that file you need to have some way to rebuild it or we won't know what to do with your devices. That means you should make the devices self-describing in some way... not, say, a raw device with dm-crypt layered directly on top, or some other option that makes it impossible to tell what it is. As long as you can implement 'scan' and get any other info you need (e.g., whatever is necessary to fetch decryption keys) then great. Thanks, I got what I wanted. It was in this form that it was necessary to submit deprecations to the community: "why do we do this, and what will it give us." As it was presented: "We kill the tool along with its functionality, you should use the new one as is, even if you do not know what it does." Thanks again, Sage. I think this post should be in ceph blog. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
On Fri, 8 Jun 2018, Alfredo Deza wrote: > On Fri, Jun 8, 2018 at 8:13 AM, Sage Weil wrote: > > I'm going to jump in here with a few points. > > > > - ceph-disk was replaced for two reasons: (1) It's design was > > centered around udev, and it was terrible. We have been plagued for years > > with bugs due to race conditions in the udev-driven activation of OSDs, > > mostly variations of "I rebooted and not all of my OSDs started." It's > > horrible to observe and horrible to debug. (2) It was based on GPT > > partitions, lots of people had block layer tools they wanted to use > > that were LVM-based, and the two didn't mix (no GPT partitions on top of > > LVs). > > > > - We designed ceph-volome to be *modular* because antipicate that there > > are going to be lots of ways that people provision the hardware devices > > that we need to consider. There are already two: legacy ceph-disk devices > > that are still in use and have GPT partitions (handled by 'simple'), and > > lvm. SPDK devices where we manage NVMe devices directly from userspace > > are on the immediate horizon--obviously LVM won't work there since the > > kernel isn't involved at all. We can add any other schemes we like. > > > > - If you don't like LVM (e.g., because you find that there is a measurable > > overhead), let's design a new approach! I wouldn't bother unless you can > > actually measure an impact. But if you can demonstrate a measurable cost, > > let's do it. > > > > - LVM was chosen as the default appraoch for new devices are a few > > reasons: > > - It allows you to attach arbitrary metadata do each device, like which > > cluster uuid it belongs to, which osd uuid it belongs to, which type of > > device it is (primary, db, wal, journal), any secrets needed to fetch it's > > decryption key from a keyserver (the mon by default), and so on. > > - One of the goals was to enable lvm-based block layer modules beneath > > OSDs (dm-cache). All of the other devicemapper-based tools we are > > aware of work with LVM. It was a hammer that hit all nails. > > > > - The 'simple' mode is the current 'out' that avoids using LVM if it's not > > an option for you. We only implemented scan and activate because that was > > all that we saw a current need for. It should be quite easy to add the > > ability to create new OSDs. > > > > I would caution you, though, that simple relies on a file in /etc/ceph > > that has the metadata about the devices. If you lose that file you need > > to have some way to rebuild it or we won't know what to do with your > > devices. > > That means you should make the devices self-describing in some > > way... not, say, a raw device with dm-crypt layered directly on top, or > > some other option that makes it impossible to tell what it is. As long as > > you can implement 'scan' and get any other info you need (e.g., whatever > > is necessary to fetch decryption keys) then great. > > 'scan' allows you to recreate that file from a data device or from > an OSD directory (e.g. /var/lib/ceph/osd/ceph-0/) > > So even in the case of disaster (or migrating) we can still get that > file again. This includes the ability to detect both ceph-disks's > encryption support > as well as regular OSDs. > > Do you mean that there might be situations where we 'scan' wouldn't be > able to recreate this file? I think the out would be if the OSD is > mounted/available already. Right, it works great for the GPT-style ceph-disk devices. I'm just cautioning that if someone wants to implement a *new* mode that doesn't use lvm or the legacy ceph-disk scheme and "uses raw devices for lower overhead" (whatever that ends up meaning), it should be done in a way such that scan can be implemented. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
On Fri, Jun 8, 2018 at 8:13 AM, Sage Weil wrote: > I'm going to jump in here with a few points. > > - ceph-disk was replaced for two reasons: (1) It's design was > centered around udev, and it was terrible. We have been plagued for years > with bugs due to race conditions in the udev-driven activation of OSDs, > mostly variations of "I rebooted and not all of my OSDs started." It's > horrible to observe and horrible to debug. (2) It was based on GPT > partitions, lots of people had block layer tools they wanted to use > that were LVM-based, and the two didn't mix (no GPT partitions on top of > LVs). > > - We designed ceph-volome to be *modular* because antipicate that there > are going to be lots of ways that people provision the hardware devices > that we need to consider. There are already two: legacy ceph-disk devices > that are still in use and have GPT partitions (handled by 'simple'), and > lvm. SPDK devices where we manage NVMe devices directly from userspace > are on the immediate horizon--obviously LVM won't work there since the > kernel isn't involved at all. We can add any other schemes we like. > > - If you don't like LVM (e.g., because you find that there is a measurable > overhead), let's design a new approach! I wouldn't bother unless you can > actually measure an impact. But if you can demonstrate a measurable cost, > let's do it. > > - LVM was chosen as the default appraoch for new devices are a few > reasons: > - It allows you to attach arbitrary metadata do each device, like which > cluster uuid it belongs to, which osd uuid it belongs to, which type of > device it is (primary, db, wal, journal), any secrets needed to fetch it's > decryption key from a keyserver (the mon by default), and so on. > - One of the goals was to enable lvm-based block layer modules beneath > OSDs (dm-cache). All of the other devicemapper-based tools we are > aware of work with LVM. It was a hammer that hit all nails. > > - The 'simple' mode is the current 'out' that avoids using LVM if it's not > an option for you. We only implemented scan and activate because that was > all that we saw a current need for. It should be quite easy to add the > ability to create new OSDs. > > I would caution you, though, that simple relies on a file in /etc/ceph > that has the metadata about the devices. If you lose that file you need > to have some way to rebuild it or we won't know what to do with your > devices. > That means you should make the devices self-describing in some > way... not, say, a raw device with dm-crypt layered directly on top, or > some other option that makes it impossible to tell what it is. As long as > you can implement 'scan' and get any other info you need (e.g., whatever > is necessary to fetch decryption keys) then great. 'scan' allows you to recreate that file from a data device or from an OSD directory (e.g. /var/lib/ceph/osd/ceph-0/) So even in the case of disaster (or migrating) we can still get that file again. This includes the ability to detect both ceph-disks's encryption support as well as regular OSDs. Do you mean that there might be situations where we 'scan' wouldn't be able to recreate this file? I think the out would be if the OSD is mounted/available already. > > sage > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
I'm going to jump in here with a few points. - ceph-disk was replaced for two reasons: (1) It's design was centered around udev, and it was terrible. We have been plagued for years with bugs due to race conditions in the udev-driven activation of OSDs, mostly variations of "I rebooted and not all of my OSDs started." It's horrible to observe and horrible to debug. (2) It was based on GPT partitions, lots of people had block layer tools they wanted to use that were LVM-based, and the two didn't mix (no GPT partitions on top of LVs). - We designed ceph-volome to be *modular* because antipicate that there are going to be lots of ways that people provision the hardware devices that we need to consider. There are already two: legacy ceph-disk devices that are still in use and have GPT partitions (handled by 'simple'), and lvm. SPDK devices where we manage NVMe devices directly from userspace are on the immediate horizon--obviously LVM won't work there since the kernel isn't involved at all. We can add any other schemes we like. - If you don't like LVM (e.g., because you find that there is a measurable overhead), let's design a new approach! I wouldn't bother unless you can actually measure an impact. But if you can demonstrate a measurable cost, let's do it. - LVM was chosen as the default appraoch for new devices are a few reasons: - It allows you to attach arbitrary metadata do each device, like which cluster uuid it belongs to, which osd uuid it belongs to, which type of device it is (primary, db, wal, journal), any secrets needed to fetch it's decryption key from a keyserver (the mon by default), and so on. - One of the goals was to enable lvm-based block layer modules beneath OSDs (dm-cache). All of the other devicemapper-based tools we are aware of work with LVM. It was a hammer that hit all nails. - The 'simple' mode is the current 'out' that avoids using LVM if it's not an option for you. We only implemented scan and activate because that was all that we saw a current need for. It should be quite easy to add the ability to create new OSDs. I would caution you, though, that simple relies on a file in /etc/ceph that has the metadata about the devices. If you lose that file you need to have some way to rebuild it or we won't know what to do with your devices. That means you should make the devices self-describing in some way... not, say, a raw device with dm-crypt layered directly on top, or some other option that makes it impossible to tell what it is. As long as you can implement 'scan' and get any other info you need (e.g., whatever is necessary to fetch decryption keys) then great. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
http://docs.ceph.com/docs/master/ceph-volume/simple/ ? Only 'scan' & 'activate'. Not 'create'. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
Den fre 8 juni 2018 kl 12:35 skrev Marc Roos : > > I am getting the impression that not everyone understands the subject > that has been raised here. > Or they do and they do not agree with your vision of how things should be done. That is a distinct possibility one has to consider when using someone elses code. I seem to recall you getting told what would be required in order to get ceph-disk back. The programmers are ceph are not willing to do the work for you, so if you want it you would have to put in the effort. I think the point where "more mails on the same subject" will have an effect is past now, regardless of if you are right or wrong in your stance. Time to decide if you can bear the horror of LVM or not. If not, then ceph might not be for you, and that is ok. No product or program will cover everyones needs, but can we stop rehashing it over and over? -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
> Answers: > - unify setup, support for crypto & more Unify setup by adding a dependency? There is / should be already support for crypto now, not? > - none Costs of lvm can be argued. Something to go through, is worse than nothing to go through. https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination https://hrcak.srce.hr/index.php?show=clanak_clanak_jezik=216661 If there is no cost, then there is no discussion. But I cannot believe there is no cost. And if there is a cost, then the reason for adding this cost should not be something like unify setup or crypto. Ceph has been around already a long time, and it does not look like its users had many problems. What about the clusters that have thousands of disks, have them migrate to lvm for just for fun? Direct disk access should stay. -Original Message- From: c...@jack.fr.eu.org [mailto:c...@jack.fr.eu.org] Sent: vrijdag 8 juni 2018 12:47 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) Beuh ... I have other questions: - why not use LVM, and stick with direct disk access ? - what are the cost of LVM (performance, latency etc) ? Answers: - unify setup, support for crypto & more - none Tldr: that technical choice is fine, nothing to argue about. On 06/08/2018 07:15 AM, Marc Roos wrote: > > I am getting the impression that not everyone understands the subject > that has been raised here. > > Why do osd's need to be via lvm, and why not stick with direct disk > access as it is now? > > - Bluestore is created to cut out some fs overhead, > - everywhere 10Gb is recommended because of better latency. (I even > posted here something to make ceph better performing with 1Gb eth, > disregarded because it would add complexity, fine, I can understand) > > And then because of some start-up/automation issues (because that is > the only thing being mentioned here for now), lets add the lvm tier? > Introducing a layer that is constantly there and adds some overhead > (maybe not that much) for every read and write operation? > > > > > > -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: vrijdag 8 juni 2018 12:14 > To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume > and lvm? (and just not stick with direct disk access) > > http://docs.ceph.com/docs/master/ceph-volume/simple/ > > ? > > > > From: ceph-users On Behalf Of > Konstantin Shalygin > Sent: 08 June 2018 11:11 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume > and lvm? (and just not stick with direct disk access) > > > > What is the reasoning behind switching to lvm? Does it make sense to > go > through (yet) another layer to access the disk? Why creating this > dependency and added complexity? It is fine as it is, or not? > > In fact, the question is why one tool is replaced by another without > saving functionality. > Why lvm, why not bcache? > > It seems to me that in the heads dev team someone has pushed the idea > that lvm solves all problems. > But this is also added the overhead, and since this is a kernel module > with a update we can get a performance drop, changes in module > settings, etc. > I understand that for Red Hat Storage this is a solution, but for a > community with different distributions and hardware this may be > superfluous. > I would like to get back possibility of preparing osd's with direct > access was restored, and let it not be the default. > Also this will save configurations for ceph-ansible. Actually I was > don't know what is create my osd's ceph-disk/ceph-volume or whatever > before this deprecation. > > > > > > k > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
Beuh ... I have other questions: - why not use LVM, and stick with direct disk access ? - what are the cost of LVM (performance, latency etc) ? Answers: - unify setup, support for crypto & more - none Tldr: that technical choice is fine, nothing to argue about. On 06/08/2018 07:15 AM, Marc Roos wrote: > > I am getting the impression that not everyone understands the subject > that has been raised here. > > Why do osd's need to be via lvm, and why not stick with direct disk > access as it is now? > > - Bluestore is created to cut out some fs overhead, > - everywhere 10Gb is recommended because of better latency. (I even > posted here something to make ceph better performing with 1Gb eth, > disregarded because it would add complexity, fine, I can understand) > > And then because of some start-up/automation issues (because that is the > only thing being mentioned here for now), lets add the lvm > tier? Introducing a layer that is constantly there and adds some > overhead (maybe not that much) for every read and write operation? > > > > > > -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: vrijdag 8 juni 2018 12:14 > To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume > and lvm? (and just not stick with direct disk access) > > http://docs.ceph.com/docs/master/ceph-volume/simple/ > > ? > > > > From: ceph-users On Behalf Of > Konstantin Shalygin > Sent: 08 June 2018 11:11 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume > and lvm? (and just not stick with direct disk access) > > > > What is the reasoning behind switching to lvm? Does it make sense > to go > through (yet) another layer to access the disk? Why creating this > dependency and added complexity? It is fine as it is, or not? > > In fact, the question is why one tool is replaced by another without > saving functionality. > Why lvm, why not bcache? > > It seems to me that in the heads dev team someone has pushed the idea > that lvm solves all problems. > But this is also added the overhead, and since this is a kernel module > with a update we can get a performance drop, changes in module settings, > etc. > I understand that for Red Hat Storage this is a solution, but for a > community with different distributions and hardware this may be > superfluous. > I would like to get back possibility of preparing osd's with direct > access was restored, and let it not be the default. > Also this will save configurations for ceph-ansible. Actually I was > don't know what is create my osd's ceph-disk/ceph-volume or whatever > before this deprecation. > > > > > > k > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
I am getting the impression that not everyone understands the subject that has been raised here. Why do osd's need to be via lvm, and why not stick with direct disk access as it is now? - Bluestore is created to cut out some fs overhead, - everywhere 10Gb is recommended because of better latency. (I even posted here something to make ceph better performing with 1Gb eth, disregarded because it would add complexity, fine, I can understand) And then because of some start-up/automation issues (because that is the only thing being mentioned here for now), lets add the lvm tier? Introducing a layer that is constantly there and adds some overhead (maybe not that much) for every read and write operation? -Original Message- From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: vrijdag 8 juni 2018 12:14 To: 'Konstantin Shalygin'; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) http://docs.ceph.com/docs/master/ceph-volume/simple/ ? From: ceph-users On Behalf Of Konstantin Shalygin Sent: 08 June 2018 11:11 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) What is the reasoning behind switching to lvm? Does it make sense to go through (yet) another layer to access the disk? Why creating this dependency and added complexity? It is fine as it is, or not? In fact, the question is why one tool is replaced by another without saving functionality. Why lvm, why not bcache? It seems to me that in the heads dev team someone has pushed the idea that lvm solves all problems. But this is also added the overhead, and since this is a kernel module with a update we can get a performance drop, changes in module settings, etc. I understand that for Red Hat Storage this is a solution, but for a community with different distributions and hardware this may be superfluous. I would like to get back possibility of preparing osd's with direct access was restored, and let it not be the default. Also this will save configurations for ceph-ansible. Actually I was don't know what is create my osd's ceph-disk/ceph-volume or whatever before this deprecation. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
http://docs.ceph.com/docs/master/ceph-volume/simple/ ? From: ceph-users On Behalf Of Konstantin Shalygin Sent: 08 June 2018 11:11 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) What is the reasoning behind switching to lvm? Does it make sense to go through (yet) another layer to access the disk? Why creating this dependency and added complexity? It is fine as it is, or not? In fact, the question is why one tool is replaced by another without saving functionality. Why lvm, why not bcache? It seems to me that in the heads dev team someone has pushed the idea that lvm solves all problems. But this is also added the overhead, and since this is a kernel module with a update we can get a performance drop, changes in module settings, etc. I understand that for Red Hat Storage this is a solution, but for a community with different distributions and hardware this may be superfluous. I would like to get back possibility of preparing osd's with direct access was restored, and let it not be the default. Also this will save configurations for ceph-ansible. Actually I was don't know what is create my osd's ceph-disk/ceph-volume or whatever before this deprecation. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
What is the reasoning behind switching to lvm? Does it make sense to go through (yet) another layer to access the disk? Why creating this dependency and added complexity? It is fine as it is, or not? In fact, the question is why one tool is replaced by another without saving functionality. Why lvm, why not bcache? It seems to me that in the heads dev team someone has pushed the idea that lvm solves all problems. But this is also added the overhead, and since this is a kernel module with a update we can get a performance drop, changes in module settings, etc. I understand that for Red Hat Storage this is a solution, but for a community with different distributions and hardware this may be superfluous. I would like to get back possibility of preparing osd's with direct access was restored, and let it not be the default. Also this will save configurations for ceph-ansible. Actually I was don't know what is create my osd's ceph-disk/ceph-volume or whatever before this deprecation. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
Yes it is indeed difficult to find a good balance between asking multiple things in one email and risk that not all are answered, or putting them as individual questions. -Original Message- From: David Turner [mailto:drakonst...@gmail.com] Sent: donderdag 31 mei 2018 23:50 To: Marc Roos Cc: ceph-users Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) You are also making this entire conversation INCREDIBLY difficult to follow by creating so many new email threads instead of sticking with one. On Thu, May 31, 2018 at 5:48 PM David Turner wrote: Your question assumes that ceph-disk was a good piece of software. It had a bug list a mile long and nobody working on it. A common example was how simple it was to mess up any part of the dozens of components that allowed an OSD to autostart on boot. One of the biggest problems was when ceph-disk was doing it's thing and an OSD would take longer than 3 minutes to start and ceph-disk would give up on it. That is a little bit about why a new solution was sought after and why ceph-disk is being removed entirely. LVM was a choice made to implement something other than partitions and udev magic while still incorporating the information still needed from all of that in a better solution. There has been a lot of talk about this on the ML. On Thu, May 31, 2018 at 5:23 PM Marc Roos wrote: What is the reasoning behind switching to lvm? Does it make sense to go through (yet) another layer to access the disk? Why creating this dependency and added complexity? It is fine as it is, or not? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
On Thu, May 31, 2018 at 10:33 PM, Marc Roos wrote: > > I actually tried to search the ML before bringing up this topic. Because > I do not get the logic choosing this direction. > > - Bluestore is created to cut out some fs overhead, > - everywhere 10Gb is recommended because of better latency. (I even > posted here something to make ceph better performing with 1Gb eth, > disregarded because it would add complexity, fine, I can understand) > > And then because of some start-up/automation issues, lets add the lvm > tier? Introducing a layer that is constantly there and adds some > overhead (maybe not that much) for every read and write operation? > > Is see ceph-disk as a tool to prepare the osd and the do the rest > myself. Without ceph-deploy or ansible, because I trust more what I see > I type than someone else scripted. I don’t have any startup problems. You can certainly do that with ceph-volume. You can create the OSD manually, and then add the information about your OSD (drives, locations, fsid, uuids, etc..) on /etc/ceph/osd/ This is how we are able to take over ceph-disk deployed OSDs See: http://docs.ceph.com/docs/master/ceph-volume/simple/scan/#scan > > Do assume I am not an expert in any field. But it is understandable that > having nothing between the disk access and something (lvm) should have a > performance penalty. > I know you can hack around nicely with disks and lvm, but those pro's > fall into the same category of questions people are suggesting related > to putting disks in raid. > > Let alone the risk that your are taking when there is going to be a > significant performance penalty: > https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination > https://hrcak.srce.hr/index.php?show=clanak_clanak_jezik=216661 > > > > -Original Message- > From: David Turner [mailto:drakonst...@gmail.com] > Sent: donderdag 31 mei 2018 23:48 > To: Marc Roos > Cc: ceph-users > Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume > and lvm? (and just not stick with direct disk access) > > Your question assumes that ceph-disk was a good piece of software. It > had a bug list a mile long and nobody working on it. A common example > was how simple it was to mess up any part of the dozens of components > that allowed an OSD to autostart on boot. One of the biggest problems > was when ceph-disk was doing it's thing and an OSD would take longer > than 3 minutes to start and ceph-disk would give up on it. > > That is a little bit about why a new solution was sought after and why > ceph-disk is being removed entirely. LVM was a choice made to implement > something other than partitions and udev magic while still incorporating > the information still needed from all of that in a better solution. > There has been a lot of talk about this on the ML. > > On Thu, May 31, 2018 at 5:23 PM Marc Roos > wrote: > > > > What is the reasoning behind switching to lvm? Does it make sense > to go > through (yet) another layer to access the disk? Why creating this > dependency and added complexity? It is fine as it is, or not? > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
You are also making this entire conversation INCREDIBLY difficult to follow by creating so many new email threads instead of sticking with one. On Thu, May 31, 2018 at 5:48 PM David Turner wrote: > Your question assumes that ceph-disk was a good piece of software. It had > a bug list a mile long and nobody working on it. A common example was how > simple it was to mess up any part of the dozens of components that allowed > an OSD to autostart on boot. One of the biggest problems was when > ceph-disk was doing it's thing and an OSD would take longer than 3 minutes > to start and ceph-disk would give up on it. > > That is a little bit about why a new solution was sought after and why > ceph-disk is being removed entirely. LVM was a choice made to implement > something other than partitions and udev magic while still incorporating > the information still needed from all of that in a better solution. There > has been a lot of talk about this on the ML. > > On Thu, May 31, 2018 at 5:23 PM Marc Roos > wrote: > >> >> What is the reasoning behind switching to lvm? Does it make sense to go >> through (yet) another layer to access the disk? Why creating this >> dependency and added complexity? It is fine as it is, or not? >> >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)
Your question assumes that ceph-disk was a good piece of software. It had a bug list a mile long and nobody working on it. A common example was how simple it was to mess up any part of the dozens of components that allowed an OSD to autostart on boot. One of the biggest problems was when ceph-disk was doing it's thing and an OSD would take longer than 3 minutes to start and ceph-disk would give up on it. That is a little bit about why a new solution was sought after and why ceph-disk is being removed entirely. LVM was a choice made to implement something other than partitions and udev magic while still incorporating the information still needed from all of that in a better solution. There has been a lot of talk about this on the ML. On Thu, May 31, 2018 at 5:23 PM Marc Roos wrote: > > What is the reasoning behind switching to lvm? Does it make sense to go > through (yet) another layer to access the disk? Why creating this > dependency and added complexity? It is fine as it is, or not? > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com