Re: [ceph-users] Ceph QoS user stories
Hi Sage, I think we can refactor the io priority strategy at the same time based on our consideration below? 2016-12-03 17:21 GMT+08:00 Ning Yao: > Hi, all > > Currently, we can modify osd_client_op_priority to assign different > clients' ops with different priority such like we can assign high > priority for OLTP and assign low priority for OLAP. However, there are > some consideration as below: > > 1) it seems OLTP's client op still can be blocked by OLAP's sub_ops > since sub_ops use the CEPH_MSG_PRIO_DEFAULT. So should we consider > sub_op should inherit the message's priority from client Ops (if > client ops do not give priority explicitly, use CEPH_MSG_PRIO_DEFAULT > by default), does this make sense? > > 2) secondly, reply message is assigned with > priority(CEPH_MSG_PRIO_HIGH), but there is no restriction for client > Ops' priority (use can set 210), which will lead to blocked for reply > message. So should we think change those kind of message to highest > priority(CEPH_MSG_PRIO_HIGHEST). Currently, it seems no ops use > CEPH_MSG_PRIO_HIGHEST. > > 3) I think the kick recovery ops should inherit the client ops priority > > 4) Is that possible to add test cases to verify whether it works > properly as expected in ceph-qa-suite as Sam mentioned before? Any > guidelines? Regards Ning Yao 2016-12-03 3:01 GMT+08:00 Sage Weil : > Hi all, > > We're working on getting infrasture into RADOS to allow for proper > distributed quality-of-service guarantees. The work is based on the > mclock paper published in OSDI'10 > > https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf > > There are a few ways this can be applied: > > - We can use mclock simply as a better way to prioritize background > activity (scrub, snap trimming, recovery, rebalancing) against client IO. > - We can use d-mclock to set QoS parameters (e.g., min IOPS or > proportional priority/weight) on RADOS pools > - We can use d-mclock to set QoS parameters (e.g., min IOPS) for > individual clients. > > Once the rados capabilities are in place, there will be a significant > amount of effort needed to get all of the APIs in place to configure and > set policy. In order to make sure we build somethign that makes sense, > I'd like to collection a set of user stores that we'd like to support so > that we can make sure we capture everything (or at least the important > things). > > Please add any use-cases that are important to you to this pad: > > http://pad.ceph.com/p/qos-user-stories > > or as a follow-up to this email. > > mClock works in terms of a minimum allocation (of IOPS or bandwidth; they > are sort of reduced into a single unit of work), a maximum (i.e. simple > cap), and a proportional weighting (to allocation any additional capacity > after the minimum allocations are satisfied). It's somewhat flexible in > terms of how we apply it to specific clients, classes of clients, or types > of work (e.g., recovery). How we put it all together really depends on > what kinds of things we need to accomplish (e.g., do we need to support a > guaranteed level of service shared across a specific set of N different > clients, or only individual clients?). > > Thanks! > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph QoS user stories
Hi Sage, The primary QoS issue we see with OpenStack users is wanting to guarantee minimum IOPS to each Cinder-mounted RBD volume as a way to guarantee the health of well-mannered workloads against badly-behaving ones. As an OpenStack Administrator, I want to guarantee a minimum number of IOPS to each Cinder volume to prevent any tenant from interfering with another. The number of IOPS may vary per volume, but in many cases a "standard" and "high" number would probably suffice. The guarantee is more important than the granularity. This is something impacting users at today's Ceph performance level. Looking at the future, once Bluestore becomes the default, there will also be latency requirements from the crowd that wants to run databases with RBD backends — both low latency and low jitter in the latency, but rather than applying that to all volumes, it will be only to select ones backing RDBMs. Well, at least in the case of a general purpose cluster. My hunch is that Enterprise users that want hard-QoS guarantees will accept that a capacity planning exercise is necessary, software can only allocate existing capacity, not create more. Community users may value more some "fairness" in distributing existing resources instead. Just a hunch at this point. Best -F _ -- "You must try until your brain hurts —Elon Musk (Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C On Fri, Dec 2, 2016 at 2:01 PM, Sage Weilwrote: > > Hi all, > > We're working on getting infrasture into RADOS to allow for proper > distributed quality-of-service guarantees. The work is based on the > mclock paper published in OSDI'10 > > https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf > > There are a few ways this can be applied: > > - We can use mclock simply as a better way to prioritize background > activity (scrub, snap trimming, recovery, rebalancing) against client IO. > - We can use d-mclock to set QoS parameters (e.g., min IOPS or > proportional priority/weight) on RADOS pools > - We can use d-mclock to set QoS parameters (e.g., min IOPS) for > individual clients. > > Once the rados capabilities are in place, there will be a significant > amount of effort needed to get all of the APIs in place to configure and > set policy. In order to make sure we build somethign that makes sense, > I'd like to collection a set of user stores that we'd like to support so > that we can make sure we capture everything (or at least the important > things). > > Please add any use-cases that are important to you to this pad: > > http://pad.ceph.com/p/qos-user-stories > > or as a follow-up to this email. > > mClock works in terms of a minimum allocation (of IOPS or bandwidth; they > are sort of reduced into a single unit of work), a maximum (i.e. simple > cap), and a proportional weighting (to allocation any additional capacity > after the minimum allocations are satisfied). It's somewhat flexible in > terms of how we apply it to specific clients, classes of clients, or types > of work (e.g., recovery). How we put it all together really depends on > what kinds of things we need to accomplish (e.g., do we need to support a > guaranteed level of service shared across a specific set of N different > clients, or only individual clients?). > > Thanks! > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph QoS user stories
Hi Sage, The primary QoS issue we see with OpenStack users is wanting to guarantee minimum IOPS to each Cinder-mounted RBD volume as a way to guarantee the health of well-mannered workloads against badly-behaving ones. As an OpenStack Administrator, I want to guarantee a minimum number of IOPS to each Cinder volume to prevent any tenant from interfering with another. The number of IOPS may vary per volume, but in many cases a "standard" and "high" number would probably suffice. The guarantee is more important than the granularity. This is something impacting users at today's Ceph performance level. Looking at the future, once Bluestore becomes the default, there will also be latency requirements from the crowd that wants to run databases with RBD backends — both low latency and low jitter in the latency, but rather than applying that to all volumes, it will be only to select ones backing RDBMs. Well, at least in the case of a general purpose cluster. My hunch is that Enterprise users that want hard-QoS guarantees will accept that a capacity planning exercise is necessary, software can only allocate existing capacity, not create more. Community users may value more some "fairness" in distributing existing resources instead. Just a hunch at this point. Best -F _ -- "You must try until your brain hurts —Elon Musk (Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C On Fri, Dec 2, 2016 at 2:01 PM, Sage Weilwrote: > Hi all, > > We're working on getting infrasture into RADOS to allow for proper > distributed quality-of-service guarantees. The work is based on the > mclock paper published in OSDI'10 > > https://www.usenix.org/legacy/event/osdi10/tech/full_papers/ > Gulati.pdf > > There are a few ways this can be applied: > > - We can use mclock simply as a better way to prioritize background > activity (scrub, snap trimming, recovery, rebalancing) against client IO. > - We can use d-mclock to set QoS parameters (e.g., min IOPS or > proportional priority/weight) on RADOS pools > - We can use d-mclock to set QoS parameters (e.g., min IOPS) for > individual clients. > > Once the rados capabilities are in place, there will be a significant > amount of effort needed to get all of the APIs in place to configure and > set policy. In order to make sure we build somethign that makes sense, > I'd like to collection a set of user stores that we'd like to support so > that we can make sure we capture everything (or at least the important > things). > > Please add any use-cases that are important to you to this pad: > > http://pad.ceph.com/p/qos-user-stories > > or as a follow-up to this email. > > mClock works in terms of a minimum allocation (of IOPS or bandwidth; they > are sort of reduced into a single unit of work), a maximum (i.e. simple > cap), and a proportional weighting (to allocation any additional capacity > after the minimum allocations are satisfied). It's somewhat flexible in > terms of how we apply it to specific clients, classes of clients, or types > of work (e.g., recovery). How we put it all together really depends on > what kinds of things we need to accomplish (e.g., do we need to support a > guaranteed level of service shared across a specific set of N different > clients, or only individual clients?). > > Thanks! > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph QoS user stories
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage > Weil > Sent: 02 December 2016 19:02 > To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com > Subject: [ceph-users] Ceph QoS user stories > > Hi all, > > We're working on getting infrasture into RADOS to allow for proper > distributed quality-of-service guarantees. The work is based on > the mclock paper published in OSDI'10 > > https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf > > There are a few ways this can be applied: > > - We can use mclock simply as a better way to prioritize background activity > (scrub, snap trimming, recovery, rebalancing) against > client IO. > - We can use d-mclock to set QoS parameters (e.g., min IOPS or proportional > priority/weight) on RADOS pools > - We can use d-mclock to set QoS parameters (e.g., min IOPS) for individual > clients. > > Once the rados capabilities are in place, there will be a significant amount > of effort needed to get all of the APIs in place to configure > and set policy. In order to make sure we build somethign that makes sense, > I'd like to collection a set of user stores that we'd like to > support so that we can make sure we capture everything (or at least the > important things). > > Please add any use-cases that are important to you to this pad: > > http://pad.ceph.com/p/qos-user-stories > > or as a follow-up to this email. > > mClock works in terms of a minimum allocation (of IOPS or bandwidth; they are > sort of reduced into a single unit of work), a maximum > (i.e. simple cap), and a proportional weighting (to allocation any additional > capacity after the minimum allocations are satisfied). It's > somewhat flexible in terms of how we apply it to specific clients, classes of > clients, or types of work (e.g., recovery). How we put it all > together really depends on what kinds of things we need to accomplish (e.g., > do we need to support a guaranteed level of service > shared across a specific set of N different clients, or only individual > clients?). > > Thanks! > sage > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi Sage, You mention IOPs and Bandwidth but would this be applicable to latency as well? Some client operations (buffered IO) can hit several hundered iops with terrible latency if queue depth is high enough. When the intended requirement might have been to have a more responsive application. Would it be possible to apply some sort of shares system to the minimum allocation. Ie, in the event not all allocations can be met, will it gracefully try to balance available resources or will it completely starve some clients. Maybe partial loss of cluster has caused performance drop, or user has set read latency to 1ms on a disk based cluster. Is this a tuneable parameter, deadline vs sharesetc I can think of a number of scenarios where QOS may help and how it might be applied. Hope they are of some use. 1. Min iop/bandwith/latency for important vm. Probably settable on a per RBD basis. Can maybe have an inheritable default from Rados pool, or customised to allow to offer bronze/silver/gold service levels. 2. Max iop/bandwith to limit noisy clients, but with option for over allocation if free resources available 3. Min Bandwidth for streaming to tape. Again set per RBD or RBD snapshot. Would help filter out the impact of clients emptying their buffered writes, as small drops in performance massively effect continuous streaming of tape. 4. Ability to QOS either reads or writes. Eg SQL DB's will benefit from fast consistent sync write latency. But actual write throughput is fairly small and coalesces well. Being able to make sure all writes jump to front of queue would ensure good performance. 5. If size < min_size I want recovery to take very high priority as ops might be blocked 6. There probably needs to be some sort of reporting to go along with this to be able to see which targets are being missed/met. I guess this needs some sort or "ceph top" or "rbd top" before it can be implemented? 7. Currently a RBD with a snapshot can overload a cluster if you do lots of small random writes to the parent. COW causes massive write amplification. If QOS was set on the parent, how are these COW writes taken into account? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph QoS user stories
Hi all, We're working on getting infrasture into RADOS to allow for proper distributed quality-of-service guarantees. The work is based on the mclock paper published in OSDI'10 https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf There are a few ways this can be applied: - We can use mclock simply as a better way to prioritize background activity (scrub, snap trimming, recovery, rebalancing) against client IO. - We can use d-mclock to set QoS parameters (e.g., min IOPS or proportional priority/weight) on RADOS pools - We can use d-mclock to set QoS parameters (e.g., min IOPS) for individual clients. Once the rados capabilities are in place, there will be a significant amount of effort needed to get all of the APIs in place to configure and set policy. In order to make sure we build somethign that makes sense, I'd like to collection a set of user stores that we'd like to support so that we can make sure we capture everything (or at least the important things). Please add any use-cases that are important to you to this pad: http://pad.ceph.com/p/qos-user-stories or as a follow-up to this email. mClock works in terms of a minimum allocation (of IOPS or bandwidth; they are sort of reduced into a single unit of work), a maximum (i.e. simple cap), and a proportional weighting (to allocation any additional capacity after the minimum allocations are satisfied). It's somewhat flexible in terms of how we apply it to specific clients, classes of clients, or types of work (e.g., recovery). How we put it all together really depends on what kinds of things we need to accomplish (e.g., do we need to support a guaranteed level of service shared across a specific set of N different clients, or only individual clients?). Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com