Re: [ceph-users] Ceph QoS user stories

2016-12-03 Thread Ning Yao
Hi Sage,

I think we can refactor the io priority strategy at the same time
based on our consideration below?

2016-12-03 17:21 GMT+08:00 Ning Yao :
> Hi, all
>
> Currently,  we can modify osd_client_op_priority to assign different
> clients' ops with different priority such like we can assign high
> priority for OLTP and assign low priority for OLAP. However, there are
> some consideration as below:
>
> 1) it seems OLTP's client op still can be blocked by OLAP's sub_ops
> since sub_ops use the CEPH_MSG_PRIO_DEFAULT.  So should we consider
> sub_op should inherit the message's priority from client Ops (if
> client ops do not give priority  explicitly, use CEPH_MSG_PRIO_DEFAULT
> by default), does this make sense?
>
> 2) secondly, reply message is assigned with
> priority(CEPH_MSG_PRIO_HIGH), but there is no restriction for client
> Ops' priority (use can set 210), which will lead to blocked for reply
> message. So should we think change those kind of message to highest
> priority(CEPH_MSG_PRIO_HIGHEST). Currently, it seems no ops use
> CEPH_MSG_PRIO_HIGHEST.
>
> 3) I think the kick recovery ops should inherit the client ops priority
>
> 4) Is that possible to add test cases to verify whether it works
> properly as expected in ceph-qa-suite as Sam mentioned before? Any
> guidelines?
Regards
Ning Yao


2016-12-03 3:01 GMT+08:00 Sage Weil :
> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Federico Lucifredi
Hi Sage,

 The primary QoS issue we see with OpenStack users is wanting to
guarantee minimum IOPS to each Cinder-mounted RBD volume as a way to
guarantee the health of well-mannered workloads against badly-behaving
ones.

 As an OpenStack Administrator, I want to guarantee a minimum number
of IOPS to each Cinder volume to prevent any tenant from interfering
with another.

  The number of IOPS may vary per volume, but in many cases a
"standard" and "high" number would probably suffice. The guarantee is
more important than the granularity.

  This is something impacting users at today's Ceph performance level.

  Looking at the future, once Bluestore becomes the default,  there
will also be latency requirements from the crowd that wants to run
databases with RBD backends — both low latency and low jitter in the
latency, but rather than applying that to all volumes, it will be only
to select ones backing RDBMs. Well, at least in the case of a general
purpose cluster.


 My hunch is that Enterprise users that want hard-QoS guarantees will
accept that a capacity planning exercise is necessary, software can
only allocate existing capacity, not create more. Community users may
value more some "fairness" in distributing existing resources instead.
Just a hunch at this point.

 Best -F

_
-- "You must try until your brain hurts —Elon Musk
(Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C

On Fri, Dec 2, 2016 at 2:01 PM, Sage Weil  wrote:
>
> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Federico Lucifredi
Hi Sage,

 The primary QoS issue we see with OpenStack users is wanting to guarantee
minimum IOPS to each Cinder-mounted RBD volume as a way to guarantee the
health of well-mannered workloads against badly-behaving ones.

 As an OpenStack Administrator, I want to guarantee a minimum number of
IOPS to each Cinder volume to prevent any tenant from interfering with
another.

  The number of IOPS may vary per volume, but in many cases a "standard"
and "high" number would probably suffice. The guarantee is more important
than the granularity.

  This is something impacting users at today's Ceph performance level.

  Looking at the future, once Bluestore becomes the default,  there will
also be latency requirements from the crowd that wants to run databases
with RBD backends — both low latency and low jitter in the latency, but
rather than applying that to all volumes, it will be only to select ones
backing RDBMs. Well, at least in the case of a general purpose cluster.


 My hunch is that Enterprise users that want hard-QoS guarantees will
accept that a capacity planning exercise is necessary, software can only
allocate existing capacity, not create more. Community users may value more
some "fairness" in distributing existing resources instead. Just a hunch at
this point.

 Best -F

_
-- "You must try until your brain hurts —Elon Musk
(Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C

On Fri, Dec 2, 2016 at 2:01 PM, Sage Weil  wrote:

> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/
> Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
> Weil
> Sent: 02 December 2016 19:02
> To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
> Subject: [ceph-users] Ceph QoS user stories
> 
> Hi all,
> 
> We're working on getting infrasture into RADOS to allow for proper 
> distributed quality-of-service guarantees.  The work is based
on
> the mclock paper published in OSDI'10
> 
>   https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
> 
> There are a few ways this can be applied:
> 
>  - We can use mclock simply as a better way to prioritize background activity 
> (scrub, snap trimming, recovery, rebalancing)
against
> client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or proportional 
> priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for individual 
> clients.
> 
> Once the rados capabilities are in place, there will be a significant amount 
> of effort needed to get all of the APIs in place to
configure
> and set policy.  In order to make sure we build somethign that makes sense, 
> I'd like to collection a set of user stores that we'd
like to
> support so that we can make sure we capture everything (or at least the 
> important things).
> 
> Please add any use-cases that are important to you to this pad:
> 
>   http://pad.ceph.com/p/qos-user-stories
> 
> or as a follow-up to this email.
> 
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they are 
> sort of reduced into a single unit of work), a
maximum
> (i.e. simple cap), and a proportional weighting (to allocation any additional 
> capacity after the minimum allocations are
satisfied).  It's
> somewhat flexible in terms of how we apply it to specific clients, classes of 
> clients, or types of work (e.g., recovery).  How we
put it all
> together really depends on what kinds of things we need to accomplish (e.g., 
> do we need to support a guaranteed level of service
> shared across a specific set of N different clients, or only individual 
> clients?).
> 
> Thanks!
> sage
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Sage,

You mention IOPs and Bandwidth but would this be applicable to latency as well? 
Some client operations (buffered IO) can hit several
hundered iops with terrible latency if queue depth is high enough. When the 
intended requirement might have been to have a more
responsive application.

Would it be possible to apply some sort of shares system to the minimum 
allocation. Ie, in the event not all allocations can be met,
will it gracefully try to balance available resources or will it completely 
starve some clients. Maybe partial loss of cluster has
caused performance drop, or user has set read latency to 1ms on a disk based 
cluster. Is this a tuneable parameter, deadline vs
sharesetc

I can think of a number of scenarios where QOS may help and how it might be 
applied. Hope they are of some use.

1. Min iop/bandwith/latency for important vm. Probably settable on a per RBD 
basis. Can maybe have an inheritable default from Rados
pool, or customised to allow to offer bronze/silver/gold service levels.

2. Max iop/bandwith to limit noisy clients, but with option for over allocation 
if free resources available

3. Min Bandwidth for streaming to tape. Again set per RBD or RBD snapshot. 
Would help filter out the impact of clients emptying
their buffered writes, as small drops in performance massively effect 
continuous streaming of tape.

4. Ability to QOS either reads or writes. Eg SQL DB's will benefit from fast 
consistent sync write latency. But actual write
throughput is fairly small and coalesces well. Being able to make sure all 
writes jump to front of queue would ensure good
performance.

5. If size < min_size I want recovery to take very high priority as ops might 
be blocked

6. There probably needs to be some sort of reporting to go along with this to 
be able to see which targets are being missed/met. I
guess this needs some sort or "ceph top" or "rbd top" before it can be 
implemented?

7. Currently a RBD with a snapshot can overload a cluster if you do lots of 
small random writes to the parent. COW causes massive
write amplification. If QOS was set on the parent, how are these COW writes 
taken into account?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph QoS user stories

2016-12-02 Thread Sage Weil
Hi all,

We're working on getting infrasture into RADOS to allow for proper 
distributed quality-of-service guarantees.  The work is based on the 
mclock paper published in OSDI'10

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf

There are a few ways this can be applied:

 - We can use mclock simply as a better way to prioritize background 
activity (scrub, snap trimming, recovery, rebalancing) against client IO.
 - We can use d-mclock to set QoS parameters (e.g., min IOPS or 
proportional priority/weight) on RADOS pools
 - We can use d-mclock to set QoS parameters (e.g., min IOPS) for 
individual clients.

Once the rados capabilities are in place, there will be a significant 
amount of effort needed to get all of the APIs in place to configure and 
set policy.  In order to make sure we build somethign that makes sense, 
I'd like to collection a set of user stores that we'd like to support so 
that we can make sure we capture everything (or at least the important 
things).

Please add any use-cases that are important to you to this pad:

http://pad.ceph.com/p/qos-user-stories

or as a follow-up to this email.

mClock works in terms of a minimum allocation (of IOPS or bandwidth; they 
are sort of reduced into a single unit of work), a maximum (i.e. simple 
cap), and a proportional weighting (to allocation any additional capacity 
after the minimum allocations are satisfied).  It's somewhat flexible in 
terms of how we apply it to specific clients, classes of clients, or types 
of work (e.g., recovery).  How we put it all together really depends on 
what kinds of things we need to accomplish (e.g., do we need to support a 
guaranteed level of service shared across a specific set of N different 
clients, or only individual clients?).

Thanks!
sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com