Re: [PATCH] SUNRPC: restore fair scheduling to priority queues.

2016-02-09 Thread NeilBrown
On Wed, Feb 10 2016, NeilBrown wrote:

> On Sun, Dec 27 2015, Trond Myklebust wrote:
>
>> On Tue, Dec 15, 2015 at 10:10 PM, NeilBrown  wrote:
>>> If you treated all reads and writed the same, then I can't see value in
>>> restoring fair scheduling.   If there is any difference, then I suspect
>>> we do need the fairness.
>>
>> I disagree. Reclaiming memory should always be able to pre-empt
>> "interactive" features such as read. Everything goes down the toilet
>> when we force the kernel into situations where it needs to swap.
>
> That's your call I guess.  I certainly agree that memory-reclaim writes
> should get some priority (e.g. two writes serviced for every read).
> Whether they should be allowed to completely block reads I'm less sure
> of.  But it is probably purely academic as if the system is busy
> reclaiming you are unlikely to have any reads to want to send.
>
> My problem would be solved (I think) by treating reads and non-reclaim
> writes as equals.  I'll make a patch, see if I can test it, and let you
> know.

ahh I just discovered
Commit: b0ac1bd2bbfd ("NFS: Background flush should not be low priority")

I didn't notice that before.  I suspect that will fix the problem -
thanks.

I'll try testing and let you know if there is a problem.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] SUNRPC: restore fair scheduling to priority queues.

2016-02-09 Thread NeilBrown
On Sun, Dec 27 2015, Trond Myklebust wrote:

> On Tue, Dec 15, 2015 at 10:10 PM, NeilBrown  wrote:
>> On Wed, Dec 16 2015, Trond Myklebust wrote:
>>
>>> On Tue, Dec 15, 2015 at 6:44 PM, NeilBrown  wrote:

 Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt 
 higher priority ones")

 removed the 'fair scheduling' feature from SUNRPC priority queues.
 This feature caused problems for some queues (send queue and session slot 
 queue)
 but is still needed for others, particularly the tcp slot queue.

 Without fairness, reads (priority 1) can starve background writes
 (priority 0) so a streaming read can cause writeback to block
 indefinitely.  This is not easy to measure with default settings as
 the current slot table size is much larger than the read-ahead size.
 However if the slot-table size is reduced (seen when backporting to
 older kernels with a limited size) the problem is easily demonstrated.

 This patch conditionally restores fair scheduling.  It is now the
 default unless rpc_sleep_on_priority() is called directly.  Then the
 queue switches to strict priority observance.

 As that function is called for both the send queue and the session
 slot queue and not for any others, this has exactly the desired
 effect.

 The "count" field that was removed by the previous patch is restored.
 A value for '255' means "strict priority queuing, no fair queuing".
 Any other value is a could of owners to be processed before switching
 to a different priority level, just like before.
> 
>>> Are we sure there is value in keeping FLUSH_LOWPRI for background writes?
>>
>> There is currently also FLUSH_HIGHPRI for "for_reclaim" writes.
>> Should they be allowed to starve reads?
>>
>> If you treated all reads and writed the same, then I can't see value in
>> restoring fair scheduling.   If there is any difference, then I suspect
>> we do need the fairness.
>
> I disagree. Reclaiming memory should always be able to pre-empt
> "interactive" features such as read. Everything goes down the toilet
> when we force the kernel into situations where it needs to swap.

That's your call I guess.  I certainly agree that memory-reclaim writes
should get some priority (e.g. two writes serviced for every read).
Whether they should be allowed to completely block reads I'm less sure
of.  But it is probably purely academic as if the system is busy
reclaiming you are unlikely to have any reads to want to send.

My problem would be solved (I think) by treating reads and non-reclaim
writes as equals.  I'll make a patch, see if I can test it, and let you
know.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] SUNRPC: restore fair scheduling to priority queues.

2015-12-26 Thread Trond Myklebust
On Tue, Dec 15, 2015 at 10:10 PM, NeilBrown  wrote:
> On Wed, Dec 16 2015, Trond Myklebust wrote:
>
>> On Tue, Dec 15, 2015 at 6:44 PM, NeilBrown  wrote:
>>>
>>> Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt 
>>> higher priority ones")
>>>
>>> removed the 'fair scheduling' feature from SUNRPC priority queues.
>>> This feature caused problems for some queues (send queue and session slot 
>>> queue)
>>> but is still needed for others, particularly the tcp slot queue.
>>>
>>> Without fairness, reads (priority 1) can starve background writes
>>> (priority 0) so a streaming read can cause writeback to block
>>> indefinitely.  This is not easy to measure with default settings as
>>> the current slot table size is much larger than the read-ahead size.
>>> However if the slot-table size is reduced (seen when backporting to
>>> older kernels with a limited size) the problem is easily demonstrated.
>>>
>>> This patch conditionally restores fair scheduling.  It is now the
>>> default unless rpc_sleep_on_priority() is called directly.  Then the
>>> queue switches to strict priority observance.
>>>
>>> As that function is called for both the send queue and the session
>>> slot queue and not for any others, this has exactly the desired
>>> effect.
>>>
>>> The "count" field that was removed by the previous patch is restored.
>>> A value for '255' means "strict priority queuing, no fair queuing".
>>> Any other value is a could of owners to be processed before switching
>>> to a different priority level, just like before.

>> Are we sure there is value in keeping FLUSH_LOWPRI for background writes?
>
> There is currently also FLUSH_HIGHPRI for "for_reclaim" writes.
> Should they be allowed to starve reads?
>
> If you treated all reads and writed the same, then I can't see value in
> restoring fair scheduling.   If there is any difference, then I suspect
> we do need the fairness.

I disagree. Reclaiming memory should always be able to pre-empt
"interactive" features such as read. Everything goes down the toilet
when we force the kernel into situations where it needs to swap.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SUNRPC: restore fair scheduling to priority queues.

2015-12-15 Thread NeilBrown
On Wed, Dec 16 2015, Trond Myklebust wrote:

> On Tue, Dec 15, 2015 at 6:44 PM, NeilBrown  wrote:
>>
>> Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt 
>> higher priority ones")
>>
>> removed the 'fair scheduling' feature from SUNRPC priority queues.
>> This feature caused problems for some queues (send queue and session slot 
>> queue)
>> but is still needed for others, particularly the tcp slot queue.
>>
>> Without fairness, reads (priority 1) can starve background writes
>> (priority 0) so a streaming read can cause writeback to block
>> indefinitely.  This is not easy to measure with default settings as
>> the current slot table size is much larger than the read-ahead size.
>> However if the slot-table size is reduced (seen when backporting to
>> older kernels with a limited size) the problem is easily demonstrated.
>>
>> This patch conditionally restores fair scheduling.  It is now the
>> default unless rpc_sleep_on_priority() is called directly.  Then the
>> queue switches to strict priority observance.
>>
>> As that function is called for both the send queue and the session
>> slot queue and not for any others, this has exactly the desired
>> effect.
>>
>> The "count" field that was removed by the previous patch is restored.
>> A value for '255' means "strict priority queuing, no fair queuing".
>> Any other value is a could of owners to be processed before switching
>> to a different priority level, just like before.
>>
>> Signed-off-by: NeilBrown 
>> ---
>>
>> It is quite possible that you won't like the overloading of
>> rpc_sleep_on_priority() to disable fair-scheduling and would prefer an
>> extra arg to rpc_init_priority_wait_queue().  I can do it that way if
>> you like.
>> NeilBrown
>>
>>
>>  include/linux/sunrpc/sched.h |  1 +
>>  net/sunrpc/sched.c   | 12 +---
>>  2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>> index d703f0ef37d8..985efe8d7e26 100644
>> --- a/include/linux/sunrpc/sched.h
>> +++ b/include/linux/sunrpc/sched.h
>> @@ -184,6 +184,7 @@ struct rpc_wait_queue {
>> pid_t   owner;  /* process id of 
>> last task serviced */
>> unsigned char   maxpriority;/* maximum priority 
>> (0 if queue is not a priority queue) */
>> unsigned char   priority;   /* current priority 
>> */
>> +   unsigned char   count;  /* # task groups 
>> remaining to be serviced */
>> unsigned char   nr; /* # tasks remaining 
>> for cookie */
>> unsigned short  qlen;   /* total # tasks 
>> waiting in queue */
>> struct rpc_timertimer_list;
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 73ad57a59989..e8fcd4f098bb 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -117,6 +117,8 @@ static void rpc_set_waitqueue_priority(struct 
>> rpc_wait_queue *queue, int priorit
>> rpc_rotate_queue_owner(queue);
>> queue->priority = priority;
>> }
>> +   if (queue->count != 255)
>> +   queue->count = 1 << (priority * 2);
>>  }
>>
>>  static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid)
>> @@ -144,8 +146,10 @@ static void __rpc_add_wait_queue_priority(struct 
>> rpc_wait_queue *queue,
>> INIT_LIST_HEAD(&task->u.tk_wait.links);
>> if (unlikely(queue_priority > queue->maxpriority))
>> queue_priority = queue->maxpriority;
>> -   if (queue_priority > queue->priority)
>> -   rpc_set_waitqueue_priority(queue, queue_priority);
>> +   if (queue->count == 255) {
>> +   if (queue_priority > queue->priority)
>> +   rpc_set_waitqueue_priority(queue, queue_priority);
>> +   }
>> q = &queue->tasks[queue_priority];
>> list_for_each_entry(t, q, u.tk_wait.list) {
>> if (t->tk_owner == task->tk_owner) {
>> @@ -401,6 +405,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *q, 
>> struct rpc_task *task,
>>  * Protect the queue operations.
>>  */
>> spin_lock_bh(&q->lock);
>> +   q->count = 255;
>> __rpc_sleep_on_priority(q, task, action, priority - 
>> RPC_PRIORITY_LOW);
>> spin_unlock_bh(&q->lock);
>>  }
>> @@ -478,7 +483,8 @@ static struct rpc_task 
>> *__rpc_find_next_queued_priority(struct rpc_wait_queue *q
>> /*
>>  * Check if we need to switch queues.
>>  */
>> -   goto new_owner;
>> +   if (queue->count == 255 || --queue->count)
>> +   goto new_owner;
>> }
>>
>> /*
>>
>
> Are we sure there is value in keeping FLUSH_LOWPRI for background writes?

There is currently also FLUSH_HIGHPRI for "for_reclaim" writes.
Should they be allowed

Re: [PATCH] SUNRPC: restore fair scheduling to priority queues.

2015-12-15 Thread Trond Myklebust
On Tue, Dec 15, 2015 at 6:44 PM, NeilBrown  wrote:
>
> Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt 
> higher priority ones")
>
> removed the 'fair scheduling' feature from SUNRPC priority queues.
> This feature caused problems for some queues (send queue and session slot 
> queue)
> but is still needed for others, particularly the tcp slot queue.
>
> Without fairness, reads (priority 1) can starve background writes
> (priority 0) so a streaming read can cause writeback to block
> indefinitely.  This is not easy to measure with default settings as
> the current slot table size is much larger than the read-ahead size.
> However if the slot-table size is reduced (seen when backporting to
> older kernels with a limited size) the problem is easily demonstrated.
>
> This patch conditionally restores fair scheduling.  It is now the
> default unless rpc_sleep_on_priority() is called directly.  Then the
> queue switches to strict priority observance.
>
> As that function is called for both the send queue and the session
> slot queue and not for any others, this has exactly the desired
> effect.
>
> The "count" field that was removed by the previous patch is restored.
> A value for '255' means "strict priority queuing, no fair queuing".
> Any other value is a could of owners to be processed before switching
> to a different priority level, just like before.
>
> Signed-off-by: NeilBrown 
> ---
>
> It is quite possible that you won't like the overloading of
> rpc_sleep_on_priority() to disable fair-scheduling and would prefer an
> extra arg to rpc_init_priority_wait_queue().  I can do it that way if
> you like.
> NeilBrown
>
>
>  include/linux/sunrpc/sched.h |  1 +
>  net/sunrpc/sched.c   | 12 +---
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index d703f0ef37d8..985efe8d7e26 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -184,6 +184,7 @@ struct rpc_wait_queue {
> pid_t   owner;  /* process id of last 
> task serviced */
> unsigned char   maxpriority;/* maximum priority 
> (0 if queue is not a priority queue) */
> unsigned char   priority;   /* current priority */
> +   unsigned char   count;  /* # task groups 
> remaining to be serviced */
> unsigned char   nr; /* # tasks remaining 
> for cookie */
> unsigned short  qlen;   /* total # tasks 
> waiting in queue */
> struct rpc_timertimer_list;
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 73ad57a59989..e8fcd4f098bb 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -117,6 +117,8 @@ static void rpc_set_waitqueue_priority(struct 
> rpc_wait_queue *queue, int priorit
> rpc_rotate_queue_owner(queue);
> queue->priority = priority;
> }
> +   if (queue->count != 255)
> +   queue->count = 1 << (priority * 2);
>  }
>
>  static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid)
> @@ -144,8 +146,10 @@ static void __rpc_add_wait_queue_priority(struct 
> rpc_wait_queue *queue,
> INIT_LIST_HEAD(&task->u.tk_wait.links);
> if (unlikely(queue_priority > queue->maxpriority))
> queue_priority = queue->maxpriority;
> -   if (queue_priority > queue->priority)
> -   rpc_set_waitqueue_priority(queue, queue_priority);
> +   if (queue->count == 255) {
> +   if (queue_priority > queue->priority)
> +   rpc_set_waitqueue_priority(queue, queue_priority);
> +   }
> q = &queue->tasks[queue_priority];
> list_for_each_entry(t, q, u.tk_wait.list) {
> if (t->tk_owner == task->tk_owner) {
> @@ -401,6 +405,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *q, 
> struct rpc_task *task,
>  * Protect the queue operations.
>  */
> spin_lock_bh(&q->lock);
> +   q->count = 255;
> __rpc_sleep_on_priority(q, task, action, priority - RPC_PRIORITY_LOW);
> spin_unlock_bh(&q->lock);
>  }
> @@ -478,7 +483,8 @@ static struct rpc_task 
> *__rpc_find_next_queued_priority(struct rpc_wait_queue *q
> /*
>  * Check if we need to switch queues.
>  */
> -   goto new_owner;
> +   if (queue->count == 255 || --queue->count)
> +   goto new_owner;
> }
>
> /*
>

Are we sure there is value in keeping FLUSH_LOWPRI for background writes?

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org

[PATCH] SUNRPC: restore fair scheduling to priority queues.

2015-12-15 Thread NeilBrown

Commit: c05eecf63610 ("SUNRPC: Don't allow low priority tasks to pre-empt 
higher priority ones")

removed the 'fair scheduling' feature from SUNRPC priority queues.
This feature caused problems for some queues (send queue and session slot queue)
but is still needed for others, particularly the tcp slot queue.

Without fairness, reads (priority 1) can starve background writes
(priority 0) so a streaming read can cause writeback to block
indefinitely.  This is not easy to measure with default settings as
the current slot table size is much larger than the read-ahead size.
However if the slot-table size is reduced (seen when backporting to
older kernels with a limited size) the problem is easily demonstrated.

This patch conditionally restores fair scheduling.  It is now the
default unless rpc_sleep_on_priority() is called directly.  Then the
queue switches to strict priority observance.

As that function is called for both the send queue and the session
slot queue and not for any others, this has exactly the desired
effect.

The "count" field that was removed by the previous patch is restored.
A value for '255' means "strict priority queuing, no fair queuing".
Any other value is a could of owners to be processed before switching
to a different priority level, just like before.

Signed-off-by: NeilBrown 
---

It is quite possible that you won't like the overloading of
rpc_sleep_on_priority() to disable fair-scheduling and would prefer an
extra arg to rpc_init_priority_wait_queue().  I can do it that way if
you like.
NeilBrown


 include/linux/sunrpc/sched.h |  1 +
 net/sunrpc/sched.c   | 12 +---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index d703f0ef37d8..985efe8d7e26 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -184,6 +184,7 @@ struct rpc_wait_queue {
pid_t   owner;  /* process id of last 
task serviced */
unsigned char   maxpriority;/* maximum priority (0 
if queue is not a priority queue) */
unsigned char   priority;   /* current priority */
+   unsigned char   count;  /* # task groups 
remaining to be serviced */
unsigned char   nr; /* # tasks remaining 
for cookie */
unsigned short  qlen;   /* total # tasks 
waiting in queue */
struct rpc_timertimer_list;
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 73ad57a59989..e8fcd4f098bb 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -117,6 +117,8 @@ static void rpc_set_waitqueue_priority(struct 
rpc_wait_queue *queue, int priorit
rpc_rotate_queue_owner(queue);
queue->priority = priority;
}
+   if (queue->count != 255)
+   queue->count = 1 << (priority * 2);
 }
 
 static void rpc_set_waitqueue_owner(struct rpc_wait_queue *queue, pid_t pid)
@@ -144,8 +146,10 @@ static void __rpc_add_wait_queue_priority(struct 
rpc_wait_queue *queue,
INIT_LIST_HEAD(&task->u.tk_wait.links);
if (unlikely(queue_priority > queue->maxpriority))
queue_priority = queue->maxpriority;
-   if (queue_priority > queue->priority)
-   rpc_set_waitqueue_priority(queue, queue_priority);
+   if (queue->count == 255) {
+   if (queue_priority > queue->priority)
+   rpc_set_waitqueue_priority(queue, queue_priority);
+   }
q = &queue->tasks[queue_priority];
list_for_each_entry(t, q, u.tk_wait.list) {
if (t->tk_owner == task->tk_owner) {
@@ -401,6 +405,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *q, struct 
rpc_task *task,
 * Protect the queue operations.
 */
spin_lock_bh(&q->lock);
+   q->count = 255;
__rpc_sleep_on_priority(q, task, action, priority - RPC_PRIORITY_LOW);
spin_unlock_bh(&q->lock);
 }
@@ -478,7 +483,8 @@ static struct rpc_task 
*__rpc_find_next_queued_priority(struct rpc_wait_queue *q
/*
 * Check if we need to switch queues.
 */
-   goto new_owner;
+   if (queue->count == 255 || --queue->count)
+   goto new_owner;
}
 
/*
-- 
2.6.3



signature.asc
Description: PGP signature