Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Jason Baron
On 02/17/2015 04:09 PM, Andy Lutomirski wrote:
> On Tue, Feb 17, 2015 at 12:33 PM, Jason Baron  wrote:
>> On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
>>> On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron  wrote:
 When we are sharing a wakeup source among multiple epoll fds, we end up 
 with
 thundering herd wakeups, since there is currently no way to add to the
 wakeup source exclusively. This series introduces 2 new epoll flags,
 EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And 
 EPOLLROUNDROBIN
 which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
 distribute the wakeups. This patch was originally motivated by a desire to
 improve wakeup balance and cpu usage for a listen socket() shared amongst
 multiple epoll fd sets.

 See: http://lwn.net/Articles/632590/ for previous test program and testing
 resutls.

 Epoll manpage text:

 EPOLLEXCLUSIVE
 Provides exclusive wakeups when attaching multiple epoll fds to a
 shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
 operation.

 EPOLLROUNDROBIN
 Provides balancing for exclusive wakeups when attaching multiple 
 epoll
 fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
 and
 must be specified with an EPOLL_CTL_ADD operation.

 Thanks,
>>> What permissions do you need on the file descriptor to do this?  This
>>> will be the first case where a poll-like operation has side effects,
>>> and that's rather weird IMO.
>>>
>> So in the case where you have both non-exclusive and exclusive
>> waiters, all of the non-exclusive waiters will continue to get woken
>> up. However, I think you're getting at having multiple exclusive
>> waiters and potentially 'starving' out other exclusive waiters.
>>
>> In general, I think wait queues are associated with a 'struct file',
>> so I think unless you are sharing your fd table, this isn't an issue.
>> However, there may be cases where this is not true? In which
>> case, perhaps, we could limit this to CAP_SYS_ADMIN...
> There's also SCM_RIGHTS, which can be used in conjunction with file
> sealing and such.
>
> In general, I feel like this patch series solves a problem that isn't
> well understood and does it by adding a rather strange new mechanism.
> Is there really a problem that can't be addressed by more normal epoll
> features?
>
> --Andy

hmmso I dug through some of the Linux archives a bit and this
problem seems to crop up every so often without resolution.
So I do believe that its an issue that ppl are more generally
interested in.

See:

http://lkml.iu.edu/hypermail/linux/kernel/1201.1/02620.html
http://marc.info/?l=linux-kernel=128638781921073=2

In the latter thread, Linus suggests adding it to the "requested events"
field to poll: http://marc.info/?l=linux-kernel=128639416832335=2

So, I think that this series at least moves in that suggested direction.

Thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Andy Lutomirski
On Tue, Feb 17, 2015 at 12:33 PM, Jason Baron  wrote:
> On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
>> On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron  wrote:
>>> When we are sharing a wakeup source among multiple epoll fds, we end up with
>>> thundering herd wakeups, since there is currently no way to add to the
>>> wakeup source exclusively. This series introduces 2 new epoll flags,
>>> EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And 
>>> EPOLLROUNDROBIN
>>> which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
>>> distribute the wakeups. This patch was originally motivated by a desire to
>>> improve wakeup balance and cpu usage for a listen socket() shared amongst
>>> multiple epoll fd sets.
>>>
>>> See: http://lwn.net/Articles/632590/ for previous test program and testing
>>> resutls.
>>>
>>> Epoll manpage text:
>>>
>>> EPOLLEXCLUSIVE
>>> Provides exclusive wakeups when attaching multiple epoll fds to a
>>> shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
>>> operation.
>>>
>>> EPOLLROUNDROBIN
>>> Provides balancing for exclusive wakeups when attaching multiple 
>>> epoll
>>> fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
>>> and
>>> must be specified with an EPOLL_CTL_ADD operation.
>>>
>>> Thanks,
>> What permissions do you need on the file descriptor to do this?  This
>> will be the first case where a poll-like operation has side effects,
>> and that's rather weird IMO.
>>
>
> So in the case where you have both non-exclusive and exclusive
> waiters, all of the non-exclusive waiters will continue to get woken
> up. However, I think you're getting at having multiple exclusive
> waiters and potentially 'starving' out other exclusive waiters.
>
> In general, I think wait queues are associated with a 'struct file',
> so I think unless you are sharing your fd table, this isn't an issue.
> However, there may be cases where this is not true? In which
> case, perhaps, we could limit this to CAP_SYS_ADMIN...

There's also SCM_RIGHTS, which can be used in conjunction with file
sealing and such.

In general, I feel like this patch series solves a problem that isn't
well understood and does it by adding a rather strange new mechanism.
Is there really a problem that can't be addressed by more normal epoll
features?

--Andy

>
> Thanks,
>
> -Jason
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Jason Baron
On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
> On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron  wrote:
>> When we are sharing a wakeup source among multiple epoll fds, we end up with
>> thundering herd wakeups, since there is currently no way to add to the
>> wakeup source exclusively. This series introduces 2 new epoll flags,
>> EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN
>> which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
>> distribute the wakeups. This patch was originally motivated by a desire to
>> improve wakeup balance and cpu usage for a listen socket() shared amongst
>> multiple epoll fd sets.
>>
>> See: http://lwn.net/Articles/632590/ for previous test program and testing
>> resutls.
>>
>> Epoll manpage text:
>>
>> EPOLLEXCLUSIVE
>> Provides exclusive wakeups when attaching multiple epoll fds to a
>> shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
>> operation.
>>
>> EPOLLROUNDROBIN
>> Provides balancing for exclusive wakeups when attaching multiple 
>> epoll
>> fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
>> and
>> must be specified with an EPOLL_CTL_ADD operation.
>>
>> Thanks,
> What permissions do you need on the file descriptor to do this?  This
> will be the first case where a poll-like operation has side effects,
> and that's rather weird IMO.
>

So in the case where you have both non-exclusive and exclusive
waiters, all of the non-exclusive waiters will continue to get woken
up. However, I think you're getting at having multiple exclusive
waiters and potentially 'starving' out other exclusive waiters.

In general, I think wait queues are associated with a 'struct file',
so I think unless you are sharing your fd table, this isn't an issue.
However, there may be cases where this is not true? In which
case, perhaps, we could limit this to CAP_SYS_ADMIN...

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Andy Lutomirski
On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron  wrote:
> When we are sharing a wakeup source among multiple epoll fds, we end up with
> thundering herd wakeups, since there is currently no way to add to the
> wakeup source exclusively. This series introduces 2 new epoll flags,
> EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN
> which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
> distribute the wakeups. This patch was originally motivated by a desire to
> improve wakeup balance and cpu usage for a listen socket() shared amongst
> multiple epoll fd sets.
>
> See: http://lwn.net/Articles/632590/ for previous test program and testing
> resutls.
>
> Epoll manpage text:
>
> EPOLLEXCLUSIVE
> Provides exclusive wakeups when attaching multiple epoll fds to a
> shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
> operation.
>
> EPOLLROUNDROBIN
> Provides balancing for exclusive wakeups when attaching multiple epoll
> fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set and
> must be specified with an EPOLL_CTL_ADD operation.
>
> Thanks,

What permissions do you need on the file descriptor to do this?  This
will be the first case where a poll-like operation has side effects,
and that's rather weird IMO.

--Andy

>
> -Jason
>
>
> Jason Baron (2):
>   sched/wait: add round robin wakeup mode
>   epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN
>
>  fs/eventpoll.c | 25 -
>  include/linux/wait.h   | 11 +++
>  include/uapi/linux/eventpoll.h |  6 ++
>  kernel/sched/wait.c| 10 --
>  4 files changed, 45 insertions(+), 7 deletions(-)
>
> --
> 1.8.2.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Jason Baron
On 02/17/2015 04:09 PM, Andy Lutomirski wrote:
 On Tue, Feb 17, 2015 at 12:33 PM, Jason Baron jba...@akamai.com wrote:
 On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
 On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron jba...@akamai.com wrote:
 When we are sharing a wakeup source among multiple epoll fds, we end up 
 with
 thundering herd wakeups, since there is currently no way to add to the
 wakeup source exclusively. This series introduces 2 new epoll flags,
 EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And 
 EPOLLROUNDROBIN
 which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
 distribute the wakeups. This patch was originally motivated by a desire to
 improve wakeup balance and cpu usage for a listen socket() shared amongst
 multiple epoll fd sets.

 See: http://lwn.net/Articles/632590/ for previous test program and testing
 resutls.

 Epoll manpage text:

 EPOLLEXCLUSIVE
 Provides exclusive wakeups when attaching multiple epoll fds to a
 shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
 operation.

 EPOLLROUNDROBIN
 Provides balancing for exclusive wakeups when attaching multiple 
 epoll
 fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
 and
 must be specified with an EPOLL_CTL_ADD operation.

 Thanks,
 What permissions do you need on the file descriptor to do this?  This
 will be the first case where a poll-like operation has side effects,
 and that's rather weird IMO.

 So in the case where you have both non-exclusive and exclusive
 waiters, all of the non-exclusive waiters will continue to get woken
 up. However, I think you're getting at having multiple exclusive
 waiters and potentially 'starving' out other exclusive waiters.

 In general, I think wait queues are associated with a 'struct file',
 so I think unless you are sharing your fd table, this isn't an issue.
 However, there may be cases where this is not true? In which
 case, perhaps, we could limit this to CAP_SYS_ADMIN...
 There's also SCM_RIGHTS, which can be used in conjunction with file
 sealing and such.

 In general, I feel like this patch series solves a problem that isn't
 well understood and does it by adding a rather strange new mechanism.
 Is there really a problem that can't be addressed by more normal epoll
 features?

 --Andy

hmmso I dug through some of the Linux archives a bit and this
problem seems to crop up every so often without resolution.
So I do believe that its an issue that ppl are more generally
interested in.

See:

http://lkml.iu.edu/hypermail/linux/kernel/1201.1/02620.html
http://marc.info/?l=linux-kernelm=128638781921073w=2

In the latter thread, Linus suggests adding it to the requested events
field to poll: http://marc.info/?l=linux-kernelm=128639416832335w=2

So, I think that this series at least moves in that suggested direction.

Thanks,

-Jason
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Andy Lutomirski
On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron jba...@akamai.com wrote:
 When we are sharing a wakeup source among multiple epoll fds, we end up with
 thundering herd wakeups, since there is currently no way to add to the
 wakeup source exclusively. This series introduces 2 new epoll flags,
 EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN
 which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
 distribute the wakeups. This patch was originally motivated by a desire to
 improve wakeup balance and cpu usage for a listen socket() shared amongst
 multiple epoll fd sets.

 See: http://lwn.net/Articles/632590/ for previous test program and testing
 resutls.

 Epoll manpage text:

 EPOLLEXCLUSIVE
 Provides exclusive wakeups when attaching multiple epoll fds to a
 shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
 operation.

 EPOLLROUNDROBIN
 Provides balancing for exclusive wakeups when attaching multiple epoll
 fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set and
 must be specified with an EPOLL_CTL_ADD operation.

 Thanks,

What permissions do you need on the file descriptor to do this?  This
will be the first case where a poll-like operation has side effects,
and that's rather weird IMO.

--Andy


 -Jason


 Jason Baron (2):
   sched/wait: add round robin wakeup mode
   epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN

  fs/eventpoll.c | 25 -
  include/linux/wait.h   | 11 +++
  include/uapi/linux/eventpoll.h |  6 ++
  kernel/sched/wait.c| 10 --
  4 files changed, 45 insertions(+), 7 deletions(-)

 --
 1.8.2.rc2

 --
 To unsubscribe from this list: send the line unsubscribe linux-api in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Jason Baron
On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
 On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron jba...@akamai.com wrote:
 When we are sharing a wakeup source among multiple epoll fds, we end up with
 thundering herd wakeups, since there is currently no way to add to the
 wakeup source exclusively. This series introduces 2 new epoll flags,
 EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN
 which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
 distribute the wakeups. This patch was originally motivated by a desire to
 improve wakeup balance and cpu usage for a listen socket() shared amongst
 multiple epoll fd sets.

 See: http://lwn.net/Articles/632590/ for previous test program and testing
 resutls.

 Epoll manpage text:

 EPOLLEXCLUSIVE
 Provides exclusive wakeups when attaching multiple epoll fds to a
 shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
 operation.

 EPOLLROUNDROBIN
 Provides balancing for exclusive wakeups when attaching multiple 
 epoll
 fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
 and
 must be specified with an EPOLL_CTL_ADD operation.

 Thanks,
 What permissions do you need on the file descriptor to do this?  This
 will be the first case where a poll-like operation has side effects,
 and that's rather weird IMO.


So in the case where you have both non-exclusive and exclusive
waiters, all of the non-exclusive waiters will continue to get woken
up. However, I think you're getting at having multiple exclusive
waiters and potentially 'starving' out other exclusive waiters.

In general, I think wait queues are associated with a 'struct file',
so I think unless you are sharing your fd table, this isn't an issue.
However, there may be cases where this is not true? In which
case, perhaps, we could limit this to CAP_SYS_ADMIN...

Thanks,

-Jason

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] Add epoll round robin wakeup mode

2015-02-17 Thread Andy Lutomirski
On Tue, Feb 17, 2015 at 12:33 PM, Jason Baron jba...@akamai.com wrote:
 On 02/17/2015 02:46 PM, Andy Lutomirski wrote:
 On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron jba...@akamai.com wrote:
 When we are sharing a wakeup source among multiple epoll fds, we end up with
 thundering herd wakeups, since there is currently no way to add to the
 wakeup source exclusively. This series introduces 2 new epoll flags,
 EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And 
 EPOLLROUNDROBIN
 which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
 distribute the wakeups. This patch was originally motivated by a desire to
 improve wakeup balance and cpu usage for a listen socket() shared amongst
 multiple epoll fd sets.

 See: http://lwn.net/Articles/632590/ for previous test program and testing
 resutls.

 Epoll manpage text:

 EPOLLEXCLUSIVE
 Provides exclusive wakeups when attaching multiple epoll fds to a
 shared wakeup source. Must be specified with an EPOLL_CTL_ADD 
 operation.

 EPOLLROUNDROBIN
 Provides balancing for exclusive wakeups when attaching multiple 
 epoll
 fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set 
 and
 must be specified with an EPOLL_CTL_ADD operation.

 Thanks,
 What permissions do you need on the file descriptor to do this?  This
 will be the first case where a poll-like operation has side effects,
 and that's rather weird IMO.


 So in the case where you have both non-exclusive and exclusive
 waiters, all of the non-exclusive waiters will continue to get woken
 up. However, I think you're getting at having multiple exclusive
 waiters and potentially 'starving' out other exclusive waiters.

 In general, I think wait queues are associated with a 'struct file',
 so I think unless you are sharing your fd table, this isn't an issue.
 However, there may be cases where this is not true? In which
 case, perhaps, we could limit this to CAP_SYS_ADMIN...

There's also SCM_RIGHTS, which can be used in conjunction with file
sealing and such.

In general, I feel like this patch series solves a problem that isn't
well understood and does it by adding a rather strange new mechanism.
Is there really a problem that can't be addressed by more normal epoll
features?

--Andy


 Thanks,

 -Jason




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/