Re: timerfd redux

2007-09-13 Thread Andrew Morton
On Thu, 13 Sep 2007 10:13:59 +0200 "Michael Kerrisk" <[EMAIL PROTECTED]> wrote:

> Andrew,
> 
> > > 3. possible solutions
> > 
> > I don't think we'll have this settled and coded in time for 2.6.23.  So I
> > think the prudent thing to do is to push this back to 2.6.24 and not offer
> > sys_timerfd() in 2.6.23.
> 
> Did you want a patch to remove the syscall number for now,
> or will you do that?
> 

Please send one over sometime.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-13 Thread Michael Kerrisk
Andrew,

> > 3. possible solutions
> 
> I don't think we'll have this settled and coded in time for 2.6.23.  So I
> think the prudent thing to do is to push this back to 2.6.24 and not offer
> sys_timerfd() in 2.6.23.

Did you want a patch to remove the syscall number for now,
or will you do that?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-13 Thread Michael Kerrisk
> > [Was: Re: [PATCH] Revised timerfd() interface]
> > 
> > > Michael, could you please refresh our memories with a brief,
> > > from-scratch summary of what the current interface is, followed
> > > by a summary of what you believe to be the shortcomings to be? 
> > 
> > Andrew,
> > 
> > I'll break this up into parts:
> > 
> > 1. the existing timerfd interface
> > 2. timerfd limitations
> > 3. possible solutions
> >  a) Add an argument
> >  b) Create an interface similar to POSIX timers
> >  c) Integrate timerfd with POSIX timers
> > 
> > Cheers,
> > 
> > Michael
> > 
> > 
> > 1: the existing timerfd interface
> > =
> > 
> > In 2.6.22, Davide added timerfd() with the following interface:
> > 
> > returned_fd = timerfd(int fd, int clockid, int flags,
> >   struct itimerspec *utimer);
> > 
> > If fd is -1, a new timer is created and started.  The syscall
> > returns a file descriptor for the timer. 'utimer' specifies
> > the initial expiration and interval of the timer.
> > 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
> > value is relative, unless TFD_TIMER_ABSTIME is specified in
> > 'flags', in which case the initial expiration is specified
> > absolutely.
> > 
> > If 'fd' is not -1, then the call modifies the existing timer
> > referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
> > and 'utimer' can all be modified.  The return value is 'fd'.
> > 
> > The key feature of timerfd() is that the caller can use
> > select/poll/epoll to wait on traditional file descriptors and
> > one or more timers.
> > 
> > read() from a timerfd file descriptor (should) return a 4-byte
> > integer that is the number of timer expirations since the last
> > read.  (If no expiration has so far occurred, read() will block.)
> > 
> > IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
> > only a single byte of info was returned by read().  I regard
> > this as a virtue: it gives us something closer to a blank slate
> > for fixing the problems described below; furthermore,
> > arguably at this point we could buy ourselves time by
> > pulling timerfd() from 2.6.23, and taking more time to get
> > things right in 2.6.24.
> > 
> > (More details on timerfd() can be found here: 
> > http://lwn.net/Articles/245533/)
> 
> OK.
> 
> > 2. timerfd limitations
> > ==
> > 
> > Unix has two older timer interfaces:
> > 
> > * setitimer/getitimer and
> > 
> > * POSIX timers (timer_create/timer_settime/timer_gettime).
> > 
> > timerfd() lacks two features that are present in the older
> > interfaces:
> > 
> > * Retrieve the previous setting of an existing timer when
> >   setting a new value for the timer.
> > 
> > * Non-destructively fetch the timer remaining until the
> >   next expiration of the timer.
> > 
> > The fact that this functionality is present in both older APIs
> > strongly suggests that various applications really need both
> > functionalities.  
> 
> Yes, I can imagine applications wanting to do those things.
> 
> > (Davide has argued that timerfd() doesn't need the
> > get-while-setting functionality because we can create multiple
> > timerfd timers.  However, POSIX timers also allow multiple
> > timer instances, but nevertheless provide get-while-setting.
> > I would estimate that this functionality would be useful for
> > libraries that want to create and control a (single) timerfd
> > file descriptor that is returned to the caller.)
> 
> Sure.  If you're implementing a timeout and you want to reset it, you
> might indeed want to know how close the old one was to expiring.
> 
> Davide's proposal sounds like an awkward workaround for missing
> functionality.

In the other thread I commented that the userspace solution
starts to look pretty complex, and I doubt that it can
be made to work in all cases.

> Does Davide have a proposal for the non-destructive fetch?

I don't think so, since he disagrees about it's necessity.

> > 3. possible solutions
> 
> I don't think we'll have this settled and coded in time for 2.6.23.  So I
> think the prudent thing to do is to push this back to 2.6.24 and not 
> offer sys_timerfd() in 2.6.23.

I think this would be wise.  I'd like to talk with Davide some
more about the possibilities.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-13 Thread Michael Kerrisk
  [Was: Re: [PATCH] Revised timerfd() interface]
  
   Michael, could you please refresh our memories with a brief,
   from-scratch summary of what the current interface is, followed
   by a summary of what you believe to be the shortcomings to be? 
  
  Andrew,
  
  I'll break this up into parts:
  
  1. the existing timerfd interface
  2. timerfd limitations
  3. possible solutions
   a) Add an argument
   b) Create an interface similar to POSIX timers
   c) Integrate timerfd with POSIX timers
  
  Cheers,
  
  Michael
  
  
  1: the existing timerfd interface
  =
  
  In 2.6.22, Davide added timerfd() with the following interface:
  
  returned_fd = timerfd(int fd, int clockid, int flags,
struct itimerspec *utimer);
  
  If fd is -1, a new timer is created and started.  The syscall
  returns a file descriptor for the timer. 'utimer' specifies
  the initial expiration and interval of the timer.
  'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
  value is relative, unless TFD_TIMER_ABSTIME is specified in
  'flags', in which case the initial expiration is specified
  absolutely.
  
  If 'fd' is not -1, then the call modifies the existing timer
  referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
  and 'utimer' can all be modified.  The return value is 'fd'.
  
  The key feature of timerfd() is that the caller can use
  select/poll/epoll to wait on traditional file descriptors and
  one or more timers.
  
  read() from a timerfd file descriptor (should) return a 4-byte
  integer that is the number of timer expirations since the last
  read.  (If no expiration has so far occurred, read() will block.)
  
  IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
  only a single byte of info was returned by read().  I regard
  this as a virtue: it gives us something closer to a blank slate
  for fixing the problems described below; furthermore,
  arguably at this point we could buy ourselves time by
  pulling timerfd() from 2.6.23, and taking more time to get
  things right in 2.6.24.
  
  (More details on timerfd() can be found here: 
  http://lwn.net/Articles/245533/)
 
 OK.
 
  2. timerfd limitations
  ==
  
  Unix has two older timer interfaces:
  
  * setitimer/getitimer and
  
  * POSIX timers (timer_create/timer_settime/timer_gettime).
  
  timerfd() lacks two features that are present in the older
  interfaces:
  
  * Retrieve the previous setting of an existing timer when
setting a new value for the timer.
  
  * Non-destructively fetch the timer remaining until the
next expiration of the timer.
  
  The fact that this functionality is present in both older APIs
  strongly suggests that various applications really need both
  functionalities.  
 
 Yes, I can imagine applications wanting to do those things.
 
  (Davide has argued that timerfd() doesn't need the
  get-while-setting functionality because we can create multiple
  timerfd timers.  However, POSIX timers also allow multiple
  timer instances, but nevertheless provide get-while-setting.
  I would estimate that this functionality would be useful for
  libraries that want to create and control a (single) timerfd
  file descriptor that is returned to the caller.)
 
 Sure.  If you're implementing a timeout and you want to reset it, you
 might indeed want to know how close the old one was to expiring.
 
 Davide's proposal sounds like an awkward workaround for missing
 functionality.

In the other thread I commented that the userspace solution
starts to look pretty complex, and I doubt that it can
be made to work in all cases.

 Does Davide have a proposal for the non-destructive fetch?

I don't think so, since he disagrees about it's necessity.

  3. possible solutions
 
 I don't think we'll have this settled and coded in time for 2.6.23.  So I
 think the prudent thing to do is to push this back to 2.6.24 and not 
 offer sys_timerfd() in 2.6.23.

I think this would be wise.  I'd like to talk with Davide some
more about the possibilities.

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-13 Thread Michael Kerrisk
Andrew,

  3. possible solutions
 
 I don't think we'll have this settled and coded in time for 2.6.23.  So I
 think the prudent thing to do is to push this back to 2.6.24 and not offer
 sys_timerfd() in 2.6.23.

Did you want a patch to remove the syscall number for now,
or will you do that?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-13 Thread Andrew Morton
On Thu, 13 Sep 2007 10:13:59 +0200 Michael Kerrisk [EMAIL PROTECTED] wrote:

 Andrew,
 
   3. possible solutions
  
  I don't think we'll have this settled and coded in time for 2.6.23.  So I
  think the prudent thing to do is to push this back to 2.6.24 and not offer
  sys_timerfd() in 2.6.23.
 
 Did you want a patch to remove the syscall number for now,
 or will you do that?
 

Please send one over sometime.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-12 Thread Andrew Morton
On Wed, 05 Sep 2007 17:32:01 +0200 "Michael Kerrisk" <[EMAIL PROTECTED]> wrote:

> [Was: Re: [PATCH] Revised timerfd() interface]
> 
> > Michael, could you please refresh our memories with a brief,
> > from-scratch summary of what the current interface is, followed
> > by a summary of what you believe to be the shortcomings to be? 
> 
> Andrew,
> 
> I'll break this up into parts:
> 
> 1. the existing timerfd interface
> 2. timerfd limitations
> 3. possible solutions
>  a) Add an argument
>  b) Create an interface similar to POSIX timers
>  c) Integrate timerfd with POSIX timers
> 
> Cheers,
> 
> Michael
> 
> 
> 1: the existing timerfd interface
> =
> 
> In 2.6.22, Davide added timerfd() with the following interface:
> 
> returned_fd = timerfd(int fd, int clockid, int flags,
>   struct itimerspec *utimer);
> 
> If fd is -1, a new timer is created and started.  The syscall
> returns a file descriptor for the timer. 'utimer' specifies
> the initial expiration and interval of the timer.
> 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
> value is relative, unless TFD_TIMER_ABSTIME is specified in
> 'flags', in which case the initial expiration is specified
> absolutely.
> 
> If 'fd' is not -1, then the call modifies the existing timer
> referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
> and 'utimer' can all be modified.  The return value is 'fd'.
> 
> The key feature of timerfd() is that the caller can use
> select/poll/epoll to wait on traditional file descriptors and
> one or more timers.
> 
> read() from a timerfd file descriptor (should) return a 4-byte
> integer that is the number of timer expirations since the last
> read.  (If no expiration has so far occurred, read() will block.)
> 
> IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
> only a single byte of info was returned by read().  I regard
> this as a virtue: it gives us something closer to a blank slate
> for fixing the problems described below; furthermore,
> arguably at this point we could buy ourselves time by
> pulling timerfd() from 2.6.23, and taking more time to get
> things right in 2.6.24.
> 
> (More details on timerfd() can be found here: 
> http://lwn.net/Articles/245533/)

OK.

> 2. timerfd limitations
> ==
> 
> Unix has two older timer interfaces:
> 
> * setitimer/getitimer and
> 
> * POSIX timers (timer_create/timer_settime/timer_gettime).
> 
> timerfd() lacks two features that are present in the older
> interfaces:
> 
> * Retrieve the previous setting of an existing timer when
>   setting a new value for the timer.
> 
> * Non-destructively fetch the timer remaining until the
>   next expiration of the timer.
> 
> The fact that this functionality is present in both older APIs
> strongly suggests that various applications really need both
> functionalities.  

Yes, I can imagine applications wanting to do those things.

> (Davide has argued that timerfd() doesn't need the
> get-while-setting functionality because we can create multiple
> timerfd timers.  However, POSIX timers also allow multiple
> timer instances, but nevertheless provide get-while-setting.
> I would estimate that this functionality would be useful for
> libraries that want to create and control a (single) timerfd
> file descriptor that is returned to the caller.)

Sure.  If you're implementing a timeout and you want to reset it, you might
indeed want to know how close the old one was to expiring.

Davide's proposal sounds like an awkward workaround for missing
functionality.


Does Davide have a proposal for the non-destructive fetch?


> 3. possible solutions

I don't think we'll have this settled and coded in time for 2.6.23.  So I
think the prudent thing to do is to push this back to 2.6.24 and not offer
sys_timerfd() in 2.6.23.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timerfd redux

2007-09-12 Thread Andrew Morton
On Wed, 05 Sep 2007 17:32:01 +0200 Michael Kerrisk [EMAIL PROTECTED] wrote:

 [Was: Re: [PATCH] Revised timerfd() interface]
 
  Michael, could you please refresh our memories with a brief,
  from-scratch summary of what the current interface is, followed
  by a summary of what you believe to be the shortcomings to be? 
 
 Andrew,
 
 I'll break this up into parts:
 
 1. the existing timerfd interface
 2. timerfd limitations
 3. possible solutions
  a) Add an argument
  b) Create an interface similar to POSIX timers
  c) Integrate timerfd with POSIX timers
 
 Cheers,
 
 Michael
 
 
 1: the existing timerfd interface
 =
 
 In 2.6.22, Davide added timerfd() with the following interface:
 
 returned_fd = timerfd(int fd, int clockid, int flags,
   struct itimerspec *utimer);
 
 If fd is -1, a new timer is created and started.  The syscall
 returns a file descriptor for the timer. 'utimer' specifies
 the initial expiration and interval of the timer.
 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
 value is relative, unless TFD_TIMER_ABSTIME is specified in
 'flags', in which case the initial expiration is specified
 absolutely.
 
 If 'fd' is not -1, then the call modifies the existing timer
 referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
 and 'utimer' can all be modified.  The return value is 'fd'.
 
 The key feature of timerfd() is that the caller can use
 select/poll/epoll to wait on traditional file descriptors and
 one or more timers.
 
 read() from a timerfd file descriptor (should) return a 4-byte
 integer that is the number of timer expirations since the last
 read.  (If no expiration has so far occurred, read() will block.)
 
 IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
 only a single byte of info was returned by read().  I regard
 this as a virtue: it gives us something closer to a blank slate
 for fixing the problems described below; furthermore,
 arguably at this point we could buy ourselves time by
 pulling timerfd() from 2.6.23, and taking more time to get
 things right in 2.6.24.
 
 (More details on timerfd() can be found here: 
 http://lwn.net/Articles/245533/)

OK.

 2. timerfd limitations
 ==
 
 Unix has two older timer interfaces:
 
 * setitimer/getitimer and
 
 * POSIX timers (timer_create/timer_settime/timer_gettime).
 
 timerfd() lacks two features that are present in the older
 interfaces:
 
 * Retrieve the previous setting of an existing timer when
   setting a new value for the timer.
 
 * Non-destructively fetch the timer remaining until the
   next expiration of the timer.
 
 The fact that this functionality is present in both older APIs
 strongly suggests that various applications really need both
 functionalities.  

Yes, I can imagine applications wanting to do those things.

 (Davide has argued that timerfd() doesn't need the
 get-while-setting functionality because we can create multiple
 timerfd timers.  However, POSIX timers also allow multiple
 timer instances, but nevertheless provide get-while-setting.
 I would estimate that this functionality would be useful for
 libraries that want to create and control a (single) timerfd
 file descriptor that is returned to the caller.)

Sure.  If you're implementing a timeout and you want to reset it, you might
indeed want to know how close the old one was to expiring.

Davide's proposal sounds like an awkward workaround for missing
functionality.


Does Davide have a proposal for the non-destructive fetch?


 3. possible solutions

I don't think we'll have this settled and coded in time for 2.6.23.  So I
think the prudent thing to do is to push this back to 2.6.24 and not offer
sys_timerfd() in 2.6.23.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


timerfd redux

2007-09-05 Thread Michael Kerrisk
[Was: Re: [PATCH] Revised timerfd() interface]

> Michael, could you please refresh our memories with a brief,
> from-scratch summary of what the current interface is, followed
> by a summary of what you believe to be the shortcomings to be? 

Andrew,

I'll break this up into parts:

1. the existing timerfd interface
2. timerfd limitations
3. possible solutions
 a) Add an argument
 b) Create an interface similar to POSIX timers
 c) Integrate timerfd with POSIX timers

Cheers,

Michael


1: the existing timerfd interface
=

In 2.6.22, Davide added timerfd() with the following interface:

returned_fd = timerfd(int fd, int clockid, int flags,
  struct itimerspec *utimer);

If fd is -1, a new timer is created and started.  The syscall
returns a file descriptor for the timer. 'utimer' specifies
the initial expiration and interval of the timer.
'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
value is relative, unless TFD_TIMER_ABSTIME is specified in
'flags', in which case the initial expiration is specified
absolutely.

If 'fd' is not -1, then the call modifies the existing timer
referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
and 'utimer' can all be modified.  The return value is 'fd'.

The key feature of timerfd() is that the caller can use
select/poll/epoll to wait on traditional file descriptors and
one or more timers.

read() from a timerfd file descriptor (should) return a 4-byte
integer that is the number of timer expirations since the last
read.  (If no expiration has so far occurred, read() will block.)

IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
only a single byte of info was returned by read().  I regard
this as a virtue: it gives us something closer to a blank slate
for fixing the problems described below; furthermore,
arguably at this point we could buy ourselves time by
pulling timerfd() from 2.6.23, and taking more time to get
things right in 2.6.24.

(More details on timerfd() can be found here: 
http://lwn.net/Articles/245533/)

2. timerfd limitations
==

Unix has two older timer interfaces:

* setitimer/getitimer and

* POSIX timers (timer_create/timer_settime/timer_gettime).

timerfd() lacks two features that are present in the older
interfaces:

* Retrieve the previous setting of an existing timer when
  setting a new value for the timer.

* Non-destructively fetch the timer remaining until the
  next expiration of the timer.

The fact that this functionality is present in both older APIs
strongly suggests that various applications really need both
functionalities.  

(Davide has argued that timerfd() doesn't need the
get-while-setting functionality because we can create multiple
timerfd timers.  However, POSIX timers also allow multiple
timer instances, but nevertheless provide get-while-setting.
I would estimate that this functionality would be useful for
libraries that want to create and control a (single) timerfd
file descriptor that is returned to the caller.)

3. possible solutions
=

> a) Add an argument

I proposed adding a further argument to timerfd(): old_utmr,
which could be used to return the time remaining until
expiry for an existing timer 
(http://marc.info/?l=linux-kernel=118669430305788=2 ).
I proposed semantics that would allow get and
get-while-setting functionality.

Jon Corbet pointed out that my suggestion was starting
to look like a multiplexing syscall.  I agree.  I now
favor one of the remaining solutions.

> b) Create an interface similar to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, _to_next_expire);

timerfd_gettime(fd, _to_next_expire);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: three new system calls, rather than 1.

This solution would be sufficient, IMO, but the
next solution might be better.

> c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with
the POSIX timers API.  A POSIX timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_gettime(timer_t timerid, struct itimerspec *value);
int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue); 

And then read from 'fd' as before.

Advantages:
  1. Integration with an existing API.
  2. Adds just a single system call
  3. This strikes me as the most beautiful solution,
 if we can do it properly.

Disadvantage: I'm not yet completely clear 

timerfd redux

2007-09-05 Thread Michael Kerrisk
[Was: Re: [PATCH] Revised timerfd() interface]

 Michael, could you please refresh our memories with a brief,
 from-scratch summary of what the current interface is, followed
 by a summary of what you believe to be the shortcomings to be? 

Andrew,

I'll break this up into parts:

1. the existing timerfd interface
2. timerfd limitations
3. possible solutions
 a) Add an argument
 b) Create an interface similar to POSIX timers
 c) Integrate timerfd with POSIX timers

Cheers,

Michael


1: the existing timerfd interface
=

In 2.6.22, Davide added timerfd() with the following interface:

returned_fd = timerfd(int fd, int clockid, int flags,
  struct itimerspec *utimer);

If fd is -1, a new timer is created and started.  The syscall
returns a file descriptor for the timer. 'utimer' specifies
the initial expiration and interval of the timer.
'clockid' is CLOCK_REALTIME or CLOCK_REALTIME.  The 'utimer'
value is relative, unless TFD_TIMER_ABSTIME is specified in
'flags', in which case the initial expiration is specified
absolutely.

If 'fd' is not -1, then the call modifies the existing timer
referred to by the file descriptor 'fd'.  The 'clockid', 'flags',
and 'utimer' can all be modified.  The return value is 'fd'.

The key feature of timerfd() is that the caller can use
select/poll/epoll to wait on traditional file descriptors and
one or more timers.

read() from a timerfd file descriptor (should) return a 4-byte
integer that is the number of timer expirations since the last
read.  (If no expiration has so far occurred, read() will block.)

IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
only a single byte of info was returned by read().  I regard
this as a virtue: it gives us something closer to a blank slate
for fixing the problems described below; furthermore,
arguably at this point we could buy ourselves time by
pulling timerfd() from 2.6.23, and taking more time to get
things right in 2.6.24.

(More details on timerfd() can be found here: 
http://lwn.net/Articles/245533/)

2. timerfd limitations
==

Unix has two older timer interfaces:

* setitimer/getitimer and

* POSIX timers (timer_create/timer_settime/timer_gettime).

timerfd() lacks two features that are present in the older
interfaces:

* Retrieve the previous setting of an existing timer when
  setting a new value for the timer.

* Non-destructively fetch the timer remaining until the
  next expiration of the timer.

The fact that this functionality is present in both older APIs
strongly suggests that various applications really need both
functionalities.  

(Davide has argued that timerfd() doesn't need the
get-while-setting functionality because we can create multiple
timerfd timers.  However, POSIX timers also allow multiple
timer instances, but nevertheless provide get-while-setting.
I would estimate that this functionality would be useful for
libraries that want to create and control a (single) timerfd
file descriptor that is returned to the caller.)

3. possible solutions
=

 a) Add an argument

I proposed adding a further argument to timerfd(): old_utmr,
which could be used to return the time remaining until
expiry for an existing timer 
(http://marc.info/?l=linux-kernelm=118669430305788w=2 ).
I proposed semantics that would allow get and
get-while-setting functionality.

Jon Corbet pointed out that my suggestion was starting
to look like a multiplexing syscall.  I agree.  I now
favor one of the remaining solutions.

 b) Create an interface similar to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, time_to_next_expire);

timerfd_gettime(fd, time_to_next_expire);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: three new system calls, rather than 1.

This solution would be sufficient, IMO, but the
next solution might be better.

 c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with
the POSIX timers API.  A POSIX timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_gettime(timer_t timerid, struct itimerspec *value);
int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue); 

And then read from 'fd' as before.

Advantages:
  1. Integration with an existing API.
  2. Adds just a single system call
  3. This strikes me as the most beautiful solution,
 if we can do it properly.

Disadvantage: I'm not yet completely clear