from:"Nicholas Miell"

Re: Announce: modutils 2.3.23 is available

2000-12-20 Thread Nicholas Miell


Christian Gennerat wrote:
 
 About Standard aliases:
  modprobe -c
 ...
 alias ppp-compress-21 bsd_comp
 ...
 
 Why bsd_comp is the standard alias?
 /src/linux/Configure.help says that
 
 The PPP Deflate compression method ("PPP Deflate compression",
   above) is preferable to BSD-Compress, because it compresses better
   and is patent-free.
 

ppp-compress-21 refers to PPP compression method 21, which happens to
be BSD Compress. Deflate is 26 (and also 24, because it was assigned
that
value in the draft RFC).

Aliasing ppp-compress-21 to anything other than bsd_comp would break
PPP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 15:41 -0800, Davide Libenzi wrote:
 This patch introduces a new system call for timers events delivered
 though file descriptors. This allows timer event to be used with
 standard POSIX poll(2), select(2) and read(2). As a consequence of
 supporting the Linux f_op-poll subsystem, they can be used with
 epoll(2) too.
 The system call is defined as:
 
 int timerfd(int ufd, int tmrtype, const struct timespec *utmr);
 
 The ufd parameter allows for re-use (re-programming) of an existing
 timerfd w/out going through the close/open cycle (same as signalfd).
 If ufd is -1, s new file descriptor will be created, otherwise the
 existing ufd will be re-programmed.
 The tmrtype parameter allows to specify the timer type. The following
 values are supported:
 
 TFD_TIMER_REL
 The time specified in the utmr parameter is a relative time
   from NOW.
 
 TFD_TIMER_ABS
 The timer specified in the utmr parameter is an absolute time.
 
 TFD_TIMER_SEQ
 The time specified in the utmr parameter is an interval at
   which a continuous clock rate will be generated.
 
 The function returns the new (or same, in case ufd is a valid timerfd
 descriptor) file, or -1 in case of error.
 As stated before, the timerfd file descriptor supports poll(2), select(2)
 and epoll(2). When a timer event happened on the timerfd, a POLLIN mask
 will be returned.
 The read(2) call can be used, and it will return a u32 variable holding
 the number of ticks that happened on the interface since the last call
 to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN
 will be returned if no ticks happened.
 A quick test program, shows timerfd working correctly on my amd64 box:
 
 http://www.xmailserver.org/timerfd-test.c
 

Why did you ignore the existing POSIX timer API?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 22:38 -0800, Davide Libenzi wrote:
 On Fri, 9 Mar 2007, Nicholas Miell wrote:
 
  Why did you ignore the existing POSIX timer API?
 
 The existing POSIX API is a standard and a very good one. Too bad it does 
 not deliver to files. The timerfd code is, as you can probably read from 
 the code, a really thin wrapper around the existing hrtimer.c Linux code.

So extend the existing POSIX timer API to deliver expiry events via a
fd.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 22:53 -0800, Davide Libenzi wrote:
 On Fri, 9 Mar 2007, Nicholas Miell wrote:
 
  On Fri, 2007-03-09 at 22:38 -0800, Davide Libenzi wrote:
   On Fri, 9 Mar 2007, Nicholas Miell wrote:
   
Why did you ignore the existing POSIX timer API?
   
   The existing POSIX API is a standard and a very good one. Too bad it does 
   not deliver to files. The timerfd code is, as you can probably read from 
   the code, a really thin wrapper around the existing hrtimer.c Linux code.
  
  So extend the existing POSIX timer API to deliver expiry events via a
  fd.
 
 It'll be out of standard as timerfd is, w/out code savings. Look at the 
 code and tell me what could be saved. Prolly the ten lines of the timer 
 callback. Lines that you'll have to drop inside the current posix timer 
 layer. Better leave standards alone, especially like in this case, when 
 the savings are not there.
 

OK, here's a more formal listing of my objections to the introduction of
timerfd in this form:

A) It is a new general-purpose ABI intended for wide-scale usage, and
thus must be maintained forever.

B) It is less functional than the existing ABIs -- modulo their
delivery via signals only limitation, which can be corrected (and has
been already in other operating systems).

C) Being an entirely new creation that completely ignores past work in
this area, it has no hope of ever getting into POSIX.

which means

D) At some point in time, Linux is going to get the POSIX version (in
whatever form it takes), making this new ABI useless dead weight (see
point A).


-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Fri, 2007-03-09 at 23:36 -0800, Davide Libenzi wrote:
 On Fri, 9 Mar 2007, Nicholas Miell wrote:
 
  On Fri, 2007-03-09 at 22:53 -0800, Davide Libenzi wrote:
   On Fri, 9 Mar 2007, Nicholas Miell wrote:

So extend the existing POSIX timer API to deliver expiry events via a
fd.
   
   It'll be out of standard as timerfd is, w/out code savings. Look at the 
   code and tell me what could be saved. Prolly the ten lines of the timer 
   callback. Lines that you'll have to drop inside the current posix timer 
   layer. Better leave standards alone, especially like in this case, when 
   the savings are not there.
   
  
  OK, here's a more formal listing of my objections to the introduction of
  timerfd in this form:
  
  A) It is a new general-purpose ABI intended for wide-scale usage, and
  thus must be maintained forever.
 
 Yup
 
 
  B) It is less functional than the existing ABIs -- modulo their
  delivery via signals only limitation, which can be corrected (and has
  been already in other operating systems).
 
 Less functional? Please, do tell me ...
 

Try reading the timer_create man page.

In short, you're limited to a single clock, so you can't set timers
based on wall-clock time (subject to NTP correction), monotomic time
(not subject to NTP, will not ever go backwards or skip ticks), the
high-res versions of the previous two clocks, per-thread or per-process
CPU usage time, or any other clocks that may get introduced in the
future.

In addition, you've introduced an entirely new incompatible API that
probably doesn't fit easily into existing software that already uses
POSIX timers.


 
  C) Being an entirely new creation that completely ignores past work in
  this area, it has no hope of ever getting into POSIX.
  
  which means
  
  D) At some point in time, Linux is going to get the POSIX version (in
  whatever form it takes), making this new ABI useless dead weight (see
  point A).
 
 Adding parameters/fields to a standard is going to create even more 
 confusion than a new *single* function. And the code to cross-link the 
 timerfd and the current posix timers is going to end up in being more 
 complex than the current one.
 

Yes, but the standard explicitly allows you to do this. Furthermore, if
you work within the existing framework, you can lobby for the inclusion
of your API in the next version of POSIX.

Simplicity of the code is only a virtue if you don't have to do the
exact same thing again with a different interface later while keeping
the maintenance burden of the existing proprietary (and, thus,
unpopular) interface.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 12:41 -0800, Davide Libenzi wrote:
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
 
  Try reading the timer_create man page.
  
  In short, you're limited to a single clock, so you can't set timers
  based on wall-clock time (subject to NTP correction), monotomic time
  (not subject to NTP, will not ever go backwards or skip ticks), the
  high-res versions of the previous two clocks, per-thread or per-process
  CPU usage time, or any other clocks that may get introduced in the
  future.
 
 One timer per fd yes. So?

I never complained about one timer per fd (although, now that you
mention it, that would get a bit excessive if you have thousands of
outstanding timers).

 The real-time and monotonic selection can be added. 

IOW, the timerfd patch is not suitable for inclusion as-is. (While
you're at it, you should probably add a flags argument for future
expansion.)

 If you look at the posix timers code, that's a bunch of code over the real 
 meat of it, that is hrtimer.c. The timerfd interface goes straight to 
 that, without adding yet another meaning to the sigevent structure,

That's what the sigevent structure is for -- to describe how events
should be signaled to userspace, whether by signal delivery, thread
creation, or queuing to event completion ports. If if you think
extending it would be bad, I can show you the line in POSIX where it
encourages the contrary.

  and 
 yet another case inside the posix timers trigger functions. That will be 
 as unstandard as timerfd is, and even more, since you cannot use that 
 interface and hope to be portable in any case.

If Linux were to do a wholesale theft of the Solaris interface (warts
and all), you'd be portable (and, now that I think of it, more
efficient).

Two major unixes using the same interface would probably make it a
shoe-in for the next POSIX, too. (c.f. openat(2) and friends)

 On top of that, handing over files to the posix timers will creates 
 problems with references kept around.
 The timerfd code is just a *really* thin layer (if you exclude the 
 includes, the structure definitions and the fd setup code, there's 
 basically *nothing*) over hrtimer.c and does not mess up with other kernel 
 code in any way, and offers the same functionalities. I'd like to keep it 
 that way.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 13:44 -0800, Linus Torvalds wrote:
 
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
  
  That's what the sigevent structure is for -- to describe how events
  should be signaled to userspace, whether by signal delivery, thread
  creation, or queuing to event completion ports. If if you think
  extending it would be bad, I can show you the line in POSIX where it
  encourages the contrary.
 
 I'm sorry, but by pointing to the POSIX timer stuff, you're just making 
 your argument weaker.
 
 POSIX timers are a horrible crock and over-designed to be a union of 
 everything that has ever been done. Nasty. We had tons of bugs in the 
 original setup because they were so damn nasty.
 

Care to elaborate on why they're a horrible crock?

And are the bugs fixed? If so, why replace them? They work now.

 I'd rather look at just about *anything* else for good design than from 
 some of the abortions that are posix-timers.
 
   Linus

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 14:42 -0800, Linus Torvalds wrote:
 
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
  
  Care to elaborate on why they're a horrible crock?
 
 It's a *classic* case of an interface that tries to do everything under 
 the sun.
 
 Here's a clue: look at any system call that takes a union as part of its 
 arguments. Count them. I think we have two:
  - struct siginfo

No argument here -- just about everything related to signals is stupidly
complex.

  - struct sigevent

However, this I take issue with.

Conceptually (and what the user ends up actually using), struct sigevent
is just:

struct sigevent
{
int sigev_notify;/* delivery method */
sigval_t sigev_value /* user cookie */
int sigev_signo; /* signal number */
void (*sigev_notify_function)(sigval_t); /* thread fn */
pthread_attr_t *sigev_notify_attributes; /* thread attr */
};

You could complain about sigval_t being a union, but that's probably
just because it predates uintptr_t. (Plus, no ugly casting.)

You also could complain that the above isn't what you actually see when
you look at /usr/include/bits/siginfo.h -- there's a union involved and
some macros to hide the fact, but that's just internal implementation
details related to how threads are created and padding out the struct
for any future expansion. 

The actual complexity for understanding and using struct sigevent isn't
all that much, and once you've figured that out, you know how to
configure event delivery for AIO completion, DNS resolution, and
messages queues, not just timers.

 and they are both broken horrible interfaces where the data structures 
 depend on various flags.
 
 It's just not the UNIX system call way. And none of it really makes sense 
 if you already have a file descriptor, since at that point you know what 
 the notification mechanism is.
 
 I'd actually much rather do POSIX timers the other way around: associate a 
 generic notification mechanism with the file descriptor, and then 
 implement posix_timer_create() on top of timerfd. Now THAT sounds like a 
 clean unix-like interface (everything is a file) and would imply that 
 you'd be able to do the same kind of notification for any file descriptor, 
 not just timers.
 

But timers aren't files or even remotely file-like -- if they were a
real files, you could just
open /dev/timers/realtime/2007/June/3rd/half-past-teatime and get a
timer. (Or, more realisticly, open /dev/timer and use ioctl().)

timerfd() had to be created to coerce them into some semblance of
filehood just to make them work with existing (and new) polling/queuing
interfaces just because those interfaces can only deal with file
descriptors.

Making non-file things look like files just because that's what poll()
and friends can deal with isn't much different from holding a hammer in
your hand and looking for what you have to do in order to turn every
problem into a nail.

Sometimes you need to go back to your toolbox for a screwdriver or a
saw.


 But posix timers as they are done now are just an abomination. They are 
 not unix-like at all.
 
  And are the bugs fixed? If so, why replace them? They work now.
 
 .. but the reason for the bugs was largely a very baroque interface, which 
 didn't get fixed (because it's specified by the standard).


But the API isn't baroque.

There's a veritable boutique of clock sources to choose from, but they
all serve specific needs, it's just one parameter to timer_create, and
you probably want CLOCK_MONOTONIC anyway.

struct sigevent  might be a bit complex, but the difficultly in learning
that is amortized across all the other APIs that also use it to specify
how their events are delivered.

Delivering via signals and dealing with struct siginfo is painful, but
everything related to signals is painful. This is what you get when you
take an interface designed essentially for exception handling and start
abusing it for general information delivery. But, hey!, that's what
SIGEV_THREAD and SIGEV_PORT are for.[1]

About the worst that can be said of it is that using timer_settime to
both arm and disarm the timer and set the interval is awkward.






[1] A SIGEV_FUNCTION which skips all the signal baggage and just passes
a supplied cookie and a purpose-specific struct pointer to an
object-specific user-supplied function pointer might be interesting, but
then you run into all of the reentrancy/masking/choosing which thread to
deliver to and other issues that signals already have without the
benefit of the existing signal infrastructure for all that stuff. Gah, I
don't want to think about this anymore.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 16:35 -0800, Linus Torvalds wrote:
 
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
   
   I'd actually much rather do POSIX timers the other way around: associate 
   a 
   generic notification mechanism with the file descriptor, and then 
   implement posix_timer_create() on top of timerfd. Now THAT sounds like a 
   clean unix-like interface (everything is a file) and would imply that 
   you'd be able to do the same kind of notification for any file 
   descriptor, 
   not just timers.
   
  
  But timers aren't files or even remotely file-like
 
 What do you think a file is?
 
 In UNIX, a file descriptor is pretty much anything. You could say that 
 sockets aren't remotely file-like, and you'd be right. What's your point? 
 If you can read on it, it's a file.

Ah, I see. You're just interested in fds as a generic handle concept,
and not a more Plan 9 type thing.

If that's the goal, somebody should start thinking about reducing the
contents of struct file to the bare minimum (i.e. not much more than a
file_operations pointer).

 
 And the real point of the whole signalfd() is that there really *are* a 
 lot of UNIX interfaces that basically only work with file descriptors. Not 
 just read, but select/poll/epoll.

It'd be useful if the polling interfaces could return small datums
beyond just the POLL* flags -- having to do a read on timerfd just to
get the overrun count has a lot of overhead for just an integer, and I
imagine other things would like to pass back stuff too.


 They currently have just one timeout, but the thing is, if UNIX had just 
 had timer file descriptors, they'd not need even that one. And even with 
 the timeout, Davide's patch actually makes for a *better* timeout than the 
 ones provided by select/poll/epoll, exactly because you can do things like 
 repeating timers and absolute time etc.
 
 Much more naturally than the timer interface we currently have for those 
 system calls.
 

You still want timeouts, creating/setting/destroying at timer just for
a single call to select/poll/epoll is probably too heavy weight.

timerfd() still leaves out the basic clock selection functionality
provided by both setitimer() and timer_create().

 The same goes for signals. The whole pselect() thing shows that signals 
 really *should* have been file descriptors, and suddenly you don't need 
 pselect() at all.
 
 So the not remotely file-like is not actually a real argument. One of 
 the big *points* of UNIX was that it unified a lot under the general 
 umbrella of a file descriptor. Davide just unifies even more.

   Linus
-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 17:57 -0800, Davide Libenzi wrote:
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
 
  If that's the goal, somebody should start thinking about reducing the
  contents of struct file to the bare minimum (i.e. not much more than a
  file_operations pointer).
 
 That's already pretty smal, and the single inode (and maybe dentry) will 
 make it even smaller. Unless you want to create brazillions of signalfds,
 timerfds or asyncfds.
 

Timers don't need dentry or inode pointers or readahead state, etc., do
they? (Beyond the existing VFS expectation, that is.)

   And the real point of the whole signalfd() is that there really *are* a 
   lot of UNIX interfaces that basically only work with file descriptors. 
   Not 
   just read, but select/poll/epoll.
  
  It'd be useful if the polling interfaces could return small datums
  beyond just the POLL* flags -- having to do a read on timerfd just to
  get the overrun count has a lot of overhead for just an integer, and I
  imagine other things would like to pass back stuff too.
 ...
 
  You still want timeouts, creating/setting/destroying at timer just for
  a single call to select/poll/epoll is probably too heavy weight.
 
 Take a look at what timerfd does and what posix timers has to do to 
 implement the interface. You'll prolly stop trolling with things like a 
 lot of overhead or too heavy weight.

That wasn't a troll. I was talking about the timerfd()/close() overhead
and the corresponding bookkeeping necessary to keep that fd around
compared to just passing a struct timespec to poll or a millisecond
count to epoll_wait.

  timerfd() still leaves out the basic clock selection functionality
  provided by both setitimer() and timer_create().
 
 That is coming as soon as I fixed my send-serie script ...

Nice.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 21:31 -0800, Linus Torvalds wrote:
 
 On Sat, 10 Mar 2007, Nicholas Miell wrote:
  
  Ah, I see. You're just interested in fds as a generic handle concept,
  and not a more Plan 9 type thing.
 
 Indeed. It's a handle.
 
 UNIX has pid's for process handles, and file descriptors for just 
 about everything else.

And I imagine that somebody will come up with way of getting a fd for a
process sooner or later. 

  If that's the goal, somebody should start thinking about reducing the
  contents of struct file to the bare minimum (i.e. not much more than a
  file_operations pointer).
 
 Well, there's more there, but it really is fairly close. If you look at 
 it, a struct file ends up not having a lot more than the minimal stuff 
 required to use it as a a handle: it really isn't a very big structure. 
 
 The biggest part is actually the read-ahead state, which is arguably a 
 generic thing for a file handle, even though not all kinds will be able to 
 use it. We *could* make that be behind a pointer (along with the f_pos 
 thing, that really logically goes along with the read-ahead thing), of 
 course, but since most files probably do end up being traditional file 
 structures, it's probably not wrong to just have it in the file.
 

Actually, I was thinking reducing struct file to the bare minimum, and
then using that as the common header shared by object-specific
structures. I don't know how unpleasant that would be from a memory
allocation perspective, though.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v3 - timerfd core ...

2007-03-11 Thread Nicholas Miell

On Sun, 2007-03-11 at 16:13 -0700, Davide Libenzi wrote:
 On Sun, 11 Mar 2007, Davide Libenzi wrote:
 
  This patch introduces a new system call for timers events delivered
  though file descriptors. This allows timer event to be used with
  standard POSIX poll(2), select(2) and read(2). As a consequence of
  supporting the Linux f_op-poll subsystem, they can be used with
  epoll(2) too.
  The system call is defined as:
  
  int timerfd(int ufd, int clockid, int tmrtype, const struct timespec *utmr);
  
  The ufd parameter allows for re-use (re-programming) of an existing
  timerfd w/out going through the close/open cycle (same as signalfd).
  If ufd is -1, s new file descriptor will be created, otherwise the
  existing ufd will be re-programmed.
  The clockid parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.
  The tmrtype parameter allows to specify the timer type. The following
  values are supported:
  
  TFD_TIMER_REL
  The time specified in the utmr parameter is a relative time
  from NOW.
  
  TFD_TIMER_ABS
  The timer specified in the utmr parameter is an absolute time.
  
  TFD_TIMER_SEQ
  The time specified in the utmr parameter is an interval at
  which a continuous clock rate will be generated.
  
 
 Duh! Forgot to update the documenation. Now timerfd() gets an itimerspec.
 For TFD_TIMER_REL only the it_interval is valid, and it's the relative 
 time. For TFD_TIMER_ABS, only the it_value is valid, and that the expiry 
 absolute time. For TFD_TIMER_SEQ, it_value tells when the first tick 
 should be generated, and it_interval tells the period of the following 
 ticks.
 

You should probably make it behave like the other things that use
itimerspec, just to avoid confusion -- i.e. timers are relative by
default, there's a flag that makes them absolute, they expire when
it_value specifies, and repeat every it_interval nanoseconds if
it_interval is non-zero.

i.e.

int timerfd(int ufd, int clockid, int flags, const struct timespec
*utmr);

with TFD_TIMER_ABS in flags making the timer absolute instead of
relative (and no TFD_TIMER_REL or TFD_TIMER_SEQ at all).

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v3 - timerfd core ...

2007-03-11 Thread Nicholas Miell

On Sun, 2007-03-11 at 16:50 -0700, Nicholas Miell wrote:
 You should probably make it behave like the other things that use
 itimerspec, just to avoid confusion -- i.e. timers are relative by
 default, there's a flag that makes them absolute, they expire when
 it_value specifies, and repeat every it_interval nanoseconds if
 it_interval is non-zero.
 
 i.e.
 
 int timerfd(int ufd, int clockid, int flags, const struct timespec
 *utmr);
 
 with TFD_TIMER_ABS in flags making the timer absolute instead of
 relative (and no TFD_TIMER_REL or TFD_TIMER_SEQ at all).
 

Sorry, that should be

int timerfd(int ufd, int clockid, int flags, const struct itimerspec
*utmr);

and TFD_TIMER_ABSTIME.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Style Question

2007-03-11 Thread Nicholas Miell

On Mon, 2007-03-12 at 06:40 +0100, Jan Engelhardt wrote:
 On Mar 12 2007 13:37, Cong WANG wrote:
 
  The following code is picked from drivers/kvm/kvm_main.c:
 
  static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
  {
  struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot];
 
  mutex_lock(vcpu-mutex);
  if (unlikely(!vcpu-vmcs)) {
  mutex_unlock(vcpu-mutex);
  return 0;
  }
  return kvm_arch_ops-vcpu_load(vcpu);
  }
 
  Obviously, it used 0 rather than NULL when returning a pointer to
  indicate an error. Should we fix such issue?
 
 Indeed. If it was for me, something like that should throw a compile error.
 
 [...]
  I think it's more clear to indicate we are using a pointer rather than
  an integer when we use NULL in kernel. But in userspace, using NULL is
  for portbility of the program, although most (*just* most, NOT all) of
  NULL's defination is ((void*)0). ;-)
 
 NULL has the same bit pattern as the number zero. (I'm not saying the bit
 pattern is all zeroes. And I am not even sure if NULL ought to have the same
 pattern as zero.) So C++ could use (void *)0, if it would let itself :p

Not necessarily. You can use 0 at the source level, but the compiler has
to convert it to the actual NULL pointer bit pattern, whatever it may
be.

In C++, NULL is typically defined to 0 (with no void* cast) by most
compilers because 0 (and only 0) can be implicitly converted to to null
pointer of any ponter type without a cast. 

GCC introduced the __null extension so that NULL still works correctly
in C++ when passed to a varargs function on 64-bit platforms.

(This just works in C because C makes NULL ((void*)0) is thus is the
right size. In C++, the 0 ends up being an int instead of a pointer when
passed to a varargs function, and things tend to blow up when they read
the garbage high bits. Of course, nobody else does this, so you still
have to use (void*)NULL to be portable.)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell

On Fri, 2007-03-16 at 23:30 +0100, Mike Galbraith wrote:
 On Sat, 2007-03-17 at 08:13 +1100, Con Kolivas wrote:
  On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
   On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
Here are full patches for rsdl 0.31 for various base kernels. A full
announce with a fresh -mm series will follow...
   
http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
   31.patch
http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.31
   .patch
  
   It still has trouble with the x/gforce vs two niced encoders scenario.
   The previously reported choppiness is still present.
  
   I suspect that x/gforce landing in the expired array is the trouble, and
   that this will never be smooth without some kind of exemption.  I added
   some targeted unfairness to .30, and it didn't help much at all.
  
   Priorities going all the way to 1 were a surprise.
  
  It wasn't going to change that case without renicing X.
 
 Con.  You are trying to wedge a fair scheduler into an environment where
 totally fair simply can not possibly function.
 
 If this is your final answer to the problem space, I am done testing,
 and as far as _I_ am concerned, your scheduler is an utter failure.
 

Sorry, I haven't really been following this thread and now I'm confused.

You're saying that it's somehow the scheduler's fault that X isn't
running with a high enough priority?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell

On Sat, 2007-03-17 at 06:56 +0100, Mike Galbraith wrote:
 On Fri, 2007-03-16 at 21:24 -0700, Nicholas Miell wrote:
 
  Sorry, I haven't really been following this thread and now I'm confused.
  
  You're saying that it's somehow the scheduler's fault that X isn't
  running with a high enough priority?
 
 I'm saying that the current scheduler adjusts for interactive loads,
 this new one doesn't.  I'm seeing interactivity regressions, and they
 are not fixed with nice unless nice is used to maximum effect.  I'm
 saying yes, I can lower my expectations, but no I don't want to.
 
 A four line summary is as short as I can make it.
 
   -Mike

Uh, no. Essentially, the current scheduler works around X's brokenness,
in an often unpredictable manner.

RSDL appears to be completely deterministic, which is a very strong
virtue.

The X people have plans for how to go about fixing this, but until then,
there's no reason to hold up kernel development.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-17 Thread Nicholas Miell

On Sat, 2007-03-17 at 00:25 -0700, William Lee Irwin III wrote:
 On Sat, Mar 17, 2007 at 08:11:57AM +0100, Mike Galbraith wrote:
  On a side note, I wonder how long it's going to take to fix all the
  X/client combinations out there.
 
 AIUI X's clients largely access it via libraries X ships, so the X
 update will sweep the vast majority of them in one shot. You'll have
 to either run the clients from remote hosts with downrev libraries or
 have downrev libraries around (e.g. in chroots) for clients to link to
 for the clients not to cooperate.
 

The changes will probably be entirely server-side anyway, so stray
ancient libraries won't be a problem.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-17 Thread Nicholas Miell

(sorry for the duplicate Ingo, this time I managed to Repy to All)

On Sat, 2007-03-17 at 08:45 +0100, Ingo Molnar wrote:
 * Nicholas Miell [EMAIL PROTECTED] wrote:
 
  The X people have plans for how to go about fixing this, [...]
 
 then we'll first have wait for those X changes to at least be done in a 
 minimal manner so that they can be tested for real with RSDL. (is it 
 _really_ due to that? Or will X regress forever once we switch to RSDL?)

Yes, it's an X problem.

There's two issues, really -- smooth pointer movement or the lack
thereof and the servicing of clients at varying priorities. There's
vague plans floating around about moving all input processing off into a
separate high-priority thread and pretty much no ideas how to deal with
mixed priority clients.

So, the current scheduler works around this brain damage using
heuristics that sort of do the job and sometimes screw things up.

 We cannot regress the scheduling of a workload as important as X mixed 
 with CPU-intense tasks. And in theory this should be fixed if X is 
 fixed does not cut it. X is pretty much _the_ most important thing to 
 optimize the interactive behavior of a Linux scheduler for. Also, 
 paradoxically, it is precisely the improvement of _X_ workloads that 
 RSDL argues with.
 
 this regression has to be fixed before RSDL can be merged, simply 
 because it is a pretty negative effect that goes beyond any of the 
 visible positive improvements that RSDL brings over the current 
 scheduler. If it is better to fix X, then X has to be fixed _first_, at 
 least in form of a prototype patch that can be _tested_, and then the 
 result has to be validated against RSDL.
 

RSDL is, above all else, fair. Predictably so.
Hacking around X's stupidity makes it no longer *be* RSDL.

Until they catch up to the early-90s technology-wise, we can just nice
-19 X.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote:
 On Fri, 2 Mar 2007, Ingo Molnar wrote:
 
  
  * Davide Libenzi davidel@xmailserver.org wrote:
  
   I think that the dirty FPU context must, at least, follow the new 
   head. That's what the userspace sees, and you don't want an async_exec 
   to re-emerge with a different FPU context.
  
  well. I think there's some confusion about terminology, so please let me 
  describe everything in detail. This is how execution goes:
  
outer loop() {
call_threadlet();
}
  
  this all runs in the 'head' context. call_threadlet() always switches to 
  the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
  while executing the threadlet function, we block, then the 
  threadlet-thread gets to keep the task (the threadlet stack and also the 
  FPU), and blocks - and we pick a 'new head' from the thread pool and 
  continue executing in that context - right after the call_threadlet() 
  function, in the 'old' head's stack. I.e. it's as if we returned 
  immediately from call_threadlet(), with a return code that signals that 
  the 'threadlet went async'.
  
  now, the FPU state that was when the threadlet blocked is totally 
  meaningless to the 'new head' - that FPU state is from the middle of the 
  threadlet execution.
 
 For threadlets, it might be. Now think about a task wanting to dispatch N 
 parallel AIO requests as N independent syslets.
 Think about this task having USEDFPU set, so the FPU context is dirty.
 When it returns from async_exec, with one of the requests being become 
 sleepy, it needs to have the same FPU context it had when it entered, 
 otherwise it won't prolly be happy.
 For the same reason a schedule() must preserve/sync the prev FPU 
 context, to be reloaded at the next FPU fault.

The point Ingo was making is that the x86 ABI already requires the FPU
context to be saved before *all* function calls.

Unfortunately, this isn't true of other ABIs -- looking over the psABIs
specs I have laying around, AMD64, PPC64, and MIPS require at least part
of the FPU state to be preserved across function calls, and I'm sure
this is also true of others.

Then there's the other nasty details of new thread creation --
thankfully, the contents of the TLS isn't inherited from the parent
thread, but it still needs to be initialized; not to mention all the
other details involved in pthread creation and destruction.

I don't see any way around the pthread issues other than making a libc
upcall on return from the first system call that blocked.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, Threadlets, generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
 On Fri, 2 Mar 2007, Nicholas Miell wrote:
 
  The point Ingo was making is that the x86 ABI already requires the FPU
  context to be saved before *all* function calls.
 
 I've not seen that among Ingo's points, but yeah some status is caller 
 saved. But, aren't things like status word and control bits callee saved? 
 If that's the case, it might require proper handling.
 

Ingo mentioned it in one of the parts you cut out of your reply:

 and here is where thinking about threadlets as a function call and not 
 as an asynchronous context helps alot: the classic gcc convention for 
 FPU use  function calls should apply: gcc does not call an external 
 function with an in-use FPU stack/register, it always neatly unuses it, 
 as no FPU register is callee-saved, all are caller-saved.

The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
XMM or MXCSR registers) and a bit vague (no mention at all of the FP
status word), but I'm fairly certain that Ingo is right.


-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-05 Thread Nicholas Miell

On Tue, 2007-03-06 at 05:41 +0100, Willy Tarreau wrote:
 On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote:
  On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
   jos poortvliet wrote:
Well, imho his current staircase scheduler already does a better job
compared to mainline, but it won't make it in (or at least, it's not
likely). So we can hope this WILL make it into mainline, but I wouldn't
count on it.
  
   Wrong problem, what is really needed is to get CPU scheduler choice into
   mainline, just as i/o scheduler finally did. Con has noted that for some
   loads this will present suboptimal performance, as will his -ck patches,
   as will the default scheduler. Instead of trying to make ANY one size
   fit all, we should have a means to select, at runtime, between any of
   the schedulers, and preferably to define an interface by which a user
   can insert a new scheduler in the kernel (compile in, I don't mean
   plugable) with clear and well defined rules for how that can be done.
  
  Been there, done that. Wli wrote the infrastructure for plugsched; I took 
  his 
  code and got it booting and ported 3 or so different scheduler designs. It 
  allowed you to build as few or as many different schedulers into the kernel 
  and either boot the only one you built into your kernel, or choose a 
  scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
  After that I gave up on that code and handed it over to Peter Williams who 
  still maintains it. So please note that I pushed the plugsched barrow 
  previously and still don't think it's a bad idea, but the maintainers think 
  it's the wrong approach.
 
 In a way, I think they are right. Let me explain. Pluggable schedulers are
 useful when you want to switch away from the default one. This is very useful
 during development of a new scheduler, as well as when you're not satisfied
 with the default scheduler. Having this feature will incitate many people to
 develop their own scheduler for their very specific workload, and nothing
 generic. It's a bit what happened after all : you, Peter, Nick, and Mike
 have worked a lot trying to provide alternative solutions.
 
 But when you think about it, there are other OSes which have only one 
 scheduler
 and which behave very well with tens of thousands of tasks and scale very well
 with lots of CPUs (eg: solaris). So there is a real challenge here to try to
 provide something at least as good and universal because we know that it can
 exist. And this is what you finally did : work on a scheduler which ought to 
 be
 good with any workload.

Solaris has a pluggable scheduler framework (each policy --
OTHER/FIFO/RR/etc. -- is it's own separate component).

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development Objective-C

2007-11-30 Thread Nicholas Miell


On Sat, 2007-12-01 at 00:19 +0100, J.A. Magallón wrote:

 An vtable in C++ takes exactly the same space that the function
 table pointer present in every driver nowadays... and probably
 the virtual method call that C++ does itself with
 
   thing-do_something(with,this)
 
 like
   push thing
   push with
   push this
   call THING_vtable+indexof(do_something) // constants at compile time
 
 is much more efficient that what gcc can mangle to do with
 
   thing-do_something(with,this,thing)
 
   push with
   push this
   push thing
   get thing+offsetof(do_something) // not constant at compile time
   dereference it
   call it
 
 (that is, get a generic field on a structure and use it as jump address)
 
 In short, the kernel is object oriented, implements OO programming by
 hand, but the compiler lacks the knowledge that it is object oriented
 programming so it could do some optimizations.

struct test;
struct testVtbl
{
int (*fn1)(struct test *t, int x, int y);
int (*fn2)(struct test *t, int x, int y);
};
struct test
{
struct testVtbl *vtbl;
int x, y;
};
void testCall(struct test *t, int x, int y)
{
t-vtbl-fn1(t, x, y);
t-vtbl-fn2(t, x, y);
}

and

struct test
{
virtual int fn1(int x, int y);
virtual int fn2(int x, int y);

int x, y;
};

void testCall(struct test *t, int x, int y)
{
t-fn1(x, y);
t-fn2(x, y);
}

generate instruction-for-instruction identical code.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Markers Implementation for Preempt RCU Boost Tracing

2008-01-02 Thread Nicholas Miell


On Wed, 2008-01-02 at 11:33 -0500, Frank Ch. Eigler wrote:
 Hi -
 
 On Wed, Jan 02, 2008 at 01:47:34PM +0100, Ingo Molnar wrote:
  [...]
   FWIW, I'm not keen about the format strings either, but they don't 
   constitute a performance hit beyond an additional parameter.  It does 
   not need to actually get parsed at run time.
  
  only an additional parameter. The whole _point_ behind these markers 
  is for them to have minimal effect!
 
 Agreed.  The only alternative I recall seeing proposed was my own
 cartesian-product macro suite that encodes parameter types into the
 marker function/macro name itself.  (Maybe some of that could be
 hidden with gcc typeof() magic.)  There appeared to be a consensus
 that this was more undesirable.  Do you agree?
 
 

C++ name mangling would be extremely useful here.


Actually, why isn't the DWARF information for the functions sufficient?

-- 
Nicholas Miell [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Should parent's WIFSIGNALED(siginfo-si_status) be true EVEN IF the SIGNAL was caught by the child?

2007-09-22 Thread Nicholas Miell

On Sat, 2007-09-22 at 11:22 -0700, John Z. Bohach wrote:
 Hello,
 
 It is unclear from the various documentions in the kernel and glibc what 
 the proper behaviour should be for the case when a child process 
 catches a SIGNAL (say for instance, SIGTERM), and then calls exit() 
 from within its caught SIGNAL handler.
 
 Since the exit() will cause a SIGCHLD to the parent, and the parent 
 (let's say) has a SIGCHLD sigaction (SA_SIGINFO sa_flags set), should 
 the parent's WIFSIGNALED(siginfo-si_status) be true?
 
 To recap, the WIFSIGNALED section of the waitpid() manpage says:
 
 WIFSIGNALED(status)
 returns true if the child process was terminated by a signal.
 
 So the dilemna:  the child caught the signal, so it wasn't terminated by 
 a signal, but rather its signal handler (let's say) called exit.

POSIX says

WIFSIGNALED(stat_val)
Evaluates to a non-zero value if status was returned for a child
process that terminated due to the receipt of a signal that was
not caught (see signal.h).

So there's no dilemma at all and Linux is non-conformant.
 
-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Out of memory management in embedded systems

2007-09-28 Thread Nicholas Miell

On Fri, 2007-09-28 at 11:15 -0400, Rik van Riel wrote:
 On Fri, 28 Sep 2007 16:36:34 +0200
 Eric Dumazet [EMAIL PROTECTED] wrote:
 
  On Fri, 28 Sep 2007 10:17:11 -0400
  Rik van Riel [EMAIL PROTECTED] wrote:
  
   On Fri, 28 Sep 2007 10:04:23 -0400
   linux-os \(Dick Johnson\) [EMAIL PROTECTED] wrote:
On Fri, 28 Sep 2007, [iso-8859-1] Daniel Spång wrote:

 On 9/28/07, linux-os (Dick Johnson) [EMAIL PROTECTED] wrote:

 On Fri, 28 Sep 2007, [iso-8859-1] Daniel Spång wrote:
   
 Some kind of notification to the application that the available 
 memory
 is scarce and let the application free up some memory (e.g., by
 flushing caches), could be used to improve the situation 
   
Any networked appliance can (will) throw data away if there are
no resources available.
   
   That is exactly what Daniel proposed in his first email.
   
   I think his idea makes sense.
  
  IBM AIX uses SIGDANGER, that kernel can raise in OOM conditions to warn
  processes that are willing to handle this signal (default action for the
   SIGDANGER signal is to ignore the signal)
 
 I suspect that SIGDANGER is not the right approach, because glibc
 memory arenas cannot be manipulated from inside a signal handler.
 
 Also, nearly OOM is not the only such signal we would want to
 send to userspace programs. It would also be useful to inform
 userspace programs when we are about to start swapping something
 out, so userspace can discard cached data instead of having to
 wait for disk IO in the future.
 
 A unix signal cannot encapsulate two different messages, while
 something like a /dev/lowmem device can simply be added into
 the program's main poll() loop and give many different messages.

SIGDANGER could stick useful information in siginfo_t's si_code field
and be delivered via a signalfd.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Syba 8-Port Serial Card Unidentified By Kernel

2007-10-05 Thread Nicholas Miell

On Fri, 2007-10-05 at 17:31 -0400, Chris Bergeron wrote:
 Hello all,
 
 I've just installed a multiport serial card released by an outfit called 
 Syba.  This is an 8 port serial-only card with an Octopus style breakout 
 cable.  The main chipset on it is an ITE IT8871F.
 
 I've got two questions on this: Is there a driver I should try force 
 loading on it (and if so, what options to use)?  and is there any useful 
 testing I can do to report back to the LKML on this card?
 
 Thanks for your time, diagnostic output follows.
 -- Chris
 
 The following comes up from an lspci -vv
 
 01:06.0 Serial controller: PLX Technology, Inc. Unknown device 9016 (rev 
 01) (prog-if 02 [16550])
 Subsystem: Unknown device 544e:0008
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
 ParErr- Stepping- SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
 TAbort- MAbort- SERR- PERR-
 Interrupt: pin A routed to IRQ 16
 Region 0: I/O ports at a000 [size=64]
 Region 1: I/O ports at a400 [size=16]
 Region 2: I/O ports at a800 [size=16]
 Region 3: Memory at f500 (32-bit, non-prefetchable) [size=4K]
 Region 4: Memory at f5001000 (32-bit, non-prefetchable) [size=4K]
 Region 5: Memory at f5002000 (32-bit, non-prefetchable) [size=4K]

Try echo -n 10b5 9016  /sys/bus/pci/drivers/serial/new_id and let
Russell King ([EMAIL PROTECTED]) know if it works (or if it
doesn't, for that matter).

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] i386: bitops: Kill needless usage of asm volatile

2007-07-23 Thread Nicholas Miell

On Mon, 2007-07-23 at 23:30 +0200, Andi Kleen wrote:
  gcc also tries to count the number of instructions, to guess how large in
  bytes the asm block is, as it could make a difference for near vs short
  jumps, etc.
 
 Are you sure? I doubt it. It would need a full asm parser to do this
 properly and then even it could be wrong 
 (e.g. when the sections are switched like Linux does extensively)  
 
 gcc doesn't have such a parser.
 
 Also on x86 gcc doesn't need to care about long/short anyways because
 the assembler takes care of this and the other instructions who cared
 about this (loop) isn't generated anymore.
 
 You're probably confusing it with some other compiler, who sometimes
 do this. e.g. the Microsoft inline asm syntax requires assembler parsing
 in the compiler.
 
  I wonder it it also affects the instruction count the inline heuristics
  use?
 
 AFAIK it counts like one operand.
 
 -Andi

GCC counts newlines and semicolons and uses that number as the likely
instruction count.

See asm_insn_count() in gcc/gcc/final.c

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] FUSE: mnotify (was: [RFC] VFS: mnotify)

2007-08-12 Thread Nicholas Miell

On Sun, 2007-08-12 at 13:24 +0200, Jan Engelhardt wrote:
 On Aug 12 2007 06:32, Al Boldi wrote:
 Al Boldi wrote:
  Jakob Oestergaard wrote:
   Why on earth would you cripple the kernel defaults for ext3 (which is a
   fine FS for boot/root filesystems), when the *fundamental* problem you
   really want to solve lie much deeper in the implementation of the
   filesystem?  Noatime doesn't solve the problem, it just makes it less
   horrible.
 
  inotify could easily solve the atime problem, but it's got the drawback of
  forcing the user to register each and every file/dir of interest, which
  isn't really reasonable on TB-filesystems.
 
 What inotify needs is some kind of SUBDIR flag on a watch so that one does not
 run out of fds, then the TB issue becomes a bit lighter I think.
 

There's no risk of running out of fds; inotify only requires one. You
still have to register every directory you're interested in, though, but
that's a limitation caused by the Unix VFS philosophy and the resulting
filesystem design it inspired rather than of inotify itself.

Come up with a filesystem where given an inode you can find every
directory that has links to that inode with very little effort, convince
everybody to switch from ext3 to this new filesystem, and then maybe
inotify could start doing recursive subtree watches. Otherwise, it's
just not feasible.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23

2007-10-10 Thread Nicholas Miell

On Tue, 2007-10-09 at 13:54 -0700, Linus Torvalds wrote:
 Finally.
 
 Yeah, it got delayed, not because of any huge issues, but because of 
 various bugfixes trickling in and causing me to reset my release clock 
 all the time. But it's out there now, and hopefully better for the wait.
 
 Not a whole lot of changes since -rc9, although there's a few updates to 
 mips, sparc64 and blackfin in there.  Ignoring those arch updates, there's 
 basically a number of mostly one-liners (mostly in drivers, but there's 
 some networking fixes and soem VFS/VM fixes there too).
 
 Shortlog and diffstat appended (both relative to -rc9, of course - the 
 full log from 2.6.22 is on kernel.org as usual).
 
 I want this to be what people look at for a few days, but expect the x86 
 merge to go ahead after that. So far, all indications are still that it's 
 going to be all smooth sailing, but hey, those indicators seem to always 
 say that, and only after the fact do people notice any problems ;)
 
   Linus

Does CFS still generate the following sysbench graphs with 2.6.23, or
did that get fixed?

http://people.freebsd.org/~kris/scaling/linux-pgsql.png
http://people.freebsd.org/~kris/scaling/linux-mysql.png

(There's also some interesting FreeBSD vs. Linux graphs in
http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf , but
AFAIK those comparisons are more indicative of glibc malloc performance
than Linux performance.)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23

2007-10-10 Thread Nicholas Miell

On Wed, 2007-10-10 at 12:14 +0200, Ingo Molnar wrote:
 * Nicholas Miell [EMAIL PROTECTED] wrote:
 
  Does CFS still generate the following sysbench graphs with 2.6.23, or 
  did that get fixed?
 
  http://people.freebsd.org/~kris/scaling/linux-pgsql.png 
  http://people.freebsd.org/~kris/scaling/linux-mysql.png
 
 as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
 
 http://redhat.com/~mingo/misc/sysbench.jpg

That's nice to know. Note that I'm not actually involved in any of these
tests, just a somewhat interested bystander.

 
 As you can see it in the graph, v2.6.23 schedules much more consistently 
 too. [ v2.6.22 has a small (but potentially statistically insignificant) 
 edge at 4-6 clients, and CFS has a slightly better peak (which is 
 statistically insignificant). ]
 
 ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
   1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
   nor in the setup - everything is pretty close to the defaults. )
 
 i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it 
 apparently got resolved after various changes to the test environment:
 
http://jeffr-tech.livejournal.com/10103.html
 
   [CFS] has virtually no dropoff and performs better under load than
the default 2.6.21 scheduler.  (paraphrased)
 
 (The new link you posted, just a few hours after the release of v2.6.23, 
 has not been reported to lkml before AFAICS - when did you become aware 
 of it? If you learned about it before v2.6.23 it might have been useful 
 to report it to the v2.6.23 regression list.)

According to my IRC logs, Jeffr pasted the URL at Oct 09 22:53:56 PDT.
He says he tried to contact you early in CFS's development, but got no
reply.

 At a quick glance there are no .configs or other testing details at or 
 around that URL that i could use to reproduce their result precisely, so 
 at least a minimal bugreport would be nice.
 

AFAICT, the configuration is described in
http://people.freebsd.org/~kris/scaling/mysql.html


-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nicholas Miell

On Fri, 2007-04-27 at 12:55 -0400, Theodore Tso wrote:
 On Thu, Apr 26, 2007 at 10:15:28PM -0700, Andrew Morton wrote:
  And hardware gets better.  If Intel  AMD come out with a 16k pagesize
  option in a couple of years we'll look pretty dumb.  If the problems which
  you're presently having with that controller get sorted out in the next
  generation of the hardware, we'll also look pretty dumb.
 
 Unfortunately, this isn't a problem with hardware getting better, but
 a willingness to break backwards compatibility.
 
 x86_64 uses a 4k page size to avoid breaking 32-bit applications.  And
 unfortunately, iirc, even 64-bit applications are continuing to depend
 on 4k page alignments for things like the text and bss segments.  If
 the userspace ELF and other compiler/linker specifications were
 appropriate written so they could handle 16k pagesizes, maybe 5 years
 from now we could move to a 16k pagesize.  But this is going to
 require some coordination between the userspace binutils folks and
 AMD/Intel in order to plan such a migration.
 
   - Ted

The AMD64 psABI requires binaries to work with any page size up to 64k.

Whether that's true in practice is another matter entirely, of course.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH x86_64] Live Patching Function on 2.6.11.7

2005-04-17 Thread Nicholas Miell

On Mon, 2005-04-18 at 00:42 -0400, Daniel Jacobowitz wrote:
 On Mon, Apr 18, 2005 at 01:19:57PM +0900, Takashi Ikebe wrote:
  GDB based approach seems not fit to our requirements. GDB(ptrace) based 
  functions are basically need to be done when target process is stopping.
  In addition to that current PTRACE_PEEK/POKE* allows us to copy only a 
  *word* size...
 
 While true, this is easily fixable.  There is even an interface
 precedent on OpenBSD (and possibly other platforms as well).
 

If we're going to be stealing ideas for debugging interfaces from other
operating systems, could we steal from Solaris instead of anything
ptrace-based?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Disable the debug.exception-trace sysctl by default

2005-07-28 Thread Nicholas Miell

debug.exception-trace causes a large amount of log spew when on, and
it's on by default, which is an irritation.

Here's a patch to turn it off.

--- linux-2.6.12/arch/x86_64/mm/fault.c.~1~ 2005-06-28
21:33:27.0 -0700
+++ linux-2.6.12/arch/x86_64/mm/fault.c 2005-07-27 23:46:10.0
-0700
@@ -284,7 +284,7 @@
 }
 
 int page_fault_trace = 0;
-int exception_trace = 1;
+int exception_trace = 0;
 
 /*
  * This routine handles page faults.  It determines the address,


-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2007-01-06 Thread Nicholas Miell

On Sat, 2007-01-06 at 11:18 -0800, H. Peter Anvin wrote:
 Randy Dunlap wrote:
 
  BTW, yesterday my 2.4 patches were not published, but I noticed that
  they were not even signed not bziped on hera. At first I simply thought
  it was related, but right now I have a doubt. Maybe the automatic script
  has been temporarily been disabled on hera too ?
  The script that deals with the uploads also deals with the packaging -
  so yes the problem is related.
  
  and with the finger_banner and version info on www.kernel.org page?
 
 Yes, they're all connected.
 
 The load on *both* machines were up above the 300s yesterday, probably 
 due to the release of a new Knoppix DVD.
 
 The most fundamental problem seems to be that I can't tell currnt Linux 
 kernels that the dcache/icache is precious, and that it's way too eager 
 to dump dcache and icache in favour of data blocks.  If I could do that, 
 this problem would be much, much smaller.
 
   -hpa

Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
this?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Disable the debug.exception-trace sysctl by default

2005-08-03 Thread Nicholas Miell

On Wed, 2005-08-03 at 11:03 +0200, Andi Kleen wrote:
 On Wed, Jul 27, 2005 at 11:53:30PM -0700, Nicholas Miell wrote:
  debug.exception-trace causes a large amount of log spew when on, and
  it's on by default, which is an irritation.
 
  Here's a patch to turn it off.
 Rejected. 

Why?

Getting 5000 lines of
inkscape[13137] trap int3 rip:425051 rsp:7fa26158 error:0
in my logs every time I ltrace something is vastly irritating and serves
no useful purpose.

Admittedly, I can (and have) turned this off, but disabling it by
default will probably save somebody else the trouble of figuring out
where this crap is coming from and how to kill it.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

overcommit verses MAP_NORESERVE

2005-08-06 Thread Nicholas Miell

Why does overcommit in mode 2 (OVERCOMMIT_NEVER) explicitly force
MAP_NORESERVE mappings to reserve memory?

My understanding is that MAP_NORESERVE is a way for apps to state that
they are aware that the memory allocated may not exist and that they
might get a SIGSEGV and that's OK with them.

Failing to do this makes certain well-know apps (*cough* Sun Java
*cough*) fail to run, which seems to be rather unhelpful.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: overcommit verses MAP_NORESERVE

2005-08-08 Thread Nicholas Miell

On Sun, 2005-08-07 at 12:49 +0100, Alan Cox wrote:
 On Sad, 2005-08-06 at 20:52 -0700, Nicholas Miell wrote:
  Why does overcommit in mode 2 (OVERCOMMIT_NEVER) explicitly force
  MAP_NORESERVE mappings to reserve memory?
  
  My understanding is that MAP_NORESERVE is a way for apps to state that
  they are aware that the memory allocated may not exist and that they
  might get a SIGSEGV and that's OK with them.
 
 Because a MAP_NORESERVE space that is filled with pages might cause
 insufficient memory to be left available for another object that is not
 MAP_NORESERVE.
 
 You are right it could be improved but that would require someone
 writing code that forcibly reclaimed MAP_NORESERVE objects when we were
 close to out of memory.  At the moment nobody has done this, but nothing
 is stopping someone having a go.

I don't think you can forcibly reclaim MAP_NORESERVE objects (I'm
assuming you mean completely throwing away dirty pages).

MAP_NORESERVE isn't standardized, so all we can go by is what everybody
else does (and what makes the most sense).

Based on the Linux and Solaris man pages (none of FreeBSD, Irix, HP-UX,
or AIX implement anything similar), I think calls to mmap() with
MAP_NORESERVE should always succeed (regardless of memory conditions,
not other errors) and individual writes to unallocated pages in a
MAP_NORESERVE region should either allocate a new page if possible or
send a SIGSEGV without triggering the OOM killer.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-21 Thread Nicholas Miell

On Sun, 2007-01-21 at 05:03 -0500, Robert P. J. Day wrote:
   Introduce the TRUE and FALSE boolean macros so that everyone can
 stop re-inventing them, and remove the one occurrence in the source
 tree that clashes with that change.
 

If you're going to introduce true and false macros, you should probably
use the official all-lowercase C99 version.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sigaction's ucontext_t with incorrect stack reference when SA_SIGINFO is being used ?

2007-01-22 Thread Nicholas Miell

On Mon, 2007-01-22 at 09:57 +0100, Xavier Roche wrote:
 Hi folks,
 
 I have a probably louzy question regarding sigaction() behaviour when an
 alternate signal stack is used: it seems that I can not get the user
 stack reference in the ucontext_t stack context ; ie. the uc_stack
 member contains reference of the alternate signal stack, not the stack
 that was used before the crash.
 
 Is this is a normal behaviour ? Is there a way to retrieve the original
 user's stack inside the signal callback ?
 
 The example given below demonstrates the issue:
 top of stack==0x7f3d7000, alternative_stack==0x501010
 SEGV==0x7f3d6ff8; sp==0x501010; current stack is the alternate stack
 
 It is obvious that the SEGV was a stack overflow: the si_addr address is
 just on the page below the stack limit.

POSIX says:
the third argument can be cast to a pointer to an object of type
ucontext_t to refer to the receiving thread's context that was
interrupted when the signal was delivered.

so if uc_stack doesn't point to the stack in use immediately prior to
signal generation, this is a bug.

(In theory I should be able to pass the ucontext_t supplied to the
signal handler to setcontext() and resume execution exactly where I left
off -- glibc's refusal to support kernel-generated ucontexts gets in the
way of this, but the point still stands.)

I have no idea who to bother about i386 signal delivery, though. (And I
suspect this bug has probably been copied to other architectures as
well.)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Nicholas Miell

On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
 Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
 wait_hpet_tick is optimized away to a never ending loop and the kernel
 hangs on boot in timer setup.
 
 001a wait_hpet_tick:
   1a:   55  push   %ebp
   1b:   89 e5   mov%esp,%ebp
   1d:   eb fe   jmp1d wait_hpet_tick+0x3
 
 This is not a problem with gcc 3.3.5.  Adding barrier() calls to
 wait_hpet_tick does not help, making the variables volatile does.
 
 Signed-off-by: Keith Owens kaos@ocs.com.au
 
 ---
  arch/i386/kernel/time_hpet.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 Index: linux-2.6/arch/i386/kernel/time_hpet.c
 ===
 --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
 +++ linux-2.6/arch/i386/kernel/time_hpet.c
 @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
   */
  static void __devinit wait_hpet_tick(void)
  {
 - unsigned int start_cmp_val, end_cmp_val;
 + unsigned volatile int start_cmp_val, end_cmp_val;
  
   start_cmp_val = hpet_readl(HPET_T0_CMP);
   do {

When you examine the inlined functions involved, this looks an awful lot
like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278

Perhaps SUSE should fix their gcc instead of working around compiler
problems in the kernel?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Nicholas Miell

On Wed, 2006-11-29 at 15:30 +1100, Keith Owens wrote:
 David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
 From: Keith Owens kaos@ocs.com.au
 Date: Wed, 29 Nov 2006 14:56:20 +1100

  Secondly, I believe that this is a separate problem from bug 22278.
  hpet_readl() is correctly using volatile internally, but its result is
  being assigned to a pair of normal integers (not declared as volatile).
  In the context of wait_hpet_tick, all the variables are unqualified so
  gcc is allowed to optimize the comparison away.

  The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
  where the return value from hpet_readl() is assigned to a normal
  variable.  Nothing in the C standard says that those unqualified
  variables should be magically treated as volatile, just because the
  original code that extracted the value used volatile.  IOW, time_hpet.c
  needs to declare any variables that hold the result of hpet_readl() as
  being volatile variables.

 I disagree with this.

 readl() returns values from an opaque source, and it is declared
 as such to show this to GCC.  It's like a function that GCC
 cannot see the implementation of, which it cannot determine
 anything about wrt. return values.

 The volatile'ness does not simply disappear the moment you
 assign the result to some local variable which is not volatile.

 Half of our drivers would break if this were true.

 This is definitely a gcc bug, 4.1.0 is doing something weird.  Compile
 with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.

 Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
 below and the problem disappears.

My theory: gcc is inlining readl into hpet_readl (readl is an inline
function, so it should be doing this no matter what), and inlining
hpet_readl into wait_hpet_tick (otherwise, it can't possibly make any
assumptions about the return values of hpet_readl -- this looks to be a
SUSE-specific over-aggressive optimization), and somewhere along the way
the volatile qualifier is getting lost.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add the debugfs interface for the sysprof tool

2008-02-23 Thread Nicholas Miell


On Sun, 2008-02-24 at 04:49 +0200, Pekka Enberg wrote:
 Hi Andrew,
 
 Andrew Morton wrote:
  I didn't need to write a new kernel module to enable that
  thirteen-character shell script, and I don't believe one needs to write a
  new kernel module to put a nice easy-to-use GUI around oprofile either.
  
  This is one of those i-cant-believe-im-having-this-discussion discussions.
 
 Sysprof tracks the full stack frame so it can provide meaningful call 
 tree (who called what) which is invaluable for spotting hot _paths_. I 
 don't see how oprofile can do that as it tracks instruction pointers only.
 
   Pekka

You could try passing the --callgraph option to opcontrol.

-- 
Nicholas Miell [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] PID namespace design bug, workaround

2007-11-02 Thread Nicholas Miell

On Fri, 2007-11-02 at 10:39 -0700, Linus Torvalds wrote:
 
 On Fri, 2 Nov 2007, Dave Hansen wrote:
  
  There are certainly more of these, but here is one In the futex
  userspace address, we install the current pid's vnr into a userspace
  address.  
 
 Now, realistically, why not just say you can't use these things across 
 namespaces? Does anybody really care? After all, somebody who screws this 
 up only screws himself, not anybody else.
 
   Linus

Accessing the same robust futex from different PID namespaces on the
same machine via a shared file mapping is logically equivalent to
accessing the same robust futex from different machines via a shared
filesystem and there's no reason to expect either operation to work
correctly.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: JIT emulator needs

2007-06-08 Thread Nicholas Miell

On Fri, 2007-06-08 at 12:10 +0100, Alan Cox wrote:
  e. mremap() flag to get a read/write mapping of a read/exec one
  f. mremap() flag to get a read/exec mapping of a read/write one
  g. mremap() flag to make the 5th arg (new addr) be the upper limit
 
 This is all mprotect and munmap.

I think he's asking for a way to copy an existing mapping, which does
sound genuinely useful. (i.e. mremap(ptr, size, size, MREMAP_COPY), with
no need to mess with files to get multiple mappings of the same region)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
 In a stunning turn of events, I've actually been able to make another -rc 
 release despite all the discussion (*cough*flaming*cough*) about other 
 issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
 out there!
 

signalfd still has the broken behavior w.r.t. signal delivery to
threads.

Is this going to get fixed before 2.6.22 proper is released, or should
it just be disabled entirely so no userspace apps grow to depend on
current wrong behavior?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sun, 2007-06-17 at 10:01 -0700, Davide Libenzi wrote:
 On Sun, 17 Jun 2007, Nicholas Miell wrote:
 
  On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
   In a stunning turn of events, I've actually been able to make another -rc 
   release despite all the discussion (*cough*flaming*cough*) about other 
   issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
   out there!
   
  
  signalfd still has the broken behavior w.r.t. signal delivery to
  threads.
  
  Is this going to get fixed before 2.6.22 proper is released, or should
  it just be disabled entirely so no userspace apps grow to depend on
  current wrong behavior?
 
 At the moment, with Ben's patch applied, signalfd can see all group-sent 
 signals, and locally-directed thread signals.

But there's still no way for multiple threads to read from a single
signalfd and get their own thread-specific signals in addition to
process-wide signals, right? I think this was agreed to be the least
surprising behavior.

 Linus, we can leave this as is, or we can use the ququed-signalfd that was
 implemented in the first versions of signalfd. In such case, since 
 signalfd hooks to the sighand, all signals will be visible to signalfd and 
 they will not compete against dequeue_signal with the tasks. So there will 
 be no races in the queue retrieval. The issue that remained to be solved 
 was a simple way to limit memory allocated by the queue.
 What do you prefer?
 
 
 
 - Davide

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sun, 2007-06-17 at 16:49 -0700, Davide Libenzi wrote:
 On Sun, 17 Jun 2007, Nicholas Miell wrote:
 
  On Sun, 2007-06-17 at 10:01 -0700, Davide Libenzi wrote:
   On Sun, 17 Jun 2007, Nicholas Miell wrote:
   
On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
 In a stunning turn of events, I've actually been able to make another 
 -rc 
 release despite all the discussion (*cough*flaming*cough*) about 
 other 
 issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
 out there!
 

signalfd still has the broken behavior w.r.t. signal delivery to
threads.

Is this going to get fixed before 2.6.22 proper is released, or should
it just be disabled entirely so no userspace apps grow to depend on
current wrong behavior?
   
   At the moment, with Ben's patch applied, signalfd can see all group-sent 
   signals, and locally-directed thread signals.
  
  But there's still no way for multiple threads to read from a single
  signalfd and get their own thread-specific signals in addition to
  process-wide signals, right? I think this was agreed to be the least
  surprising behavior.
 
 Multiple threads can wait on the signalfd. Each one will dequeue either 
 its own private signals (tsk-pending) or the process shared ones 
 (tsk-signal-shared_pending). This will be the behaviour once Ben's patch 
 is applied.
 

Ah, ok, that's great.

I didn't see anything like that in linux.git, missed Ben's patch to the
list, and mixed up your description with the original TIF_SIGPENDING
work.

Sorry for the confusion.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fix signalfd interaction with thread-private signals

2007-06-22 Thread Nicholas Miell

On Sat, 2007-06-23 at 09:19 +1000, Benjamin Herrenschmidt wrote:
 On Sat, 2007-06-23 at 09:16 +1000, Benjamin Herrenschmidt wrote:
  On Fri, 2007-06-22 at 15:47 -0700, Linus Torvalds wrote:
   Quite frankly, it strikes me that if we want to do this, then we 
   shouldn't 
   save the _process_ information at all, we should save the sighand 
   instead.
   
   So either we save the process info, or we save the sighand, but saving 
   the 
   group_leader seems totally bogus. Especially as the group leader can 
   change (by execve()).
   
   One thing that strikes me as I look at that function is that the whole 
   signalfd thing doesn't seem to do any reference counting. Ie it looks 
   totally buggy wrt passing the resulting fd off to somebody else, and then 
   exiting in the original process.
   
   What did I miss? 
  
  Probably nothing... doesn't look good. What are the lifetime rules of a
  struct sighand tho ?
 
 Ah got it, signalfd_detach() in include/linux/signalfd.h from
 exit_signal plus some rcu bits in signalfd lock/unlock.

You could just get rid of the process/sighand/whatever reference
entirely and just make reads on a signalfd always dequeue signals for
the current thread.

You'd lose the ability to pass signalfds around to other processes, but
I'm not convinced that is even useful. (But I'm sure somebody smarter
than me has a valid use case and would love to share :-)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fix signalfd interaction with thread-private signals

2007-06-22 Thread Nicholas Miell

On Fri, 2007-06-22 at 17:12 -0700, Davide Libenzi wrote:
 On Fri, 22 Jun 2007, Nicholas Miell wrote:
 
  You could just get rid of the process/sighand/whatever reference
  entirely and just make reads on a signalfd always dequeue signals for
  the current thread.
 
 Duh?! ...
 
  You'd lose the ability to pass signalfds around to other processes, but
  I'm not convinced that is even useful. (But I'm sure somebody smarter
  than me has a valid use case and would love to share :-)
 
 Wasn't it you that bitched (just a few days ago) because multiple threads 
 could not use the same signalfd and they (by your initial thought) had to 
 create one per thread?

Nevermind, I wasn't entirely clear on the reason why signalfd_ctx had a
tsk pointer. (I wrongly thought it was a vestige of the mechanism for
the original delivery semantics.)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fix signalfd interaction with thread-private signals

2007-06-23 Thread Nicholas Miell

On Sat, 2007-06-23 at 16:05 +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2007-06-22 at 17:12 -0700, Davide Libenzi wrote:
  Wasn't it you that bitched (just a few days ago) because multiple
  threads 
  could not use the same signalfd and they (by your initial thought) had
  to 
  create one per thread?
 
 He said multiple process and you say multiple threads...
 
 If signalfd isn't attached to any context, it would then be useable by
 all threads in a process, delivering them their private signals and the
 process shared signals. Makes sense to me.
 
 By removing that context thing, you lose the ability to listen to some
 other -process- signals, which is probably a bad idea in the first place
 anyway... if you're going to do that, use ptrace (yuck) :-)
 
 Now, you -might- have valid uses for that later ability, but if not, it
 then makes some sense to only attach when an actual read or poll is
 done and only for the duration of that read/poll and only for that
 reader/poller (not the whole signalfd instance).
 
 I think that's what Nicholas means... and it may even simplify the code.
  

That is what I was suggesting, but I don't understand the internals of
Linux signal delivery enough to know if it is possible without
unpleasant contortions to make it work.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce O_CLOEXEC (take 2)

2007-05-31 Thread Nicholas Miell

On Thu, 2007-05-31 at 14:09 -0400, Ulrich Drepper wrote:
 diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
 index c154b9d..b847741 100644
 --- a/include/asm-generic/fcntl.h
 +++ b/include/asm-generic/fcntl.h
 @@ -48,6 +48,9 @@
  #ifndef O_NOATIME
  #define O_NOATIME0100
  #endif
 +#ifndef O_CLOEXEC
 +#define O_CLOEXEC0200/* set close_on_exec */
 +#endif
  #ifndef O_NDELAY
  #define O_NDELAY O_NONBLOCK
  #endif

O_CLOSEONEXEC, perhaps?

We don't want to create another creat here... :)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 13:22 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2007-06-04 at 19:38 -0700, Davide Libenzi wrote:
- I still think there's something wrong with dequeue_signal() being
   potentially called with a task different than current by signalfd, since
   __dequeue_signal() (among others) mucks around with current regardless.
   I'd love to just make signalfd's read() only do anything if current ==
   ctx-tsk and remove the task argument from dequeue_signal... that would
   fix it nicely too no ?
  
  There's got to be a clean solution that does not limit signalfd, no? I 
  have no time to look at it immediately, but I can look into it in the 
  next few days, if someone else does not do it before...
 
 Is there a real usage to dequeuing somebody else signals with signalfd ?
 If yes, then we can do something around the lines of passing task down
 to __dequeue_signal, though I'm not too sure waht this notifier is about
 and wether it might rely on being called from within the affected task
 context...
 
 Ben.
 

signalfd() doesn't deliver thread-targeted signals to the wrong threads,
does it?

Hmm.

It looks like reading from a signalfd will give you either
process-global signals or the thread-specific signals that are targeted
towards the thread that originally created the signalfd (regardless of
which thread actually calls read()).

Which is weird, to say the least. Definitely needs to be noted in the
man page, which doesn't seem to exist yet.

Is there a reason why signalfd() doesn't behave like regular signals in
this regard?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:27 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2007-06-04 at 23:09 -0700, Nicholas Miell wrote:
  signalfd() doesn't deliver thread-targeted signals to the wrong
  threads,
  does it?
  
  Hmm.
  
  It looks like reading from a signalfd will give you either
  process-global signals or the thread-specific signals that are
  targeted
  towards the thread that originally created the signalfd (regardless of
  which thread actually calls read()).
  
  Which is weird, to say the least. Definitely needs to be noted in the
  man page, which doesn't seem to exist yet.
  
  Is there a reason why signalfd() doesn't behave like regular signals
  in
  this regard? 
 
 It's worse than that ... by being able to call dequeue_signal from the
 contxt of another thread than the one dequeuing from.
 
 Ben.

Yes, that's certainly wrong, but that's an implementation issue. I was
more concerned about the design of the API.

Naively, I would expect a reads on a signalfd to return either process
signals or thread signals targeted towards the thread doing the read.

What it actually does (delivering process signals or thread signals
targeted towards the thread that created the signalfd) is weird.

For one, it means you can't create a single signalfd, stick it in an
epoll set, and then wait on that set from multiple threads.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
 On Tue, 5 Jun 2007, Nicholas Miell wrote:
 
  Yes, that's certainly wrong, but that's an implementation issue. I was
  more concerned about the design of the API.
  
  Naively, I would expect a reads on a signalfd to return either process
  signals or thread signals targeted towards the thread doing the read.
  
  What it actually does (delivering process signals or thread signals
  targeted towards the thread that created the signalfd) is weird.
  
  For one, it means you can't create a single signalfd, stick it in an
  epoll set, and then wait on that set from multiple threads.
 
 In your box threads do share the sighand, don't they? :)
 

I have no idea what you're trying to say, but it doesn't appear to
address the issue I raise.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:37 -0700, Davide Libenzi wrote:
 On Tue, 5 Jun 2007, Nicholas Miell wrote:
 
  On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
   On Tue, 5 Jun 2007, Nicholas Miell wrote:
   
Yes, that's certainly wrong, but that's an implementation issue. I was
more concerned about the design of the API.

Naively, I would expect a reads on a signalfd to return either process
signals or thread signals targeted towards the thread doing the read.

What it actually does (delivering process signals or thread signals
targeted towards the thread that created the signalfd) is weird.

For one, it means you can't create a single signalfd, stick it in an
epoll set, and then wait on that set from multiple threads.
   
   In your box threads do share the sighand, don't they? :)
   
  
  I have no idea what you're trying to say, but it doesn't appear to
  address the issue I raise.
 
 For one, it means you can't create a single signalfd, stick it in an
  epoll set, and then wait on that set from multiple threads.
 
 Why not?
 A signalfd, like I said, is attached to the sighand, that is shared by the 
 threads.
 
 

POSIX requires the following:

At the time of generation, a determination shall be made whether the
signal has been generated for the process or for a specific thread
within the process. Signals which are generated by some action
attributable to a particular thread, such as a hardware fault, shall be
generated for the thread that caused the signal to be generated. Signals
that are generated in association with a process ID or process group ID
or an asynchronous event, such as terminal activity, shall be generated
for the process.

In practice, this means that signals like SIGSEGV/SIGFPE/SIGILL/etc. and
signals generated by pthread_kill() (i.e. tkill() or tgkill()) are
directed to a specific threads, while other signals are directed to the
process as a whole and serviced by any thread that isn't blocking that
specific signal.

Linux accomplishes this by having two lists of pending signals --
current-pending is the per-thread list and
current-signal-shared_pending is the process-wide list.

dequeue_signal(tsk, ...) looks for signals first in tsk-pending and
then in tsk-signal-shared_pending.

sys_signalfd() stores current in signalfd_ctx. signalfd_read() passes
that context to signalfd_dequeue, which passes that that saved
task_struct pointer to dequeue_signal.

This means that a signalfd will deliver signals targeted towards either
the original thread that created that signalfd, or signals targeted
towards the process as a whole.

This means that a single signalfd is not adequate to handle signal
delivery for all threads in a process, because signals targeted towards
threads other than the thread that originally created the signalfd will
never be queued to that signalfd.

Is my analysis wrong?

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 20:37 -0700, Linus Torvalds wrote:
 
 On Tue, 5 Jun 2007, Davide Libenzi wrote:
  On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:
   
   Yeah, synchronous signals should probably never be delivered to another
   process, even via signalfd. There's no point delivering a SEGV to
   somebody else :-)
  
  That'd be a limitation. Like you can choose to not handle SEGV, you can 
  choose to have a signalfd listening to it. Of course, not with the 
  intention to *handle* the signal, but with a notification intent.
 
 I agree that it would be a limitation, but it would be a sane one.
 
 How about we try to live with that limitation, if only to avoid the issue 
 of having the private signals being stolen by anybody else. If we actually 
 find a real-live use-case where that is bad in the future, we can re-visit 
 the issue - it's always easier to _expand_ semantics later than it is to 
 restrict them, so I think this thread is a good argument for starting it 
 out in a more restricted form before people start depending on semantics 
 that can be nasty..
 
   Linus

Proposed semantics:

a) Process-global signals can be read by any thread (inside or outside
of the process receiving the signal).

Rationale:
This should always work, so there's no reason to limit it.

b) Thread-specific signals can only be read by their target thread.

Rationale:
This behavior is required by POSIX, and if an application is using
pthread_kill()/tkill()/tgkill()/etc. to specifically direct a signal, it
damn well better get to where the app wants it to go.

c) Synchronous signals (Naturally generated SIGILL, SIGFPE, SIGSEGV,
SIGBUS, and SIGTRAP. Did I miss any?) are not delivered via signalfd()
at all. (And by naturally generated, I mean signals that would have
the SI_KERNEL flag set.)

Rationale: 
These are a subset of thread-specific signals, so they can only be read
from a signalfd by their target thread.

However, there's no way for the target thread to get the signal because
it is either:

a) not blocked in a syscall waiting for signal delivery and thus further
execution beyond the instruction causing the signal is impossible
 OR
b) it is blocked in a syscall waiting for signal delivery and the error
is caused by the signal delivery mechanism itself (i.e. a bad pointer
passed to read/select/poll/epoll_wait/etc.) and thus the signal can't be
delivered

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-27 Thread Nicholas Miell

On Wed, 2007-06-27 at 11:45 -0700, Davide Libenzi wrote:
 On Wed, 27 Jun 2007, Hugh Dickins wrote:
 
  On Wed, 27 Jun 2007, Davide Libenzi wrote:
   On Wed, 27 Jun 2007, Hugh Dickins wrote:
   
In honesty, I should add that I dislike and distrust Davide's
MAP_NOZERO very much indeed!  Would much rather leave my cpus
spending a little time in clear_page().  A uid in struct page
(though I'm sure we could find somewhere to tuck it away) -
the horror, the horror!  But I've so far failed to find a killer
argument against it, and am hoping for someone else to do so.
   
   Little time? Please, do not trust me. Start oprofile and run a kernel 
   build. Look, I'm not even talking about som micro benchmark explicitly 
   built to exploit the thing. A kernel build.
   You will find clear_page to be the *1st* kernel entry after cc1 and as.
   That is bad for two reasons. The time it spends in there, and the cache 
   it 
   blows.
  
  I don't doubt that it shows real benefits; but dangerously cutting
  corners usually shows benefits too.  Relying on a uid at this level
  feels very wrong to me - but as I said, I've not found a killer
  argument against it.
 
 The reason why I posted is exactly so other ppl can look at it and find 
 possible flaws in the way pages and retired. If an effective UID was able 
 to see (or it generated) the data on that page, it should be able to get 
 that page back uncleared (when VM_NOZERO is set).
 From a performance POV, a 2-3% boost on a non-micro-bench test like a 
 kernel build is not exaclty peanuts. And for more heavy malloc/anon-mmap 
 appliations, the boost goes up to 10-15%. That is not exactly what I 
 call little time ;)

 - Davide
 

I don't think the security issues with this will ever make it
worthwhile.

Consider:

1) euid is not sufficient, you need to store away arbitrary LSM
information and call LSM hooks to decide security equivalence. The same
applies to VServer or whatever other container system you use.

2) Two processes, A and B, are in separate VFS namespaces but have
equivalent security identity according to LSM. Process A reads data from
file F which is not visible in process's B's namespace. You have to
prevent process B from ever getting a page that once contained data from
file F.

3) mlock() is often used by programs like GPG to prevent decrypted
secret keys from ever getting swapped out. You need to zero all
once-mlocked pages before they get reused to prevent that page from
getting swapped to disk or application bugs from leaking the key.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: *at syscalls for xattrs?

2007-07-15 Thread Nicholas Miell

On Sun, 2007-07-15 at 21:53 +0100, Al Viro wrote:
 On Sun, Jul 15, 2007 at 09:46:27PM +0200, Jan Engelhardt wrote:
  Hi,
  
  
  recently, the family of *at() syscalls and functions (openat, fstatat, 
  etc.) have been added to Linux and Glibc, respectively.
  In short: I am missing xattr at functions :)
 
 No.  They are not fscking forks.  They are almost as revolting, but
 not quite on the same level.

I suspect he was asking for 

int getxattrat(int fd, const char *path, const char *name, void *value, 
size_t size, int flags)
int setxattrat(int fd, const char *path, const char *name, void *value,
size_t size, int xattrflags, int atflags)

rather than the ability to access xattrs as files.

  BTW, why is fstatat called fstatat and not statat? (Same goes for 
  futimesat.) It does not take a file descriptor for the file argument. 
  Otherwise we'd also need fopenat/funlinkat, etc. Any reasons?
 
 Ulrich having an odd taste?

Solaris compatibility.

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: modutils 2.3.23 is available

2000-12-20 Thread Nicholas Miell

Christian Gennerat wrote:
> 
> About Standard aliases:
> > modprobe -c
> ...
> alias ppp-compress-21 bsd_comp
> ...
> 
> Why bsd_comp is the standard alias?
> /src/linux/Configure.help says that
> 
> The PPP Deflate compression method ("PPP Deflate compression",
>   above) is preferable to BSD-Compress, because it compresses better
>   and is patent-free.
> 

ppp-compress-21 refers to PPP compression method 21, which happens to
be BSD Compress. Deflate is 26 (and also 24, because it was assigned
that
value in the draft RFC).

Aliasing ppp-compress-21 to anything other than bsd_comp would break
PPP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH x86_64] Live Patching Function on 2.6.11.7

2005-04-17 Thread Nicholas Miell

On Mon, 2005-04-18 at 00:42 -0400, Daniel Jacobowitz wrote:
> On Mon, Apr 18, 2005 at 01:19:57PM +0900, Takashi Ikebe wrote:
> > GDB based approach seems not fit to our requirements. GDB(ptrace) based 
> > functions are basically need to be done when target process is stopping.
> > In addition to that current PTRACE_PEEK/POKE* allows us to copy only a 
> > *word* size...
> 
> While true, this is easily fixable.  There is even an interface
> precedent on OpenBSD (and possibly other platforms as well).
> 

If we're going to be stealing ideas for debugging interfaces from other
operating systems, could we steal from Solaris instead of anything
ptrace-based?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Disable the debug.exception-trace sysctl by default

2005-08-03 Thread Nicholas Miell

On Wed, 2005-08-03 at 11:03 +0200, Andi Kleen wrote:
> On Wed, Jul 27, 2005 at 11:53:30PM -0700, Nicholas Miell wrote:
> > debug.exception-trace causes a large amount of log spew when on, and
> > it's on by default, which is an irritation.
> 
> > Here's a patch to turn it off.
> Rejected. 

Why?

Getting 5000 lines of
"inkscape[13137] trap int3 rip:425051 rsp:7fa26158 error:0"
in my logs every time I ltrace something is vastly irritating and serves
no useful purpose.

Admittedly, I can (and have) turned this off, but disabling it by
default will probably save somebody else the trouble of figuring out
where this crap is coming from and how to kill it.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

overcommit verses MAP_NORESERVE

2005-08-06 Thread Nicholas Miell

Why does overcommit in mode 2 (OVERCOMMIT_NEVER) explicitly force
MAP_NORESERVE mappings to reserve memory?

My understanding is that MAP_NORESERVE is a way for apps to state that
they are aware that the memory allocated may not exist and that they
might get a SIGSEGV and that's OK with them.

Failing to do this makes certain well-know apps (*cough* Sun Java
*cough*) fail to run, which seems to be rather unhelpful.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: overcommit verses MAP_NORESERVE

2005-08-08 Thread Nicholas Miell

On Sun, 2005-08-07 at 12:49 +0100, Alan Cox wrote:
> On Sad, 2005-08-06 at 20:52 -0700, Nicholas Miell wrote:
> > Why does overcommit in mode 2 (OVERCOMMIT_NEVER) explicitly force
> > MAP_NORESERVE mappings to reserve memory?
> > 
> > My understanding is that MAP_NORESERVE is a way for apps to state that
> > they are aware that the memory allocated may not exist and that they
> > might get a SIGSEGV and that's OK with them.
> 
> Because a MAP_NORESERVE space that is filled with pages might cause
> insufficient memory to be left available for another object that is not
> MAP_NORESERVE.
> 
> You are right it could be improved but that would require someone
> writing code that forcibly reclaimed MAP_NORESERVE objects when we were
> close to out of memory.  At the moment nobody has done this, but nothing
> is stopping someone having a go.

I don't think you can forcibly reclaim MAP_NORESERVE objects (I'm
assuming you mean completely throwing away dirty pages).

MAP_NORESERVE isn't standardized, so all we can go by is what everybody
else does (and what makes the most sense).

Based on the Linux and Solaris man pages (none of FreeBSD, Irix, HP-UX,
or AIX implement anything similar), I think calls to mmap() with
MAP_NORESERVE should always succeed (regardless of memory conditions,
not other errors) and individual writes to unallocated pages in a
MAP_NORESERVE region should either allocate a new page if possible or
send a SIGSEGV without triggering the OOM killer.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Disable the debug.exception-trace sysctl by default

2005-07-28 Thread Nicholas Miell

debug.exception-trace causes a large amount of log spew when on, and
it's on by default, which is an irritation.

Here's a patch to turn it off.

--- linux-2.6.12/arch/x86_64/mm/fault.c.~1~ 2005-06-28
21:33:27.0 -0700
+++ linux-2.6.12/arch/x86_64/mm/fault.c 2005-07-27 23:46:10.0
-0700
@@ -284,7 +284,7 @@
 }
 
 int page_fault_trace = 0;
-int exception_trace = 1;
+int exception_trace = 0;
 
 /*
  * This routine handles page faults.  It determines the address,


-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2007-01-06 Thread Nicholas Miell

On Sat, 2007-01-06 at 11:18 -0800, H. Peter Anvin wrote:
> Randy Dunlap wrote:
> >
> >>> BTW, yesterday my 2.4 patches were not published, but I noticed that
> >>> they were not even signed not bziped on hera. At first I simply thought
> >>> it was related, but right now I have a doubt. Maybe the automatic script
> >>> has been temporarily been disabled on hera too ?
> >> The script that deals with the uploads also deals with the packaging -
> >> so yes the problem is related.
> > 
> > and with the finger_banner and version info on www.kernel.org page?
> 
> Yes, they're all connected.
> 
> The load on *both* machines were up above the 300s yesterday, probably 
> due to the release of a new Knoppix DVD.
> 
> The most fundamental problem seems to be that I can't tell currnt Linux 
> kernels that the dcache/icache is precious, and that it's way too eager 
> to dump dcache and icache in favour of data blocks.  If I could do that, 
> this problem would be much, much smaller.
> 
>   -hpa

Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
this?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Nicholas Miell

On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> wait_hpet_tick is optimized away to a never ending loop and the kernel
> hangs on boot in timer setup.
> 
> 001a :
>   1a:   55  push   %ebp
>   1b:   89 e5   mov%esp,%ebp
>   1d:   eb fe   jmp1d 
> 
> This is not a problem with gcc 3.3.5.  Adding barrier() calls to
> wait_hpet_tick does not help, making the variables volatile does.
> 
> Signed-off-by: Keith Owens 
> 
> ---
>  arch/i386/kernel/time_hpet.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> ===
> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
>   */
>  static void __devinit wait_hpet_tick(void)
>  {
> - unsigned int start_cmp_val, end_cmp_val;
> + unsigned volatile int start_cmp_val, end_cmp_val;
>  
>   start_cmp_val = hpet_readl(HPET_T0_CMP);
>   do {

When you examine the inlined functions involved, this looks an awful lot
like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278

Perhaps SUSE should fix their gcc instead of working around compiler
problems in the kernel?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Nicholas Miell

On Wed, 2006-11-29 at 15:30 +1100, Keith Owens wrote:
> David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
> >From: Keith Owens 
> >Date: Wed, 29 Nov 2006 14:56:20 +1100
> >
> >> Secondly, I believe that this is a separate problem from bug 22278.
> >> hpet_readl() is correctly using volatile internally, but its result is
> >> being assigned to a pair of normal integers (not declared as volatile).
> >> In the context of wait_hpet_tick, all the variables are unqualified so
> >> gcc is allowed to optimize the comparison away.
> >> 
> >> The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
> >> where the return value from hpet_readl() is assigned to a normal
> >> variable.  Nothing in the C standard says that those unqualified
> >> variables should be magically treated as volatile, just because the
> >> original code that extracted the value used volatile.  IOW, time_hpet.c
> >> needs to declare any variables that hold the result of hpet_readl() as
> >> being volatile variables.
> >
> >I disagree with this.
> >
> >readl() returns values from an opaque source, and it is declared
> >as such to show this to GCC.  It's like a function that GCC
> >cannot see the implementation of, which it cannot determine
> >anything about wrt. return values.
> >
> >The volatile'ness does not simply disappear the moment you
> >assign the result to some local variable which is not volatile.
> >
> >Half of our drivers would break if this were true.
> 
> This is definitely a gcc bug, 4.1.0 is doing something weird.  Compile
> with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.
> 
> Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
> below and the problem disappears.
> 

My theory: gcc is inlining readl into hpet_readl (readl is an inline
function, so it should be doing this no matter what), and inlining
hpet_readl into wait_hpet_tick (otherwise, it can't possibly make any
assumptions about the return values of hpet_readl -- this looks to be a
SUSE-specific over-aggressive optimization), and somewhere along the way
the volatile qualifier is getting lost.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 21:31 -0800, Linus Torvalds wrote:
> 
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> > 
> > Ah, I see. You're just interested in fds as a generic handle concept,
> > and not a more Plan 9 type thing.
> 
> Indeed. It's a "handle".
> 
> UNIX has pid's for "process" handles, and "file descriptors" for just 
> about everything else.

And I imagine that somebody will come up with way of getting a fd for a
process sooner or later. 

> > If that's the goal, somebody should start thinking about reducing the
> > contents of struct file to the bare minimum (i.e. not much more than a
> > file_operations pointer).
> 
> Well, there's more there, but it really is fairly close. If you look at 
> it, a "struct file" ends up not having a lot more than the minimal stuff 
> required to use it as a a handle: it really isn't a very big structure. 
> 
> The biggest part is actually the read-ahead state, which is arguably a 
> generic thing for a file handle, even though not all kinds will be able to 
> use it. We *could* make that be behind a pointer (along with the "f_pos" 
> thing, that really logically goes along with the read-ahead thing), of 
> course, but since most files probably do end up being "traditional file" 
> structures, it's probably not wrong to just have it in the file.
> 

Actually, I was thinking reducing struct file to the bare minimum, and
then using that as the common header shared by object-specific
structures. I don't know how unpleasant that would be from a memory
allocation perspective, though.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v3 - timerfd core ...

2007-03-11 Thread Nicholas Miell

On Sun, 2007-03-11 at 16:13 -0700, Davide Libenzi wrote:
> On Sun, 11 Mar 2007, Davide Libenzi wrote:
> 
> > This patch introduces a new system call for timers events delivered
> > though file descriptors. This allows timer event to be used with
> > standard POSIX poll(2), select(2) and read(2). As a consequence of
> > supporting the Linux f_op->poll subsystem, they can be used with
> > epoll(2) too.
> > The system call is defined as:
> > 
> > int timerfd(int ufd, int clockid, int tmrtype, const struct timespec *utmr);
> > 
> > The "ufd" parameter allows for re-use (re-programming) of an existing
> > timerfd w/out going through the close/open cycle (same as signalfd).
> > If "ufd" is -1, s new file descriptor will be created, otherwise the
> > existing "ufd" will be re-programmed.
> > The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.
> > The "tmrtype" parameter allows to specify the timer type. The following
> > values are supported:
> > 
> > TFD_TIMER_REL
> > The time specified in the "utmr" parameter is a relative time
> > from NOW.
> > 
> > TFD_TIMER_ABS
> > The timer specified in the "utmr" parameter is an absolute time.
> > 
> > TFD_TIMER_SEQ
> > The time specified in the "utmr" parameter is an interval at
> > which a continuous clock rate will be generated.
> > 
> 
> Duh! Forgot to update the documenation. Now timerfd() gets an itimerspec.
> For TFD_TIMER_REL only the it_interval is valid, and it's the relative 
> time. For TFD_TIMER_ABS, only the it_value is valid, and that the expiry 
> absolute time. For TFD_TIMER_SEQ, it_value tells when the first tick 
> should be generated, and it_interval tells the period of the following 
> ticks.
> 

You should probably make it behave like the other things that use
itimerspec, just to avoid confusion -- i.e. timers are relative by
default, there's a flag that makes them absolute, they expire when
it_value specifies, and repeat every it_interval nanoseconds if
it_interval is non-zero.

i.e.

int timerfd(int ufd, int clockid, int flags, const struct timespec
*utmr);

with TFD_TIMER_ABS in flags making the timer absolute instead of
relative (and no TFD_TIMER_REL or TFD_TIMER_SEQ at all).

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v3 - timerfd core ...

2007-03-11 Thread Nicholas Miell

On Sun, 2007-03-11 at 16:50 -0700, Nicholas Miell wrote:
> You should probably make it behave like the other things that use
> itimerspec, just to avoid confusion -- i.e. timers are relative by
> default, there's a flag that makes them absolute, they expire when
> it_value specifies, and repeat every it_interval nanoseconds if
> it_interval is non-zero.
> 
> i.e.
> 
> int timerfd(int ufd, int clockid, int flags, const struct timespec
> *utmr);
> 
> with TFD_TIMER_ABS in flags making the timer absolute instead of
> relative (and no TFD_TIMER_REL or TFD_TIMER_SEQ at all).
> 

Sorry, that should be

int timerfd(int ufd, int clockid, int flags, const struct itimerspec
*utmr);

and TFD_TIMER_ABSTIME.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Style Question

2007-03-11 Thread Nicholas Miell

On Mon, 2007-03-12 at 06:40 +0100, Jan Engelhardt wrote:
> On Mar 12 2007 13:37, Cong WANG wrote:
> >
> > The following code is picked from drivers/kvm/kvm_main.c:
> >
> > static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
> > {
> > struct kvm_vcpu *vcpu = >vcpus[vcpu_slot];
> >
> > mutex_lock(>mutex);
> > if (unlikely(!vcpu->vmcs)) {
> > mutex_unlock(>mutex);
> > return 0;
> > }
> > return kvm_arch_ops->vcpu_load(vcpu);
> > }
> >
> > Obviously, it used 0 rather than NULL when returning a pointer to
> > indicate an error. Should we fix such issue?
> 
> Indeed. If it was for me, something like that should throw a compile error.
> 
> >>[...]
> > I think it's more clear to indicate we are using a pointer rather than
> > an integer when we use NULL in kernel. But in userspace, using NULL is
> > for portbility of the program, although most (*just* most, NOT all) of
> > NULL's defination is ((void*)0). ;-)
> 
> NULL has the same bit pattern as the number zero. (I'm not saying the bit
> pattern is all zeroes. And I am not even sure if NULL ought to have the same
> pattern as zero.) So C++ could use (void *)0, if it would let itself :p

Not necessarily. You can use 0 at the source level, but the compiler has
to convert it to the actual NULL pointer bit pattern, whatever it may
be.

In C++, NULL is typically defined to 0 (with no void* cast) by most
compilers because 0 (and only 0) can be implicitly converted to to null
pointer of any ponter type without a cast. 

GCC introduced the __null extension so that NULL still works correctly
in C++ when passed to a varargs function on 64-bit platforms.

(This just works in C because C makes NULL ((void*)0) is thus is the
right size. In C++, the 0 ends up being an int instead of a pointer when
passed to a varargs function, and things tend to blow up when they read
the garbage high bits. Of course, nobody else does this, so you still
have to use (void*)NULL to be portable.)

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell

On Fri, 2007-03-16 at 23:30 +0100, Mike Galbraith wrote:
> On Sat, 2007-03-17 at 08:13 +1100, Con Kolivas wrote:
> > On Saturday 17 March 2007 02:34, Mike Galbraith wrote:
> > > On Sat, 2007-03-17 at 00:40 +1100, Con Kolivas wrote:
> > > > Here are full patches for rsdl 0.31 for various base kernels. A full
> > > > announce with a fresh -mm series will follow...
> > > >
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.20.3-rsdl-0.31.patch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
> > > >31.patch
> > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.31
> > > >.patch
> > >
> > > It still has trouble with the x/gforce vs two niced encoders scenario.
> > > The previously reported choppiness is still present.
> > >
> > > I suspect that x/gforce landing in the expired array is the trouble, and
> > > that this will never be smooth without some kind of exemption.  I added
> > > some targeted unfairness to .30, and it didn't help much at all.
> > >
> > > Priorities going all the way to 1 were a surprise.
> > 
> > It wasn't going to change that case without renicing X.
> 
> Con.  You are trying to wedge a fair scheduler into an environment where
> totally fair simply can not possibly function.
> 
> If this is your final answer to the problem space, I am done testing,
> and as far as _I_ am concerned, your scheduler is an utter failure.
> 

Sorry, I haven't really been following this thread and now I'm confused.

You're saying that it's somehow the scheduler's fault that X isn't
running with a high enough priority?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-16 Thread Nicholas Miell

On Sat, 2007-03-17 at 06:56 +0100, Mike Galbraith wrote:
> On Fri, 2007-03-16 at 21:24 -0700, Nicholas Miell wrote:
> 
> > Sorry, I haven't really been following this thread and now I'm confused.
> > 
> > You're saying that it's somehow the scheduler's fault that X isn't
> > running with a high enough priority?
> 
> I'm saying that the current scheduler adjusts for interactive loads,
> this new one doesn't.  I'm seeing interactivity regressions, and they
> are not fixed with nice unless nice is used to maximum effect.  I'm
> saying yes, I can lower my expectations, but no I don't want to.
> 
> A four line summary is as short as I can make it.
> 
>   -Mike

Uh, no. Essentially, the current scheduler works around X's brokenness,
in an often unpredictable manner.

RSDL appears to be completely deterministic, which is a very strong
virtue.

The X people have plans for how to go about fixing this, but until then,
there's no reason to hold up kernel development.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-17 Thread Nicholas Miell

On Sat, 2007-03-17 at 00:25 -0700, William Lee Irwin III wrote:
> On Sat, Mar 17, 2007 at 08:11:57AM +0100, Mike Galbraith wrote:
> > On a side note, I wonder how long it's going to take to fix all the
> > X/client combinations out there.
> 
> AIUI X's clients largely access it via libraries X ships, so the X
> update will sweep the vast majority of them in one shot. You'll have
> to either run the clients from remote hosts with downrev libraries or
> have downrev libraries around (e.g. in chroots) for clients to link to
> for the clients not to cooperate.
> 

The changes will probably be entirely server-side anyway, so stray
ancient libraries won't be a problem.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-17 Thread Nicholas Miell

(sorry for the duplicate Ingo, this time I managed to Repy to All)

On Sat, 2007-03-17 at 08:45 +0100, Ingo Molnar wrote:
> * Nicholas Miell <[EMAIL PROTECTED]> wrote:
> 
> > The X people have plans for how to go about fixing this, [...]
> 
> then we'll first have wait for those X changes to at least be done in a 
> minimal manner so that they can be tested for real with RSDL. (is it 
> _really_ due to that? Or will X regress forever once we switch to RSDL?)

Yes, it's an X problem.

There's two issues, really -- smooth pointer movement or the lack
thereof and the servicing of clients at varying priorities. There's
vague plans floating around about moving all input processing off into a
separate high-priority thread and pretty much no ideas how to deal with
mixed priority clients.

So, the current scheduler works around this brain damage using
heuristics that sort of do the job and sometimes screw things up.

> We cannot regress the scheduling of a workload as important as "X mixed 
> with CPU-intense tasks". And "in theory this should be fixed if X is 
> fixed" does not cut it. X is pretty much _the_ most important thing to 
> optimize the interactive behavior of a Linux scheduler for. Also, 
> paradoxically, it is precisely the improvement of _X_ workloads that 
> RSDL argues with.
> 
> this regression has to be fixed before RSDL can be merged, simply 
> because it is a pretty negative effect that goes beyond any of the 
> visible positive improvements that RSDL brings over the current 
> scheduler. If it is better to fix X, then X has to be fixed _first_, at 
> least in form of a prototype patch that can be _tested_, and then the 
> result has to be validated against RSDL.
> 

RSDL is, above all else, fair. Predictably so.
Hacking around X's stupidity makes it no longer *be* RSDL.

Until they catch up to the early-90s technology-wise, we can just nice
-19 X.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Ingo Molnar wrote:
> 
> > 
> > * Davide Libenzi  wrote:
> > 
> > > I think that the "dirty" FPU context must, at least, follow the new 
> > > head. That's what the userspace sees, and you don't want an async_exec 
> > > to re-emerge with a different FPU context.
> > 
> > well. I think there's some confusion about terminology, so please let me 
> > describe everything in detail. This is how execution goes:
> > 
> >   outer loop() {
> >   call_threadlet();
> >   }
> > 
> > this all runs in the 'head' context. call_threadlet() always switches to 
> > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
> > while executing the threadlet function, we block, then the 
> > threadlet-thread gets to keep the task (the threadlet stack and also the 
> > FPU), and blocks - and we pick a 'new head' from the thread pool and 
> > continue executing in that context - right after the call_threadlet() 
> > function, in the 'old' head's stack. I.e. it's as if we returned 
> > immediately from call_threadlet(), with a return code that signals that 
> > the 'threadlet went async'.
> > 
> > now, the FPU state that was when the threadlet blocked is totally 
> > meaningless to the 'new head' - that FPU state is from the middle of the 
> > threadlet execution.
> 
> For threadlets, it might be. Now think about a task wanting to dispatch N 
> parallel AIO requests as N independent syslets.
> Think about this task having USEDFPU set, so the FPU context is dirty.
> When it returns from async_exec, with one of the requests being become 
> sleepy, it needs to have the same FPU context it had when it entered, 
> otherwise it won't prolly be happy.
> For the same reason a schedule() must preserve/sync the "prev" FPU 
> context, to be reloaded at the next FPU fault.

The point Ingo was making is that the x86 ABI already requires the FPU
context to be saved before *all* function calls.

Unfortunately, this isn't true of other ABIs -- looking over the psABIs
specs I have laying around, AMD64, PPC64, and MIPS require at least part
of the FPU state to be preserved across function calls, and I'm sure
this is also true of others.

Then there's the other nasty details of new thread creation --
thankfully, the contents of the TLS isn't inherited from the parent
thread, but it still needs to be initialized; not to mention all the
other details involved in pthread creation and destruction.

I don't see any way around the pthread issues other than making a libc
upcall on return from the first system call that blocked.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Nicholas Miell wrote:
> 
> > The point Ingo was making is that the x86 ABI already requires the FPU
> > context to be saved before *all* function calls.
> 
> I've not seen that among Ingo's points, but yeah some status is caller 
> saved. But, aren't things like status word and control bits callee saved? 
> If that's the case, it might require proper handling.
> 

Ingo mentioned it in one of the parts you cut out of your reply:

> and here is where thinking about threadlets as a function call and not 
> as an asynchronous context helps alot: the classic gcc convention for 
> FPU use & function calls should apply: gcc does not call an external 
> function with an in-use FPU stack/register, it always neatly unuses it, 
> as no FPU register is callee-saved, all are caller-saved.

The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
XMM or MXCSR registers) and a bit vague (no mention at all of the FP
status word), but I'm fairly certain that Ingo is right.


-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-05 Thread Nicholas Miell

On Tue, 2007-03-06 at 05:41 +0100, Willy Tarreau wrote:
> On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote:
> > On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
> > > jos poortvliet wrote:
> > > > Well, imho his current staircase scheduler already does a better job
> > > > compared to mainline, but it won't make it in (or at least, it's not
> > > > likely). So we can hope this WILL make it into mainline, but I wouldn't
> > > > count on it.
> > >
> > > Wrong problem, what is really needed is to get CPU scheduler choice into
> > > mainline, just as i/o scheduler finally did. Con has noted that for some
> > > loads this will present suboptimal performance, as will his -ck patches,
> > > as will the default scheduler. Instead of trying to make ANY one size
> > > fit all, we should have a means to select, at runtime, between any of
> > > the schedulers, and preferably to define an interface by which a user
> > > can insert a new scheduler in the kernel (compile in, I don't mean
> > > plugable) with clear and well defined rules for how that can be done.
> > 
> > Been there, done that. Wli wrote the infrastructure for plugsched; I took 
> > his 
> > code and got it booting and ported 3 or so different scheduler designs. It 
> > allowed you to build as few or as many different schedulers into the kernel 
> > and either boot the only one you built into your kernel, or choose a 
> > scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
> > After that I gave up on that code and handed it over to Peter Williams who 
> > still maintains it. So please note that I pushed the plugsched barrow 
> > previously and still don't think it's a bad idea, but the maintainers think 
> > it's the wrong approach.
> 
> In a way, I think they are right. Let me explain. Pluggable schedulers are
> useful when you want to switch away from the default one. This is very useful
> during development of a new scheduler, as well as when you're not satisfied
> with the default scheduler. Having this feature will incitate many people to
> develop their own scheduler for their very specific workload, and nothing
> generic. It's a bit what happened after all : you, Peter, Nick, and Mike
> have worked a lot trying to provide alternative solutions.
> 
> But when you think about it, there are other OSes which have only one 
> scheduler
> and which behave very well with tens of thousands of tasks and scale very well
> with lots of CPUs (eg: solaris). So there is a real challenge here to try to
> provide something at least as good and universal because we know that it can
> exist. And this is what you finally did : work on a scheduler which ought to 
> be
> good with any workload.

Solaris has a pluggable scheduler framework (each policy --
OTHER/FIFO/RR/etc. -- is it's own separate component).

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 15:41 -0800, Davide Libenzi wrote:
> This patch introduces a new system call for timers events delivered
> though file descriptors. This allows timer event to be used with
> standard POSIX poll(2), select(2) and read(2). As a consequence of
> supporting the Linux f_op->poll subsystem, they can be used with
> epoll(2) too.
> The system call is defined as:
> 
> int timerfd(int ufd, int tmrtype, const struct timespec *utmr);
> 
> The "ufd" parameter allows for re-use (re-programming) of an existing
> timerfd w/out going through the close/open cycle (same as signalfd).
> If "ufd" is -1, s new file descriptor will be created, otherwise the
> existing "ufd" will be re-programmed.
> The "tmrtype" parameter allows to specify the timer type. The following
> values are supported:
> 
> TFD_TIMER_REL
> The time specified in the "utmr" parameter is a relative time
>   from NOW.
> 
> TFD_TIMER_ABS
> The timer specified in the "utmr" parameter is an absolute time.
> 
> TFD_TIMER_SEQ
> The time specified in the "utmr" parameter is an interval at
>   which a continuous clock rate will be generated.
> 
> The function returns the new (or same, in case "ufd" is a valid timerfd
> descriptor) file, or -1 in case of error.
> As stated before, the timerfd file descriptor supports poll(2), select(2)
> and epoll(2). When a timer event happened on the timerfd, a POLLIN mask
> will be returned.
> The read(2) call can be used, and it will return a u32 variable holding
> the number of "ticks" that happened on the interface since the last call
> to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN
> will be returned if no ticks happened.
> A quick test program, shows timerfd working correctly on my amd64 box:
> 
> http://www.xmailserver.org/timerfd-test.c
> 

Why did you ignore the existing POSIX timer API?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 22:38 -0800, Davide Libenzi wrote:
> On Fri, 9 Mar 2007, Nicholas Miell wrote:
> 
> > Why did you ignore the existing POSIX timer API?
> 
> The existing POSIX API is a standard and a very good one. Too bad it does 
> not deliver to files. The timerfd code is, as you can probably read from 
> the code, a really thin wrapper around the existing hrtimer.c Linux code.

So extend the existing POSIX timer API to deliver expiry events via a
fd.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-09 Thread Nicholas Miell

On Fri, 2007-03-09 at 22:53 -0800, Davide Libenzi wrote:
> On Fri, 9 Mar 2007, Nicholas Miell wrote:
> 
> > On Fri, 2007-03-09 at 22:38 -0800, Davide Libenzi wrote:
> > > On Fri, 9 Mar 2007, Nicholas Miell wrote:
> > > 
> > > > Why did you ignore the existing POSIX timer API?
> > > 
> > > The existing POSIX API is a standard and a very good one. Too bad it does 
> > > not deliver to files. The timerfd code is, as you can probably read from 
> > > the code, a really thin wrapper around the existing hrtimer.c Linux code.
> > 
> > So extend the existing POSIX timer API to deliver expiry events via a
> > fd.
> 
> It'll be out of standard as timerfd is, w/out code savings. Look at the 
> code and tell me what could be saved. Prolly the ten lines of the timer 
> callback. Lines that you'll have to drop inside the current posix timer 
> layer. Better leave standards alone, especially like in this case, when 
> the savings are not there.
> 

OK, here's a more formal listing of my objections to the introduction of
timerfd in this form:

A) It is a new general-purpose ABI intended for wide-scale usage, and
thus must be maintained forever.

B) It is less functional than the existing ABIs -- modulo their
"delivery via signals only" limitation, which can be corrected (and has
been already in other operating systems).

C) Being an entirely new creation that completely ignores past work in
this area, it has no hope of ever getting into POSIX.

which means

D) At some point in time, Linux is going to get the POSIX version (in
whatever form it takes), making this new ABI useless dead weight (see
point A).

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Fri, 2007-03-09 at 23:36 -0800, Davide Libenzi wrote:
> On Fri, 9 Mar 2007, Nicholas Miell wrote:
> 
> > On Fri, 2007-03-09 at 22:53 -0800, Davide Libenzi wrote:
> > > On Fri, 9 Mar 2007, Nicholas Miell wrote:
> > > > 
> > > > So extend the existing POSIX timer API to deliver expiry events via a
> > > > fd.
> > > 
> > > It'll be out of standard as timerfd is, w/out code savings. Look at the 
> > > code and tell me what could be saved. Prolly the ten lines of the timer 
> > > callback. Lines that you'll have to drop inside the current posix timer 
> > > layer. Better leave standards alone, especially like in this case, when 
> > > the savings are not there.
> > > 
> > 
> > OK, here's a more formal listing of my objections to the introduction of
> > timerfd in this form:
> > 
> > A) It is a new general-purpose ABI intended for wide-scale usage, and
> > thus must be maintained forever.
> 
> Yup
> 
> 
> > B) It is less functional than the existing ABIs -- modulo their
> > "delivery via signals only" limitation, which can be corrected (and has
> > been already in other operating systems).
> 
> Less functional? Please, do tell me ...
> 

Try reading the timer_create man page.

In short, you're limited to a single clock, so you can't set timers
based on wall-clock time (subject to NTP correction), monotomic time
(not subject to NTP, will not ever go backwards or skip ticks), the
high-res versions of the previous two clocks, per-thread or per-process
CPU usage time, or any other clocks that may get introduced in the
future.

In addition, you've introduced an entirely new incompatible API that
probably doesn't fit easily into existing software that already uses
POSIX timers.

> 
> > C) Being an entirely new creation that completely ignores past work in
> > this area, it has no hope of ever getting into POSIX.
> > 
> > which means
> > 
> > D) At some point in time, Linux is going to get the POSIX version (in
> > whatever form it takes), making this new ABI useless dead weight (see
> > point A).
> 
> Adding parameters/fields to a standard is going to create even more 
> confusion than a new *single* function. And the code to cross-link the 
> timerfd and the current posix timers is going to end up in being more 
> complex than the current one.
> 

Yes, but the standard explicitly allows you to do this. Furthermore, if
you work within the existing framework, you can lobby for the inclusion
of your API in the next version of POSIX.

Simplicity of the code is only a virtue if you don't have to do the
exact same thing again with a different interface later while keeping
the maintenance burden of the existing proprietary (and, thus,
unpopular) interface.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 12:41 -0800, Davide Libenzi wrote:
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> 
> > Try reading the timer_create man page.
> > 
> > In short, you're limited to a single clock, so you can't set timers
> > based on wall-clock time (subject to NTP correction), monotomic time
> > (not subject to NTP, will not ever go backwards or skip ticks), the
> > high-res versions of the previous two clocks, per-thread or per-process
> > CPU usage time, or any other clocks that may get introduced in the
> > future.
> 
> One timer per fd yes. So?

I never complained about one timer per fd (although, now that you
mention it, that would get a bit excessive if you have thousands of
outstanding timers).

> The real-time and monotonic selection can be added. 

IOW, the timerfd patch is not suitable for inclusion as-is. (While
you're at it, you should probably add a flags argument for future
expansion.)

> If you look at the posix timers code, that's a bunch of code over the real 
> meat of it, that is hrtimer.c. The timerfd interface goes straight to 
> that, without adding yet another meaning to the sigevent structure,

That's what the sigevent structure is for -- to describe how events
should be signaled to userspace, whether by signal delivery, thread
creation, or queuing to event completion ports. If if you think
extending it would be bad, I can show you the line in POSIX where it
encourages the contrary.

>  and 
> yet another case inside the posix timers trigger functions. That will be 
> as unstandard as timerfd is, and even more, since you cannot use that 
> interface and hope to be portable in any case.

If Linux were to do a wholesale theft of the Solaris interface (warts
and all), you'd be portable (and, now that I think of it, more
efficient).

Two major unixes using the same interface would probably make it a
shoe-in for the next POSIX, too. (c.f. openat(2) and friends)

> On top of that, handing over files to the posix timers will creates 
> problems with references kept around.
> The timerfd code is just a *really* thin layer (if you exclude the 
> includes, the structure definitions and the fd setup code, there's 
> basically *nothing*) over hrtimer.c and does not mess up with other kernel 
> code in any way, and offers the same functionalities. I'd like to keep it 
> that way.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 13:44 -0800, Linus Torvalds wrote:
> 
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> > 
> > That's what the sigevent structure is for -- to describe how events
> > should be signaled to userspace, whether by signal delivery, thread
> > creation, or queuing to event completion ports. If if you think
> > extending it would be bad, I can show you the line in POSIX where it
> > encourages the contrary.
> 
> I'm sorry, but by pointing to the POSIX timer stuff, you're just making 
> your argument weaker.
> 
> POSIX timers are a horrible crock and over-designed to be a union of 
> everything that has ever been done. Nasty. We had tons of bugs in the 
> original setup because they were so damn nasty.
> 

Care to elaborate on why they're a horrible crock?

And are the bugs fixed? If so, why replace them? They work now.

> I'd rather look at just about *anything* else for good design than from 
> some of the abortions that are posix-timers.
> 
>   Linus

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 14:42 -0800, Linus Torvalds wrote:
> 
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> > 
> > Care to elaborate on why they're a horrible crock?
> 
> It's a *classic* case of an interface that tries to do everything under 
> the sun.
> 
> Here's a clue: look at any system call that takes a union as part of its 
> arguments. Count them. I think we have two:
>  - struct siginfo

No argument here -- just about everything related to signals is stupidly
complex.

>  - struct sigevent

However, this I take issue with.

Conceptually (and what the user ends up actually using), struct sigevent
is just:

struct sigevent
{
int sigev_notify;/* delivery method */
sigval_t sigev_value /* user cookie */
int sigev_signo; /* signal number */
void (*sigev_notify_function)(sigval_t); /* thread fn */
pthread_attr_t *sigev_notify_attributes; /* thread attr */
};

You could complain about sigval_t being a union, but that's probably
just because it predates uintptr_t. (Plus, no ugly casting.)

You also could complain that the above isn't what you actually see when
you look at /usr/include/bits/siginfo.h -- there's a union involved and
some macros to hide the fact, but that's just internal implementation
details related to how threads are created and padding out the struct
for any future expansion. 

The actual complexity for understanding and using struct sigevent isn't
all that much, and once you've figured that out, you know how to
configure event delivery for AIO completion, DNS resolution, and
messages queues, not just timers.

> and they are both broken horrible interfaces where the data structures 
> depend on various flags.
> 
> It's just not the UNIX system call way. And none of it really makes sense 
> if you already have a file descriptor, since at that point you know what 
> the notification mechanism is.
> 
> I'd actually much rather do POSIX timers the other way around: associate a 
> generic notification mechanism with the file descriptor, and then 
> implement posix_timer_create() on top of timerfd. Now THAT sounds like a 
> clean unix-like interface ("everything is a file") and would imply that 
> you'd be able to do the same kind of notification for any file descriptor, 
> not just timers.
> 

But timers aren't files or even remotely file-like -- if they were a
real files, you could just
open /dev/timers/realtime/2007/June/3rd/half-past-teatime and get a
timer. (Or, more realisticly, open /dev/timer and use ioctl().)

timerfd() had to be created to coerce them into some semblance of
filehood just to make them work with existing (and new) polling/queuing
interfaces just because those interfaces can only deal with file
descriptors.

Making non-file things look like files just because that's what poll()
and friends can deal with isn't much different from holding a hammer in
your hand and looking for what you have to do in order to turn every
problem into a nail.

Sometimes you need to go back to your toolbox for a screwdriver or a
saw.

> But posix timers as they are done now are just an abomination. They are 
> not unix-like at all.
> 
> > And are the bugs fixed? If so, why replace them? They work now.
> 
> .. but the reason for the bugs was largely a very baroque interface, which 
> didn't get fixed (because it's specified by the standard).
>

But the API isn't baroque.

There's a veritable boutique of clock sources to choose from, but they
all serve specific needs, it's just one parameter to timer_create, and
you probably want CLOCK_MONOTONIC anyway.

struct sigevent  might be a bit complex, but the difficultly in learning
that is amortized across all the other APIs that also use it to specify
how their events are delivered.

Delivering via signals and dealing with struct siginfo is painful, but
everything related to signals is painful. This is what you get when you
take an interface designed essentially for exception handling and start
abusing it for general information delivery. But, hey!, that's what
SIGEV_THREAD and SIGEV_PORT are for.[1]

About the worst that can be said of it is that using timer_settime to
both arm and disarm the timer and set the interval is awkward.

[1] A SIGEV_FUNCTION which skips all the signal baggage and just passes
a supplied cookie and a purpose-specific struct pointer to an
object-specific user-supplied function pointer might be interesting, but
then you run into all of the reentrancy/masking/choosing which thread to
deliver to and other issues that signals already have without the
benefit of the existing signal infrastructure for all that stuff. Gah, I
don't want to think about this anymore.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
t

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 16:35 -0800, Linus Torvalds wrote:
> 
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> > > 
> > > I'd actually much rather do POSIX timers the other way around: associate 
> > > a 
> > > generic notification mechanism with the file descriptor, and then 
> > > implement posix_timer_create() on top of timerfd. Now THAT sounds like a 
> > > clean unix-like interface ("everything is a file") and would imply that 
> > > you'd be able to do the same kind of notification for any file 
> > > descriptor, 
> > > not just timers.
> > > 
> > 
> > But timers aren't files or even remotely file-like
> 
> What do you think "a file" is?
> 
> In UNIX, a file descriptor is pretty much anything. You could say that 
> sockets aren't remotely file-like, and you'd be right. What's your point? 
> If you can read on it, it's a file.

Ah, I see. You're just interested in fds as a generic handle concept,
and not a more Plan 9 type thing.

If that's the goal, somebody should start thinking about reducing the
contents of struct file to the bare minimum (i.e. not much more than a
file_operations pointer).

> 
> And the real point of the whole signalfd() is that there really *are* a 
> lot of UNIX interfaces that basically only work with file descriptors. Not 
> just read, but select/poll/epoll.

It'd be useful if the polling interfaces could return small datums
beyond just the POLL* flags -- having to do a read on timerfd just to
get the overrun count has a lot of overhead for just an integer, and I
imagine other things would like to pass back stuff too.


> They currently have just one timeout, but the thing is, if UNIX had just 
> had "timer file descriptors", they'd not need even that one. And even with 
> the timeout, Davide's patch actually makes for a *better* timeout than the 
> ones provided by select/poll/epoll, exactly because you can do things like 
> repeating timers and absolute time etc.
> 
> Much more naturally than the timer interface we currently have for those 
> system calls.
> 

You still want timeouts, creating/setting/destroying at timer just for
a single call to select/poll/epoll is probably too heavy weight.

timerfd() still leaves out the basic clock selection functionality
provided by both setitimer() and timer_create().

> The same goes for signals. The whole "pselect()" thing shows that signals 
> really *should* have been file descriptors, and suddenly you don't need 
> "pselect()" at all.
> 
> So the "not remotely file-like" is not actually a real argument. One of 
> the big *points* of UNIX was that it unified a lot under the general 
> umbrella of a "file descriptor". Davide just unifies even more.
>
>   Linus
-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/9] signalfd/timerfd v1 - timerfd core ...

2007-03-10 Thread Nicholas Miell

On Sat, 2007-03-10 at 17:57 -0800, Davide Libenzi wrote:
> On Sat, 10 Mar 2007, Nicholas Miell wrote:
> 
> > If that's the goal, somebody should start thinking about reducing the
> > contents of struct file to the bare minimum (i.e. not much more than a
> > file_operations pointer).
> 
> That's already pretty smal, and the single inode (and maybe dentry) will 
> make it even smaller. Unless you want to create brazillions of signalfds,
> timerfds or asyncfds.
> 

Timers don't need dentry or inode pointers or readahead state, etc., do
they? (Beyond the existing VFS expectation, that is.)

> > > And the real point of the whole signalfd() is that there really *are* a 
> > > lot of UNIX interfaces that basically only work with file descriptors. 
> > > Not 
> > > just read, but select/poll/epoll.
> > 
> > It'd be useful if the polling interfaces could return small datums
> > beyond just the POLL* flags -- having to do a read on timerfd just to
> > get the overrun count has a lot of overhead for just an integer, and I
> > imagine other things would like to pass back stuff too.
> ...
> 
> > You still want timeouts, creating/setting/destroying at timer just for
> > a single call to select/poll/epoll is probably too heavy weight.
> 
> Take a look at what timerfd does and what posix timers has to do to 
> implement the interface. You'll prolly stop trolling with things like "a 
> lot of overhead" or "too heavy weight".

That wasn't a troll. I was talking about the timerfd()/close() overhead
and the corresponding bookkeeping necessary to keep that fd around
compared to just passing a struct timespec to poll or a millisecond
count to epoll_wait.

> > timerfd() still leaves out the basic clock selection functionality
> > provided by both setitimer() and timer_create().
> 
> That is coming as soon as I fixed my send-serie script ...

Nice.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-21 Thread Nicholas Miell

On Sun, 2007-01-21 at 05:03 -0500, Robert P. J. Day wrote:
>   Introduce the TRUE and FALSE boolean macros so that everyone can
> stop re-inventing them, and remove the one occurrence in the source
> tree that clashes with that change.
> 

If you're going to introduce true and false macros, you should probably
use the official all-lowercase C99 version.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sigaction's ucontext_t with incorrect stack reference when SA_SIGINFO is being used ?

2007-01-22 Thread Nicholas Miell

On Mon, 2007-01-22 at 09:57 +0100, Xavier Roche wrote:
> Hi folks,
> 
> I have a probably louzy question regarding sigaction() behaviour when an
> alternate signal stack is used: it seems that I can not get the user
> stack reference in the ucontext_t stack context ; ie. the uc_stack
> member contains reference of the alternate signal stack, not the stack
> that was used before the crash.
> 
> Is this is a normal behaviour ? Is there a way to retrieve the original
> user's stack inside the signal callback ?
> 
> The example given below demonstrates the issue:
> top of stack==0x7f3d7000, alternative_stack==0x501010
> SEGV==0x7f3d6ff8; sp==0x501010; current stack is the alternate stack
> 
> It is obvious that the SEGV was a stack overflow: the si_addr address is
> just on the page below the stack limit.

POSIX says:
"the third argument can be cast to a pointer to an object of type
ucontext_t to refer to the receiving thread's context that was
interrupted when the signal was delivered."

so if uc_stack doesn't point to the stack in use immediately prior to
signal generation, this is a bug.

(In theory I should be able to pass the ucontext_t supplied to the
signal handler to setcontext() and resume execution exactly where I left
off -- glibc's refusal to support kernel-generated ucontexts gets in the
way of this, but the point still stands.)

I have no idea who to bother about i386 signal delivery, though. (And I
suspect this bug has probably been copied to other architectures as
well.)

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
> In a stunning turn of events, I've actually been able to make another -rc 
> release despite all the discussion (*cough*flaming*cough*) about other 
> issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
> out there!
> 

signalfd still has the broken behavior w.r.t. signal delivery to
threads.

Is this going to get fixed before 2.6.22 proper is released, or should
it just be disabled entirely so no userspace apps grow to depend on
current wrong behavior?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sun, 2007-06-17 at 10:01 -0700, Davide Libenzi wrote:
> On Sun, 17 Jun 2007, Nicholas Miell wrote:
> 
> > On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
> > > In a stunning turn of events, I've actually been able to make another -rc 
> > > release despite all the discussion (*cough*flaming*cough*) about other 
> > > issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
> > > out there!
> > > 
> > 
> > signalfd still has the broken behavior w.r.t. signal delivery to
> > threads.
> > 
> > Is this going to get fixed before 2.6.22 proper is released, or should
> > it just be disabled entirely so no userspace apps grow to depend on
> > current wrong behavior?
> 
> At the moment, with Ben's patch applied, signalfd can see all group-sent 
> signals, and locally-directed thread signals.

But there's still no way for multiple threads to read from a single
signalfd and get their own thread-specific signals in addition to
process-wide signals, right? I think this was agreed to be the least
surprising behavior.

> Linus, we can leave this as is, or we can use the ququed-signalfd that was
> implemented in the first versions of signalfd. In such case, since 
> signalfd hooks to the sighand, all signals will be visible to signalfd and 
> they will not compete against dequeue_signal with the tasks. So there will 
> be no races in the queue retrieval. The issue that remained to be solved 
> was a simple way to limit memory allocated by the queue.
> What do you prefer?
> 
> 
> 
> - Davide

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: And now for something _totally_ different: Linux v2.6.22-rc5

2007-06-17 Thread Nicholas Miell

On Sun, 2007-06-17 at 16:49 -0700, Davide Libenzi wrote:
> On Sun, 17 Jun 2007, Nicholas Miell wrote:
> 
> > On Sun, 2007-06-17 at 10:01 -0700, Davide Libenzi wrote:
> > > On Sun, 17 Jun 2007, Nicholas Miell wrote:
> > > 
> > > > On Sat, 2007-06-16 at 20:33 -0700, Linus Torvalds wrote:
> > > > > In a stunning turn of events, I've actually been able to make another 
> > > > > -rc 
> > > > > release despite all the discussion (*cough*flaming*cough*) about 
> > > > > other 
> > > > > issues, and we now have a brand-spanking-new Linux 2.6.22-rc5 release 
> > > > > out there!
> > > > > 
> > > > 
> > > > signalfd still has the broken behavior w.r.t. signal delivery to
> > > > threads.
> > > > 
> > > > Is this going to get fixed before 2.6.22 proper is released, or should
> > > > it just be disabled entirely so no userspace apps grow to depend on
> > > > current wrong behavior?
> > > 
> > > At the moment, with Ben's patch applied, signalfd can see all group-sent 
> > > signals, and locally-directed thread signals.
> > 
> > But there's still no way for multiple threads to read from a single
> > signalfd and get their own thread-specific signals in addition to
> > process-wide signals, right? I think this was agreed to be the least
> > surprising behavior.
> 
> Multiple threads can wait on the signalfd. Each one will dequeue either 
> its own private signals (tsk->pending) or the process shared ones 
> (tsk->signal->shared_pending). This will be the behaviour once Ben's patch 
> is applied.
> 

Ah, ok, that's great.

I didn't see anything like that in linux.git, missed Ben's patch to the
list, and mixed up your description with the original TIF_SIGPENDING
work.

Sorry for the confusion.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 13:22 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2007-06-04 at 19:38 -0700, Davide Libenzi wrote:
> > >  - I still think there's something wrong with dequeue_signal() being
> > > potentially called with a task different than current by signalfd, since
> > > __dequeue_signal() (among others) mucks around with current regardless.
> > > I'd love to just make signalfd's read() only do anything if current ==
> > > ctx->tsk and remove the task argument from dequeue_signal... that would
> > > fix it nicely too no ?
> > 
> > There's got to be a clean solution that does not limit signalfd, no? I 
> > have no time to look at it immediately, but I can look into it in the 
> > next few days, if someone else does not do it before...
> 
> Is there a real usage to dequeuing somebody else signals with signalfd ?
> If yes, then we can do something around the lines of passing task down
> to __dequeue_signal, though I'm not too sure waht this notifier is about
> and wether it might rely on being called from within the affected task
> context...
> 
> Ben.
> 

signalfd() doesn't deliver thread-targeted signals to the wrong threads,
does it?

Hmm.

It looks like reading from a signalfd will give you either
process-global signals or the thread-specific signals that are targeted
towards the thread that originally created the signalfd (regardless of
which thread actually calls read()).

Which is weird, to say the least. Definitely needs to be noted in the
man page, which doesn't seem to exist yet.

Is there a reason why signalfd() doesn't behave like regular signals in
this regard?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:27 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2007-06-04 at 23:09 -0700, Nicholas Miell wrote:
> > signalfd() doesn't deliver thread-targeted signals to the wrong
> > threads,
> > does it?
> > 
> > Hmm.
> > 
> > It looks like reading from a signalfd will give you either
> > process-global signals or the thread-specific signals that are
> > targeted
> > towards the thread that originally created the signalfd (regardless of
> > which thread actually calls read()).
> > 
> > Which is weird, to say the least. Definitely needs to be noted in the
> > man page, which doesn't seem to exist yet.
> > 
> > Is there a reason why signalfd() doesn't behave like regular signals
> > in
> > this regard? 
> 
> It's worse than that ... by being able to call dequeue_signal from the
> contxt of another thread than the one dequeuing from.
> 
> Ben.

Yes, that's certainly wrong, but that's an implementation issue. I was
more concerned about the design of the API.

Naively, I would expect a reads on a signalfd to return either process
signals or thread signals targeted towards the thread doing the read.

What it actually does (delivering process signals or thread signals
targeted towards the thread that created the signalfd) is weird.

For one, it means you can't create a single signalfd, stick it in an
epoll set, and then wait on that set from multiple threads.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
> On Tue, 5 Jun 2007, Nicholas Miell wrote:
> 
> > Yes, that's certainly wrong, but that's an implementation issue. I was
> > more concerned about the design of the API.
> > 
> > Naively, I would expect a reads on a signalfd to return either process
> > signals or thread signals targeted towards the thread doing the read.
> > 
> > What it actually does (delivering process signals or thread signals
> > targeted towards the thread that created the signalfd) is weird.
> > 
> > For one, it means you can't create a single signalfd, stick it in an
> > epoll set, and then wait on that set from multiple threads.
> 
> In your box threads do share the sighand, don't they? :)
> 

I have no idea what you're trying to say, but it doesn't appear to
address the issue I raise.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 17:37 -0700, Davide Libenzi wrote:
> On Tue, 5 Jun 2007, Nicholas Miell wrote:
> 
> > On Tue, 2007-06-05 at 17:11 -0700, Davide Libenzi wrote:
> > > On Tue, 5 Jun 2007, Nicholas Miell wrote:
> > > 
> > > > Yes, that's certainly wrong, but that's an implementation issue. I was
> > > > more concerned about the design of the API.
> > > > 
> > > > Naively, I would expect a reads on a signalfd to return either process
> > > > signals or thread signals targeted towards the thread doing the read.
> > > > 
> > > > What it actually does (delivering process signals or thread signals
> > > > targeted towards the thread that created the signalfd) is weird.
> > > > 
> > > > For one, it means you can't create a single signalfd, stick it in an
> > > > epoll set, and then wait on that set from multiple threads.
> > > 
> > > In your box threads do share the sighand, don't they? :)
> > > 
> > 
> > I have no idea what you're trying to say, but it doesn't appear to
> > address the issue I raise.
> 
> "For one, it means you can't create a single signalfd, stick it in an
>  epoll set, and then wait on that set from multiple threads."
> 
> Why not?
> A signalfd, like I said, is attached to the sighand, that is shared by the 
> threads.
> 
> 

POSIX requires the following:

"At the time of generation, a determination shall be made whether the
signal has been generated for the process or for a specific thread
within the process. Signals which are generated by some action
attributable to a particular thread, such as a hardware fault, shall be
generated for the thread that caused the signal to be generated. Signals
that are generated in association with a process ID or process group ID
or an asynchronous event, such as terminal activity, shall be generated
for the process."

In practice, this means that signals like SIGSEGV/SIGFPE/SIGILL/etc. and
signals generated by pthread_kill() (i.e. tkill() or tgkill()) are
directed to a specific threads, while other signals are directed to the
process as a whole and serviced by any thread that isn't blocking that
specific signal.

Linux accomplishes this by having two lists of pending signals --
current->pending is the per-thread list and
current->signal->shared_pending is the process-wide list.

dequeue_signal(tsk, ...) looks for signals first in tsk->pending and
then in tsk->signal->shared_pending.

sys_signalfd() stores current in signalfd_ctx. signalfd_read() passes
that context to signalfd_dequeue, which passes that that saved
task_struct pointer to dequeue_signal.

This means that a signalfd will deliver signals targeted towards either
the original thread that created that signalfd, or signals targeted
towards the process as a whole.

This means that a single signalfd is not adequate to handle signal
delivery for all threads in a process, because signals targeted towards
threads other than the thread that originally created the signalfd will
never be queued to that signalfd.

Is my analysis wrong?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-05 Thread Nicholas Miell

On Tue, 2007-06-05 at 20:37 -0700, Linus Torvalds wrote:
> 
> On Tue, 5 Jun 2007, Davide Libenzi wrote:
> > On Wed, 6 Jun 2007, Benjamin Herrenschmidt wrote:
> > > 
> > > Yeah, synchronous signals should probably never be delivered to another
> > > process, even via signalfd. There's no point delivering a SEGV to
> > > somebody else :-)
> > 
> > That'd be a limitation. Like you can choose to not handle SEGV, you can 
> > choose to have a signalfd listening to it. Of course, not with the 
> > intention to *handle* the signal, but with a notification intent.
> 
> I agree that it would be a limitation, but it would be a sane one.
> 
> How about we try to live with that limitation, if only to avoid the issue 
> of having the private signals being stolen by anybody else. If we actually 
> find a real-live use-case where that is bad in the future, we can re-visit 
> the issue - it's always easier to _expand_ semantics later than it is to 
> restrict them, so I think this thread is a good argument for starting it 
> out in a more restricted form before people start depending on semantics 
> that can be nasty..
> 
>   Linus

Proposed semantics:

a) Process-global signals can be read by any thread (inside or outside
of the process receiving the signal).

Rationale:
This should always work, so there's no reason to limit it.

b) Thread-specific signals can only be read by their target thread.

Rationale:
This behavior is required by POSIX, and if an application is using
pthread_kill()/tkill()/tgkill()/etc. to specifically direct a signal, it
damn well better get to where the app wants it to go.

c) Synchronous signals ("Naturally" generated SIGILL, SIGFPE, SIGSEGV,
SIGBUS, and SIGTRAP. Did I miss any?) are not delivered via signalfd()
at all. (And by "naturally" generated, I mean signals that would have
the SI_KERNEL flag set.)

Rationale: 
These are a subset of thread-specific signals, so they can only be read
from a signalfd by their target thread.

However, there's no way for the target thread to get the signal because
it is either:

a) not blocked in a syscall waiting for signal delivery and thus further
execution beyond the instruction causing the signal is impossible
 OR
b) it is blocked in a syscall waiting for signal delivery and the error
is caused by the signal delivery mechanism itself (i.e. a bad pointer
passed to read/select/poll/epoll_wait/etc.) and thus the signal can't be
delivered

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: JIT emulator needs

2007-06-08 Thread Nicholas Miell

On Fri, 2007-06-08 at 12:10 +0100, Alan Cox wrote:
> > e. mremap() flag to get a read/write mapping of a read/exec one
> > f. mremap() flag to get a read/exec mapping of a read/write one
> > g. mremap() flag to make the 5th arg (new addr) be the upper limit
> 
> This is all mprotect and munmap.

I think he's asking for a way to copy an existing mapping, which does
sound genuinely useful. (i.e. mremap(ptr, size, size, MREMAP_COPY), with
no need to mess with files to get multiple mappings of the same region)

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-05-31 Thread Nicholas Miell

On Thu, 2007-05-31 at 14:09 -0400, Ulrich Drepper wrote:
> diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> index c154b9d..b847741 100644
> --- a/include/asm-generic/fcntl.h
> +++ b/include/asm-generic/fcntl.h
> @@ -48,6 +48,9 @@
>  #ifndef O_NOATIME
>  #define O_NOATIME0100
>  #endif
> +#ifndef O_CLOEXEC
> +#define O_CLOEXEC0200/* set close_on_exec */
> +#endif
>  #ifndef O_NDELAY
>  #define O_NDELAY O_NONBLOCK
>  #endif

O_CLOSEONEXEC, perhaps?

We don't want to create another "creat" here... :)

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-27 Thread Nicholas Miell

On Fri, 2007-04-27 at 12:55 -0400, Theodore Tso wrote:
> On Thu, Apr 26, 2007 at 10:15:28PM -0700, Andrew Morton wrote:
> > And hardware gets better.  If Intel & AMD come out with a 16k pagesize
> > option in a couple of years we'll look pretty dumb.  If the problems which
> > you're presently having with that controller get sorted out in the next
> > generation of the hardware, we'll also look pretty dumb.
> 
> Unfortunately, this isn't a problem with hardware getting better, but
> a willingness to break backwards compatibility.
> 
> x86_64 uses a 4k page size to avoid breaking 32-bit applications.  And
> unfortunately, iirc, even 64-bit applications are continuing to depend
> on 4k page alignments for things like the text and bss segments.  If
> the userspace ELF and other compiler/linker specifications were
> appropriate written so they could handle 16k pagesizes, maybe 5 years
> from now we could move to a 16k pagesize.  But this is going to
> require some coordination between the userspace binutils folks and
> AMD/Intel in order to plan such a migration.
> 
>   - Ted

The AMD64 psABI requires binaries to work with any page size up to 64k.

Whether that's true in practice is another matter entirely, of course.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 116 matches

Mail list logo