Re: [BUG] scheduler: first timeslice of the exiting thread

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 23:09 -0700, Andrew Morton wrote:
> On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi <[EMAIL PROTECTED]> wrote:
> 
> > When I was examining the following program ...
> > 
> >   1. There are a large amount of small jobs takes several msecs,
> >  and the number of job increases constantly.
> >   2. The process creates a thread or a process per job (I examined both
> >  the thread model and the process model).
> >   3. Each child process/thread does the assigned job and exit immediately.
> > 
> > ... I found that the thread model's latency is longer than proess
> > model's one against my expectation. It's because of the current
> > sched_fork()/sched_exit() implementation as follows:
> > 
> >   a) On sched_fork, the creator share its timeslice with new process.
> >   b) On sched_exit, if the exiting process didn't exhaust its first
> >  timeslice yet, it gives its timeslice to the parent.
> > 
> > It has no problem on the process model since the creator is the parent.
> > However, on the thread model, the creator is not the parent, it is same
> > as the creator's parent. Hence, on this kind of program, the creator
> > can't retrieve shared timeslice and exausts its timeslice at a rate of
> > knots. In addition, somehow, the parent (typically shell?) gets extra
> > timeslice.
> > 
> > I believe it's a bug and the exiting process should give its timeslice
> > to the creator. Now I have some patch plan to fix this problem as follow:
> > 
> >  a) Add the field for the creator to task_struct. It needs extra memory.
> >  b) Doesn't add extra field and have thread's parent the creater, which is
> > same as process creation. However it has many side effects, for example,
> > we also need to change sys_getppid() implementation.
> > 
> > What do you think? Any comments are welcome.
> 
> This comes at an awkward time, because we might well merge the
> staircase/deadline work into 2.6.22, and I think it rewrites the part of
> the scheduler which is causing the problems you're observing.
> 
> Has anyone verified that SD fixes this problem and the one at
> http://lkml.org/lkml/2007/4/7/21 ?

Not verified either way in testing, but I believe this should be a
problem for SD as well because timeslice fork/exit handling is identical
with mainline.  Individual slices are much smaller than mainline, so
priority should drop rapidly, consuming bandwidth allotted for the
current rotation, sending the creator off to the expired array
prematurely.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SD scheduler testing hitch

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 21:34 +0300, Al Boldi wrote:
> Mike Galbraith wrote:
> > On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:
> > > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> > > here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
> > > starving 10% duty cycle task has trouble getting 1% CPU.
> >
> > Hmm.  Playing with it some more today, it still happens, but it's not
> > very repeatable.  Something is odd.  I wonder if any SD using readers
> > will try it.
> 
> Tried it on mainline 2.6.20.3.
> It's not easily repeatable, but it's got the same problem.
> 
> top - 21:21:45 up 27 min,  0 users,  load average: 0.80, 0.43, 0.20
> Tasks:  45 total,   3 running,  42 sleeping,   0 stopped,   0 zombie
> Cpu(s):  24.3% user,   0.5% system,   0.0% nice,  75.0% idle,   0.2% IO-wait
> Mem:499488k total,27352k used,   472136k free, 1996k buffers
> Swap:  1020088k total,0k used,  1020088k free, 9160k cached
> 
>   PID  PR  NI  VIRT  RES  SHR SWAP nFLT nDRT WCHAN S %CPUTIME+  
> Command 
>   688  25   0  1804  412  352 139200 rest_init R 94.7   2:37.01 
> fairtest
>   689  15   0  1804  264  204 154000 rest_init R  0.0   0:00.79 
> fairtest

Aha!  Thanks a bunch for testing it.  (thing was irritating me greatly)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Mon, 2007-04-09 at 01:23 -0400, Gene Heskett wrote:

> This may not be so informative, its almost behaving ATM.
> 
> 29252 amanda22   0  1856  572  220 R 76.4  0.1   1:07.24 gzip
> 29235 amanda15   0  2992 1224  888 S  5.6  0.1   0:02.80 chunker
> 29500 root  18   0  2996 1164  788 S  4.0  0.1   0:02.40 tar
> 10459 amanda15   0  3340 1052  832 S  3.0  0.1   0:49.04 amandad
> 10536 amanda15   0  3276 1308 1004 S  2.3  0.1   0:40.92 dumper
> 29496 amanda18   0  2808  472  280 S  2.0  0.0   0:01.73 sendbackup
>  4057 gkrellmd  15   0 11568 1172  896 S  1.3  0.1   7:45.82 gkrellmd
> 29498 amanda18   0  2396  780  656 S  1.0  0.1   0:00.60 tar
> 19183 root  15   0 000 S  0.7  0.0   0:01.92 pdflush
> 

Yeah, this is showing the scheduler behaving properly.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] scheduler: first timeslice of the exiting thread

2007-04-08 Thread Andrew Morton
On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi <[EMAIL PROTECTED]> wrote:

> When I was examining the following program ...
> 
>   1. There are a large amount of small jobs takes several msecs,
>  and the number of job increases constantly.
>   2. The process creates a thread or a process per job (I examined both
>  the thread model and the process model).
>   3. Each child process/thread does the assigned job and exit immediately.
> 
> ... I found that the thread model's latency is longer than proess
> model's one against my expectation. It's because of the current
> sched_fork()/sched_exit() implementation as follows:
> 
>   a) On sched_fork, the creator share its timeslice with new process.
>   b) On sched_exit, if the exiting process didn't exhaust its first
>  timeslice yet, it gives its timeslice to the parent.
> 
> It has no problem on the process model since the creator is the parent.
> However, on the thread model, the creator is not the parent, it is same
> as the creator's parent. Hence, on this kind of program, the creator
> can't retrieve shared timeslice and exausts its timeslice at a rate of
> knots. In addition, somehow, the parent (typically shell?) gets extra
> timeslice.
> 
> I believe it's a bug and the exiting process should give its timeslice
> to the creator. Now I have some patch plan to fix this problem as follow:
> 
>  a) Add the field for the creator to task_struct. It needs extra memory.
>  b) Doesn't add extra field and have thread's parent the creater, which is
> same as process creation. However it has many side effects, for example,
> we also need to change sys_getppid() implementation.
> 
> What do you think? Any comments are welcome.

This comes at an awkward time, because we might well merge the
staircase/deadline work into 2.6.22, and I think it rewrites the part of
the scheduler which is causing the problems you're observing.

Has anyone verified that SD fixes this problem and the one at
http://lkml.org/lkml/2007/4/7/21 ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Mon, 2007-04-09 at 01:16 -0400, Gene Heskett wrote:
> On Monday 09 April 2007, Mike Galbraith wrote:
> >So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the
> >problem?
> >
> > -Mike
> 
> That looks as if it should demo it pretty well if I understand correctly 
> everything you're doing there.

Well, I let it process my ~250GB of data with my current tree, and it
looked utterly harmless (and since I'm running SMP, was of course).
I'll try building UP to make sure, and check mainline as well.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Mon, 2007-04-09 at 00:08 -0400, Gene Heskett wrote:
> On Monday 09 April 2007, Mike Galbraith wrote:
> >
> >
> >Actually, there was practically nil interest in testing.  We made a
> >couple of minor adjustments to the interactivity logic, and all went
> >quiet, so I didn't think it was enough of a problem to require more
> >intrusive countermeasures.
> >
> > -Mike
> 
> Does one of these messages have a url so I can test the latest of your 
> patches for -rc6?  Or was the one Ingo sent the most recent?

No, my tree has a bugfix and some other adjustments that try to move the
balance closer to fair without sacrificing interactivity.

> Putting that url in your sig would be nice, and might result in its 
> getting a lot more exersize which should = more feedback.

When I get it cleaned up and better tested, I'll post again.  If you
want, I'll CC you... willing victims are a highly valued commodity :)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SD scheduler testing hitch

2007-04-08 Thread Mike Galbraith
On Mon, 2007-04-09 at 02:23 +0200, Dmitry Adamushko wrote:
> > [...]
> > Well, it's a late hour, so maybe I'm missing something... but it does
> > look to be HZ and "will run" time interval related issue. Like
> > described in (*). Or maybe we both observe similar situations but have
> > different reasons behind them.
> 
> I meant that account_user_time() is also called from timer_ISR ->
> update_process_times() like scheduler_tick(). So if task's running
> intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
> may be wrong for such a task...

I think you're right wrt percentages, and that's making accurate
measurement of SD fairness difficult.  However, total runtime for user
tasks should be pretty accurate for kernels that use nanoseconds,
because they're added every time a tasks passes through schedule().

BTW, the aberration I noticed with my unverified "testcase" does _seem_
to be repeatable here.  Once behavior changes, after a reboot the
repeatability returns.  I have no idea what's going on, but something is
sure fishy.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fw: Re: + add-locking-to-evdev.patch added to -mm tree

2007-04-08 Thread Paul E. McKenney
On Fri, Mar 30, 2007 at 02:06:05PM -0700, [EMAIL PROTECTED] wrote:
> 
> The patch titled
>  Add locking to evdev
> has been added to the -mm tree.  Its filename is
>  add-locking-to-evdev.patch
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this
> 
> --
> Subject: Add locking to evdev
> From: Dmitry Torokhov <[EMAIL PROTECTED]>
> 
> Input: evdev - implement proper locking

OK, so I have to ask -- this is protecting multiple clients of a given
mouse or keyboard, right?  Doesn't look like it has much to do with 
connecting multiple mice/keyboards/joysticks/whatever to a given system,
but thought I should ask.

Excellent start, but some concerns marked with "!!!".  If these are
fixed, either by educating me or by appropriate changes, I will ack.

A signal-related question for Oleg marked with "Oleg".

Thanx, Paul

> Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
> Cc: "Paul E. McKenney" <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  drivers/input/evdev.c |  351 
>  1 files changed, 254 insertions(+), 97 deletions(-)
> 
> diff -puN drivers/input/evdev.c~add-locking-to-evdev drivers/input/evdev.c
> --- a/drivers/input/evdev.c~add-locking-to-evdev
> +++ a/drivers/input/evdev.c
> @@ -31,6 +31,8 @@ struct evdev {
>   wait_queue_head_t wait;
>   struct evdev_client *grab;
>   struct list_head client_list;
> + spinlock_t client_lock;

OK, what does this one protect?

o   ev_attach_client(): client_list field (permitting RCU readers).
Adds element.

o   evdev_detach_client(): ditto, but deletes element.

o   evdev_hangup(): scans the list hanging off of the client_list 
field, invoking kill_fasync() on each.  Looks to be delivering
a POLL_HUP to all parties receiving events.

Apparently the lock is preventing an entry from being
deleted out from under evdev_hangup().  Need to check races
with close(), I guess...  (For example, it would be bad
to have the process torn down to the point that it could
not tolerate receiving (or ignoring) a signal before
removing itself from the list.)

o   Readers of the evdev->client_list can use RCU.

> + struct mutex mutex;

And what does this one protect?

o   evdev_flush(): evdev->exist flag (which handles race with RCU removal?)
Also invokes input_flush_device(), which invokes some flush-handler
function.  There may be more issues here, but they would be with
users of evdev rather than with evdev itself, I am guessing.

o   evdev_release(): invokes evdev_ungrab().  This NULLs the
evdev->grab field using rcu_assign_pointer().

o   evdev_write(): invokes evdev_event_from_user() and
input_inject_event().  The former copies from user space, so
->mutex indeed cannot be a spinlock.  Not sure what we are
protecting here -- perhaps event traffic?  @@@

o   evdev_ioctl_handler(): protecting ioctl.  Consistent with
the thought of protecting event traffic.

o   evdev_mark_dead(): protect setting evdev->exist to zero, adding
weight to the speculation under evdev_flush() above that 
->exist handles the race with RCU removal.

o   Readers of evdev->grab can use RCU.  RCU readers caring about
concurrent deletion should check for evdev->exist under evdev->mutex.

Lock order:

o   evdev->client_lock => fown_struct->lock

o   fown_struct->lock => tasklist_lock

o   tasklist_lock => sighand_struct->siglock

o   evdev_table_mutex => evdev->client_lock.

>   struct device dev;
>  };
> 
> @@ -38,39 +40,48 @@ struct evdev_client {
>   struct input_event buffer[EVDEV_BUFFER_SIZE];
>   int head;
>   int tail;
> + spinlock_t buffer_lock;

And what does this one protect?  Presumably a buffer!  ;-)

o   evdev_pass_event(): adding an event to evdev_client->buffer.
This includes the evdev_client->head field.

!!!  Why doesn't this function need to check the
evdev_client->tail field???  How do we know we won't overflow
the buffer???

o   evdev_new_client() [was evdev_open()]: evdev_client->client
field (attaching the evdev to its client, apparently).
Invokes evdev_attach_client() to do the list manipulation
(protected in turn by evdev->client_lock).

Argh...  Strike that -- spin_lock_init() rather than spin_lock().

o   evdev_fetch_next_event(): removing an event from
evdev_client->buffer.  This includes evdev_client->head and
evdev_client->tail.

>   struct fasync_struct *fasync;
>   struct evdev *evdev;
>   struct list_head node;
>  };

Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 09:08 -0400, Ed Tomlinson wrote:
> Hi,
> 
> I am one of those who have been happily testing Con's patches.  
> 
> They work better than mainline here.

(I tried a UP kernel yesterday, and even a single kernel build would
make noticeable hitches if I move a window around. YMMV etc.)

> If one really needs some sort of interactivity booster (I do not with SD), why
> not move it into user space?  With SD it would be simple enough to export
> some info on estimated latency.  With this user space could make a good
> attempt to keep latency within bounds for a set of tasks just by renicing 

I don't think you can have very much effect on latency using nice with
SD once the CPU is fully utilized.  See below.

/*
 * This contains a bitmap for each dynamic priority level with empty slots
 * for the valid priorities each different nice level can have. It allows
 * us to stagger the slots where differing priorities run in a way that
 * keeps latency differences between different nice levels at a minimum.
 * ie, where 0 means a slot for that priority, priority running from left to
 * right:
 * nice -20 
 * nice -10 1001000100100010001001000100010010001000
 * nice   0 0101010101010101010101010101010101010101
 * nice   5 1101011010110101101011010110101101011011
 * nice  10 0110111011011101110110111011101101110111
 * nice  15 0101101101011011
 * nice  19 1110
 */

Nice allocates bandwidth, but as long as the CPU is busy, tasks always
proceed downward in priority until they hit the expired array.  That's
the design.  If X gets busy and expires, and a nice 20 CPU hog wakes up
after it's previous rotation has ended, but before the current rotation
is ended (ie there is 1 task running at wakeup time), X will take a
guaranteed minimum 160ms latency hit (quite noticeable) independent of
nice level.  The only way to avoid it is to use a realtime class.

A nice -20 task has maximum bandwidth allocated, but that also makes it
a bigger target for preemption from tasks at all nice levels as it
proceeds downward toward expiration.  AFAIKT, low latency scheduling
just isn't possible once the CPU becomes 100% utilized, but it is
bounded to runqueue length.  In mainline OTOH, a nice -20 task will
always preempt a nice 0 task, giving it instant gratification, and
latency of lower priority tasks is bounded by the EXPIRED_STARVING(rq)
safety net.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.21-rc5-git] make /proc/acpi/wakeup more useful

2007-04-08 Thread David Brownell
> On Sat, 2007-04-07 at 13:08 -0700, David Brownell wrote:
> > On Friday 06 April 2007 10:01 pm, Greg KH wrote:
> > 
> > > Are you _sure_ you have a 1-to-1 relationship here?  No multiple devices
> > > pointing to the same acpi node?  Or the other way around?  If so, you
> > > are going to have to change the name to be something more unique.
> > 
> > I've wondered that too.  The short answer:  APCI only supports 1-1
> > here.
>
> Right.
>
> >   It will emit warnings if it tries to bind more than one ACPI
> > device to a given "real" device ... but errors the other way are
> > silently ignored.
>
> My understanding is different.
> First, one "real" device can only have one device.archdata.acpi_handle,
> which means it can only be bound to one ACPI device.
> Second, AE_ALREADY_EXISTS will be returned when ACPI tries to bind more
> than one "real" devices to the same ACPI device.

Exactly.  The "first" case emits a warning, the "second" case doesn't;
no matter what it is (though I only saw ALREADY_EXISTS).


When I added a warning to that case:

> > By adding a warning over this create-links patch, I found that the
> > system in the $SUBJECT patch (and likely every ACPI system) has
> > two different nodes that correspond to one ACPI node:
> > 
> > /sys/devices/pci:00 ... pci root node
> > /sys/devices/pnp0/00:00 ... id PNP0a03
> > /sys/devices/acpi_system:00/device:00/PNP0A03:00 ... ditto
> > 
> > Arguably that's too many sysfs nodes for one device...

Presumably you've noticed this same thing (not necessarily pnp0/00:00)
on other systems ...


> > Plus, there's the issue of flakey ACPI tables; in the $SUBJECT patch
> > both MDM and AUD nodes exist in the ACPI namespace, but they could
> > only refer to one PCI device (with MDM as the wakeup source, not AUD
> > as listed in the table).  Or maybe that's another case where the ACPI
> > code isn't handling the tables as sensibly as it might...
>
> Could you attach this acpidump please? :)

Off-list; yes.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Gene Heskett
On Monday 09 April 2007, Mike Galbraith wrote:
>On Sun, 2007-04-08 at 13:57 -0400, Gene Heskett wrote:
>> On Sunday 08 April 2007, Mike Galbraith wrote:
>> >On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote:
>> >> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote:
>> >> > That seems to be the killer loading here, building a kernel (make
>> >> > -j3) doesn't seem to lag it all that bad.  One session of gzip
>> >> > -best makes it fall plumb over though, which was a
>> >> > disappointment.
>> >>
>> >> Can you make a testcase that doesn't require amanda?
>> >
>> >Or at least send me a couple of 5 or 10 second top snapshots (which
>> > also show CPU usage of sleeping tasks) while the system is
>> > misbehaving?
>> >
>> >-Mike
>>
>> With what monitor utility?
>
>Top.
>
>   -Mike

This may not be so informative, its almost behaving ATM.

29252 amanda22   0  1856  572  220 R 76.4  0.1   1:07.24 gzip
29235 amanda15   0  2992 1224  888 S  5.6  0.1   0:02.80 chunker
29500 root  18   0  2996 1164  788 S  4.0  0.1   0:02.40 tar
10459 amanda15   0  3340 1052  832 S  3.0  0.1   0:49.04 amandad
10536 amanda15   0  3276 1308 1004 S  2.3  0.1   0:40.92 dumper
29496 amanda18   0  2808  472  280 S  2.0  0.0   0:01.73 sendbackup
 4057 gkrellmd  15   0 11568 1172  896 S  1.3  0.1   7:45.82 gkrellmd
29498 amanda18   0  2396  780  656 S  1.0  0.1   0:00.60 tar
19183 root  15   0 000 S  0.7  0.0   0:01.92 pdflush

I also note with some disdain that I'm half a megabyte into swap, but I've 
had FF-2.0.0.3 busy for the last hour while amanda was trying to find a 
few cycles at the same time.  Looking at a bunch of pdf's of circuit 
boards to see if I wanna build them for my milling machine.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Fatal Error: Found MS-Windows System -> Repartitioning Disk for Linux...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Gene Heskett
On Monday 09 April 2007, Mike Galbraith wrote:
>On Sun, 2007-04-08 at 13:56 -0400, Gene Heskett wrote:
>> On Sunday 08 April 2007, Mike Galbraith wrote:
>> >On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote:
>> >> That seems to be the killer loading here, building a kernel (make
>> >> -j3) doesn't seem to lag it all that bad.  One session of gzip
>> >> -best makes it fall plumb over though, which was a disappointment.
>> >
>> >Can you make a testcase that doesn't require amanda?
>> >
>> >-Mike
>>
>> Sure.  Try 'tar czf nameofarchive.tar.gz /path/to-dir-to-be-backed-up'
>>
>> Or, from the runtar log from this morning, and this is all one line:
>>
>> runtar.20070408022016.debug:running: /bin/tar: 'gtar' '--create'
>> '--file' '-' '--directory' '/usr/dlds-rpms' '--one-file-system'
>> '--listed-incremental'
>> '/usr/local/var/amanda/gnutar-lists/coyote_usr_dlds-rpms_1.new'
>> '--sparse' '--ignore-failed-read' '--totals' '--exclude-from'
>> '/tmp/amanda/sendbackup._usr_dlds-rpms.20070408022016.exclude' '.'
>>
>> and amanda will if requested, pipe that output through a |gzip -best,
>> and its this process that brings the machine to the table begging for
>> scraps like a puppy.  Tar by itself can be felt but isn't bad.
>
>So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the
>problem?
>
>   -Mike

That looks as if it should demo it pretty well if I understand correctly 
everything you're doing there.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
In /users3 did Kubla Kahn
A stately pleasure dome decree,
Where /bin, the sacred river ran
Through Test Suites measureless to Man
Down to a sunless C.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel

2007-04-08 Thread johnrobertbanks
On Mon, 09 Apr 2007 00:58:53 +0200, "Richard Knutsson"
<[EMAIL PROTECTED]> said:
> Wow, I'm impressed. Think you got the record on how many mails you 
> referenced to in a reply... 

TWO actually. I guess you are easily impressed.

A simple cut and paste error.

> You have got some rude answers and you have called them back on it 

Yeah, I (fairly closely) mimicked their behavior to make a point.

> + you have repeated the same statement several times, that is 
> not the best way of convincing people.

I know you DON'T believe that, as you are about the tenth person to
repeat that "repeating stuff has no effect."

> I believe you picked up the "anti-Reiser religion"-phrase from previous 
> rant-wars (otherwise, why does that "religion"-phrase always come up, 
> and (almost) only when dealing with Reiser-fs), and yes, there has been 
> some clashes caused by both sides, so please be careful when dealing 
> with this matter.

NO. You people simply come across as zealots who work together, against
Reiser4.

Hence the term "anti-Reiser religion."

> Would you be willing to benchmark Reiser4 with some compressed 
> binary-blob and show the time as well as the CPU-usage? 

I might be. I don't really know how to set it all up.

Perhaps if you guided me through it.

> >
> > You deliberately ignored the fact that bad blocks are NOT dealt with by
> > the filesystem,... but by the operating system. Like I said: If your
> > filesystem is writing to bad blocks, then throw away your operating
> > system.
> >   

> I may have missed something, but if my room-mate took my harddrive, 
> screwed it open, wrote a love-letter on the disk with a pencil and then 
> returned it (ok, there may be some more plausible reasons for 
> corruption), is the OS really suppose to handle it? 

Yeah, I can't see how the OS could read the love-letter either.

But one thing is for sure. The FS ain't responsible for reading it.

> Yes, it should not 
> assign any new data to those blocks but should it not also fall into the 
> file-systems domain to be able to restore some/all data?

It's a tough ask of any FS. 

Microsoft's filesystem checker totally roasted all my data on an XP-box
last night. 

I had used ntfsresize to reduce the partition size and had a power
outage. 

Later, Windows booted, ran the filesystem checker, seemed OK. 

Next time I boot, all I get is Input/Output error.

> 
> Just my 2c to the pond
> Richard Knutsson
> 
Addin my 2c
John.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - A no graphics, no pop-ups email service

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-08 Thread Gene Heskett
On Monday 09 April 2007, [EMAIL PROTECTED] wrote:
>
>I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY.
>
Many of us have, and recall the pain.  We'll pass thank you.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
It is indeed desirable to be well descended, but the glory belongs to
our ancestors.
-- Plutarch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-08 Thread Jeff Garzik

[EMAIL PROTECTED] wrote:

YOU GUYS WILL LAUGH ABOUT THIS:


Yes, we are laughing at you.

You keep using bonnie++ after being told it's a poor benchmark.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-08 Thread Jeff Garzik

[EMAIL PROTECTED] wrote:

REISER4 FOR INCLUSION IN THE LINUX KERNEL.

Dave Lynch takes a reasoned approach to REISER4.

Dave Lynch wrote:

Jeff Garzik wrote:

If the compelling reason is that it needs a test, I'd say its not ready.


Can you please elaborate ? I am not sure I understand what you are
arguing ?


Jeff Garzik is "saying" that he wants REISER4 to stay out of the main
kernel, for reasons he is not willing to tell you.


False.  I have told you the reasons.



I for one would at least play with it if it were in the distribution
tree.


I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY.


You can download it now.  Nobody is stopping you, or anyone else.


As far as I could tell Hans pretty much everything else that 
was demanded. Hans eventually caved and provided - albeit with much 
pissing and moaning, and holy than thou rhetoric.


It was not his pissing and moaning, etc,... these were just excuses to
keep REISER4 from succeeding. The truth is, that any excuse would do.

The real reasons are financial and backed by big money (sometimes, big
egos).


Put down the conspiracy crackpipe.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-08 Thread johnrobertbanks
YOU GUYS WILL LAUGH ABOUT THIS:

I forgot the all the statistics that might support the sase for REISER4
inclusion.

Well, here it all is:

http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.


Column one measures the time taken to complete the bonnie++ benchmarking
test (run with the parameters bonnie++ -n128:128k:0)

Column two, Disk Usage: measures the amount of disk used to store 655MB
of raw data (which was 3 different copies of the Linux kernel sources).

OR LOOK AT THE FULL RESULTS:

.-.
|File |Disk |Copy |Copy |Tar  |Unzip| Del |
|System   |Usage|655MB|655MB|Gzip |UnTar| 2.5 |
|Type | (MB)| (1) | (2) |655MB|655MB| Gig |
.-.
|REISER4 gzip | 213 | 148 |  68 |  83 |  48 |  70 |
|REISER4 lzo  | 278 | 138 |  56 |  80 |  34 |  84 |
|REISER4 tails| 673 | 148 |  63 |  78 |  33 |  65 |
|REISER4  | 692 | 148 |  55 |  67 |  25 |  56 |
|NTFS3g   | 772 |1333 |1426 | 585 | 767 | 194 |
|NTFS | 779 | 781 | 173 |   X |   X |   X |
|REISER3  | 793 | 184 |  98 |  85 |  63 |  22 |
|XFS  | 799 | 220 | 173 | 119 |  90 | 106 |
|JFS  | 806 | 228 | 202 |  95 |  97 | 127 |
|EXT4 extents | 806 | 162 |  55 |  69 |  36 |  32 |
|EXT4 default | 816 | 174 |  70 |  74 |  42 |  50 |
|EXT3 | 816 | 182 |  74 |  73 |  43 |  51 |
|EXT2 | 816 | 201 |  82 |  73 |  39 |  67 |
|FAT32| 988 | 253 | 158 | 118 |  81 |  95 |
.-.


Each test was preformed 5 times and the average value recorded.
Disk Usage: The amount of disk used to store the data (which was 3
different copies of the Linux kernel sources).
The raw data (without filesystem meta-data, block alignment wastage,
etc) was 655MB.
Copy 655MB (1): Copy the data over a partition boundary.
Copy 655MB (2): Copy the data within a partition.
Tar Gzip 655MB: Tar and Gzip the data.
Unzip UnTar 655MB: UnGzip and UnTar the data.
Del 2.5 Gig: Delete everything just written (about 2.5 Gig).


To get a feel for the performance increases that can be achieved by
using compression, we look at the total time (in seconds) to run the
test:

bonnie++ -n128:128k:0 (bonnie++ is Version 1.93c)

.---.
| FILESYSTEM | TIME |
.---.
|REISER4 lzo |  1938|
|REISER4 gzip|  2295|
|REISER4 |  3462|
|EXT4|  4408|
|EXT2|  4092|
|JFS |  4225|
|EXT3|  4421|
|XFS |  4625|
|REISER3 |  6178|
|FAT32   | 12342|
|NTFS-3g |>10414|
.---.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - IMAP accessible web-mail

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add a norecovery option to ext3/4?

2007-04-08 Thread Brad Campbell

Eric Sandeen wrote:

Samuel Thibault wrote:

Hi,

Distribution installers usually try to probe OSes for building a suited
grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
mode, does perform some operations on the filesystem (log recovery).
This is not a good idea since it may silently garbage data.  


Can you elaborate?  Under what circumstances is log replay going to harm 
data?  Do you mean that the installer mounts partitions, looking for 
what OS is installed?  How is that harmful?




It'll wreak havoc on my hibernated system when I've suspended it to do a test OS install on one of 
my spare partitions. The log replay will go fine, but then when the system resumes it's idea of 
what's on the disk won't match what is really there and ugly, ugly things happen.



Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/14] sysfs: fix error handling in binattr write()

2007-04-08 Thread Tejun Heo
Error handling in fs/sysfs/bin.c:write() was wrong because size_t
count is used to receive return value from flush_write() which is
negative on failure.

This patch updates write() such that int variable is used instead.
read() is updated the same way for consistency.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/bin.c |   21 -
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index d3b9f5f..8273dd6 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -33,16 +33,13 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, 
size_t count)
 }
 
 static ssize_t
-read(struct file * file, char __user * userbuf, size_t count, loff_t * off)
+read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
 {
char *buffer = file->private_data;
struct dentry *dentry = file->f_path.dentry;
int size = dentry->d_inode->i_size;
loff_t offs = *off;
-   int ret;
-
-   if (count > PAGE_SIZE)
-   count = PAGE_SIZE;
+   int count = min_t(size_t, bytes, PAGE_SIZE);
 
if (size) {
if (offs > size)
@@ -51,10 +48,9 @@ read(struct file * file, char __user * userbuf, size_t 
count, loff_t * off)
count = size - offs;
}
 
-   ret = fill_read(dentry, buffer, offs, count);
-   if (ret < 0) 
-   return ret;
-   count = ret;
+   count = fill_read(dentry, buffer, offs, count);
+   if (count < 0)
+   return count;
 
if (copy_to_user(userbuf, buffer, count))
return -EFAULT;
@@ -78,16 +74,15 @@ flush_write(struct dentry *dentry, char *buffer, loff_t 
offset, size_t count)
return attr->write(kobj, buffer, offset, count);
 }
 
-static ssize_t write(struct file * file, const char __user * userbuf,
-size_t count, loff_t * off)
+static ssize_t write(struct file *file, const char __user *userbuf,
+size_t bytes, loff_t *off)
 {
char *buffer = file->private_data;
struct dentry *dentry = file->f_path.dentry;
int size = dentry->d_inode->i_size;
loff_t offs = *off;
+   int count = min_t(size_t, bytes, PAGE_SIZE);
 
-   if (count > PAGE_SIZE)
-   count = PAGE_SIZE;
if (size) {
if (offs > size)
return 0;
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/14] sysfs: add sysfs_dirent->s_parent

2007-04-08 Thread Tejun Heo
Add sysfs_dirent->s_parent.  With this patch, each sd points to and
holds a reference to its parent.  This allows walking sysfs tree
without referencing sd->s_dentry which can go away anytime if the user
doesn't control when it's deleted.

sd->s_parent is initialized and parent is referenced in
sysfs_attach_dirent().  Reference to parent is released when the sd is
released, so as long as reference to a sd is held, s_parent can be
followed.

dentry walk in sysfs_readdir() is convereted to s_parent walk.

This will be used to reimplement symlink such that it uses only
sysfs_dirent tree.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c   |   27 ---
 fs/sysfs/mount.c |1 +
 fs/sysfs/sysfs.h |1 +
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 3e460f7..8c35a60 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -16,6 +16,11 @@ DECLARE_RWSEM(sysfs_rename_sem);
 
 void release_sysfs_dirent(struct sysfs_dirent * sd)
 {
+   struct sysfs_dirent *parent_sd;
+
+ repeat:
+   parent_sd = sd->s_parent;
+
if (sd->s_type & SYSFS_KOBJ_LINK) {
struct sysfs_symlink * sl = sd->s_element;
kfree(sl->link_name);
@@ -24,6 +29,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
}
kfree(sd->s_iattr);
kmem_cache_free(sysfs_dir_cachep, sd);
+
+   sd = parent_sd;
+   if (sd && atomic_dec_and_test(&sd->s_count))
+   goto repeat;
 }
 
 static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
@@ -71,8 +80,10 @@ void sysfs_attach_dirent(struct sysfs_dirent *sd,
dentry->d_op = &sysfs_dentry_ops;
}
 
-   if (parent_sd)
+   if (parent_sd) {
+   sd->s_parent = sysfs_get(parent_sd);
list_add(&sd->s_sibling, &parent_sd->s_children);
+   }
 }
 
 /*
@@ -508,7 +519,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, 
filldir_t filldir)
i++;
/* fallthrough */
case 1:
-   ino = (unsigned long)dentry->d_parent->d_fsdata;
+   ino = (unsigned long)parent_sd->s_parent;
if (filldir(dirent, "..", 2, i, ino, DT_DIR) < 0)
break;
filp->f_pos++;
@@ -625,13 +636,13 @@ int sysfs_make_shadowed_dir(struct kobject *kobj,
 
 struct dentry *sysfs_create_shadow_dir(struct kobject *kobj)
 {
+   struct dentry *dir = kobj->dentry;
+   struct inode *inode = dir->d_inode;
+   struct dentry *parent = dir->d_parent;
+   struct sysfs_dirent *parent_sd = parent->d_fsdata;
+   struct dentry *shadow;
struct sysfs_dirent *sd;
-   struct dentry *parent, *dir, *shadow;
-   struct inode *inode;
 
-   dir = kobj->dentry;
-   inode = dir->d_inode;
-   parent = dir->d_parent;
shadow = ERR_PTR(-EINVAL);
if (!sysfs_is_shadowed_inode(inode))
goto out;
@@ -643,6 +654,8 @@ struct dentry *sysfs_create_shadow_dir(struct kobject *kobj)
sd = sysfs_new_dirent(kobj, inode->i_mode, SYSFS_DIR);
if (!sd)
goto nomem;
+   /* point to parent_sd but don't attach to it */
+   sd->s_parent = sysfs_get(parent_sd);
sysfs_attach_dirent(sd, NULL, shadow);
 
d_instantiate(shadow, igrab(inode));
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 23a48a3..141f7b1 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -28,6 +28,7 @@ static const struct super_operations sysfs_ops = {
 };
 
 static struct sysfs_dirent sysfs_root = {
+   .s_count= ATOMIC_INIT(1),
.s_sibling  = LIST_HEAD_INIT(sysfs_root.s_sibling),
.s_children = LIST_HEAD_INIT(sysfs_root.s_children),
.s_element  = NULL,
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 0be1d94..f95ab31 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -1,5 +1,6 @@
 struct sysfs_dirent {
atomic_ts_count;
+   struct sysfs_dirent * s_parent;
struct list_heads_sibling;
struct list_heads_children;
void* s_element;
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 20:51 +0200, Rene Herman wrote:
> On 04/08/2007 12:41 PM, Ingo Molnar wrote:
> 
> > this is pretty hard to get right, and the most objective way to change 
> > it is to do it testcase-driven. FYI, interactivity tweaking has been 
> > gradual, the last bigger round of interactivity changes were done a year 
> > ago:
> > 
> >  commit 5ce74abe788a26698876e66b9c9ce7e7acc25413
> >  Author: Mike Galbraith <[EMAIL PROTECTED]>
> >  Date:   Mon Apr 10 22:52:44 2006 -0700
> > 
> >  [PATCH] sched: fix interactive task starvation
> > 
> > (and a few smaller tweaks since then too.)
> > 
> > and that change from Mike responded to a testcase. Mike's latest changes 
> > (the ones you just tested) were mostly driven by actual testcases too, 
> > which measured long-term timeslice distribution fairness.
> 
> Ah yes, that one. Here's the next one in that series:
> 
> commit f1adad78dd2fc8edaa513e0bde92b4c64340245c
> Author: Linus Torvalds <[EMAIL PROTECTED]>
> Date:   Sun May 21 18:54:09 2006 -0700
> 
>  Revert "[PATCH] sched: fix interactive task starvation"
> 
> It personally had me wonder if _anyone_ was testing this stuff...

Well of course not.  Making random untested changes, and reverting them
later is half the fun of kernel development.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/14] sysfs: consolidate sysfs_dirent creation functions

2007-04-08 Thread Tejun Heo
Currently there are four functions to create sysfs_dirent -
__sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and
sysfs_make_dirent().  Other than sysfs_make_dirent(), no function has
two users if calls to implement other functions are excluded.

This patch consolidates sysfs_dirent creation functions into the
following two.

* sysfs_new_dirent() : allocate and initialize
* sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or
  associate with dentry

This simplifies interface and gives callers more flexibility.  This is
in preparation of object reference simplification.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c |   82 
 fs/sysfs/file.c|   21 ++---
 fs/sysfs/symlink.c |7 ++--
 fs/sysfs/sysfs.h   |7 +++-
 4 files changed, 50 insertions(+), 67 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 0005117..3e460f7 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -42,10 +42,7 @@ static struct dentry_operations sysfs_dentry_ops = {
.d_iput = sysfs_d_iput,
 };
 
-/*
- * Allocates a new sysfs_dirent and links it to the parent sysfs_dirent
- */
-static struct sysfs_dirent * __sysfs_new_dirent(void * element)
+struct sysfs_dirent *sysfs_new_dirent(void *element, umode_t mode, int type)
 {
struct sysfs_dirent * sd;
 
@@ -57,25 +54,25 @@ static struct sysfs_dirent * __sysfs_new_dirent(void * 
element)
atomic_set(&sd->s_event, 1);
INIT_LIST_HEAD(&sd->s_children);
INIT_LIST_HEAD(&sd->s_sibling);
+
sd->s_element = element;
+   sd->s_mode = mode;
+   sd->s_type = type;
 
return sd;
 }
 
-static void __sysfs_list_dirent(struct sysfs_dirent *parent_sd,
- struct sysfs_dirent *sd)
+void sysfs_attach_dirent(struct sysfs_dirent *sd,
+struct sysfs_dirent *parent_sd, struct dentry *dentry)
 {
-   if (sd)
-   list_add(&sd->s_sibling, &parent_sd->s_children);
-}
+   if (dentry) {
+   sd->s_dentry = dentry;
+   dentry->d_fsdata = sysfs_get(sd);
+   dentry->d_op = &sysfs_dentry_ops;
+   }
 
-static struct sysfs_dirent * sysfs_new_dirent(struct sysfs_dirent *parent_sd,
-   void * element)
-{
-   struct sysfs_dirent *sd;
-   sd = __sysfs_new_dirent(element);
-   __sysfs_list_dirent(parent_sd, sd);
-   return sd;
+   if (parent_sd)
+   list_add(&sd->s_sibling, &parent_sd->s_children);
 }
 
 /*
@@ -103,39 +100,6 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd,
return 0;
 }
 
-
-static struct sysfs_dirent *
-__sysfs_make_dirent(struct dentry *dentry, void *element, mode_t mode, int 
type)
-{
-   struct sysfs_dirent * sd;
-
-   sd = __sysfs_new_dirent(element);
-   if (!sd)
-   goto out;
-
-   sd->s_mode = mode;
-   sd->s_type = type;
-   sd->s_dentry = dentry;
-   if (dentry) {
-   dentry->d_fsdata = sysfs_get(sd);
-   dentry->d_op = &sysfs_dentry_ops;
-   }
-
-out:
-   return sd;
-}
-
-int sysfs_make_dirent(struct sysfs_dirent * parent_sd, struct dentry * dentry,
-   void * element, umode_t mode, int type)
-{
-   struct sysfs_dirent *sd;
-
-   sd = __sysfs_make_dirent(dentry, element, mode, type);
-   __sysfs_list_dirent(parent_sd, sd);
-
-   return sd ? 0 : -ENOMEM;
-}
-
 static int init_dir(struct inode * inode)
 {
inode->i_op = &sysfs_dir_inode_operations;
@@ -179,10 +143,11 @@ static int create_dir(struct kobject *kobj, struct dentry 
*parent,
if (sysfs_dirent_exist(parent->d_fsdata, name))
goto out_dput;
 
-   error = sysfs_make_dirent(parent->d_fsdata, dentry, kobj, mode,
- SYSFS_DIR);
-   if (error)
+   error = -ENOMEM;
+   sd = sysfs_new_dirent(kobj, mode, SYSFS_DIR);
+   if (!sd)
goto out_drop;
+   sysfs_attach_dirent(sd, parent->d_fsdata, dentry);
 
error = sysfs_create(dentry, mode, init_dir);
if (error)
@@ -197,7 +162,6 @@ static int create_dir(struct kobject *kobj, struct dentry 
*parent,
goto out_dput;
 
  out_sput:
-   sd = dentry->d_fsdata;
list_del_init(&sd->s_sibling);
sysfs_put(sd);
  out_drop:
@@ -494,13 +458,16 @@ static int sysfs_dir_open(struct inode *inode, struct 
file *file)
 {
struct dentry * dentry = file->f_path.dentry;
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
+   struct sysfs_dirent * sd;
 
mutex_lock(&dentry->d_inode->i_mutex);
-   file->private_data = sysfs_new_dirent(parent_sd, NULL);
+   sd = sysfs_new_dirent(NULL, 0, 0);
+   if (sd)
+   sysfs_attach_dirent(sd, parent_sd, NULL);
mutex_unlock(&dentry->d_inode->i_mutex);
 
-   return file->private_da

[PATCH 01/14] sysfs: fix i_ino handling in sysfs

2007-04-08 Thread Tejun Heo
Inode number handling was incorrect in two ways.

1. sysfs uses the inode number allocated by new_inode() and never
   hashes it.  When reporting the inode number, it uses iunique() if
   inode is inaccessible.  This is incorrect because iunique() assumes
   the inodes are hashed.  This can cause duplicate inode numbers and
   the condition is likely to happen because new_inode() and iunique()
   use separate increasing static counters to scan for empty slot.

2. sysfs_dirent->s_dentry can go away anytime and can't be referenced
   unless the caller knows the dentry is not and not going to be
   deleted.

This patch makes sysfs report the pointer to sysfs_dirent as ino.
ino_t is always as big as or larger than unsigned long && sysfs_dirent
hierarchy is the internal representation of the sysfs tree, so it
makes sense and simple to implement.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c   |   11 ---
 fs/sysfs/inode.c |1 +
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 85a6686..5112f88 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -504,19 +504,19 @@ static int sysfs_readdir(struct file * filp, void * 
dirent, filldir_t filldir)
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
struct sysfs_dirent *cursor = filp->private_data;
struct list_head *p, *q = &cursor->s_sibling;
-   ino_t ino;
+   unsigned long ino;
int i = filp->f_pos;
 
switch (i) {
case 0:
-   ino = dentry->d_inode->i_ino;
+   ino = (unsigned long)parent_sd;
if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0)
break;
filp->f_pos++;
i++;
/* fallthrough */
case 1:
-   ino = parent_ino(dentry);
+   ino = (unsigned long)dentry->d_parent->d_fsdata;
if (filldir(dirent, "..", 2, i, ino, DT_DIR) < 0)
break;
filp->f_pos++;
@@ -538,10 +538,7 @@ static int sysfs_readdir(struct file * filp, void * 
dirent, filldir_t filldir)
 
name = sysfs_get_name(next);
len = strlen(name);
-   if (next->s_dentry)
-   ino = next->s_dentry->d_inode->i_ino;
-   else
-   ino = iunique(sysfs_sb, 2);
+   ino = (unsigned long)next;
 
if (filldir(dirent, name, len, filp->f_pos, ino,
 dt_type(next)) < 0)
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 4de5c6b..b8b010c 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -140,6 +140,7 @@ struct inode * sysfs_new_inode(mode_t mode, struct 
sysfs_dirent * sd)
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
+   inode->i_ino = (unsigned long)sd;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);
 
if (sd->s_iattr) {
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 13:57 -0400, Gene Heskett wrote:
> On Sunday 08 April 2007, Mike Galbraith wrote:
> >On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote:
> >> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote:
> >> > That seems to be the killer loading here, building a kernel (make
> >> > -j3) doesn't seem to lag it all that bad.  One session of gzip -best
> >> > makes it fall plumb over though, which was a disappointment.
> >>
> >> Can you make a testcase that doesn't require amanda?
> >
> >Or at least send me a couple of 5 or 10 second top snapshots (which also
> >show CPU usage of sleeping tasks) while the system is misbehaving?
> >
> > -Mike
> 
> With what monitor utility?

Top.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/14] sysfs: implement kobj_sysfs_assoc_lock

2007-04-08 Thread Tejun Heo
kobj->dentry can go away anytime unless the user controls when the
associated sysfs node is deleted.  This patch implements
kobj_sysfs_assoc_lock which protects kobj->dentry.  This will be used
to maintain kobj based API when converting sysfs to use sysfs_dirent
tree instead of dentry/kobject.

Note that this lock belongs to kobject/driver-model not sysfs.  Once
sysfs is converted to not use kobject in its interface, this can be
removed from sysfs.

This is in preparation of object reference simplification.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c   |8 +++-
 fs/sysfs/sysfs.h |1 +
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 4070dc4..707eba9 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -13,6 +13,7 @@
 #include "sysfs.h"
 
 DECLARE_RWSEM(sysfs_rename_sem);
+spinlock_t kobj_sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
 
 void release_sysfs_dirent(struct sysfs_dirent * sd)
 {
@@ -371,8 +372,13 @@ static void __sysfs_remove_dir(struct dentry *dentry)
 
 void sysfs_remove_dir(struct kobject * kobj)
 {
-   __sysfs_remove_dir(kobj->dentry);
+   struct dentry *d = kobj->dentry;
+
+   spin_lock(&kobj_sysfs_assoc_lock);
kobj->dentry = NULL;
+   spin_unlock(&kobj_sysfs_assoc_lock);
+
+   __sysfs_remove_dir(d);
 }
 
 int sysfs_rename_dir(struct kobject * kobj, struct dentry *new_parent,
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index b1a8a7e..5c41fc5 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -60,6 +60,7 @@ extern void sysfs_remove_subdir(struct dentry *);
 extern void sysfs_drop_dentry(struct sysfs_dirent *sd, struct dentry *parent);
 extern int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
 
+extern spinlock_t kobj_sysfs_assoc_lock;
 extern struct rw_semaphore sysfs_rename_sem;
 extern struct super_block * sysfs_sb;
 extern const struct file_operations sysfs_dir_operations;
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/14] sysfs: make sysfs_dirent->s_element a union

2007-04-08 Thread Tejun Heo
Make sd->s_element a union of sysfs_elem_{dir|symlink|attr|bin_attr}
and rename it to s_elem.  This is to achieve...

* some level of type checking : changing symlink to point to
  sysfs_dirent instead of kobject is much safer and less painful now.
* easier / standardized dereferencing
* allow sysfs_elem_* to contain more than one entry

Where possible, pointer is obtained by directly deferencing from sd
instead of going through other entities.  This reduces dependencies to
dentry, inode and kobject.  to_attr() and to_bin_attr() are unused now
and removed.

This is in preparation of object reference simplification.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/bin.c |   18 ++--
 fs/sysfs/dir.c |   31 +---
 fs/sysfs/file.c|   19 +
 fs/sysfs/inode.c   |2 +-
 fs/sysfs/mount.c   |1 -
 fs/sysfs/symlink.c |   23 +++-
 fs/sysfs/sysfs.h   |   56 ---
 7 files changed, 71 insertions(+), 79 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 8273dd6..0f0027b 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -23,7 +23,8 @@
 static int
 fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count)
 {
-   struct bin_attribute * attr = to_bin_attr(dentry);
+   struct sysfs_dirent *attr_sd = dentry->d_fsdata;
+   struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
struct kobject * kobj = to_kobj(dentry->d_parent);
 
if (!attr->read)
@@ -65,7 +66,8 @@ read(struct file *file, char __user *userbuf, size_t bytes, 
loff_t *off)
 static int
 flush_write(struct dentry *dentry, char *buffer, loff_t offset, size_t count)
 {
-   struct bin_attribute *attr = to_bin_attr(dentry);
+   struct sysfs_dirent *attr_sd = dentry->d_fsdata;
+   struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
struct kobject *kobj = to_kobj(dentry->d_parent);
 
if (!attr->write)
@@ -101,9 +103,9 @@ static ssize_t write(struct file *file, const char __user 
*userbuf,
 
 static int mmap(struct file *file, struct vm_area_struct *vma)
 {
-   struct dentry *dentry = file->f_path.dentry;
-   struct bin_attribute *attr = to_bin_attr(dentry);
-   struct kobject *kobj = to_kobj(dentry->d_parent);
+   struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+   struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
+   struct kobject *kobj = to_kobj(file->f_path.dentry->d_parent);
 
if (!attr->mmap)
return -EINVAL;
@@ -114,7 +116,8 @@ static int mmap(struct file *file, struct vm_area_struct 
*vma)
 static int open(struct inode * inode, struct file * file)
 {
struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
-   struct bin_attribute * attr = to_bin_attr(file->f_path.dentry);
+   struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+   struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
int error = -EINVAL;
 
if (!kobj || !attr)
@@ -150,7 +153,8 @@ static int open(struct inode * inode, struct file * file)
 static int release(struct inode * inode, struct file * file)
 {
struct kobject * kobj = to_kobj(file->f_path.dentry->d_parent);
-   struct bin_attribute * attr = to_bin_attr(file->f_path.dentry);
+   struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+   struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
u8 * buffer = file->private_data;
 
kobject_put(kobj);
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 525c0e7..4070dc4 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -21,11 +21,8 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
  repeat:
parent_sd = sd->s_parent;
 
-   if (sd->s_type & SYSFS_KOBJ_LINK) {
-   struct sysfs_symlink * sl = sd->s_element;
-   kobject_put(sl->target_kobj);
-   kfree(sl);
-   }
+   if (sd->s_type & SYSFS_KOBJ_LINK)
+   kobject_put(sd->s_elem.symlink.target_kobj);
if (sd->s_type & SYSFS_COPY_NAME)
kfree(sd->s_name);
kfree(sd->s_iattr);
@@ -52,8 +49,7 @@ static struct dentry_operations sysfs_dentry_ops = {
.d_iput = sysfs_d_iput,
 };
 
-struct sysfs_dirent *sysfs_new_dirent(const char *name, void *element,
- umode_t mode, int type)
+struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 {
char *dup_name = NULL;
struct sysfs_dirent * sd;
@@ -76,7 +72,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, void 
*element,
INIT_LIST_HEAD(&sd->s_sibling);
 
sd->s_name = name;
-   sd->s_element = element;
sd->s_mode = mode;
sd->s_type = type;
 
@@ -111,7 +106,7 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd,
struct sysfs_dirent * sd;
 

[PATCH 11/14] sysfs: implement bin_buffer

2007-04-08 Thread Tejun Heo
Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE
buffer to properly synchronize accesses to per-openfile buffer and
prepare for immediate-kobj-disconnect.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/bin.c |   64 ++-
 1 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 0f0027b..1dd1bf1 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -20,6 +20,11 @@
 
 #include "sysfs.h"
 
+struct bin_buffer {
+   struct mutexmutex;
+   void*buffer;
+};
+
 static int
 fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count)
 {
@@ -36,7 +41,7 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, 
size_t count)
 static ssize_t
 read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
 {
-   char *buffer = file->private_data;
+   struct bin_buffer *bb = file->private_data;
struct dentry *dentry = file->f_path.dentry;
int size = dentry->d_inode->i_size;
loff_t offs = *off;
@@ -49,17 +54,23 @@ read(struct file *file, char __user *userbuf, size_t bytes, 
loff_t *off)
count = size - offs;
}
 
-   count = fill_read(dentry, buffer, offs, count);
+   mutex_lock(&bb->mutex);
+
+   count = fill_read(dentry, bb->buffer, offs, count);
if (count < 0)
-   return count;
+   goto out_unlock;
 
-   if (copy_to_user(userbuf, buffer, count))
-   return -EFAULT;
+   if (copy_to_user(userbuf, bb->buffer, count)) {
+   count = -EFAULT;
+   goto out_unlock;
+   }
 
pr_debug("offs = %lld, *off = %lld, count = %zd\n", offs, *off, count);
 
*off = offs + count;
 
+ out_unlock:
+   mutex_unlock(&bb->mutex);
return count;
 }
 
@@ -79,7 +90,7 @@ flush_write(struct dentry *dentry, char *buffer, loff_t 
offset, size_t count)
 static ssize_t write(struct file *file, const char __user *userbuf,
 size_t bytes, loff_t *off)
 {
-   char *buffer = file->private_data;
+   struct bin_buffer *bb = file->private_data;
struct dentry *dentry = file->f_path.dentry;
int size = dentry->d_inode->i_size;
loff_t offs = *off;
@@ -92,25 +103,38 @@ static ssize_t write(struct file *file, const char __user 
*userbuf,
count = size - offs;
}
 
-   if (copy_from_user(buffer, userbuf, count))
-   return -EFAULT;
+   mutex_lock(&bb->mutex);
+
+   if (copy_from_user(bb->buffer, userbuf, count)) {
+   count = -EFAULT;
+   goto out_unlock;
+   }
 
-   count = flush_write(dentry, buffer, offs, count);
+   count = flush_write(dentry, bb->buffer, offs, count);
if (count > 0)
*off = offs + count;
+
+ out_unlock:
+   mutex_unlock(&bb->mutex);
return count;
 }
 
 static int mmap(struct file *file, struct vm_area_struct *vma)
 {
+   struct bin_buffer *bb = file->private_data;
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
struct kobject *kobj = to_kobj(file->f_path.dentry->d_parent);
+   int rc;
 
if (!attr->mmap)
return -EINVAL;
 
-   return attr->mmap(kobj, attr, vma);
+   mutex_lock(&bb->mutex);
+   rc = attr->mmap(kobj, attr, vma);
+   mutex_unlock(&bb->mutex);
+
+   return rc;
 }
 
 static int open(struct inode * inode, struct file * file)
@@ -118,6 +142,7 @@ static int open(struct inode * inode, struct file * file)
struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
+   struct bin_buffer *bb = NULL;
int error = -EINVAL;
 
if (!kobj || !attr)
@@ -135,14 +160,22 @@ static int open(struct inode * inode, struct file * file)
goto Error;
 
error = -ENOMEM;
-   file->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
-   if (!file->private_data)
+   bb = kzalloc(sizeof(*bb), GFP_KERNEL);
+   if (!bb)
goto Error;
 
+   bb->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!bb->buffer)
+   goto Error;
+
+   mutex_init(&bb->mutex);
+   file->private_data = bb;
+
error = 0;
-goto Done;
+   goto Done;
 
  Error:
+   kfree(bb);
module_put(attr->attr.owner);
  Done:
if (error)
@@ -155,11 +188,12 @@ static int release(struct inode * inode, struct file * 
file)
struct kobject * kobj = to_kobj(file->f_path.dentry->d_parent);
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
-   u8 * buffer 

[PATCH 14/14] sysfs: kill unnecessary attribute->owner

2007-04-08 Thread Tejun Heo
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 drivers/base/class.c|2 --
 drivers/base/core.c |4 
 drivers/base/firmware_class.c   |2 +-
 drivers/block/pktcdvd.c |3 +--
 drivers/char/ipmi/ipmi_msghandler.c |   10 --
 drivers/cpufreq/cpufreq_stats.c |3 +--
 drivers/cpufreq/cpufreq_userspace.c |2 +-
 drivers/cpufreq/freq_table.c|1 -
 drivers/firmware/dcdbas.h   |3 +--
 drivers/firmware/dell_rbu.c |6 +++---
 drivers/firmware/edd.c  |2 +-
 drivers/firmware/efivars.c  |6 +++---
 drivers/i2c/chips/eeprom.c  |1 -
 drivers/i2c/chips/max6875.c |1 -
 drivers/infiniband/core/sysfs.c |1 -
 drivers/input/mouse/psmouse.h   |1 -
 drivers/media/video/pvrusb2/pvrusb2-sysfs.c |   13 -
 drivers/misc/asus-laptop.c  |3 +--
 drivers/pci/hotplug/acpiphp_ibm.c   |1 -
 drivers/pci/pci-sysfs.c |4 
 drivers/pcmcia/socket_sysfs.c   |2 +-
 drivers/rtc/rtc-ds1553.c|1 -
 drivers/rtc/rtc-ds1742.c|1 -
 drivers/scsi/arcmsr/arcmsr_attr.c   |3 ---
 drivers/scsi/lpfc/lpfc_attr.c   |2 --
 drivers/scsi/qla2xxx/qla_attr.c |6 --
 drivers/spi/at25.c  |1 -
 drivers/video/aty/radeon_base.c |2 --
 drivers/video/backlight/backlight.c |2 +-
 drivers/video/backlight/lcd.c   |2 +-
 drivers/w1/slaves/w1_ds2433.c   |1 -
 drivers/w1/slaves/w1_therm.c|1 -
 drivers/w1/w1.c |2 --
 fs/ecryptfs/main.c  |2 --
 fs/ocfs2/cluster/masklog.c  |1 -
 fs/partitions/check.c   |1 -
 fs/sysfs/bin.c  |   19 +--
 fs/sysfs/file.c |   21 +
 include/linux/sysdev.h  |3 +--
 include/linux/sysfs.h   |7 +++
 kernel/module.c |9 +++--
 kernel/params.c |1 -
 net/bridge/br_sysfs_br.c|3 +--
 net/bridge/br_sysfs_if.c|3 +--
 44 files changed, 35 insertions(+), 130 deletions(-)

diff --git a/drivers/base/class.c b/drivers/base/class.c
index d596812..064c1de 100644
--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -624,7 +624,6 @@ int class_device_add(struct class_device *class_dev)
goto out3;
class_dev->uevent_attr.attr.name = "uevent";
class_dev->uevent_attr.attr.mode = S_IWUSR;
-   class_dev->uevent_attr.attr.owner = parent_class->owner;
class_dev->uevent_attr.store = store_uevent;
error = class_device_create_file(class_dev, &class_dev->uevent_attr);
if (error)
@@ -639,7 +638,6 @@ int class_device_add(struct class_device *class_dev)
}
attr->attr.name = "dev";
attr->attr.mode = S_IRUGO;
-   attr->attr.owner = parent_class->owner;
attr->show = show_dev;
error = class_device_create_file(class_dev, attr);
if (error) {
diff --git a/drivers/base/core.c b/drivers/base/core.c
index d7fcf82..37930d0 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -563,8 +563,6 @@ int device_add(struct device *dev)
 
dev->uevent_attr.attr.name = "uevent";
dev->uevent_attr.attr.mode = S_IWUSR;
-   if (dev->driver)
-   dev->uevent_attr.attr.owner = dev->driver->owner;
dev->uevent_attr.store = store_uevent;
error = device_create_file(dev, &dev->uevent_attr);
if (error)
@@ -579,8 +577,6 @@ int device_add(struct device *dev)
}
attr->attr.name = "dev";
attr->attr.mode = S_IRUGO;
-   if (dev->driver)
-   attr->attr.owner = dev->driver->owner;
attr->show = show_dev;
error = device_create_file(dev, attr);
if (

[PATCH 12/14] sysfs: implement sysfs_dirent active reference and immediate disconnect

2007-04-08 Thread Tejun Heo
Opening a sysfs node references its associated kobject, so userland
can arbitrarily prolong lifetime of a kobject which complicates
lifetime rules in drivers.  This patch implements active reference and
makes the association between kobject and sysfs immediately breakable.

Now each sysfs_dirent has two reference counts - s_count and s_active.
s_count is a regular reference count which guarantees that the
containing sysfs_dirent is accessible.  As long as s_count reference
is held, all sysfs internal fields in sysfs_dirent are accessible
including s_parent and s_name.

The newly added s_active is active reference count.  This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other).  Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference.  This includes access
to the associated kobjects, attributes and ops.

The active references can be drained and denied by calling
sysfs_deactivate().  All sysfs_dirents must be deactivated after
deletion but before the default reference is dropped.  This enables
immediate disconnect of sysfs nodes.  Once a sysfs_dirent is deleted,
it won't access any entity external to sysfs proper.

Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references.  Parent's is acquired first and released last.

Unlike other operations, mmapped area lingers on after mmap() is
finished and the module implement implementing it and kobj need to
stay referenced till all the mapped pages are gone.  This is
accomplished by holding one set of active references to the bin_attr
and its parent if there have been any mmap during lifetime of an
openfile.  The references are dropped when the openfile is released.

This change makes sysfs lifetime rules independent from both kobject's
and module's.  It not only fixes several race conditions caused by
sysfs not holding onto the proper module when referencing kobject, but
also helps fixing and simplifying lifetime management in driver model
and drivers by taking sysfs out of the equation.

Please read the following message for more info.

  http://article.gmane.org/gmane.linux.kernel/510293

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/bin.c   |   95 ++--
 fs/sysfs/dir.c   |   18 ++-
 fs/sysfs/file.c  |  130 +++--
 fs/sysfs/inode.c |8 +++-
 fs/sysfs/sysfs.h |  107 +++-
 5 files changed, 245 insertions(+), 113 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 1dd1bf1..69bb8da 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -23,6 +23,7 @@
 struct bin_buffer {
struct mutexmutex;
void*buffer;
+   int mmapped;
 };
 
 static int
@@ -30,12 +31,20 @@ fill_read(struct dentry *dentry, char *buffer, loff_t off, 
size_t count)
 {
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
-   struct kobject * kobj = to_kobj(dentry->d_parent);
+   struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj;
+   int rc;
+
+   /* need attr_sd for attr, its parent for kobj */
+   if (!sysfs_get_active_two(attr_sd))
+   return -ENODEV;
 
-   if (!attr->read)
-   return -EIO;
+   rc = -EIO;
+   if (attr->read)
+   rc = attr->read(kobj, buffer, off, count);
 
-   return attr->read(kobj, buffer, off, count);
+   sysfs_put_active_two(attr_sd);
+
+   return rc;
 }
 
 static ssize_t
@@ -79,12 +88,20 @@ flush_write(struct dentry *dentry, char *buffer, loff_t 
offset, size_t count)
 {
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
-   struct kobject *kobj = to_kobj(dentry->d_parent);
+   struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj;
+   int rc;
+
+   /* need attr_sd for attr, its parent for kobj */
+   if (!sysfs_get_active_two(attr_sd))
+   return -ENODEV;
 
-   if (!attr->write)
-   return -EIO;
+   rc = -EIO;
+   if (attr->write)
+   rc = attr->write(kobj, buffer, offset, count);
 
-   return attr->write(kobj, buffer, offset, count);
+   sysfs_put_active_two(attr_sd);
+
+   return rc;
 }
 
 static ssize_t write(struct file *file, const char __user *userbuf,
@@ -124,14 +141,24 @@ static int mmap(struct file *file, struct vm_area_struct 
*vma)
struct bin_buffer *bb = file->private_data;
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_elem.bin_attr.bin_attr;
-   struct kobject *kobj =

[PATCH 13/14] sysfs: kill attribute file orphaning

2007-04-08 Thread Tejun Heo
Now that sysfs_dirent can be disconnected from kobject on deletion,
there is no need to orphan each attribute files.  All [bin_]attribute
nodes are automatically orphaned when the parent node is deleted.
Kill attribute file orphaning.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/file.c  |   65 ++---
 fs/sysfs/inode.c |   25 
 fs/sysfs/mount.c |8 --
 fs/sysfs/sysfs.h |   16 -
 4 files changed, 13 insertions(+), 101 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 6dd11ca..37b5ee5 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -51,29 +51,15 @@ static struct sysfs_ops subsys_sysfs_ops = {
.store  = subsys_attr_store,
 };
 
-/**
- * add_to_collection - add buffer to a collection
- * @buffer:buffer to be added
- * @node:  inode of set to add to
- */
-
-static inline void
-add_to_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-   struct sysfs_buffer_collection *set = node->i_private;
-
-   mutex_lock(&node->i_mutex);
-   list_add(&buffer->associates, &set->associates);
-   mutex_unlock(&node->i_mutex);
-}
-
-static inline void
-remove_from_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-   mutex_lock(&node->i_mutex);
-   list_del(&buffer->associates);
-   mutex_unlock(&node->i_mutex);
-}
+struct sysfs_buffer {
+   size_t  count;
+   loff_t  pos;
+   char* page;
+   struct sysfs_ops* ops;
+   struct semaphoresem;
+   int needs_read_fill;
+   int event;
+};
 
 /**
  * fill_read_buffer - allocate and fill buffer from object.
@@ -175,10 +161,7 @@ sysfs_read_file(struct file *file, char __user *buf, 
size_t count, loff_t *ppos)
 
down(&buffer->sem);
if (buffer->needs_read_fill) {
-   if (buffer->orphaned)
-   retval = -ENODEV;
-   else
-   retval = fill_read_buffer(file->f_path.dentry,buffer);
+   retval = fill_read_buffer(file->f_path.dentry,buffer);
if (retval)
goto out;
}
@@ -276,16 +259,11 @@ sysfs_write_file(struct file *file, const char __user 
*buf, size_t count, loff_t
ssize_t len;
 
down(&buffer->sem);
-   if (buffer->orphaned) {
-   len = -ENODEV;
-   goto out;
-   }
len = fill_write_buffer(buffer, buf, count);
if (len > 0)
len = flush_write_buffer(file->f_path.dentry, buffer, len);
if (len > 0)
*ppos += len;
-out:
up(&buffer->sem);
return len;
 }
@@ -295,7 +273,6 @@ static int sysfs_open_file(struct inode *inode, struct file 
*file)
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct attribute *attr = attr_sd->s_elem.attr.attr;
struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj;
-   struct sysfs_buffer_collection *set;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
int error;
@@ -319,26 +296,14 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
else
ops = &subsys_sysfs_ops;
 
+   error = -EACCES;
+
/* No sysfs operations, either from having no subsystem,
 * or the subsystem have no operations.
 */
-   error = -EACCES;
if (!ops)
goto err_mput;
 
-   /* make sure we have a collection to add our buffers to */
-   mutex_lock(&inode->i_mutex);
-   if (!(set = inode->i_private)) {
-   error = -ENOMEM;
-   if (!(set = inode->i_private = kmalloc(sizeof(struct 
sysfs_buffer_collection), GFP_KERNEL)))
-   goto err_mput;
-   else
-   INIT_LIST_HEAD(&set->associates);
-   }
-   mutex_unlock(&inode->i_mutex);
-
-   error = -EACCES;
-
/* File needs write support.
 * The inode's perms must say it's ok, 
 * and we must have a store method.
@@ -365,11 +330,9 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
if (!buffer)
goto err_mput;
 
-   INIT_LIST_HEAD(&buffer->associates);
init_MUTEX(&buffer->sem);
buffer->needs_read_fill = 1;
buffer->ops = ops;
-   add_to_collection(buffer, inode);
file->private_data = buffer;
 
/* open succeeded, put active references and pin attr_sd */
@@ -388,10 +351,8 @@ static int sysfs_release(struct inode * inode, struct file 
* filp)
 {
struct sysfs_dirent *attr_sd = filp->f_path.dentry->d_fsdata;
struct attribute *attr = attr_sd->s_elem.attr.attr;
-   struct sysfs_buffer * buffer = filp->private_data;
+   struct sysfs_buffer *buffer = filp->private_data;
 
-

[PATCH 07/14] sysfs: add sysfs_dirent->s_name

2007-04-08 Thread Tejun Heo
Add s_name to sysfs_dirent.  This is to further reduce dependency to
the associated dentry.  Name is copied for directories and symlinks
but not for attributes.

Where possible, name dereferences are converted to use sd->s_name.
sysfs_symlink->link_name and sysfs_get_name() are unused now and
removed.

This change allows symlink to be implemented using sysfs_dirent tree
proper, which is the last remaining dentry-dependent sysfs walk.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c|   33 +
 fs/sysfs/file.c   |2 +-
 fs/sysfs/inode.c  |   33 +
 fs/sysfs/symlink.c|8 +---
 fs/sysfs/sysfs.h  |7 +++
 include/linux/sysfs.h |1 +
 6 files changed, 28 insertions(+), 56 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 8c35a60..525c0e7 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -23,10 +23,11 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 
if (sd->s_type & SYSFS_KOBJ_LINK) {
struct sysfs_symlink * sl = sd->s_element;
-   kfree(sl->link_name);
kobject_put(sl->target_kobj);
kfree(sl);
}
+   if (sd->s_type & SYSFS_COPY_NAME)
+   kfree(sd->s_name);
kfree(sd->s_iattr);
kmem_cache_free(sysfs_dir_cachep, sd);
 
@@ -51,19 +52,30 @@ static struct dentry_operations sysfs_dentry_ops = {
.d_iput = sysfs_d_iput,
 };
 
-struct sysfs_dirent *sysfs_new_dirent(void *element, umode_t mode, int type)
+struct sysfs_dirent *sysfs_new_dirent(const char *name, void *element,
+ umode_t mode, int type)
 {
+   char *dup_name = NULL;
struct sysfs_dirent * sd;
 
+   if (type & SYSFS_COPY_NAME) {
+   name = dup_name = kstrdup(name, GFP_KERNEL);
+   if (!name)
+   return NULL;
+   }
+
sd = kmem_cache_zalloc(sysfs_dir_cachep, GFP_KERNEL);
-   if (!sd)
+   if (!sd) {
+   kfree(dup_name);
return NULL;
+   }
 
atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_event, 1);
INIT_LIST_HEAD(&sd->s_children);
INIT_LIST_HEAD(&sd->s_sibling);
 
+   sd->s_name = name;
sd->s_element = element;
sd->s_mode = mode;
sd->s_type = type;
@@ -100,8 +112,7 @@ int sysfs_dirent_exist(struct sysfs_dirent *parent_sd,
 
list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
if (sd->s_element) {
-   const unsigned char *existing = sysfs_get_name(sd);
-   if (strcmp(existing, new))
+   if (strcmp(sd->s_name, new))
continue;
else
return -EEXIST;
@@ -155,7 +166,7 @@ static int create_dir(struct kobject *kobj, struct dentry 
*parent,
goto out_dput;
 
error = -ENOMEM;
-   sd = sysfs_new_dirent(kobj, mode, SYSFS_DIR);
+   sd = sysfs_new_dirent(name, kobj, mode, SYSFS_DIR);
if (!sd)
goto out_drop;
sysfs_attach_dirent(sd, parent->d_fsdata, dentry);
@@ -280,9 +291,7 @@ static struct dentry * sysfs_lookup(struct inode *dir, 
struct dentry *dentry,
 
list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
if (sd->s_type & SYSFS_NOT_PINNED) {
-   const unsigned char * name = sysfs_get_name(sd);
-
-   if (strcmp(name, dentry->d_name.name))
+   if (strcmp(sd->s_name, dentry->d_name.name))
continue;
 
if (sd->s_type & SYSFS_KOBJ_LINK)
@@ -472,7 +481,7 @@ static int sysfs_dir_open(struct inode *inode, struct file 
*file)
struct sysfs_dirent * sd;
 
mutex_lock(&dentry->d_inode->i_mutex);
-   sd = sysfs_new_dirent(NULL, 0, 0);
+   sd = sysfs_new_dirent("_DIR_", NULL, 0, 0);
if (sd)
sysfs_attach_dirent(sd, parent_sd, NULL);
mutex_unlock(&dentry->d_inode->i_mutex);
@@ -539,7 +548,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, 
filldir_t filldir)
if (!next->s_element)
continue;
 
-   name = sysfs_get_name(next);
+   name = next->s_name;
len = strlen(name);
ino = (unsigned long)next;
 
@@ -651,7 +660,7 @@ struct dentry *sysfs_create_shadow_dir(struct kobject *kobj)
if (!shadow)
goto nomem;
 
-   sd = sysfs_new_dirent(kobj, inode->i_mode, SYSFS_DIR);
+   sd = sysfs_new_dirent("_SHADOW_", kobj, inode->i_mode, SYSFS_DIR);
if (!sd)
goto nomem;
/* point to parent_sd but don't attach to it */
diff --git a/fs/sysfs/file.

[PATCH 10/14] sysfs: reimplement symlink using sysfs_dirent tree

2007-04-08 Thread Tejun Heo
sysfs symlink is implemented by referencing dentry and kobject from
sysfs_dirent - symlink entry references kobject, dentry is used to
walk the tree.  This complicates object lifetimes rules and is
dangerous - for example, there is no way to tell to which module the
target of a symlink belongs and referencing that kobject can make it
linger after the module is gone.

This patch reimplements symlink using only sysfs_dirent tree.  sd for
a symlink points and holds reference to the target sysfs_dirent and
all walking is done using sysfs_dirent tree.  Simpler and safer.

Please read the following message for more info.

  http://article.gmane.org/gmane.linux.kernel/510293

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c |2 +-
 fs/sysfs/symlink.c |   88 +++
 fs/sysfs/sysfs.h   |9 +++--
 3 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 707eba9..5b337c7 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -23,7 +23,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
parent_sd = sd->s_parent;
 
if (sd->s_type & SYSFS_KOBJ_LINK)
-   kobject_put(sd->s_elem.symlink.target_kobj);
+   sysfs_put(sd->s_elem.symlink.target_sd);
if (sd->s_type & SYSFS_COPY_NAME)
kfree(sd->s_name);
kfree(sd->s_iattr);
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 27df635..ff605d3 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -11,50 +11,49 @@
 
 #include "sysfs.h"
 
-static int object_depth(struct kobject * kobj)
+static int object_depth(struct sysfs_dirent *sd)
 {
-   struct kobject * p = kobj;
int depth = 0;
-   do { depth++; } while ((p = p->parent));
+
+   for (; sd->s_parent; sd = sd->s_parent)
+   depth++;
+
return depth;
 }
 
-static int object_path_length(struct kobject * kobj)
+static int object_path_length(struct sysfs_dirent * sd)
 {
-   struct kobject * p = kobj;
int length = 1;
-   do {
-   length += strlen(kobject_name(p)) + 1;
-   p = p->parent;
-   } while (p);
+
+   for (; sd->s_parent; sd = sd->s_parent)
+   length += strlen(sd->s_name) + 1;
+
return length;
 }
 
-static void fill_object_path(struct kobject * kobj, char * buffer, int length)
+static void fill_object_path(struct sysfs_dirent *sd, char *buffer, int length)
 {
-   struct kobject * p;
-
--length;
-   for (p = kobj; p; p = p->parent) {
-   int cur = strlen(kobject_name(p));
+   for (; sd->s_parent; sd = sd->s_parent) {
+   int cur = strlen(sd->s_name);
 
/* back up enough to print this bus id with '/' */
length -= cur;
-   strncpy(buffer + length,kobject_name(p),cur);
+   strncpy(buffer + length, sd->s_name, cur);
*(buffer + --length) = '/';
}
 }
 
-static int sysfs_add_link(struct dentry * parent, const char * name, struct 
kobject * target)
+static int sysfs_add_link(struct sysfs_dirent * parent_sd, const char * name,
+ struct sysfs_dirent * target_sd)
 {
-   struct sysfs_dirent * parent_sd = parent->d_fsdata;
struct sysfs_dirent * sd;
 
sd = sysfs_new_dirent(name, S_IFLNK|S_IRWXUGO, SYSFS_KOBJ_LINK);
if (!sd)
return -ENOMEM;
 
-   sd->s_elem.symlink.target_kobj = kobject_get(target);
+   sd->s_elem.symlink.target_sd = target_sd;
sysfs_attach_dirent(sd, parent_sd, NULL);
return 0;
 }
@@ -68,6 +67,8 @@ static int sysfs_add_link(struct dentry * parent, const char 
* name, struct kobj
 int sysfs_create_link(struct kobject * kobj, struct kobject * target, const 
char * name)
 {
struct dentry *dentry = NULL;
+   struct sysfs_dirent *parent_sd = NULL;
+   struct sysfs_dirent *target_sd = NULL;
int error = -EEXIST;
 
BUG_ON(!name);
@@ -80,11 +81,27 @@ int sysfs_create_link(struct kobject * kobj, struct kobject 
* target, const char
 
if (!dentry)
return -EFAULT;
+   parent_sd = dentry->d_fsdata;
+
+   /* target->dentry can go away beneath us but is protected with
+* kobj_sysfs_assoc_lock.  Fetch target_sd from it.
+*/
+   spin_lock(&kobj_sysfs_assoc_lock);
+   if (target->dentry)
+   target_sd = sysfs_get(target->dentry->d_fsdata);
+   spin_unlock(&kobj_sysfs_assoc_lock);
+
+   if (!target_sd)
+   return -ENOENT;
 
mutex_lock(&dentry->d_inode->i_mutex);
if (!sysfs_dirent_exist(dentry->d_fsdata, name))
-   error = sysfs_add_link(dentry, name, target);
+   error = sysfs_add_link(parent_sd, name, target_sd);
mutex_unlock(&dentry->d_inode->i_mutex);
+
+   if (error)
+   sysfs_put(target_sd);
+
return error;
 }
 
@@ -100,14 +117,14 @@ v

[PATCH 03/14] sysfs: move release_sysfs_dirent() to dir.c

2007-04-08 Thread Tejun Heo
There is no reason this function should be inlined and soon to follow
sysfs object reference simplification will make it heavier.  Move it
to dir.c.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c   |   12 
 fs/sysfs/sysfs.h |   13 +
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5112f88..2d630bf 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -14,6 +14,18 @@
 
 DECLARE_RWSEM(sysfs_rename_sem);
 
+void release_sysfs_dirent(struct sysfs_dirent * sd)
+{
+   if (sd->s_type & SYSFS_KOBJ_LINK) {
+   struct sysfs_symlink * sl = sd->s_element;
+   kfree(sl->link_name);
+   kobject_put(sl->target_kobj);
+   kfree(sl);
+   }
+   kfree(sd->s_iattr);
+   kmem_cache_free(sysfs_dir_cachep, sd);
+}
+
 static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
 {
struct sysfs_dirent * sd = dentry->d_fsdata;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a77c57e..3b8aae0 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -17,6 +17,7 @@ extern void sysfs_delete_inode(struct inode *inode);
 extern struct inode * sysfs_new_inode(mode_t mode, struct sysfs_dirent *);
 extern int sysfs_create(struct dentry *, int mode, int (*init)(struct inode 
*));
 
+extern void release_sysfs_dirent(struct sysfs_dirent * sd);
 extern int sysfs_dirent_exist(struct sysfs_dirent *, const unsigned char *);
 extern int sysfs_make_dirent(struct sysfs_dirent *, struct dentry *, void *,
umode_t, int);
@@ -97,18 +98,6 @@ static inline struct kobject *sysfs_get_kobject(struct 
dentry *dentry)
return kobj;
 }
 
-static inline void release_sysfs_dirent(struct sysfs_dirent * sd)
-{
-   if (sd->s_type & SYSFS_KOBJ_LINK) {
-   struct sysfs_symlink * sl = sd->s_element;
-   kfree(sl->link_name);
-   kobject_put(sl->target_kobj);
-   kfree(sl);
-   }
-   kfree(sd->s_iattr);
-   kmem_cache_free(sysfs_dir_cachep, sd);
-}
-
 static inline struct sysfs_dirent * sysfs_get(struct sysfs_dirent * sd)
 {
if (sd) {
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET #master] sysfs: make sysfs disconnect immediately on deletion, take 2

2007-04-08 Thread Tejun Heo
Hello, all.

This is the second take of sysfs-immediate-disconnct patchset.

In the last take, rwsem was added to s_elem.dir to protect kobj only.
This wasn't enough because attr and bin_attr need to hold onto not
only the kobject of their parents but also the module backing
themselves and ops too, so the first set still needed separate and
duplicate attribute file orphaning mechanism.

In this take, the rwsem is generalized to become active reference
count.  Now each sysfs_dirent has two reference counts - s_count and
s_active.  s_count is a regular reference count which guarantees that
the containing sysfs_dirent is accessible.  As long as s_count
reference is held, all sysfs internal fields in sysfs_dirent are
accessible including s_parent and s_name.

The newly added s_active is active reference count.  This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other).  Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference.  This includes access
to the associated kobjects, attributes and ops.

Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references.  Parent's is acquired first and released last.

Basically, s_count provides the reference counted objects to the upper
layer while s_active guards low level access such that low level
objects can just go away when they want to, and the same mechanism is
applied to all types of sysfs nodes.  I think it's conceptually
cleaner and thus easier to understand this way.

With all the patches applied, the same test used in the last take ran
9+hrs without any problem.

Change from the last take are...

* Patch 3 now doesn't move sysfs_get_kobject() as the it's replaced by
  active references later.

* Patch 12 updated such that sdir->rwsem is generalized into active
  reference count.

Please read the original lifetime rules discussion[1] and description
of the last take[2] for more info.

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/510293
[2] http://thread.gmane.org/gmane.linux.kernel/513334


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/14] sysfs: flatten cleanup paths in sysfs_add_link() and create_dir()

2007-04-08 Thread Tejun Heo
Flatten cleanup paths in sysfs_add_link() and create_dir() to improve
readability and ease further changes to these functions.  This is in
preparation of object reference simplification.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c |   73 ++-
 fs/sysfs/symlink.c |   27 ++
 2 files changed, 58 insertions(+), 42 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 2d630bf..0005117 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -159,40 +159,53 @@ static int init_symlink(struct inode * inode)
return 0;
 }
 
-static int create_dir(struct kobject * k, struct dentry * p,
- const char * n, struct dentry ** d)
+static int create_dir(struct kobject *kobj, struct dentry *parent,
+ const char *name, struct dentry **p_dentry)
 {
int error;
umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO;
+   struct dentry *dentry;
+   struct sysfs_dirent *sd;
 
-   mutex_lock(&p->d_inode->i_mutex);
-   *d = lookup_one_len(n, p, strlen(n));
-   if (!IS_ERR(*d)) {
-   if (sysfs_dirent_exist(p->d_fsdata, n))
-   error = -EEXIST;
-   else
-   error = sysfs_make_dirent(p->d_fsdata, *d, k, mode,
-   SYSFS_DIR);
-   if (!error) {
-   error = sysfs_create(*d, mode, init_dir);
-   if (!error) {
-   inc_nlink(p->d_inode);
-   (*d)->d_op = &sysfs_dentry_ops;
-   d_rehash(*d);
-   }
-   }
-   if (error && (error != -EEXIST)) {
-   struct sysfs_dirent *sd = (*d)->d_fsdata;
-   if (sd) {
-   list_del_init(&sd->s_sibling);
-   sysfs_put(sd);
-   }
-   d_drop(*d);
-   }
-   dput(*d);
-   } else
-   error = PTR_ERR(*d);
-   mutex_unlock(&p->d_inode->i_mutex);
+   mutex_lock(&parent->d_inode->i_mutex);
+
+   dentry = lookup_one_len(name, parent, strlen(name));
+   if (IS_ERR(dentry)) {
+   error = PTR_ERR(dentry);
+   goto out_unlock;
+   }
+
+   error = -EEXIST;
+   if (sysfs_dirent_exist(parent->d_fsdata, name))
+   goto out_dput;
+
+   error = sysfs_make_dirent(parent->d_fsdata, dentry, kobj, mode,
+ SYSFS_DIR);
+   if (error)
+   goto out_drop;
+
+   error = sysfs_create(dentry, mode, init_dir);
+   if (error)
+   goto out_sput;
+
+   inc_nlink(parent->d_inode);
+   dentry->d_op = &sysfs_dentry_ops;
+   d_rehash(dentry);
+
+   *p_dentry = dentry;
+   error = 0;
+   goto out_dput;
+
+ out_sput:
+   sd = dentry->d_fsdata;
+   list_del_init(&sd->s_sibling);
+   sysfs_put(sd);
+ out_drop:
+   d_drop(dentry);
+ out_dput:
+   dput(dentry);
+ out_unlock:
+   mutex_unlock(&parent->d_inode->i_mutex);
return error;
 }
 
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 7b9c5bf..b463f17 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -49,30 +49,33 @@ static int sysfs_add_link(struct dentry * parent, const 
char * name, struct kobj
 {
struct sysfs_dirent * parent_sd = parent->d_fsdata;
struct sysfs_symlink * sl;
-   int error = 0;
+   int error;
 
error = -ENOMEM;
-   sl = kmalloc(sizeof(*sl), GFP_KERNEL);
+   sl = kzalloc(sizeof(*sl), GFP_KERNEL);
if (!sl)
-   goto exit1;
+   goto err_out;
 
sl->link_name = kmalloc(strlen(name) + 1, GFP_KERNEL);
if (!sl->link_name)
-   goto exit2;
+   goto err_out;
 
strcpy(sl->link_name, name);
sl->target_kobj = kobject_get(target);
 
error = sysfs_make_dirent(parent_sd, NULL, sl, S_IFLNK|S_IRWXUGO,
SYSFS_KOBJ_LINK);
-   if (!error)
-   return 0;
-
-   kobject_put(target);
-   kfree(sl->link_name);
-exit2:
-   kfree(sl);
-exit1:
+   if (error)
+   goto err_out;
+
+   return 0;
+
+ err_out:
+   if (sl) {
+   kobject_put(sl->target_kobj);
+   kfree(sl->link_name);
+   kfree(sl);
+   }
return error;
 }
 
-- 
1.5.0.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 13:56 -0400, Gene Heskett wrote:
> On Sunday 08 April 2007, Mike Galbraith wrote:
> >On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote:
> >> That seems to be the killer loading here, building a kernel (make -j3)
> >> doesn't seem to lag it all that bad.  One session of gzip -best makes
> >> it fall plumb over though, which was a disappointment.
> >
> >Can you make a testcase that doesn't require amanda?
> >
> > -Mike
> 
> Sure.  Try 'tar czf nameofarchive.tar.gz /path/to-dir-to-be-backed-up'
> 
> Or, from the runtar log from this morning, and this is all one line:
> 
> runtar.20070408022016.debug:running: /bin/tar: 'gtar' '--create' '--file' '-' 
> '--directory' '/usr/dlds-rpms' '--one-file-system' '--listed-incremental' 
> '/usr/local/var/amanda/gnutar-lists/coyote_usr_dlds-rpms_1.new' '--sparse' 
> '--ignore-failed-read' '--totals' '--exclude-from' 
> '/tmp/amanda/sendbackup._usr_dlds-rpms.20070408022016.exclude' '.'
> 
> and amanda will if requested, pipe that output through a |gzip -best, and 
> its this process that brings the machine to the table begging for scraps 
> like a puppy.  Tar by itself can be felt but isn't bad.

So tar -cvf - / | gzip --best | tar -tvzf - should reproduce the
problem?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


REISER4 FOR INCLUSION IN THE LINUX KERNEL.

2007-04-08 Thread johnrobertbanks
REISER4 FOR INCLUSION IN THE LINUX KERNEL.

Dave Lynch takes a reasoned approach to REISER4.

Dave Lynch wrote:
> 
> Jeff Garzik wrote:
> > 
> > If the compelling reason is that it needs a test, I'd say its not ready.
> > 
> 
> Can you please elaborate ? I am not sure I understand what you are
> arguing ?

Jeff Garzik is "saying" that he wants REISER4 to stay out of the main
kernel, for reasons he is not willing to tell you.

> Despite his substantially less than polite rhetoric, I have read
> Hans's post from months if not years ago.
> 
> Aside from the pissing contests - which where not entirely one
> sided, 

On the basis of what I have seen here,... Hans Reiser was probably an
angel.

> I actually beleive that Hans made a reasonable case that 
> Reiser4 had gone about as far as it could reasonably go with regard 
> to testing, robustness, ... without the broader base of use that 
> even an experimental filesystem in distribution tree would get.

Of course, this is an entirely reasonable request of Reiser's.
One meet with an array of unreasonable actions, but mainly STALLING 
which has led to REISER4 never becoming part of the main kernel.

It has also lead to many false claims about REISER4. Claims that are
never backed up with solid statistics, but used to keep REISER4 out of
the kernel and tar its reputation.

> I for one would at least play with it if it were in the distribution
> tree.

I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY.
 
> As far as I could tell Hans pretty much everything else that 
> was demanded. Hans eventually caved and provided - albeit with much 
> pissing and moaning, and holy than thou rhetoric.

It was not his pissing and moaning, etc,... these were just excuses to
keep REISER4 from succeeding. The truth is, that any excuse would do.

The real reasons are financial and backed by big money (sometimes, big
egos).

> The argument that anything that needs testing can't get into the
> distribution tree's is specious. There is alot of poorly tested crap in
> the distribution trees.

Yes, the argument that anything that needs testing can't get in is
indeed stupid.

But stupid things often work. Is REISER4 in the kernel? Is REISER4 a
success?
 
> But separately, there is the issue of scale. Namesys claims that
> they have no currently know bugs, faults ... - with their base of
> internal and external users.
> 
> I would fully expect new failures to crop up with any filesystem,
> driver, ... moving  up an order of magnitude in users.
>  
> Are you going to subject all filesystems and drivers to the same
> high standards you are placing on Reiser4 ? If so then we need to strip
> the distribution tree now.

No. Only those things that threaten big money.
   
> I am not looking to defend Hans - he is likely to be in jail and no
> longer a factor for a long time. Nor am I looking to make or support
> claims for Reiser4.

Why not defend Hans? He is in jail on what appear to be trumped-up
charges, just like the trumped-up complainants about his filesystem.

> But I am asking - why we can not get past the bad blood, rhetoric,
> and zealotry -which to my eyes has not been all one sided.

Money talks, BS walks. Reiser4 is a little guy. You should play in my
league.

> I am NOT looking for a technical explanation of all the relative
> merits and demerits of Reiser4.
> 
> I do not care for arguments about whether it compresses 0's well, or
> that tail combining is a bad thing. They may have merit, but there is
> not a filesystem that is going to be all things to all people.

Yeap.
 
> Whether Reiser4 is a small niche filesystem or a significant general 
> use one, is a decision that should be reached by its performance 
> in practice, not it rhetoric. Regardless, even as a niche filesystem, 
> I beleive at this point it merits inclusion.

Yeap. REISER4 merits inclusion.

John.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - The professional email service

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Kprobes: The ON/OFF knob thru debugfs

2007-04-08 Thread Ananth N Mavinakayanahalli
On Sun, Apr 08, 2007 at 11:22:31AM +0100, Christoph Hellwig wrote:
> On Wed, Apr 04, 2007 at 05:43:49PM +0530, Ananth N Mavinakayanahalli wrote:
> > This patch provides a debugfs knob to turn kprobes on/off
> > 
> > o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
> >   not (default enabled)
> > o Echoing 0 to this file will disarm all installed probes
> > o Any new probe registration when disabled will register the probe but
> >   not arm it. A message will be printed out in such a case.
> > o When a value 1 is echoed to the file, all probes (including ones
> >   registered in the intervening period) will be enabled
> > o Unregistration will happen irrespective of whether probes are globally
> >   enabled or not.
> > o Update Documentation/kprobes.txt to reflect these changes. While there
> >   also update the doc to make it current.
> 
> Looks good.
> 
> When I suggested a user interface to enable/disable probes was nice to
> have I was more thinking about a interface to enable/disable individual
> probes.  Any chance you could try to implement that aswell as see if
> any code can be shared with this feature?

Thats on the TODO list - any preferences on what the debugfs control
should look like? One file per kprobe seems simplest, but it'd be
unwieldly if there are hundreds of active probes.

> > -   arch_arm_kprobe(p);
> > +   arch_arm_kprobe(p);
> > +   } else
> > +   printk("Kprobes are globally disabled. This kprobe [@ %p] "
> > +   "will be enabled with all other probes\n", p->addr);
> 
> This printk seems far too verbose.  Just remove it and make sure
> the debugfs interface has an indicator of whether probes are en- or
> disabled.

Agreed... and "enabled" file is the indicator.

Andrew, please include this incremental patch against 2.6.21-rc6-mm1
that removes the verbose printk.

o Remove verbose printk during registration with kprobes globally
  disabled
o Print out a message when kprobes are enabled/disabled globally

Signed-off-by: Ananth N Mavinakyanahalli <[EMAIL PROTECTED]>

---
 kernel/kprobes.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

Index: linux-2.6.21-rc6/kernel/kprobes.c
===
--- linux-2.6.21-rc6.orig/kernel/kprobes.c
+++ linux-2.6.21-rc6/kernel/kprobes.c
@@ -574,10 +574,7 @@ static int __kprobes __register_kprobe(s
register_page_fault_notifier(&kprobe_page_fault_nb);
 
arch_arm_kprobe(p);
-   } else
-   printk("Kprobes are globally disabled. This kprobe [@ %p] "
-   "will be enabled with all other probes\n", p->addr);
-
+   }
 out:
mutex_unlock(&kprobe_mutex);
 
@@ -928,6 +925,7 @@ static void __kprobes enable_all_kprobes
}
 
kprobe_enabled = true;
+   printk("Kprobes globally enabled\n");
 
 already_enabled:
mutex_unlock(&kprobe_mutex);
@@ -948,6 +946,7 @@ static void __kprobes disable_all_kprobe
goto already_disabled;
 
kprobe_enabled = false;
+   printk("Kprobes globally disabled\n");
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Gene Heskett
On Monday 09 April 2007, Mike Galbraith wrote:
>On Sun, 2007-04-08 at 13:04 -0400, Gene Heskett wrote:
>> On Sunday 08 April 2007, Ingo Molnar wrote:
>> >and note that a year ago Mike did a larger patch too, not unlike his
>> >current patch - but we hoped that his smaller change would be
>> > sufficient - and nobody came along and said "i tested Mike's and the
>> > difference is significant on my system".
>>
>> May I suggest that while it may have been noticeable, it was
>> not 'significant', so we didn't sing praises and bow to mecca at the
>> time.
>
>Actually, there was practically nil interest in testing.  We made a
>couple of minor adjustments to the interactivity logic, and all went
>quiet, so I didn't think it was enough of a problem to require more
>intrusive countermeasures.
>
>   -Mike

Does one of these messages have a url so I can test the latest of your 
patches for -rc6?  Or was the one Ingo sent the most recent?

Putting that url in your sig would be nice, and might result in its 
getting a lot more exersize which should = more feedback.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Got a complaint about the Internal Revenue Service?  
Call the convenient toll-free "IRS Taxpayer Complaint Hot Line Number":

1-800-AUDITME
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Mike Galbraith
On Sun, 2007-04-08 at 13:04 -0400, Gene Heskett wrote:
> On Sunday 08 April 2007, Ingo Molnar wrote:

> >and note that a year ago Mike did a larger patch too, not unlike his
> >current patch - but we hoped that his smaller change would be sufficient
> >- and nobody came along and said "i tested Mike's and the difference is
> >significant on my system".
> 
> May I suggest that while it may have been noticeable, it was 
> not 'significant', so we didn't sing praises and bow to mecca at the 
> time.

Actually, there was practically nil interest in testing.  We made a
couple of minor adjustments to the interactivity logic, and all went
quiet, so I didn't think it was enough of a problem to require more
intrusive countermeasures.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add a norecovery option to ext3/4?

2007-04-08 Thread Eric Sandeen

Samuel Thibault wrote:

Hm, so the root cause there seems that the installer found 2 legs of a 
mirror and mounted them independently, recovering them independently... 
But why did that cause problems?


Because that thrashed his data (or at least it didn't help to keep data
safe).


Other options you may have in the installer, though, is to check for
md superblocks before mounting bare partitions, or maybe use the
BLKROSET ioctl to set the block device to read-only prior to mount,
for added insurance...


That's one the things proposed in the bugreport yes.


The reason I suggest other options is because intentionally mounting a 
corrupted FS may not really be the way you want to go... norecovery on 
xfs at least is an option of last resort, not something to use by default.


-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add a norecovery option to ext3/4?

2007-04-08 Thread Samuel Thibault
Eric Sandeen, le Sun 08 Apr 2007 22:24:50 -0500, a écrit :
> Samuel Thibault wrote:
> >Distribution installers usually try to probe OSes for building a suited
> >grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
> >mode, does perform some operations on the filesystem (log recovery).
> >This is not a good idea since it may silently garbage data.  
> 
> Can you elaborate?  Under what circumstances is log replay going to harm 
> data?  Do you mean that the installer mounts partitions, looking for 
> what OS is installed?  How is that harmful?
> 
> Ohhh... this is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417407 
> isn't it?

Yes.

> Hm, so the root cause there seems that the installer found 2 legs of a 
> mirror and mounted them independently, recovering them independently... 
> But why did that cause problems?

Because that thrashed his data (or at least it didn't help to keep data
safe).

> Other options you may have in the installer, though, is to check for
> md superblocks before mounting bare partitions, or maybe use the
> BLKROSET ioctl to set the block device to read-only prior to mount,
> for added insurance...

That's one the things proposed in the bugreport yes.

Samuel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add a norecovery option to ext3/4?

2007-04-08 Thread Eric Sandeen

Samuel Thibault wrote:

Hi,

Distribution installers usually try to probe OSes for building a suited
grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
mode, does perform some operations on the filesystem (log recovery).
This is not a good idea since it may silently garbage data.  


Can you elaborate?  Under what circumstances is log replay going to harm 
data?  Do you mean that the installer mounts partitions, looking for 
what OS is installed?  How is that harmful?


Ohhh... this is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417407 
isn't it?


Hm, so the root cause there seems that the installer found 2 legs of a 
mirror and mounted them independently, recovering them independently... 
But why did that cause problems?



XFS has a
norecovery option that allows to disable that, I'd say ext3/4 should
have it too.


The xfs mount option is useful on a purely read-only device, or if the 
log is corrupted to the point where it can't be replayed... It was put 
in place 9+ years ago.  :)  I'd have to ask the sgi guys to dig & see 
what the original use for...


It'd be easy enough to add to ext3/4, I suppose.  Other options you may 
have in the installer, though, is to check for md superblocks before 
mounting bare partitions, or maybe use the BLKROSET ioctl to set the 
block device to read-only prior to mount, for added insurance...


-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend

2007-04-08 Thread Gautham R Shenoy
On Mon, Apr 02, 2007 at 10:51:27PM +0200, Pavel Machek wrote:
> > > 
> > > Should we create CONFIG_FREEZER?
> > 
> > Why do you think so?  I think the freezer should be compiled automatically
> > if any of the above is set, which is what this directive really means.
> 
> Kconfig can do that. ("select statement"). If we have one such ifdef,
> it is okay, but if it would be more of them.
> 

Ok.

> > > Eh? Why does kprobes code depend on config_pm?
> > 
> > Because it uses the freezer? ;-)
> 
> That is no longer true after this patch... Ugly ifdef above makes sure
> freezer is there for kprobes. I'm trying to say that #if above is
> now broken. Actually it was probably always broken, but it just became
> more so.

I have already removed it from in my version 3.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CONFIG_PACKET_MMAP should depend on MMU

2007-04-08 Thread Aubrey Li

The option CONFIG_PACKET_MMAP should depend on MMU.

Signed-off-by: Aubrey.Li <[EMAIL PROTECTED]>
---
net/packet/Kconfig |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/packet/Kconfig b/net/packet/Kconfig
index 34ff93f..959c272 100644
--- a/net/packet/Kconfig
+++ b/net/packet/Kconfig
@@ -17,7 +17,7 @@ config PACKET

config PACKET_MMAP
bool "Packet socket: mmapped IO"
-   depends on PACKET
+   depends on PACKET && MMU
help
  If you say Y here, the Packet protocol driver will use an IO
  mechanism that results in faster communication.
--
1.5.1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined

2007-04-08 Thread Gautham R Shenoy
On Fri, Apr 06, 2007 at 11:25:00PM -0700, Andrew Morton wrote:
> On Fri, 6 Apr 2007 14:41:50 -0700 "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> 
> wrote:
> 
> > Subject: Avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined
> > 
> > Avoid checking for cpu gone in mm hot path when
> > CONFIG_HOTPLUG_CPU is not defined.
> > 
> > Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
> > 
> > ---
> >  arch/i386/kernel/smp.c |4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > Index: work/arch/i386/kernel/smp.c
> > ===
> > --- work.orig/arch/i386/kernel/smp.c
> > +++ work/arch/i386/kernel/smp.c
> > @@ -365,10 +365,12 @@ static void flush_tlb_others(cpumask_t c
> > BUG_ON(cpu_isset(smp_processor_id(), cpumask));
> > BUG_ON(!mm);
> >  
> > +#ifdef CONFIG_HOTPLUG_CPU
> > /* If a CPU which we ran on has gone down, OK. */
> > cpus_and(cpumask, cpumask, cpu_online_map);
> > -   if (cpus_empty(cpumask))
> > +   if (unlikely(cpus_empty(cpumask)))
> > return;
> > +#endif
> >  
> > /*
> >  * i'm not happy about this global shared spinlock in the
> 
> Fair enough.
> 
> The code you're touching in with the original CPU-hotplug-for-i386 patches.
> 
> x86_64 doesn't do it.  It handles tlb flushing differently anyway.  But I
> suspect that x86_64 is just buggy, unless all callers of flush_tlb_others()
> have taken care to disable preemption prior to their calculation of the
> passed-in cpumask.
> 
> Shudder.  Gautham, this is code which we can cheerfully delete when we get
> the freezer stuff done.  Fortunately, Anil's patch will make it nice and
> easy to find again.

Ok, I will make a note of this one.

If the IO-test results are good, I hope to post the patchset sometime
this week.


Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.21-rc5-git] make /proc/acpi/wakeup more useful

2007-04-08 Thread Zhang Rui
On Sat, 2007-04-07 at 13:08 -0700, David Brownell wrote:
> On Friday 06 April 2007 10:01 pm, Greg KH wrote:
> 
> > Are you _sure_ you have a 1-to-1 relationship here?  No multiple devices
> > pointing to the same acpi node?  Or the other way around?  If so, you
> > are going to have to change the name to be something more unique.
> 
> I've wondered that too.  The short answer:  APCI only supports 1-1
> here.
Right.
>   It will emit warnings if it tries to bind more than one ACPI
> device to a given "real" device ... but errors the other way are
> silently ignored.
> 
My understanding is different.
First, one "real" device can only have one device.archdata.acpi_handle,
which means it can only be bound to one ACPI device.
Second, AE_ALREADY_EXISTS will be returned when ACPI tries to bind more
than one "real" devices to the same ACPI device.
> By adding a warning over this create-links patch, I found that the
> system in the $SUBJECT patch (and likely every ACPI system) has
> two different nodes that correspond to one ACPI node:
> 
>   /sys/devices/pci:00 ... pci root node
>   /sys/devices/pnp0/00:00 ... id PNP0a03
>   /sys/devices/acpi_system:00/device:00/PNP0A03:00 ... ditto
> 
> Arguably that's too many sysfs nodes for one device...
> 
> Plus, there's the issue of flakey ACPI tables; in the $SUBJECT patch
> both MDM and AUD nodes exist in the ACPI namespace, but they could
> only refer to one PCI device (with MDM as the wakeup source, not AUD
> as listed in the table).  Or maybe that's another case where the ACPI
> code isn't handling the tables as sensibly as it might...
> 
Could you attach this acpidump please? :)

Thanks,
Rui
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-08 Thread William Lee Irwin III
On Fri, 06 Apr 2007 17:02:58 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:
>> You're too kind.  wli's comment on the first version of this patch was
>> something along the lines of "this patch causes a surprising amount of
>> damage for what little it achieves".

On Fri, Apr 06, 2007 at 05:28:44PM -0700, Andrew Morton wrote:
> Damn, I wish I'd said that.

ISTR it went:

On Fri, Feb 16, 2007 at 02:21:07PM -0800, William Lee Irwin III wrote:
> The amount of violence this patch manages to commit is phenomenal for
> what little it actually does. There are also a number of oddities

Cheers.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Scheduler: Improving the scheduler performance.

2007-04-08 Thread Rik van Riel

[EMAIL PROTECTED] wrote:

On Sat, 07 Apr 2007 23:42:20 +0600, root said:


As we know that, linux scheduler use separate runqueue for every CPU of
a multiprocessor system, which having an active and an expired array.If
we use only one expired array, then the CPUs of a multiprocessor system
will be able to share their expired task via the accumulated expired
array,


I got this far, and the first thought that popped into my head was:

"Wow.  This might actually win on a UP or small MP (2-15 CPU).  But the
lock contention on a big 512-CPU machoflops box is likely going to *suck*".

For that matter, my quick eyeballing of the code, although it doesn't *find*
any race conditions, doesn't convince me there's any protection taken to make
sure there aren't any.  Is there some subtle algorithmic trick I'm missing
to ensure Nothing Bad Can Happen?


Lock contention is going to be the least of your worries.

Destroying CPU affinity is the big one I suspect.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Roland McGrath
I concur with Eric's assessment.  Adding new magic bits to the generic
clone path seems like a poor way to cope with kernel threads.  I think
it's better if kernel thread setup gets less like normal user process
setup.  I also agree with Eric that PPID of 0 is a very natural way for
kernel threads to be displayed.  We need to know more about the nature
of the compatibility issue in procps to judge whether there is good
reason to avoid changing it.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] zap_other_threads: remove unneeded ->exit_signal change

2007-04-08 Thread Roland McGrath
I think that's correct.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] HID bus prototype - 20070408

2007-04-08 Thread Li Yu
Hi,
   
It seem the hid-pidff driver also should be sticky.

Good luck

- Li Yu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Theodore Tso
On Sun, Apr 08, 2007 at 12:28:32PM -0700, H. Peter Anvin wrote:
> Theodore Tso wrote:
> >It doesn't state explicitly that you can use the telldir cookie()
> >after closing the directory stream using closedir() and then reopening
> >it using opendir(), but given that it states that results are
> >undefined after a rewinddir() --- which is much less violent than a
> >closedir()/opendir(), I would definitely argue that an application
> >programmer would be very ill-advised to rely on this working.
> >
> >(Of course, I'd argue that an application programmer shouldn't use
> >telldir/seekdir at all.)
> >
> >Ulrich, is it too late to insert a clarification that the telldir()
> >cookie isn't guaranteed to be valid after closedir() *or* rewinddir()?
> 
> More fundamentally, the telldir cookie should never be valid when 
> applied to a different DIR * (even one that refers to the same directory.)

Well, Joern thought that rm -rf might relying on the telldir cookie
being valid in precisely that circumstance.  If that is true, I'd
argue that this is a BUG in GNU coreutils that should be fixed...

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: kernel oops with badly formatted module option

2007-04-08 Thread Rusty Russell
On Sat, 2007-04-07 at 19:47 -0700, Randy Dunlap wrote:
> On Sat, 07 Apr 2007 19:21:01 -0500 Larry Finger wrote:
> 
> > With the following line in /etc/modprobe.conf.local:
> > 
> > options bcm43xx fwpostfix = ".fw3" locale=8
> > 
> > the kernel oops below is generated. I realize that the line should have no 
> > whitespace around the 
> > "=", but I do not feel that an oops is the best way to report the syntax 
> > error. Could there be a more gentle failure?
> 
> 
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Catch malformed kernel parameter usage of "param = value".
> Spaces are not supported, but don't cause a kernel fault on
> such usage, just report an error.
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Thanks Randy,

I even read your patch before I wrote my own, for a change!

Acked-by: Rusty Russell <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-08 Thread Rusty Russell
On Fri, 2007-04-06 at 08:39 -0700, H. Peter Anvin wrote:
> I will, unless Rusty does, first.  No desire to step on each other.

Oh no, please, after you!

Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 04/08, Eric W. Biederman wrote:
>
>> If we are going to have kernel only flags please use an additional
>> argument to do_fork and copy_process.
>
> Yes, we can do this. But we have a number of architectures which use
> sys_clone() to implement kernel_thread(). It would be nice to have an
> architecture neutral kernel_thread() implementation as you proposed.
> We should change all of them if we want to add a new parameter to
> do_fork().
>
> Perhaps it is better to add reparent_kthread() (next patch) to kthread()
> and forget about CLONE_KERNEL_THREAD.

Please. 

> Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
> threads. And if ->parent == /sbin/init, we can't remove us from ->children
> (unless we forbid sub-thread-of-init exec). So the only safe change is
> set ->exit_state = -1.

Yes.  We certainly need ->exit_state = -1.
Earlier I had forgotten about second the use of ->children to update
the parent pointer of processes when their parent exits.

There is a practical question how much we care about pstree being
confused (I assume it doesn't crash).  If this is just a confusion
issue then I say go for it.  PPID == 0 is a very legitimate way to say
the kernel is the parent process.

There are a few more cases where we are likely to get PPID == 0 in the
future and /sbin/init already has that now.  Plus there is a lot of
historic precedent.  The odd part is PPID = 0 having multiple
children.

If we decide maintaining a tree is important I would much rather put
init_task on the task_list so we can see it in /proc then go the other
way around.

I would like a confirmation that it PPID == 0 is what is confusing
pstree just to make certain we haven't half filled in some field
in init_task and are thus giving in correct /proc output.  But that is
all the double checking I would do.

>> Your current scheme also has the bad side that if user space supplied
>> a kernel flag it is hard to detect it and return -EINVAL.  Which
>> limits future expansion.  Silently dropping clone flags is a real
>> pain, if you are trying to detect if a new flag has been implemented.
>
> Yes. But that is what we are doing now. copy_process() just ignores
> unknown flags.

Agreed.  I fixed that in sys_unshare but I should really submit a
patch to do the same for sys_clone at some point.

When know flags aren't implemented we certainly return -EINVAL.

Given that this line of work looks to fix the race that messes allows
a threaded init to generate unkillable zombies I can probably find
some time in the next while to work on it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pata_icside driver

2007-04-08 Thread Jeff Garzik

Alan Cox wrote:

The second FIXME area is ata_irq_ack - it is unconditionally coded
for SFF-type interfaces.  I believe that using this function in
non-BMDMA interfaces is wrong - it attempts to read from the BMDMA
registers irrespective of whether ap->ioaddr.bmdma_addr is set or
not.  The question this poses is: what should non-BMDMA implementations
use for this method?  Note that pata_platform also uses this
function despite not supporting BMDMA which seems even more suspicious.


Thats a bug that has arrived again. The older code was corrected to
handle this properly but the fix appears to have become lost. The
ioread/iowrite code actually made quite a mess (all the address reporting
is also broken) and we do some iffy things like compare the iomap result
with zero and assume thats the same as checking for true bus zero
addresses.

ata_irq_ack is part of the SFF layer so its fine that it assumes SFF but
its wrong that it is used unconditionally and it shouldn't be used this
way. It just needs a (!ap->ioaddr.bmdma_addr) test adding (assuming thats
valid for iomap)


No.  It does not need such a test, as it requires BMDMA, not just an 
SFF-style Status register.  It is up to the driver to decide whether or 
not ata_irq_ack() is appropriate for your hardware.


pata_icside needs its own ata_irq_ack -- which may just be as simple as 
reading the Status register to clear the interrupt condition.


If others need this as well, ata_sff_irq_ack() would be a good generic 
function to create.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc6

2007-04-08 Thread Jeff Garzik

Andrew Morton wrote:

netdev:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/forcedeth-work-around-null-skb-dereference-crash.patch


It sounded this was specific to Ingo.  I haven't heard anybody else 
complain, and AFAIK Ayaz and Ingo were still going back and forth.




ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/depcac-fix-handling-of-platorm_device_add-failure.patch


ACK this one.  Need to send this up, but I'm intentionally avoiding work 
as we are having a big Easter bash here in Raleigh.  Silly bunny-related 
traditions that have nothing to do with Jesus take priority ;-)


I have a couple other bug fixes to push, but that will wait until Tuesday.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] intel_agp: PCI id update for Intel 965GM

2007-04-08 Thread Wang Zhenyu

[AGPGART] intel: Add 965GM chipset support

Update PCI id info for Intel 965GM chipset.

Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]>

---
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index e542a62..a9fdbf9 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -18,11 +18,14 @@
 #define PCI_DEVICE_ID_INTEL_82965Q_IG   0x2992
 #define PCI_DEVICE_ID_INTEL_82965G_HB   0x29A0
 #define PCI_DEVICE_ID_INTEL_82965G_IG   0x29A2
+#define PCI_DEVICE_ID_INTEL_82965GM_HB  0x2A00
+#define PCI_DEVICE_ID_INTEL_82965GM_IG  0x2A02
 
 #define IS_I965 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82946GZ_HB || \
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_1_HB || 
\
  agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965Q_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB)
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965G_HB || \
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82965GM_HB)
 
 
 extern int agp_memory_reserved;
@@ -1921,7 +1924,13 @@ static int __devinit agp_intel_probe(struct pci_dev 
*pdev,
bridge->driver = &intel_845_driver;
name = "965G";
break;
-
+   case PCI_DEVICE_ID_INTEL_82965GM_HB:
+   if (find_i830(PCI_DEVICE_ID_INTEL_82965GM_IG))
+   bridge->driver = &intel_i965_driver;
+   else
+   bridge->driver = &intel_845_driver;
+   name = "965GM";
+   break;
case PCI_DEVICE_ID_INTEL_7505_0:
bridge->driver = &intel_7505_driver;
name = "E7505";
@@ -2080,6 +2089,7 @@ static struct pci_device_id agp_intel_pci_table[] = {
ID(PCI_DEVICE_ID_INTEL_82965G_1_HB),
ID(PCI_DEVICE_ID_INTEL_82965Q_HB),
ID(PCI_DEVICE_ID_INTEL_82965G_HB),
+   ID(PCI_DEVICE_ID_INTEL_82965GM_HB),
{ }
 };
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc6

2007-04-08 Thread Greg KH
On Sun, Apr 08, 2007 at 04:09:54PM -0700, Andrew Morton wrote:
> 
> I'm sitting on five patches which look like 2.6.21 material, but which
> would normally go through subsystem maintainers:
> 
> driver core:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/update-documentation-driver-model-platformtxt.patch

Feel free to forward it on with:
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

As it was just a documentation update, I figured it was safe to wait for
2.6.22, but I have no objection to it going in now.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SD scheduler testing hitch

2007-04-08 Thread Dmitry Adamushko

[...]
Well, it's a late hour, so maybe I'm missing something... but it does
look to be HZ and "will run" time interval related issue. Like
described in (*). Or maybe we both observe similar situations but have
different reasons behind them.


I meant that account_user_time() is also called from timer_ISR ->
update_process_times() like scheduler_tick(). So if task's running
intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
may be wrong for such a task...


--
Best regards,
Dmitry Adamushko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SD scheduler testing hitch

2007-04-08 Thread Dmitry Adamushko

On 08/04/07, Mike Galbraith <[EMAIL PROTECTED]> wrote:

On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:

> I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
> starving 10% duty cycle task has trouble getting 1% CPU.



Something is odd, very odd indeed. But surprise-surprise, it does not
seem to be something merely SD-releated.

In short, the question is - can we always believe statistics being
provided by "top" (i.e. the way it's being collected by the kernel)?

The tests are below. Somewhere in the middle are thoughts on how HZ
and an interval of cpu usage by a given task may be connected to such
a behaviour.

The system: Pentiium 3 Coppermine 750 MHz (iThinkPad T21), 256 RAM.

I tested 3 configurations:

(1)  2.6.13-15 (default in SuSE 10)
(2)  2.6.19
(3)  2.6.21-rc5 + sd-0.39

TEST: just a tenp.c, i.e. without Mike's "steal" (either as xx.c or as
a part of modified fairtest.c) thingy, but

tenp-- a tenp.c with a single running copy;
tenp2  -- a tenp.c with 2 (1 additionally forked) running copies
tenp15 - 15 copies (only for SD)


(1)  2.6.13-15

Tasks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.6% us,  0.7% sy,  0.0% ni, 90.4% id,  0.0% wa,  0.3% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
5582 dimm  15   0  1460  428  348 S  6.0  0.2   0:02.03 tenp
4047 messageb  17   0  3520 1584 1324 S  1.3  0.6   0:00.28 dbus-daemon


Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.9% us,  0.3% sy,  0.0% ni, 84.8% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
5598 dimm  15   0  1460  428  348 S  7.2  0.2   0:01.42 tenp2
5599 dimm  15   0  1460  432  352 S  6.9  0.2   0:00.87 tenp2
5591 dimm  16   0  2108  988  764 R  0.3  0.4   0:00.47 top
   1 root  16   0   688  260  224 S  0.0  0.1   0:01.78 init

I repeated 7 times each of the tests (tenp and tenp2). All are ok.


Now an interesting part starts.

(2)  2.6.19

[ 2.1 ]

ks:  73 total,   1 running,  72 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3% us,  0.7% sy,  0.0% ni, 98.0% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
8312 root  15   0 27168  14m 2128 S  0.7  5.6   0:08.29 X
8640 dimm  15   0 28656  13m   9m S  0.7  5.4   0:03.44 konsole
8813 dimm  15   0  1460  432  352 S  0.3  0.2   0:00.32 tenp
   1 root  15   0   696  268  228 S  0.0  0.1   0:01.12 init

[ 2.2 ]

ks:  73 total,   3 running,  70 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.6% us,  0.7% sy,  0.0% ni, 92.7% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
8816 dimm  15   0  1464  432  352 S  5.0  0.2   0:01.49 tenp
8312 root  15   0 27168  14m 2128 R  1.7  5.6   0:09.08 X

See a difference between [ 2.1 ] and [ 2.2 ] ?  [ 2.2 ] (which is ok)
has happened 3 out of 10 times.

Now for tenp2

[ 2.3 ]

ks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.6% us,  0.3% sy,  0.0% ni, 85.1% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
8850 dimm  15   0  1460  432  352 S  6.6  0.2   0:01.32 tenp2
8851 dimm  15   0  1460  112   32 S  6.3  0.0   0:00.77 tenp2
8312 root  15   0 27168  14m 2128 S  0.7  5.6   0:11.73 X

[ 2.4 ]

ks:  74 total,   2 running,  72 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.3% us,  0.3% sy,  0.0% ni, 96.3% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
8312 root  15   0 27168  14m 2128 S  2.0  5.6   0:12.97 X
8640 dimm  15   0 28748  13m   9m R  0.7  5.4   0:07.22 konsole
8532 dimm  18   0  2476  416  268 S  0.3  0.2   0:00.04 gpg-agent
8852 dimm  15   0  2116  996  772 R  0.3  0.4   0:00.27 top
8859 dimm  15   0  1460  432  352 S  0.3  0.2   0:00.44 tenp2
8860 dimm  15   0  1460  112   32 S  0.3  0.0   0:00.02 tenp2
   1 root  15   0   696  268  228 S  0.0  0.1   0:01.12 init

Again, [ 2.3 ] took place only 3 times.

Some observations:

/1/  for the "ok" ( [ 2.2 ] and [ 2.3 ] ) cases, the "will run" and
"will sleep" times from tenp's calibration output look /higher/ than
on average :

e.g.
Each fork will run for 5863 usecs and sleep for 52767 usecs

v.s. something in between

Each fork will run for 2392 usecs and sleep for 21528 usecs
Each fork will run for 3880 usecs and sleep for 34920 usecs

for the most part of cases (when tenp's cpu% ~0.3).


/2/  HZ = 250 for 2.6.19 and I think it was still 1000 for 2.6.13
(arghh.. forgot to check and would like to avoid a reboot in this
already late hour... but I believe it was still the time of 1000 by
default).

=

(*)

HZ == 250 ==> timer_tick is once in 4 ms. So - "will run for" < 4 ms -
may come well unaccounted? :o)

The funny thing i

Re: Security computation within Linux kernel

2007-04-08 Thread Lee Revell

On 4/8/07, JanuGerman <[EMAIL PROTECTED]> wrote:

Hi every one,

I have one question regarding security libraries, already shipped with Linux 
Kernel. That is, all PKI, RSA libraries, as provided by OpenSSL are already 
integrated within the linux kernel source code? OR, one have to use OpenSSL 
seperately in this regard.

I can see, linux/crypto.h and linux/hash.h files shipped with 2.6 kernel and know that 
SHA1 "hashing" can be done using linux/hash.h, but beside that, any possiblity 
for RSA or PKI.


What do you expect the kernel to use PKI for?  That's userspace stuff.

What problem are you trying to solve?

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add a norecovery option to ext3/4?

2007-04-08 Thread Samuel Thibault
Hi,

Distribution installers usually try to probe OSes for building a suited
grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
mode, does perform some operations on the filesystem (log recovery).
This is not a good idea since it may silently garbage data.  XFS has a
norecovery option that allows to disable that, I'd say ext3/4 should
have it too.

Samuel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc6

2007-04-08 Thread Andrew Morton
On Thu, 5 Apr 2007 19:50:11 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> 
wrote:

> 
> Ok,
>  I don't think there really is anything very interesting here, but we're 
> hopefully whittling down the list of regressions, and fixing various 
> random other small issues while at it.
> 
> Some smallish MIPS updates, networking (and network driver) fixes, removal 
> of a long obsolete framebuffer driver, etc etc. The shortlog really tells 
> the story.
> 
> We should be getting close to a 2.6.21 release, so please update any 
> regression reports you've done,
> 

I'm sitting on five patches which look like 2.6.21 material, but which
would normally go through subsystem maintainers:

pcmcia:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/fix-hotplug-for-legacy-platform-drivers.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/fix-hotplug-for-legacy-platform-drivers-update.patch

driver core:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/update-documentation-driver-model-platformtxt.patch

netdev:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/forcedeth-work-around-null-skb-dereference-crash.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/depcac-fix-handling-of-platorm_device_add-failure.patch

net:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/pktgen-add-try_to_freeze.patch

please send acks, nacks or smacks asap, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel

2007-04-08 Thread Richard Knutsson
Wow, I'm impressed. Think you got the record on how many mails you 
referenced to in a reply... But dude, please calm down, the caps-lock is 
not the answer. You have got some rude answers and you have called them 
back on it + you have repeated the same statement several times, that is 
not the best way of convincing people.


I believe you picked up the "anti-Reiser religion"-phrase from previous 
rant-wars (otherwise, why does that "religion"-phrase always come up, 
and (almost) only when dealing with Reiser-fs), and yes, there has been 
some clashes caused by both sides, so please be careful when dealing 
with this matter.


Would you be willing to benchmark Reiser4 with some compressed 
binary-blob and show the time as well as the CPU-usage? And document how 
it is set up so it can be reproduced. After all, Windows is suppose to 
be more stable, maintained and cost-efficient then Linux, but they don't 
tell us how ;)



since it can't benefit as much from similarity between
files. So if that is the case and you really want to save diskspace you
almost have to look at read-only compressed filesystems such as cramfs,
squashfs, zisofs, cloop and various other variants in combination with
a unionfs overlay to get read/write functionality.

But in the end everything is a tradeoff. You can save diskspace, but
increase the cost of corruption. 



You deliberately ignored the fact that bad blocks are NOT dealt with by
the filesystem,... but by the operating system. Like I said: If your
filesystem is writing to bad blocks, then throw away your operating
system.
  
I may have missed something, but if my room-mate took my harddrive, 
screwed it open, wrote a love-letter on the disk with a pencil and then 
returned it (ok, there may be some more plausible reasons for 
corruption), is the OS really suppose to handle it? Yes, it should not 
assign any new data to those blocks but should it not also fall into the 
file-systems domain to be able to restore some/all data?



Just my 2c to the pond
Richard Knutsson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER - Christer Weinigel

2007-04-08 Thread johnrobertbanks

Christer Weinigel: Until YOU, have actually used the REISER4 filesystem
yourself, I think YOU OWE IT to the people on the linux-kernel mailing
list, to, AS YOU SAY, shut the fuck up. 

Even reading up on the REISER4 filesystem would help. 

Applying a little intelligence would undoubtedly help too.

> [EMAIL PROTECTED] writes:
> 
> > Lennart. Tell me again that these results from 
> > 
> > http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
> > http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm
> > 
> > are not of interest to you. I still don't understand why you
> > have your head in the sand.
> 
> Oh, for fucks sake, stop sounding like a broken record.  

Oh, for fucks sake, would you, and your religious anti-REISER cohorts,
stop sounding like a broken record.

> You have repeated the same totally meaningless statistics more 
> times than I care to count.  Please shut the fuck up.

You, and your religious anti-REISER cohorts, have indeed repeated the
same broken arguments (if you can call them such) more times than I care
to count.

NO statistics, NO real facts, just selective MANIPULATION of facts.

> Please shut the fuck up.

Yes, why don't you politely, shut the fuck up.

Until YOU, have actually used the REISER4 filesystem yourself, I think
YOU OWE IT to the people on the linux-kernel mailing list, to shut the
fuck up, as YOU say.

I guess, the fact that you are a TOTAL HYPOCRITE, has completely escaped
you.

By the way: Did I thank you "delightful" people for the "pleasant"
welcome to the linux-kernel mailing list?

-

> So the two bonnie benchmarks with lzo and gzip are
> totally meaningless for any real life usages.

YOU (yes, the one with no experience and next to NO knowledge on the
subject) claim that because bonnie++ writes files that are mostly zeros,
the results are meaningless. It should be mentioned that bonnie++ writes
files that are mostly zero for all the filesystems compared. So the
results are meaningful, contrary to would you claim.

And hopefully all will notice that you just ignore these tests:

.-.
|File |Disk |Copy |Copy |Tar  |Unzip| Del |
|System   |Usage|655MB|655MB|Gzip |UnTar| 2.5 |
|Type | (MB)| (1) | (2) |655MB|655MB| Gig |
.-.
|REISER4 gzip | 213 | 148 |  68 |  83 |  48 |  70 |
|REISER4 lzo  | 278 | 138 |  56 |  80 |  34 |  84 |
|REISER4 tails| 673 | 148 |  63 |  78 |  33 |  65 |
|REISER4  | 692 | 148 |  55 |  67 |  25 |  56 |
|NTFS3g   | 772 |1333 |1426 | 585 | 767 | 194 |
|NTFS | 779 | 781 | 173 |   X |   X |   X |
|REISER3  | 793 | 184 |  98 |  85 |  63 |  22 |
|XFS  | 799 | 220 | 173 | 119 |  90 | 106 |
|JFS  | 806 | 228 | 202 |  95 |  97 | 127 |
|EXT4 extents | 806 | 162 |  55 |  69 |  36 |  32 |
|EXT4 default | 816 | 174 |  70 |  74 |  42 |  50 |
|EXT3 | 816 | 182 |  74 |  73 |  43 |  51 |
|EXT2 | 816 | 201 |  82 |  73 |  39 |  67 |
|FAT32| 988 | 253 | 158 | 118 |  81 |  95 |
.-.


where the files are definitely NOT mostly zeros. 

Your negligence has to be deliberate,... but why?

Are you manipulating the facts just to try and win an argument?

Most sane people will realize, that what you say is simply wrong.

ALSO YOU IGNORE examples offered by others, on lkml, which contradict
your assertion: FOR EXAMPLE:

> I see the same thing with my nightly scripts that do syslog analysis, last 
> year 
> I trimmed 2 hours from the nightly run by processing compressed files instead 
> of 
> uncompressed ones (after I did this I configured it to compress the files as 
> they are rolled, but rolling every 5 min the compression takes <20 seconds, 
> so 
> the compression is < 30 min)

>From David Lang http://lkml.org/lkml/2007/4/7/196

Willy Tarreau also mentions this situation in a couple of articles.

Let me spoon feed you:

David has said that compressing the logs takes

24 x 12 x 20 secs = 5,760 secs = 1.6 hours of CPU time (over the day)

but he saves 2 hours of CPU time on the daily syslog analysis.

For a total (minimum) saving of 24 minutes.

The actual saving is probably much greater. It depends on the CPU
utilization when not compressing, ie, whether you are using ide CPU
cycles or not. I guess it also depends on whether you can go home one
and a half hours earlier by using compression, or if your boss makes you
stick around anyway.

NOTE THAT THE FILES IN THIS EXAMPLE ARE ALSO NOT MAINLY ZEROS.

MAYBE you just lacked the knowledge to understand what David was saying,
or maybe your desire to denigrate REISER4 is so strong, that you simply
don't care what other people say about similar circumstances.

I am not sure why you have to be spoon feed on these matters, or why you
adamantly refuse to find the facts of the matter, for yourself.

--

2.6.21-rc6-mm1

2007-04-08 Thread Andrew Morton

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/


- Lots of x86 updates

- This is a 25MB diff against mainline, which is rather large.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git 
tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

echo "subscribe mm-commits" | mail [EMAIL PROTECTED]

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.


Changes since 2.6.21-rc5-mm4:

 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-agpgart.patch
 git-arm.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-powerpc.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ia64.patch
 git-ieee1394.patch
 git-infiniband.patch
 git-input.patch
 git-jfs.patch
 git-kbuild.patch
 git-kvm.patch
 git-leds.patch
 git-libata-all.patch
 git-md-accel.patch
 git-mips.patch
 git-mmc.patch
 git-mtd.patch
 git-ubi.patch
 git-netdev-all.patch
 git-e1000.patch
 git-net.patch
 git-ioat.patch
 git-ocfs2.patch
 git-parisc.patch
 git-r8169.patch
 git-selinux.patch
 git-pciseg.patch
 git-s390.patch
 git-scsi-misc.patch
 git-block.patch
 git-unionfs.patch
 git-watchdog.patch
 git-wireless.patch
 git-ipwireless_cs.patch
 git-cryptodev.patch
 git-gccbug.patch

 git trees.

-md-avoid-a-deadlock-when-removing-a-device-from-an-md-array-via-sysfs.patch
-md-avoid-a-deadlock-when-removing-a-device-from-an-md-array-via-sysfs-fix.patch
-revert-driver-core-do-not-wait-unnecessarily-in-driver_unregister.patch
-net-sunrpc-svcsockc-fix-a-check.patch
-agp-prevent-probe-collision-of-sis-agp-and-amd64_agp.patch
-cifs-remove-unneeded-checks.patch
-git-libata-all-ipr-fix.patch
-pcmcia-spot-slave-decode-flaws-for-testing.patch
-sata_nv-dont-read-shadow-registers-when-in-adma-mode.patch
-pata_ali-remove-all-the-crap-again-and-switch-to.patch
-pata_amd-remove-all-the-crud-and-restore-the-cable-detect.patch
-pata_netcell-re-remove-all-the-crud.patch
-pata_qdi-restore-cable-detect.patch
-pata_sl82c105-restore-cable-detect-method.patch
-pata_winbond-restore-cable-method.patch
-pata_optidma-rework-for-cable-detect-and-to-remove.patch
-ide-sl82c105-rework-pio-support.patch
-ide-sl82c105-dma-support-code-cleanup-take3.patch
-mtd-pmc-msp71xx-flash-rootfs-mappings.patch
-jffs2-delete-everything-related-to-obsolete-jffs2_proc.patch
-mtd-support-for-auto-locking-flash-on-power-up.patch
-make-drivers-net-qla3xxxcphy_devices-static.patch
-git-wireless-debug-build-fixes.patch
-cxgb3-safeguard-tcam-size-usage.patch
-cxgb3-detect-nic-only-adapters.patch
-cxgb3-tighten-xgmac-workaround.patch
-cxgb3-firwmare-update.patch
-fix-scsi_send_eh_cmnd-scatterlist-handling.patch
-slab-mention-slab-name-when-listing-corrupt-objects.patch
-turn-do_sync_file_range-into-do_sync_mapping_range.patch

 Merged into mainline or a subsystem tree.

+fuse-validate-rootmode-mount-option.patch
+proper-fix-for-highmem-kmap_atomic-functions-for-vmi-for-2621.patch
+omap_cf-oops-on-suspend-fix.patch
+x86_64-early-quirks-fix-early_qrk-section-tag.patch
+i386-irqbalance_disable-section-fix.patch

 2.6.21 queue.

-vmi-paravirt-ops-bugfix-for-2621.patch

 Dropped.

+make-proc-acpi-wakeup-more-useful.patch
+sony-laptop-remove-acpi-references-from-variable-and-function-names.patch
+sony-laptop-prepare-the-platform-driver-for-multiple-users.patch
+sony-laptop-add-debug-macros-also-used-by-the-sonypi-reimplementation.patch
+sony-laptop-add-sny6001-device-handling-sonypi-reimplementation.patch
+sony-laptop-unify-the-input-subsystem-event-forwarding.patch
+sony-laptop-additional-platform-attributes-coming-from-sny6001.patch
+sony-laptop-sanitize-printks.patch
+sony-laptop-update-documentation-and-kconfig-help.patch
+sony-laptop-add-sonypi-compat-code.patch

 sony-laptop work.

+arm-fix-section-mismatch-warning-in-board-sam9260.patch

 A

Re: [RFC][PATCH -mm] swsusp: Use rbtree for tracking allocated swap

2007-04-08 Thread Nigel Cunningham
Hi.

On Sun, 2007-04-08 at 18:47 +0200, Rafael J. Wysocki wrote:
> On Sunday, 8 April 2007 01:42, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Sun, 2007-04-08 at 01:13 +0200, Rafael J. Wysocki wrote:
> > > On Sunday, 8 April 2007 00:31, Nigel Cunningham wrote:
> > > > Hi.
> > > > 
> > > > On Sat, 2007-04-07 at 15:06 -0700, Andrew Morton wrote:
> > > > > On Sat, 7 Apr 2007 23:20:39 +0200 "Rafael J. Wysocki" <[EMAIL 
> > > > > PROTECTED]> wrote:
> > > > > 
> > > > > > This should allow us to reduce the memory usage, practically 
> > > > > > always, and
> > > > > > improve performance.
> > > > > 
> > > > > And does it?
> > > 
> > > Yes.  There are theoretical corner cases in which it may be less efficient
> > > than the current approach, but in the usual situation it is _much_ better.
> > > 
> > > > It will. I've been using extents for ages, for the same reasons. I don't
> > > > put them in an rb_tree because I view it as less than most efficient,
> > > 
> > > Actually, I don't agree with that.  In the normal situation (ie. one 
> > > extent is
> > > needed) there is no difference as far as the memory usage or performance
> > > are concerned, but if there are more extents, the rbtree should be more
> > > efficient.
> > 
> > I don't think it's worth having a big discussion over, but let me give
> > you the details, which you can then feel free to ignore :)
> > 
> > The rb_node struct adds an unsigned long and two struct rb_node *
> > pointers. My extents use one struct extent * pointer. The difference is
> > thus 12/24 bytes per extent (32/64 bits) vs 20/40.
> 
> Well, you use open-coded lists.  If you used list.h lists, the numbers 
> would be different. :-)

Yes, but I don't need doubly linked lists.

> > In the normal situation, not worth worrying about, but I'm also using these 
> > for
> > recording the sectors we write too, and thinking about swap files and
> > multiple swap devices. Nearly double the memory use bites more as you
> > get more extents.
> >
> > Insertion cost for rb_node includes keeping the tree balanced. For
> > extents, I start with the location of the last insertion to minimise the
> > cost, so insertion time is usually virtually zero (inc max of last
> > extent or append a new one).
> 
> Isn't the appending one actually linear worst-case?

Worst case would be the swap allocator returning swap pages in reverse
order. As you and I both know, that doesn't happen. I first implemented
this in 2003. If the worst case actually happened, I would have seen the
effect by now :)

> > If for some reason swap was allocated out of order, I might need to traverse
> > the whole chain from the start. 
> 
> Exactly.
> 
> > Normal usage in both cases is simply iterating through the list, so I
> > guess the cost would be approximately the same.
> > 
> > Deletion could would include rebalancing for the rb_nodes.
> 
> In swsusp the deletions are needed only if there's an error.

When freeing swap at the end of the cycle?

> > Code cost is a gain for you - you're leveraging existing code, I'm
> > adding a bit more. extent.c is 300 lines including code for serialising
> > the chains in an image header and iterating through a group of chains
> > (multiple swap devices support).
> > 
> > rb_nodes seem to be the wrong solution to me because we generally don't
> > care about searching. We care about minimising memory usage and
> > maximising the speed of iteration, insertion and deletion. I believe
> > I've managed to do that with a singly linked, sorted list.
> 
> The insertion also uses searching and in fact I don't really care for anything
> else.

Ok :)

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Fix MTRR suspend support for specific machines (some AMD64, maybe others also)

2007-04-08 Thread Andrew Morton
On Tue, 3 Apr 2007 15:55:32 +0200 (CEST) Bernhard Kaindl <[EMAIL PROTECTED]> 
wrote:

> With at least 3 of the following 4 patches, s2ram and s2disk are
> fixed on at least the Acer Ferrari 1000 notebooks and at least
> s2disk on the Acer Ferrari 5000 notebooks.

These patches cause my Vaio to oops during suspend-to-disk.

oops: http://userweb.kernel.org/~akpm/s5000499.jpg
config: http://userweb.kernel.org/~akpm/config-sony.txt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread J. Bruce Fields
On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote:
> 1) Deprecate telldir/seekdir() altogether.  Relatively few progams use
> this functionality, and it is highly questionable how useful it is,
> anyway.  If you use telldir/seekdir and keep the cookie for a long
> time, even the POSIX-provided guarantees about files that are created
> and deleted between the telldir() and seekdir() points in time makes
> its utility highly dubious.

How will nfsd implement readdir?

> 2) If application programs must have telldir/seekdir, than expand the
> size of the cookie from 32-bits to a minimum of 128 bits, and
> preferably larger --- say 512 bits, to accomodate systems that might
> be using 512-bit variant of SHA-2.

NFS readdir cookies are currently 64 bits.

It'd be interesting to think about how to modify the protocol to make
this all easier, but any network filesystem protocol will need to give
clients some way to read through big directories one piece at a time.
Might also be nice if it worked even if the server rebooted partway
through

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6

2007-04-08 Thread Bartek

Please reproduce and provide a new crash dump without the nvidia
binary-only module loaded.


Hi again,

Here is a new crash dump (I also removed vmnet and vmmon properitary
modules), this time I also included a lspci output:

Apr  8 21:47:21 localhost pppd[2114]: rcvd [proto=0xfc3b] bc d4 80 eb
43 62 d0 7e 6d 27 0a e0 22 e4 8d e6 3e f1 a3 10 39 c8 fd cb e7 23 db
f1 cf a8 e0 4d ...
Apr  8 21:47:21 localhost pppd[2114]: Unsupported protocol 0xfc3b received
Apr  8 21:47:21 localhost pppd[2114]: sent [LCP ProtRej id=0x75 fc 3b
bc d4 80 eb 43 62 d0 7e 6d 27 0a e0 22 e4 8d e6 3e f1 a3 10 39 c8 fd
cb e7 23 db f1 cf a8 ...]
Apr  8 21:47:22 localhost pppd[2114]: rcvd [proto=0xcd] f7 4e 69 54 c1
d2 82 d3 bf 1c 33 46 a1 ee 90 97 14 7c a7 23 9d 84 c3 d4 ff 6c ec 25
a7 65 a3 bd ...
Apr  8 21:47:22 localhost pppd[2114]: Unsupported protocol 0xcd received
Apr  8 21:47:22 localhost pppd[2114]: sent [LCP ProtRej id=0x76 00 cd
f7 4e 69 54 c1 d2 82 d3 bf 1c 33 46 a1 ee 90 97 14 7c a7 23 9d 84 c3
d4 ff 6c ec 25 a7 65 ...]
Apr  8 21:47:22 localhost kernel: skb_under_panic: text:f8c3cc0e
len:1268 put:1 head:c399f800 data:c399f7ff tail:c399fcf3 end:c399fe00
dev:
Apr  8 21:47:22 localhost kernel: [ cut here ]
Apr  8 21:47:22 localhost kernel: kernel BUG at net/core/skbuff.c:111!
Apr  8 21:47:22 localhost kernel: invalid opcode:  [#1]
Apr  8 21:47:22 localhost kernel: Modules linked in: nfs nfsd exportfs
lockd nfs_acl sunrpc button xt_TCPMSS xt_limit xt_tcpudp nf_nat_irc
nf_nat_ftp iptable_nat iptable_mangle ipt_LOG ipt_MASQUERADE nf_nat
ipt_TOS ipt_REJECT nf_conntrack_irc nf_conntrack_ftp nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables x_tables
ppp_async ipv6 ppp_generic slhc xfs eeprom w83781d w83627hf hwmon_vid
i2c_isa ide_generic snd_via82xx snd_ac97_codec ac97_bus snd_pcm
snd_timer snd_page_alloc i2c_viapro snd_mpu401_uart i2c_core via_ircc
snd_rawmidi snd_seq_device floppy snd serio_raw soundcore irda rtc
psmouse via_agp agpgart crc_ccitt pcspkr evdev ext3 jbd mbcache usbhid
ide_cd cdrom ide_disk generic uhci_hcd usbcore via82cxxx ide_core e100
mii thermal processor fan
Apr  8 21:47:22 localhost kernel: CPU:0
Apr  8 21:47:22 localhost kernel: EIP:0060:[]
Tainted: P   VLI
Apr  8 21:47:22 localhost kernel: EFLAGS: 00010096   (2.6.21-rc6 #3)
Apr  8 21:47:22 localhost kernel: EIP is at skb_under_panic+0x59/0x5d
Apr  8 21:47:22 localhost kernel: eax: 0073   ebx: c399f800   ecx:
   edx: 
Apr  8 21:47:22 localhost kernel: esi:    edi: c399fcf5   ebp:
c399fcf1   esp: c1ce5ed8
Apr  8 21:47:22 localhost kernel: ds: 007b   es: 007b   fs: 00d8  gs:
  ss: 0068
Apr  8 21:47:22 localhost kernel: Process events/0 (pid: 3,
ti=c1ce4000 task=dfd02030 task.ti=c1ce4000)
Apr  8 21:47:22 localhost kernel: Stack: c02c47d0 f8c3cc0e 04f4
0001 c399f800 c399f7ff c399fcf3 c399fe00
Apr  8 21:47:22 localhost kernel:c02b7ed8 f7ef5600 00ff
f8c3cc13  dfff5c20 c1fee800 0208
Apr  8 21:47:22 localhost kernel:c1fee5ad c1fee4ad c1fee800
0202 dfd7ce00 0004 c1fee400 c1fee80c
Apr  8 21:47:22 localhost kernel: Call Trace:
Apr  8 21:47:22 localhost kernel:  []
ppp_asynctty_receive+0x3b0/0x584 [ppp_async]
Apr  8 21:47:22 localhost kernel:  []
ppp_asynctty_receive+0x3b5/0x584 [ppp_async]
Apr  8 21:47:22 localhost kernel:  [] flush_to_ldisc+0xe6/0x124
Apr  8 21:47:22 localhost kernel:  [] flush_to_ldisc+0x0/0x124
Apr  8 21:47:22 localhost kernel:  [] run_workqueue+0x70/0x101
Apr  8 21:47:22 localhost kernel:  [] worker_thread+0x105/0x12e
Apr  8 21:47:22 localhost kernel:  [] default_wake_function+0x0/0xc
Apr  8 21:47:22 localhost kernel:  [] worker_thread+0x0/0x12e
Apr  8 21:47:23 localhost kernel:  [] kthread+0xa0/0xc8
Apr  8 21:47:23 localhost kernel:  [] kthread+0x0/0xc8
Apr  8 21:47:23 localhost kernel:  [] kernel_thread_helper+0x7/0x10
Apr  8 21:47:23 localhost kernel:  ===
Apr  8 21:47:23 localhost kernel: Code: 00 00 89 5c 24 14 8b 98 a0 00
00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 24 04 c7 04 24 d0 47 2c
c0 89 44 24 08 e8 af c5 ef ff <0f> 0b eb fe 56 53 bb d8 7e 2b c0 83 ec
24 8b 70 14 85 f6 0f 45
Apr  8 21:47:23 localhost kernel: EIP: []
skb_under_panic+0x59/0x5d SS:ESP 0068:c1ce5ed8
Apr  8 21:48:01 localhost /USR/SBIN/CRON[6287]: (root) CMD
(/usr/local/bin/pppd_test.sh)
Apr  8 21:48:09 localhost pppd[2114]: No response to 5 echo-requests
Apr  8 21:48:09 localhost pppd[2114]: Serial link appears to be disconnected.
Apr  8 21:48:09 localhost pppd[2114]: Connect time 522.7 minutes.
Apr  8 21:48:09 localhost pppd[2114]: Sent 57811374 bytes, received
186299345 bytes.
Apr  8 21:48:09 localhost pppd[2114]: Script /etc/ppp/ip-down started (pid 6289)
Apr  8 21:48:09 localhost pppd[2114]: sent [LCP TermReq id=0x77 "Peer
not responding"]
Apr  8 21:48:09 localhost pppd[2114]: Script /etc/ppp/ip-down finished
(pid 6289), status = 0x0
Apr  8 21:48:12 localhost pppd[2114]: sent [LCP TermReq id=0x78 "Peer
no

Re: [PATCH 2/4] Save the MTRRs of the BSP before booting an AP

2007-04-08 Thread Andrew Morton
On Tue, 3 Apr 2007 16:00:36 +0200 (CEST) Bernhard Kaindl <[EMAIL PROTECTED]> 
wrote:

> --- linux-2.6.20.orig/arch/i386/kernel/smpboot.c
> +++ linux-2.6.20/arch/i386/kernel/smpboot.c
> @@ -59,6 +59,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 

This inclusion breaks `make headers_check'.

Please always at least test allmodconfig builds before releasing a
patchset.  Additional hints are in Documentation/SubmitChecklist.

+static __inline__ void mtrr_save_state (void)
+{
+   if (smp_processor_id() == 0)
+   mtrr_save_fixed_ranges(NULL);
+   else
+   smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1);
+}

- Please use inline, not __inline__

- No space before the (

- We should uninline this function

  - It is not performance critical

  - It is probably too large to be inlined anwyay

  - It uses a lot of tricky stuff which requires a lot of header files.


From: Andrew Morton <[EMAIL PROTECTED]>

Fix `make headers_check'.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Bernhard Kaindl <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/cpu/mtrr/main.c |   11 +++
 include/asm-i386/mtrr.h  |   12 +---
 include/asm-x86_64/proto.h   |   12 +---
 3 files changed, 13 insertions(+), 22 deletions(-)

diff -puN 
arch/i386/kernel/smpboot.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
 arch/i386/kernel/smpboot.c
diff -puN 
arch/x86_64/kernel/smpboot.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
 arch/x86_64/kernel/smpboot.c
diff -puN 
include/asm-i386/mtrr.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix 
include/asm-i386/mtrr.h
--- 
a/include/asm-i386/mtrr.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
+++ a/include/asm-i386/mtrr.h
@@ -25,7 +25,6 @@
 
 #include 
 #include 
-#include 
 
 #defineMTRR_IOCTL_BASE 'M'
 
@@ -71,16 +70,7 @@ struct mtrr_gentry
 /*  The following functions are for use by other drivers  */
 # ifdef CONFIG_MTRR
 extern void mtrr_save_fixed_ranges(void *);
-/**
- * Save current fixed-range MTRR state of the BSP
- */
-static __inline__ void mtrr_save_state (void)
-{
-   if (smp_processor_id() == 0)
-   mtrr_save_fixed_ranges(NULL);
-   else
-   smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1);
-}
+extern void mtrr_save_state(void);
 extern int mtrr_add (unsigned long base, unsigned long size,
 unsigned int type, char increment);
 extern int mtrr_add_page (unsigned long base, unsigned long size,
diff -puN 
include/asm-x86_64/proto.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
 include/asm-x86_64/proto.h
--- 
a/include/asm-x86_64/proto.h~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
+++ a/include/asm-x86_64/proto.h
@@ -2,7 +2,6 @@
 #define _ASM_X8664_PROTO_H 1
 
 #include 
-#include 
 
 /* misc architecture specific prototypes */
 
@@ -19,16 +18,7 @@ extern void mcheck_init(struct cpuinfo_x
 extern void mtrr_ap_init(void);
 extern void mtrr_bp_init(void);
 extern void mtrr_save_fixed_ranges(void *);
-static __inline__ void mtrr_save_state (void)
-{
-   /*
-* Save current fixed-range MTRR state of the BSP:
-*/
-   if (smp_processor_id() == 0)
-   mtrr_save_fixed_ranges(NULL);
-   else
-   smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1);
-}
+extern void mtrr_save_state(void);
 #else
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff -puN 
arch/i386/kernel/cpu/mtrr/main.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
 arch/i386/kernel/cpu/mtrr/main.c
--- 
a/arch/i386/kernel/cpu/mtrr/main.c~mtrr-save-the-mtrrs-of-the-bsp-before-booting-an-ap-fix
+++ a/arch/i386/kernel/cpu/mtrr/main.c
@@ -729,6 +729,17 @@ void mtrr_ap_init(void)
local_irq_restore(flags);
 }
 
+/**
+ * Save current fixed-range MTRR state of the BSP
+ */
+void mtrr_save_state(void)
+{
+   if (smp_processor_id() == 0)
+   mtrr_save_fixed_ranges(NULL);
+   else
+   smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1, 1);
+}
+
 static int __init mtrr_init_finialize(void)
 {
if (!mtrr_if)
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pata_icside driver

2007-04-08 Thread Alan Cox
> + /*
> +  * DMA is based on a 16MHz clock
> +  */
> + if (ata_timing_compute(adev, adev->dma_mode, &t, 1000, 1))
> + return;

This seems strange for a 16MHz clock.

> +
> + /*
> +  * Now, properly adjust the timings.  If we have a 62.5ns clock
> +  * period and we ask for MWDMA2, it calculates the following
> +  * timings: active 125ns, recovery 62.5ns, cycle 125ns.
> +  * Quite obviously bogus. 

NAK. 

At this point you need to work out why you are getting bogus results and
fix it or demonstrate a bug in the core code and fix that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] timekeeping: drop irq-context clocksource polling

2007-04-08 Thread Daniel Walker
On Sun, 2007-04-08 at 10:33 +0200, Thomas Gleixner wrote:
> 
> Oh well, this is a leftover from the days where we tried to use TSC
> despite of frequency changes. It still modifies the scale factor of the
> tsc clocksource. 
> 
> I agree that it can be removed as we switch off TSC anyway in that case.

That's what I was thinking .. However, I wanted to wait for John to
comment on it also ..

Dainel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird MMC errors: 2 of 2 - inconsistent state after data crc error

2007-04-08 Thread Pierre Ossman
Alex Dubov wrote:
> Problem 2: After a data crc error all subsequent commands fail. May it be 
> caused by stop command
> leaving card in some bad state (something clearable by SEND_STATUS)? On the 
> other hand, is there a
> real need to issue a stop command in case main command failed?
> 

It might be, depending on what the problem is. E.g. timeout might still mean the
card processed the command and will start sending data.

Anyway, CRC errors should be extremely rare so I'd guess that either the card or
the controller has gotten confused. In many cases the card will shut down when
it gets annoyed, so that might be what you're seeing here.

Other than that, I'm not sure I can help that much. The stop commands should
never wedge the card, so that isn't the issue (unless the card is buggy).

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Ulrich Drepper

On 4/8/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

More fundamentally, the telldir cookie should never be valid when
applied to a different DIR * (even one that refers to the same directory.)


Don't worry about this.  This is clearly the semantics which was
always wanted. I've filed a defect report and it'll be handled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird MMC errors: 1 of 2 - bad ocr value

2007-04-08 Thread Pierre Ossman
Alex Dubov wrote:
> Recently, I've obtained a bug report concerning an MMC card. Two problems are 
> described, both
> sporadic.
> Problem 1: illegal ocr value is returned. You may notice, in the non-working 
> case, obviously
> incorrect ocr value (0x) is returned. The card won't work after this, 
> unless reinserted.
> What, to your opinion, shall we do about it?
> 

I got something similar when there was problem with the power supply. The card
was booting through power it drain from the other pins, but it didn't work
correctly.

Try adding some more delay after you power up in case the controller needs some
time to stabilize.

-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] merge compat_ioctl.h into compat_ioctl.c

2007-04-08 Thread David Miller
From: Arnd Bergmann <[EMAIL PROTECTED]>
Date: Sun, 8 Apr 2007 16:51:20 +0200

> On Sunday 08 April 2007, Christoph Hellwig wrote:
> > 
> > Now that there is no arch-specific compat ioctl handling left there
> > is not point in having a separate copat_ioctl.h, so merge it into
> > compat_ioctl.c
> 
> Yes, definitely a good idea.
> 
> > Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>
>  
> Acked-by: Arnd Bergmann <[EMAIL PROTECTED]>

Acked-by: David S. Miller <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MMC: Fix handling of low-voltage cards (take 2)

2007-04-08 Thread Pierre Ossman
Philip Langdale wrote:
> Fix handling of low voltage MMC cards.
>
>   

Sorry, my fifo filled up and you got stuck at the far end.

I've applied this and will push to andrew in a bit.

Rgds

-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread H. Peter Anvin

Theodore Tso wrote:

It doesn't state explicitly that you can use the telldir cookie()
after closing the directory stream using closedir() and then reopening
it using opendir(), but given that it states that results are
undefined after a rewinddir() --- which is much less violent than a
closedir()/opendir(), I would definitely argue that an application
programmer would be very ill-advised to rely on this working.

(Of course, I'd argue that an application programmer shouldn't use
telldir/seekdir at all.)

Ulrich, is it too late to insert a clarification that the telldir()
cookie isn't guaranteed to be valid after closedir() *or* rewinddir()?


More fundamentally, the telldir cookie should never be valid when 
applied to a different DIR * (even one that refers to the same directory.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Ulrich Drepper

On 4/8/07, Theodore Tso <[EMAIL PROTECTED]> wrote:

Ulrich, is it too late to insert a clarification that the telldir()
cookie isn't guaranteed to be valid after closedir() *or* rewinddir()?


It's never too late.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.5

2007-04-08 Thread Andre Tomt

Chris Wright wrote:


Arnaldo Carvalho de Melo (1):
  DCCP: Fix exploitable hole in DCCP socket options



Does this fix cure CVE-2007-1730 and CVE-2007-1734, or just one of them? 
They both seem to be in the exact same code path the patch touches.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Theodore Tso
On Sun, Apr 08, 2007 at 08:41:30PM +0200, Jörn Engel wrote:
> 
> Garbage-collecting them on closedir() does not work.  It surprised me as
> well, but there seem to be applications that keep the telldir() cookie
> around after closedir().  Iirc, "rm -r" was one of them.
> 
> Neil, is this correct?

Well, according to the Single Unix Specification:

If the value of loc was not obtained from an earlier call to
telldir(), or if a call to rewinddir() occurred between the call
to telldir() and the call to seekdir(), the results of subsequent
calls to readdir() are unspecified.

It doesn't state explicitly that you can use the telldir cookie()
after closing the directory stream using closedir() and then reopening
it using opendir(), but given that it states that results are
undefined after a rewinddir() --- which is much less violent than a
closedir()/opendir(), I would definitely argue that an application
programmer would be very ill-advised to rely on this working.

(Of course, I'd argue that an application programmer shouldn't use
telldir/seekdir at all.)

Ulrich, is it too late to insert a clarification that the telldir()
cookie isn't guaranteed to be valid after closedir() *or* rewinddir()?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread H. Peter Anvin

Theodore Tso wrote:


You could, but then you're succeptible to a memory allocation attack.
If you have an arbitrarily large directory (say, one with multiple
millions of entries), and the attacker program calls seekdir() after
every single readdir() call, you would then force the kernel to
allocate and then pin arbitrarily large amounts of memory, which as
you point out, as currently specified by the POSIX specification, you
are not allowed to release until closedir().

This could be done in userspace, by forcing glibc to readdir() the
entire directory into memory, at which point seekdir()/telldir() will
work just fine.  But for a really big directory, this could consume a
huge amount of space. 


If we had the 64-byte telldir cookie that I had proposed, then in
userspace we could simply associate that 64-byte telldir cookie with a
small 32-bit integer, either in memory, or in some berkdb or tdb
interface, at least until the use of telldir/seekdir had actually
disappeared.  (Which probably wouldn't take that long; I really doubt
there are that many users of it out there, so it's probably OK if they
suffer a performance penality if they use this really wretched and
horrible interface.)



If you want to have a large cookies, you could have glibc allocate a 
memory block to store it, and have glibc responsible for keeping track 
of it.  As far as I know, off_t can hold a pointer on all our 
implementations (only 32-bit machines have 32-bit off_t as an option; 
Alpha *might* be an exception but I don't think so.)



I'll also note, by the way, that there are those who have been much
more cavalier with breaking the wireless interface or the udev/sys
interface after one year.  Not that I would agree with that, but over
some deprecation period measured in years, I think it is possible to
nuke what was a horribly misguided interface that should have never
existed.  Whoever invented it really should receive the brown paper
award for one of the worst design decisions of all time.


Readdir/telldir are much, much, more fundamental than that.  We're 
talking interfaces which have been standardized for longer than Linux 
itself has existed.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5: Thinkpad X60 gets critical thermal shutdowns

2007-04-08 Thread Valdis . Kletnieks
On Mon, 02 Apr 2007 10:35:40 +0200, Rene Rebe said:

(Sorry for the late reply..)

> IIRC a MSI Megabook S270 (I formerly owned) BIOS notifies this
> "Critical temperature reached (128C)" when the battery run empty
> when the OS did no action due to battery low indications. I guess
> the BIOS people thought this is a good last resort to let the OS
> really shutdown before the box just turns off.

It's not just MSI - I recently managed to put a Dell Latitude D820 into its bag
while still running, where it babbled to itself running on the warm side for
several hours.  When I finally did get it out, it *was* quite hot to the touch,
but I was amazed that it managed to run the battery down to somewhere under 4%
(which took some 4 or 5 hours) and then throw the thermal check that made it
shut down - quite the coincidence indeed.

However, "ran warm but tolerable and then used the thermal to shut down when
the battery failed" matches the symptoms much better



pgp7lmMALdsLK.pgp
Description: PGP signature


Re: [RFC] pata_icside driver

2007-04-08 Thread Alan Cox
> The second FIXME area is ata_irq_ack - it is unconditionally coded
> for SFF-type interfaces.  I believe that using this function in
> non-BMDMA interfaces is wrong - it attempts to read from the BMDMA
> registers irrespective of whether ap->ioaddr.bmdma_addr is set or
> not.  The question this poses is: what should non-BMDMA implementations
> use for this method?  Note that pata_platform also uses this
> function despite not supporting BMDMA which seems even more suspicious.

Thats a bug that has arrived again. The older code was corrected to
handle this properly but the fix appears to have become lost. The
ioread/iowrite code actually made quite a mess (all the address reporting
is also broken) and we do some iffy things like compare the iomap result
with zero and assume thats the same as checking for true bus zero
addresses.

ata_irq_ack is part of the SFF layer so its fine that it assumes SFF but
its wrong that it is used unconditionally and it shouldn't be used this
way. It just needs a (!ap->ioaddr.bmdma_addr) test adding (assuming thats
valid for iomap)


Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Rene Herman

On 04/08/2007 12:41 PM, Ingo Molnar wrote:

this is pretty hard to get right, and the most objective way to change 
it is to do it testcase-driven. FYI, interactivity tweaking has been 
gradual, the last bigger round of interactivity changes were done a year 
ago:


 commit 5ce74abe788a26698876e66b9c9ce7e7acc25413
 Author: Mike Galbraith <[EMAIL PROTECTED]>
 Date:   Mon Apr 10 22:52:44 2006 -0700

 [PATCH] sched: fix interactive task starvation

(and a few smaller tweaks since then too.)

and that change from Mike responded to a testcase. Mike's latest changes 
(the ones you just tested) were mostly driven by actual testcases too, 
which measured long-term timeslice distribution fairness.


Ah yes, that one. Here's the next one in that series:

commit f1adad78dd2fc8edaa513e0bde92b4c64340245c
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Sun May 21 18:54:09 2006 -0700

Revert "[PATCH] sched: fix interactive task starvation"

It personally had me wonder if _anyone_ was testing this stuff...

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Ulrich Drepper

On 4/7/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote:

It's not going to solve anything at all.  We can't stop supporting
functionality that has been there forever.


Not necessarily.

One problem here is that the interface for using readdir() with and
without telldir()/seekdir() is the same.  A second problem is that the
functionality is universally required.  Both of these problems can be
addressed.

For the second problem, I certainly could imagine that making the
functionality to to use seekdir()/telldir() optional.  It might be
hard in POSIX but this does not mean anything about implementations.
Implementations just have to provide a way to allow these functions to
be used.  It does not mean it always and everywhere has to work.  What
this means is that if, for instance, a filesystem would be (for now)
be able to have a mount option to not allow seekdir()/telldir() the
system still can conform to POSIX.  At the same time we can gather
information as to whether seekdir()/telldir() are really needed.  I
personally think the number of apps which depend on this functionality
is miniscule.

Using a mount option isn't the nicest solution, though.  If a
filesystem can support seekdir()/telldir() the better solution from
the userlevel API POV would be to provide a better, alternative
interface.  Maybe an alternative opendir() call (opendir2?) which
takes a second parameter as to whether seeking is needed or not.  Then
this opendir2() function can use a new getdents() syscall and return
the entries.  The difference would be that if the user wants to use
seekdir()/telldir() the userlevel code would cache the old results and
the seekdir()/telldir() handling would be entirely at userlevel.

It's not a good idea to make this the default behavior for the old
opendir() since the vast majority of the current users don't want to
seek and therefore the caching would significantly impact the
performance.  With the extra argument saying when caching is needed
this is no problem anymore.  Over time people would migrate off of
opendir() and towards opendir2() (with some "careful" encouragement)
and the whole problem will go away.

And the best: this is certainly a path I can see being viable for
POSIX.  But it requires that we have
a) established existing practice
b) shown the impact is really low

So, I think it would be great to get started writing this new getdents
call.  Yes, for now it means maintaining two separate versions.  If
all goes well those filsystems which feel a high burden can simply
stop supporting the old syscall or at least the seek functionality.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Theodore Tso
On Sun, Apr 08, 2007 at 11:11:20AM -0700, H. Peter Anvin wrote:
> Christoph Hellwig wrote:
> >On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote:
> >>this functionality, and it is highly questionable how useful it is,
> >>anyway.  If you use telldir/seekdir and keep the cookie for a long
> >>time, even the POSIX-provided guarantees about files that are created
> >>and deleted between the telldir() and seekdir() points in time makes
> >>its utility highly dubious.
> >
> >It's not going to solve anything at all.  We can't stop supporting
> >functionality that has been there forever. 
> 
> Well, the question is if you can keep the seekdir/telldir cookie around 
> as a pointer -- preferrably in userspace, of course.  You would 
> presumably garbage-collect them on closedir() -- there is no other point 
> at which you could.
> 
> I personally suspect that hch is right -- this stuff has been there 
> since time immemorial and it'll be hard or impossible to deprecate it.

You could, but then you're succeptible to a memory allocation attack.
If you have an arbitrarily large directory (say, one with multiple
millions of entries), and the attacker program calls seekdir() after
every single readdir() call, you would then force the kernel to
allocate and then pin arbitrarily large amounts of memory, which as
you point out, as currently specified by the POSIX specification, you
are not allowed to release until closedir().

This could be done in userspace, by forcing glibc to readdir() the
entire directory into memory, at which point seekdir()/telldir() will
work just fine.  But for a really big directory, this could consume a
huge amount of space. 

If we had the 64-byte telldir cookie that I had proposed, then in
userspace we could simply associate that 64-byte telldir cookie with a
small 32-bit integer, either in memory, or in some berkdb or tdb
interface, at least until the use of telldir/seekdir had actually
disappeared.  (Which probably wouldn't take that long; I really doubt
there are that many users of it out there, so it's probably OK if they
suffer a performance penality if they use this really wretched and
horrible interface.)

I'll also note, by the way, that there are those who have been much
more cavalier with breaking the wireless interface or the udev/sys
interface after one year.  Not that I would agree with that, but over
some deprecation period measured in years, I think it is possible to
nuke what was a horribly misguided interface that should have never
existed.  Whoever invented it really should receive the brown paper
award for one of the worst design decisions of all time.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread Jörn Engel
On Sun, 8 April 2007 11:11:20 -0700, H. Peter Anvin wrote:
> 
> Well, the question is if you can keep the seekdir/telldir cookie around 
> as a pointer -- preferrably in userspace, of course.  You would 
> presumably garbage-collect them on closedir() -- there is no other point 
> at which you could.

Garbage-collecting them on closedir() does not work.  It surprised me as
well, but there seem to be applications that keep the telldir() cookie
around after closedir().  Iirc, "rm -r" was one of them.

Neil, is this correct?

Jörn

-- 
Data dominates. If you've chosen the right data structures and organized
things well, the algorithms will almost always be self-evident. Data
structures, not algorithms, are central to programming.
-- Rob Pike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


man-pages-2.44 is released

2007-04-08 Thread Michael Kerrisk
Gidday,

After a long hiatus, a new man-pages release...

And a happy announcement!  My work on man-pages is now partially supported
by my employer, Google.  Henceforth, something up to 20% [*] of my working
week (depending on other time pressures...) will be spent on man-pages
maintenance.  Thanks, Google!

So...

I've released man-pages-2.44.

This release is now available for download at:

ftp://ftp.kernel.org/pub/linux/docs/manpages
or mirrors: ftp://ftp.XX.kernel.org/pub/linux/docs/manpages

and soon at:

ftp://ftp.win.tue.nl/pub/linux-local/manpages

Cheers,

Michael

[*] http://www.google.com/support/jobs/bin/static.py?page=about.html
http://en.wikipedia.org/wiki/Google#.22Twenty_percent.22_time

===

This release contains a very large number of changes.  Among the changes
that may be of interest to readers of this list are the following:

New pages
-

termio.7
mtk, after a bit of prodding by Reuben Thomas
A brief discussion of the old System V termio interface,
with pointers to pages that will contain the information
that the reader probably wants.

Changes to individual pages
---

access.2
mtk
Since 2.6.20, access() honours the MS_NOEXEC mount flag.

mincore.2
Nick Piggin
Kernel 2.6.21 fixes several earlier bugs in mincore().
mtk
Rewrote various parts to make the page clearer.

mmap.2
mtk
Rewrote and reorganised various parts to be clearer.

mount.2
mtk / Val Henson
Document MS_RELATIME, new in Linux 2.6.20.

semop.2
mtk
If sops contains multiple operations, then these are performed
in array order.  All Unix systems that I know of do this,
and some Linux applications depend on this behaviour.  SUSv3
made no explicit statement here, but SUSv4 will explicitly
require this behaviour.

ptrace.2
Chuck Ebbert
When the parent receives an event with PTRACE_EVENT_* set,
the child is not in the normal signal delivery path.  This
means the parent cannot do ptrace(PTRACE_CONT) with a signal
or ptrace(PTRACE_KILL).  kill() with a SIGKILL signal can be
used instead to kill the child process after receiving one
of these messages.

time.7
mtk
Since kernel 2.6.20, the software clock can also be 300 HZ.


-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: APIC error on 32-bit kernel

2007-04-08 Thread Jay Cliburn
[Adding linux-kernel to the cc list, hoping for wider exposure.]

On Fri, 23 Mar 2007 20:08:17 -0500
Jay Cliburn <[EMAIL PROTECTED]> wrote:

> We're trying to track down the source of a problem that occurs
> whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4

and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.

> We can load the driver just fine, but whenever we activate the
> network, we see APIC errors (a sample of them are shown here,
> captured from a serial console):
> 
> [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk
> [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
> sysctl (sysc.
> [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> [   94.498887] APIC error on CPU0: 00(08)
> [   94.498534] APIC error on CPU1: 00(08)
> [   94.550079] APIC error on CPU0: 08(08)
> [   94.549725] APIC error on CPU1: 08(08)
> [   94.600915] APIC error on CPU1: 08(08)
> [   94.601276] APIC error on CPU0: 08(08)
> [   94.652108] APIC error on CPU1: 08(08)
> [   94.652470] APIC error on CPU0: 08(08)
> [   94.703659] APIC error on CPU0: 08(08)
> [   94.703305] APIC error on CPU1: 08(08)
> [   94.754852] APIC error on CPU0: 08(40)
> [   94.806045] APIC error on CPU0: 40(08)
> [   94.805692] APIC error on CPU1: 08(08)
> [   94.857238] APIC error on CPU0: 08(08)
> [   94.856884] APIC error on CPU1: 08(08)
> [   94.908432] APIC error on CPU0: 08(08)
> [   94.908078] APIC error on CPU1: 08(08)
> [snip, more of the same]
> [   98.901156] APIC error on CPU1: 08(08)
> [   98.952702] APIC error on CPU0: 08(08)
> [   98.952349] APIC error on CPU1: 08(08)
> [   99.003895] APIC error on CPU0: 08(08)
> [   99.003542] APIC error on CPU1: 08(08)
> 
> The machine hangs for about 5-10 seconds, then spontaneously reboots
> without further console output.

I can prompt an oops by pinging my router while the apic errors are
scrolling by.

> 
> This is an Asus M2V (Via K8T890) motherboard.
> 
> The problem does not occur on a 32-bit kernel if we boot with
> pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> motherboard.
> 
> We also do not see this problem on Intel-based motherboards, with
> either 32- or 64-bit kernels.

A full raft of documentation -- including acpidump and
linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
(with apic=debug boot option), dmesg, and /proc/interrupts -- is
available at http://www.hogchain.net/m2v/apic-problem/

If this is a motherboard problem, that's fine; I'd just like to know
the details so I tell users something more than "it's a motherboard
problem."

Thanks,
Jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SD scheduler testing hitch

2007-04-08 Thread Al Boldi
Mike Galbraith wrote:
> On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:
> > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> > here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
> > starving 10% duty cycle task has trouble getting 1% CPU.
>
> Hmm.  Playing with it some more today, it still happens, but it's not
> very repeatable.  Something is odd.  I wonder if any SD using readers
> will try it.

Tried it on mainline 2.6.20.3.
It's not easily repeatable, but it's got the same problem.

top - 21:21:45 up 27 min,  0 users,  load average: 0.80, 0.43, 0.20
Tasks:  45 total,   3 running,  42 sleeping,   0 stopped,   0 zombie
Cpu(s):  24.3% user,   0.5% system,   0.0% nice,  75.0% idle,   0.2% IO-wait
Mem:499488k total,27352k used,   472136k free, 1996k buffers
Swap:  1020088k total,0k used,  1020088k free, 9160k cached

  PID  PR  NI  VIRT  RES  SHR SWAP nFLT nDRT WCHAN S %CPUTIME+  Command 
  688  25   0  1804  412  352 139200 rest_init R 94.7   2:37.01 fairtest
  689  15   0  1804  264  204 154000 rest_init R  0.0   0:00.79 fairtest
1  15   0  1440  500  444  940   150 rest_init S  0.0   0:00.73 init


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] worker_thread: fix racy try_to_freeze() usage

2007-04-08 Thread Oleg Nesterov
worker_thread() can miss freeze_process()->signal_wake_up() if it happens
between try_to_freeze() and prepare_to_wait(). We should check freezing()
before entering schedule().

This race was introduced by me in

[PATCH 1/1] workqueue: don't migrate pending works from the dead CPU

Looks like mm/vmscan.c:kswapd() has the same race.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5-mm4/kernel/workqueue.c~wq_freeze 2007-04-07 20:11:14.0 
+0400
+++ 2.6.21-rc5-mm4/kernel/workqueue.c   2007-04-08 21:37:43.0 +0400
@@ -307,14 +307,14 @@ static int worker_thread(void *__cwq)
do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
for (;;) {
-   if (cwq->wq->freezeable)
-   try_to_freeze();
-
prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
-   if (!cwq->should_stop && list_empty(&cwq->worklist))
+   if (!freezing(current) && !cwq->should_stop
+   && list_empty(&cwq->worklist))
schedule();
finish_wait(&cwq->more_work, &wait);
 
+   try_to_freeze();
+
if (cwq_should_stop(cwq))
break;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] zap_other_threads: remove unneeded ->exit_signal change

2007-04-08 Thread Oleg Nesterov
We already depend on fact that all sub-threads have ->exit_signal == -1,
no need to set it in zap_other_threads().

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5-mm4/kernel/signal.c~zat  2007-04-07 20:11:14.0 +0400
+++ 2.6.21-rc5-mm4/kernel/signal.c  2007-04-08 22:09:20.0 +0400
@@ -1163,17 +1163,6 @@ void zap_other_threads(struct task_struc
if (t->exit_state)
continue;
 
-   /*
-* We don't want to notify the parent, since we are
-* killed as part of a thread group due to another
-* thread doing an execve() or similar. So set the
-* exit signal to -1 to allow immediate reaping of
-* the process.  But don't detach the thread group
-* leader.
-*/
-   if (t != p->group_leader)
-   t->exit_signal = -1;
-
/* SIGKILL will be handled before any pending SIGSTOP */
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Theodore Tso wrote:
> The reason why I ignore the tar+gzip tests is that in the past Hans
> has rigged the test by using a tar ball which was generated by
> unpacking a set of kernel sources on a reiser4 filesystem, and then
> repacking them using tar+gzip.  The result was a tar file whose files
> were optimally laid out so that reiser4 could insert them into the
> filesystem b-tree without doing any extra work.
> 
> I can't say for sure whether or not this set of benchmarks has done
> this (there's not enough information describing the benchmark setup),
> but the sad fact of the matter is that people trying to pitch Reiser4
> have generated for themselves a reputation for using rigged
> benchmarks.  Hans's used of a carefully stacked and ordered tar file
> (which is the same as stacking a deck of cards), and your repeated use
> of the bonnee++ benchmarks despite being told that it is a meaningless
> result given the fact that well, zero's compress very well and most
> people are interested in storing a file of all zeros, has caused me to
> look at any benchmarks cited by Reiser4 partisans with a very
> jaundiced and skeptical eye.
> 
> Fortunately for you, it's not up to me whether or not Reiser4 makes it
> into the kernel.  And if it works for you, hey, go wild.  You can
> always patch it into your own kernel and encourage others to do the
> same with respect to getting it tested and adopted.  My personal take
> on it is that Reiser3, Reiser4 and JFS suffer the same problems, which
> is to say they have a very small and limited development community,
> and this was referenced in Novell's decision to drop Reiser3:
> 
> http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-default-fs/
> 
> SuSE has deprecated Reiser3 *and* JFS, and I believe quite strongly it
> is the failure of the organizations to attract a diverse development
> community is ultimately what doomed them in the long term, both in
> terms of support as the kernel migrated and new feature support.  It
> is for that reason that Hans' personality traits that tend to drive
> away those developers who would help them, beyond those that he hires,
> is what has been so self-destructive to Reiser4.  Read the
> announcement Jeff Mahoney from SUSE Labs again; he pointed out was
> that reiser3 was getting dropped even though it performs better than
> ext3 in some scenarios.  There are many other considerations, such as
> a filesystem's robustness in case on-disk corruption, long term
> maintenance as the kernel maintains, availability of developers to
> provide bug fixes, how well the system performs on systems with
> multiple cores/CPU's, etc.

Those are all arguments I've made and still stand by, but I should
address one point that has been repeated fairly often. Novell _isn't_
dropping support for Reiser3 in any of our products. The change only
refers to the choice of a default file system. Most users don't care
about which file system they use, and those that do are still free to
choose reiser3 if they want it. We'll support it and I still have
patches under development to improve it.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGGTHYLPWxlyuTD7IRAj0SAJ4txD5NoStOA4GFgkzcXDdE/Xf9ngCZATNL
QtyNTGbi6YFbNF71T5C9hTA=
=Emwr
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: If not readdir() then what?

2007-04-08 Thread H. Peter Anvin

Christoph Hellwig wrote:

On Sat, Apr 07, 2007 at 04:36:33PM -0400, Theodore Tso wrote:

this functionality, and it is highly questionable how useful it is,
anyway.  If you use telldir/seekdir and keep the cookie for a long
time, even the POSIX-provided guarantees about files that are created
and deleted between the telldir() and seekdir() points in time makes
its utility highly dubious.


It's not going to solve anything at all.  We can't stop supporting
functionality that has been there forever. 


Well, the question is if you can keep the seekdir/telldir cookie around 
as a pointer -- preferrably in userspace, of course.  You would 
presumably garbage-collect them on closedir() -- there is no other point 
at which you could.


I personally suspect that hch is right -- this stuff has been there 
since time immemorial and it'll be hard or impossible to deprecate it.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: need help

2007-04-08 Thread Bill Davidsen

vjn wrote:

in my project i want to code the kernel such that when i plugged my usb it
should ask for password and check it in the kernel space . can anyone help
me


I think the correct solution is to use an excrypted mount, and issue the 
mount command manually with the question in user space. There's no code 
to ask for input, nor anyway to positively decide which connected 
terminal is the terminal to ask.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ten percent test

2007-04-08 Thread Gene Heskett
On Sunday 08 April 2007, Mike Galbraith wrote:
>On Sun, 2007-04-08 at 13:40 +0200, Mike Galbraith wrote:
>> On Sun, 2007-04-08 at 07:33 -0400, Gene Heskett wrote:
>> > That seems to be the killer loading here, building a kernel (make
>> > -j3) doesn't seem to lag it all that bad.  One session of gzip -best
>> > makes it fall plumb over though, which was a disappointment.
>>
>> Can you make a testcase that doesn't require amanda?
>
>Or at least send me a couple of 5 or 10 second top snapshots (which also
>show CPU usage of sleeping tasks) while the system is misbehaving?
>
>   -Mike

With what monitor utility?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"Microsoft technology" -- isn't that an oxymoron? 

   -- Gareth Barnard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >