date:20050222

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Guillaume Thouvenin

On Tue, 2005-02-22 at 12:11 -0800, Jay Lan wrote:
>  How ELSA adds per process accounting data
> to your grouping (banks) when a process exit? How do you save
> accounting data you need in task_struct before it is disposed? BSD
> handles that through acct_process() hook at do_exit(). CSA also
> depends on a hook at do_exit() to merge per-process data to per-job
> data. How does ELSA handle this without a need of a do_exit() hook?

  There are three parts in ELSA. 

  There is a job daemon that does process aggregation. It needs a hook
in the do_fork() routine to be able to manage group of processes. So
this part handles process-aggregation by maintaining a complete picture
of the process/thread hierarchy. 

  You can interact with the job daemon with classical IPC and message
operations. Thus we wrote a second part that is the interface between
the user and the job daemon. Through this interface you can add and
remove a process in/from a group, you can stop the job daemon and you
can dump information in a file about current group of processes. 

  This file (that contains information about group of processes) is used
by ELSA, with the accounting file provided by the accton(8) command and
the BSD accounting, to provide per-group of process accounting. So the
third part of ELSA is a parser and also an analyzer. 

  The architecture of ELSA is as follow (I hope that the ASCII picture
will be readable):

 KERNEL | USER SPACE
|
--- |   --- 
   | 1. Fork connector |  Netlink  | 2. Job Daemon |
   |   |-->|   |
--- |   ---
| ^
| | IPC  -
|  >| 3. Interface|
|   |   (webmin, ...) |---
|   --->| |   |
|  | -|
   |   Per-group of
 Accounting Fileprocesses
 (see accton(8))accounting

You can see how it works on the following web page:
http://elsa.sourceforge.net/sample_session.html
In the session we're using the fork_history.ko which will be replace by
the fork hook connector.

Best regards,
Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Andrew Morton

Kaigai Kohei <[EMAIL PROTECTED]> wrote:
>
>  The common agreement for the method of dealing with process aggregation
>  has not been constructed yet, I understood. And, we will not able to
>  integrate each process aggregation model because of its diverseness.
> 
>  For example, a process which belong to JOB-A must not belong any other
>  'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can 
> concurrently
>  belong to BANK-B1 which is a child of BANK-B.
> 
>  And, there are other defferences:
>  Whether a process not to belong to any process-aggregation is permitted or 
> not ?
>  Whether a process-aggregation should be inherited to child process or not ?
>  (There is possibility not to be inherited in a rule-based process 
> aggregation like CKRM)
> 
>  Some process-aggregation model have own philosophy and implemantation,
>  so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
> handling
>  framework to implement any kinds of process-aggregation.

We really want to avoid doing such stuff in-kernel if at all possible, of
course.

Is it not possible to implement the fork/exec/exit notifications to
userspace so that a daemon can track the process relationships and perform
aggregation based upon individual tasks' accounting?  That's what one of
the accounting systems is proposing doing, I believe.

(In fact, why do we even need the notifications?  /bin/ps can work this
stuff out).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Kaigai Kohei

Hi, Thanks for your comments.
>> I think there are two issues about system accounting framework.
>>
>> Issue: 1) How to define the appropriate unit for accounting ?
>> Current BSD-accountiong make a collection per process accounting
>> information.
>> CSA make additionally a collection per process-aggregation accounting.
>
>
> The 'enhanced acct data collection' patches that were added to
> 2-6-11-rc* tree still do collection of per process data.
Hmm, I have not noticed this extension. But I made sure about it.
The following your two patches implements enhanced data collection, didn't it?
- ChangeLog for 2.6.11-rc1
[PATCH] enhanced I/O accounting data patch
[PATCH] enhanced Memory accounting data collection
Since making a collection per process accounting is unified to the stock kernel,
I want to have a discussion about remaining half, "How to define the appropriate
unit for accounting ?"
We can agree that only per process-accounting is so rigid, I think.
Then, process-aggregation should be provided in one way or another.
[1] Is it necessary 'fork/exec/exit' event handling framework ?
The common agreement for the method of dealing with process aggregation
has not been constructed yet, I understood. And, we will not able to
integrate each process aggregation model because of its diverseness.
For example, a process which belong to JOB-A must not belong any other
'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently
belong to BANK-B1 which is a child of BANK-B.
And, there are other defferences:
Whether a process not to belong to any process-aggregation is permitted or not ?
Whether a process-aggregation should be inherited to child process or not ?
(There is possibility not to be inherited in a rule-based process aggregation 
like CKRM)
Some process-aggregation model have own philosophy and implemantation,
so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
framework to implement any kinds of process-aggregation.
[2] What implementation should be adopted ?
I think registerable hooks on fork/execve/exit is necessary, not only exit() 
hook.
Because a rule or policy based process-aggregation model requirees to catch
the transition of a process status.
It might be enough to hook the exit() event only in process-accounting,
but it's not kind for another customer.
Thus, I recommend SGI's PAGG.
In my understanding, the reason for not to include such a framework is that
increase of unidentifiable (proprietary) modules is worried.
But, SI can divert LSM to implemente process-aggregation if they ignore
the LSM's original purpose, for example.
# I'm strongly opposed to such a movement as a SELinux's user :-)
So, I think such a fork/execve/exit hooks is harmless now.
Is this the time to unify it?
Thanks.
> CSA added those per-process data to per-aggregation ("job") data
> structure at do_exit() time when a process termintes.
>
>>
>> It is appropriate to make the fork-exit event handling framework for
>> definition
>> of the process-aggregation, such as PAGG.
>>
>> This system-accounting per process-aggregation is quite useful,
>> thought I tried the SGI's implementation named 'job' in past days.
>>
>>
>> Issue: 2) What items should be collected for accounting information ?
>> BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
>> minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
>> Integral-VM/RSS, and amount of block-I/O additionally.
>
>
> These data are now collected in 2.6.11-rc* code. Note that these data
> are still per-process.
>
>>
>> I think it's hard to implement the accounting-engine as a kernel loadable
>> module using any kinds of framework. Because, we must put callback
>> functions
>> into all around the kernel for this purpose.
>>
>> Thus, I make a proposion as follows:
>> We should separate the process-aggregation functionality and collecting
>> accounting informations.
>
>
> I totally agree with this! Actually that was what we have done. The data
> collection part of code has been unified.
>
>> Something of framework to implement process-aggregation is necessary.
>> And, making a collection of accounting information should be merged
>> into BSD-accounting and implemented as a part of monolithic kernel
>> as Guillaume said.
>
>
> This sounds good. I am interested in learning how ELSA saves off
> the per-process accounting data before the data got disposed. If
> that scheme works for CSA, we would be very happy to adopt the
> scheme. The current BSD scheme is very insufficient. The code is
> very BSD centric and it provides no way to handle process-aggregation.
>
> Thanks,
>  - jay
>
>>
>> Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Always send siginfo for synchronous signals

2005-02-22 Thread Jeremy Fitzhardinge

Valgrind is critically dependent on getting siginfo with its synchronous
(caused by an instruction fault) signals; if it gets, say, a SIGSEGV
which doesn't have siginfo, it must terminate ASAP because it really
can't make any more progress without knowing what caused the SIGSEGV.

The trouble is that if some other completely unrelated program the user
is running at the time builds up a large queue of pending signals for
some reason (as KDE seems to on SuSE 9.2), it will cause Valgrind to
fail for that user, apparently inexplicably.

It seems to me that the kernel should always deliver siginfo with
synchronous fault signals (SIGSEGV, SIGBUS, SIGFPE, SIGTRAP, SIGILL).
They can't ever be blocked (because they're unconditionally fatal if you
do block them), and therefore never queued.  By definition, the task
which causes the signal is running at the time, and so can be
immediately delivered a siginfo without having to allocate one (except
temporarily).

Proposed patch against 2.6.11-rc4 attached.

J

If we're sending a signal relating to a faulting instruction, then
always generate siginfo for that signal.  

If the user has some unrelated process which has managed to consume
the user's entire allocation of siginfo, then signals will start being
delivered without siginfo.  Some programs absolutely depend on getting
siginfo for signals like SIGSEGV, and get very confused if they see a
SEGV without siginfo.

Such signals cannot be blocked (they're immediately fatal if they
are), and therefore cannot be queued.  There's therefore no risk of
resource starvation.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

Index: local-2.6/kernel/signal.c
===
--- local-2.6.orig/kernel/signal.c	2005-02-22 20:35:30.0 -0800
+++ local-2.6/kernel/signal.c	2005-02-22 20:43:16.0 -0800
@@ -136,6 +136,10 @@ static kmem_cache_t *sigqueue_cachep;
 #define SIG_KERNEL_IGNORE_MASK (\
 M(SIGCONT)   |  M(SIGCHLD)   |  M(SIGWINCH)  |  M(SIGURG))
 
+#define SIG_KERNEL_SYNC_MASK (\
+	M(SIGSEGV)   |  M(SIGBUS)| M(SIGILL) |  M(SIGFPE)| \
+	M(SIGTRAP) )
+
 #define sig_kernel_only(sig) \
 		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_ONLY_MASK))
 #define sig_kernel_coredump(sig) \
@@ -144,6 +148,8 @@ static kmem_cache_t *sigqueue_cachep;
 		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_IGNORE_MASK))
 #define sig_kernel_stop(sig) \
 		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_STOP_MASK))
+#define sig_kernel_sync(sig) \
+		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_SYNC_MASK))
 
 #define sig_user_defined(t, signr) \
 	(((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_DFL) &&	\
@@ -260,11 +266,12 @@ next_signal(struct sigpending *pending, 
 	return sig;
 }
 
-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags)
+static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags, int always)
 {
 	struct sigqueue *q = NULL;
 
-	if (atomic_read(>user->sigpending) <
+	if (always || 
+	atomic_read(>user->sigpending) <
 			t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
 		q = kmem_cache_alloc(sigqueue_cachep, flags);
 	if (q) {
@@ -777,6 +784,7 @@ static int send_signal(int sig, struct s
 {
 	struct sigqueue * q = NULL;
 	int ret = 0;
+	int always;
 
 	/*
 	 * fast-pathed signals for kernel-internal things like SIGSTOP
@@ -785,6 +793,13 @@ static int send_signal(int sig, struct s
 	if ((unsigned long)info == 2)
 		goto out_set;
 
+	/* Always attempt to send siginfo with an unblocked
+	   fault-generated signal. */
+	always = sig_kernel_sync(sig) &&
+		!sigismember(>blocked, sig) &&
+		(unsigned long)info > 2 &&
+		info->si_code > SI_USER;
+
 	/* Real-time signals must be queued if sent by sigqueue, or
 	   some other real-time mechanism.  It is implementation
 	   defined whether kill() does so.  We attempt to do so, on
@@ -793,7 +808,7 @@ static int send_signal(int sig, struct s
 	   make sure at least one signal gets delivered and don't
 	   pass on the info struct.  */
 
-	q = __sigqueue_alloc(t, GFP_ATOMIC);
+	q = __sigqueue_alloc(t, GFP_ATOMIC, always);
 	if (q) {
 		list_add_tail(>list, >list);
 		switch ((unsigned long) info) {
@@ -1316,7 +1331,7 @@ struct sigqueue *sigqueue_alloc(void)
 {
 	struct sigqueue *q;
 
-	if ((q = __sigqueue_alloc(current, GFP_KERNEL)))
+	if ((q = __sigqueue_alloc(current, GFP_KERNEL, 0)))
 		q->flags |= SIGQUEUE_PREALLOC;
 	return(q);
 }

Re: reading the same entropy twice

2005-02-22 Thread Matt Mackall

On Tue, Feb 22, 2005 at 04:55:39PM -0500, Bob O'Neill wrote:
> Hello.
> 
> I have noticed that it is possible on an SMP box for two processes to
> simultaneously read the same entropy out of /dev/urandom.  This
> doesn't seem right to me.  I was using the entropy value to generate a
> random number to use as a session ID, so occasionally there would be a
> collision on session IDs, causing a login failure as session IDs are
> required to be unique.  This issue does not appear to be related to
> entropy depletion.
> 
> Could you provide me with some insight into why this is the case, if
> it is intentional?  It seems like it could be addressed with a
> spinlock.

This should be fixed in current kernels.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Warning of redefined NR_OPEN

2005-02-22 Thread Chris Wright

* Marcel Holtmann ([EMAIL PROTECTED]) wrote:
> when compiling the latest 2.6 tree from the Bitkeeper repository, I get
> a lot of these:
> 
>   CC  init/main.o
> In file included from include/linux/fs.h:202,
>  from include/linux/proc_fs.h:6,
>  from init/main.c:17:
> include/linux/limits.h:4:1: warning: "NR_OPEN" redefined

Yup, although it's been fixed in bk.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin

On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote:

> I just got also reminded that we walk these damn pagetables completely
> twice every exit, once to unmap the VMAs pte mappings, once again to
> zap the page tables.  It might be fruitful to explore combining
> those two steps, perhaps not.
> 

I'm going to have a look at refcounting page table pages, which
will hopefully allow us to get back (and more) the clear_page_range
overhead introduced by the aggressive page table freeing.

It may also allow nice things like dropping file backed page table
mappings if they get reclaimed, and also a single walk to do the
freeing. I haven't looked into details yet though, these are just
vague hopes.

> Anyways, comments and improvment suggestions welcome.  Particularly
> interesting would be if this thing helps a lot on other platforms
> too, such as x86_64, ia64, alpha and ppc64.
> 

I have a feeling it should provide nice benefits to all archs if
we get it into all the walkers. Downsides are few - the bitmap walk
probably only becomes more expensive when all but a handful of
cachelines are present in a page table page.

I'd like to look at ways to make this patch happen with you soon...
First, for 2.6.12 my main concern is to get pt walking consistent,
and try to claw back some of the clear_page_range regression.

Thanks,
Nick

Find local movie times and trailers on Yahoo! Movies.

http://au.movies.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.10-as5

2005-02-22 Thread Andres Salomon

Hi,

Here's 2.6.10-as5.  2.6.10-as4 was never officially announced; it had
issues (note to self; test, *then* tag).  Distributors should note that
there is an ABI/API change in this release, due to
114-netfilter_private_queues.patch changing ipv4 related function args.
Modules that use these will most likely need to be rebuilt.

Lots of security fixes in here; it's probably a good idea to upgrade.
If I'm missing any security related stuff, please let me know.  I have
been travelling, so my apologies to anyone who hasn't gotten a quick
response from me.  I will also be without an internet connection between
Feb 25th and March 5, so don't expect responses between then.

The -as tree is intended to include only security and bugfixes, from
various sources.  I do not include hardware driver updates
(specifically, anything that changes how the hardware registers
themselves are probed/poked), large subsystem updates, cleanups, and so
on; only fixes that will not contain regressions.  The hope is that
vendors/distributors can use this tree as a base for their kernels.  It
is also what I'd want a 2.6.x.y tree to have.

The kernel patches can be grabbed from here:
http://www.acm.cs.rpi.edu/~dilinger/patches/2.6.10/as5/

4c44b02bb9fe6295bb683e364604d74f  ChangeLog
72421ac55f99af28e0bae87b948a241e  linux-2.6.10-as5.tar.gz
1a9c1a7ec584c67a91c307ce8169f164  patch-2.6.10-as5.gz

Changes from 2.6.10-as3:

2005-02-23 02:58:11 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-131

Summary:
  tag 2.6.10-as5
Revision:
  linux--dilinger--0--patch-131




modified files:
 000-extraversion.patch


2005-02-23 01:53:58 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-130

Summary:
  125-netfilter_private_queues_2.patch
Revision:
  linux--dilinger--0--patch-130

[SECURITY] Add missing bits needed to make
114-netfilter_private_queues.patch
compile.  Patch stolen from ubuntu (mainly to keep the same ABI).


new files:
 .arch-ids/125-netfilter_private_queues_2.patch.id
 125-netfilter_private_queues_2.patch


2005-02-22 13:55:01 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-129

Summary:
  124-setsid_tty_sem_missing_header.patch
Revision:
  linux--dilinger--0--patch-129

[SECURITY] 103-setsid_tty_sem_locking_races.patch was missing a
header file,
causing -as4 to not compile.



new files:
 .arch-ids/124-setsid_tty_sem_missing_header.patch.id
 124-setsid_tty_sem_missing_header.patch


2005-02-22 09:14:25 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-128

Summary:
  tag 2.6.10-as4
Revision:
  linux--dilinger--0--patch-128




modified files:
 000-extraversion.patch


2005-02-22 09:11:15 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-127

Summary:
  fix up 123-*.patch
Revision:
  linux--dilinger--0--patch-127

Argh, so late, and of course the last patch doesn't apply.


modified files:
 123-atm_get_addr_signedness_fix.patch


2005-02-22 09:07:49 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-126

Summary:
  123-atm_get_addr_signedness_fix.patch
Revision:
  linux--dilinger--0--patch-126

[SECURITY] Fix atm_get_addr()'s usage of its size arg, by making it
unsigned.  WDYBTGT3-3 on
http://www.guninski.com/where_do_you_want_billg_to_go_today_3.html



new files:
 .arch-ids/123-atm_get_addr_signedness_fix.patch.id
 123-atm_get_addr_signedness_fix.patch


2005-02-22 09:02:49 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-125

Summary:
  122-cpufreq_resume_readd_2.patch
Revision:
  linux--dilinger--0--patch-125

[CPUFREQ] Fix a problem w/ 121-cpufreq_resume_readd.patch, where a
return
value was not being checked correctly.


new files:
 .arch-ids/122-cpufreq_resume_readd_2.patch.id
 122-cpufreq_resume_readd_2.patch


2005-02-22 09:01:53 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-124

Summary:
  121-cpufreq_resume_readd.patch
Revision:
  linux--dilinger--0--patch-124

[CPUFREQ] Somewhere around 2.6.6, a call to cpufreq_driver->resume()
was
accidentally dropped.  Readd it.




new files:
 .arch-ids/121-cpufreq_resume_readd.patch.id
 121-cpufreq_resume_readd.patch


2005-02-22 09:00:49 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-123

Summary:
  120-openpromfs_property_read_fix.patch
Revision:
  linux--dilinger--0--patch-123

Fix an oopsable condition in Openpromfs's property_read().


new files:
 .arch-ids/120-openpromfs_property_read_fix.patch.id
 120-openpromfs_property_read_fix.patch


2005-02-22 08:59:49 GMT Andres Salomon <[EMAIL PROTECTED]>  patch-122

Summary:
  119-i2c_viapro_i2cdump_overflow.patch
Revision:
  linux--dilinger--0--patch-122

[SECURITY] Fix a very hard to exploit buffer overflow in the
i2c-viapro driver.



new

Warning of redefined NR_OPEN

2005-02-22 Thread Marcel Holtmann

Hi,

when compiling the latest 2.6 tree from the Bitkeeper repository, I get
a lot of these:

  CC  init/main.o
In file included from include/linux/fs.h:202,
 from include/linux/proc_fs.h:6,
 from init/main.c:17:
include/linux/limits.h:4:1: warning: "NR_OPEN" redefined
In file included from include/linux/proc_fs.h:6,
 from init/main.c:17:
include/linux/fs.h:24:1: warning: this is the location of the previous 
definition

Maybe the re-order of  to make userland happy was not a good
idea.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread David S. Miller

On Wed, 23 Feb 2005 15:49:30 +1100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> > It's easy to toy with the sparc64 optimization on other platforms,
> > just add the necessary hacks to pmd_set and pgd_set, allocation
> > of pmd and pgd tables
> 
> David: just an implementation detail that I had meant to bring
> up earlier - would it feel like less of a hack to put these in
> pmd_populate and pgd_populate?

Sure, no problem.  They get defined to pmd_set/pgd_set calls
anyways.  But wouldn't that miss pgd_clear() and pmd_clear()?
Someone may find it worthwhile to, on a *_clear(), to see if
a set bit can now be clear because all the neighboring entries
are empty as well.

That might have been the reason I put it there, but I may be
giving myself too much credit :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

IA32 (2.6.11-rc4 - 2005-02-22.16.00) - 2 New warnings

2005-02-22 Thread John Cherry

include/linux/fs.h:24:1: warning: this is the location of the previous 
definition
include/linux/limits.h:4:1: warning: "NR_OPEN" redefined
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin

On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote:
> On Wed, 23 Feb 2005 02:06:28 + (GMT)
> Hugh Dickins <[EMAIL PROTECTED]> wrote:
> 
> > I've not seen Dave's bitmap walking functions (for clearing?),
> > would they fit in better with my way?
> 

Hugh: I'll have more of a look through your patch when I get
some time... to be honest I'm not too worried either way, so
long as one or the other gets in.

Very trivial point, but I'm not sure that I like the name
p?d_limit... maybe p?d_span or _span_end... hmm, they're not
really pleasing either.

You _are_ repeating a bit of mindless loop accounting in every
page table walk, and it isn't completely clear to me that it is
giving you much more flexibility (than for_each_*). But my loops
_are_ a bit contorted.

> This is what Nick is referring to:
> 

[snip]

> It's easy to toy with the sparc64 optimization on other platforms,
> just add the necessary hacks to pmd_set and pgd_set, allocation
> of pmd and pgd tables

David: just an implementation detail that I had meant to bring
up earlier - would it feel like less of a hack to put these in
pmd_populate and pgd_populate?

Nick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread David S. Miller

On Wed, 23 Feb 2005 02:06:28 + (GMT)
Hugh Dickins <[EMAIL PROTECTED]> wrote:

> I've not seen Dave's bitmap walking functions (for clearing?),
> would they fit in better with my way?

This is what Nick is referring to:



I hacked up something slightly different today.  I only
have it being used by clear_page_range() but it is extremely
effective.

Things like fork+exit latencies on my 750Mhz sparc64 box went
from ~490 microseconds to ~367 microseconds.  fork+execve
latency went down from ~1595 microseconds to ~1351 microseconds.

Two issues:

1) I'm not terribly satisfied with the interface.  I think
   with some improvements it can be applies to the two other
   routines this thing really makes sense for, namely copy_page_range
   and unmap_page_range

2) I don't think it will collapse well for 2-level page tables,
   someone take a look?

It's easy to toy with the sparc64 optimization on other platforms,
just add the necessary hacks to pmd_set and pgd_set, allocation
of pmd and pgd tables, use "PAGE_SHIFT - 5" instead of "PAGE_SHIFT - 6"
on 32-bit platforms, and then copy the asm-sparc64/pgwalk.h bits over
into your platforms asm-${ARCH}/pgwalk.h

I just got also reminded that we walk these damn pagetables completely
twice every exit, once to unmap the VMAs pte mappings, once again to
zap the page tables.  It might be fruitful to explore combining
those two steps, perhaps not.

Anyways, comments and improvment suggestions welcome.  Particularly
interesting would be if this thing helps a lot on other platforms
too, such as x86_64, ia64, alpha and ppc64.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/08/10 23:44:24-07:00 [EMAIL PROTECTED] 
#   [MM]: Add arch-overridable page table walking machinery.
#   
#   Currently very rudimentary but is used fully for
#   clear_page_range().  An optimized implementation
#   is there for sparc64 and it is extremely effective
#   particularly for 64-bit processes.
#   
#   For things like lat_fork and friends clear_page_tables()
#   use to be 2nd or 3rd in the kernel profile, now it has
#   dropped to the 20th or so entry.
#   
#   Signed-off-by: David S. Miller 
# 
# mm/memory.c
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -26
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgtable.h
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +28 -4
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgalloc.h
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -2
#   [MM]: Add arch-overridable page table walking machinery.
# 
# arch/sparc64/mm/init.c
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +2 -2
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-x86_64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-v850/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-um/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +114 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sh64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sh/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-s390/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ppc64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ppc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-parisc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-mips/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-m68knommu/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-m68k/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ia64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
#

Re: [PATCH] TCP-Hybla proposal

2005-02-22 Thread Stephen Hemminger

On Tue, 22 Feb 2005 10:14:47 -0800, David S. Miller <[EMAIL PROTECTED]> wrote:
On Tue, 22 Feb 2005 13:03:11 -0500 (EST)
John Heffner <[EMAIL PROTECTED]> wrote:

> An idea I've been toying with for a while now is completely abstracting
> congestion control.  Then you could have congestion control loadable
> modules, which would avoid this mess of experimental algorithms inside the
> main-line kernel.  If done right, they might be able to work seamlessly
> with SCTP, too.  The tricky part is making sure the interface is complete
> enough.
There might be a noticeable performance impact to making it truly 
modular. Calling a function in a module is slower. In some tests, I see 
a 5 to 10% drop in performance when Ethernet driver is a module versus 
builtin.

You might want to look at how the I/O schedulers are configured as an 
example.

The symbols exported to allow this would need to be EXPORT_SYMBOL_GPL().

Why's that?
Because the kernel developers who hold the collective copyright on the 
existing GPL TCP code do not want some vendor producing a closed source 
binary module of "enhanced TCP".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch 0/6] Bind Mount Extensions 0.06

2005-02-22 Thread Matt Mackall

On Tue, Feb 22, 2005 at 07:51:02PM -0800, Andrew Morton wrote:
> Matt Mackall <[EMAIL PROTECTED]> wrote:
> >
> >  Please give each patch a unique, descriptive subject.
> 
> yup.
> 
> > Summarizing what
> >  each patch is doing in your 0/n so that reviewers can focus on the
> >  bits that are interesting is also helpful.
> 
> Actually, that's fairly irritating, because the 0/n contains useful info
> which someone has to go and massage and copy into 1/n.

Certainly, there should be nothing in the summary that isn't already
in the patch itself. What I'm suggesting is more a table of contents:

1/6 move foo to bar so that we can then remove baz
2/6 kill references to baz
...

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] TCP-Hybla proposal

2005-02-22 Thread Matt Mackall

On Tue, Feb 22, 2005 at 03:34:42PM +0100, Daniele Lacamera wrote:
> Hi
> This is the official patch to implement TCP Hybla congestion avoidance.
> 
> - "In heterogeneous networks, TCP connections that incorporate a 
> terrestrial or satellite radio link are greatly disadvantaged with 
> respect to entirely wired connections, because of their longer round 
> trip times (RTTs). To cope with this problem, a new TCP proposal, the 
> TCP Hybla, is presented and discussed in the paper[1]. It stems from an 
> analytical evaluation of the congestion window dynamics in the TCP 
> standard versions (Tahoe, Reno, NewReno), which suggests the necessary 
> modifications to remove the performance dependence on RTT.[...]"[1]
> 
> [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for 
> heterogeneous networks", 
> International Journal of Satellite Communications and Networking
> Volume 22, Issue 5 , Pages 547 - 566. September 2004.

It's disappointing that this paper appears to be available only
through subscription sources. If I'm mistaken, please post a URL. 

By comparison, papers on Reno, Vegas, Westwood, BicTCP, not to mention
just about every other contribution to the field of open Internet
protocols has been readily available on the net since the birth of FTP
servers.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

msgsnd in module

2005-02-22 Thread Vijayalakshmi Hadimani


Hi,
   I am inserting a module(device driver) using insmod. 
I want to send a message from this module to an user process.
For this I used msgsnd with buffer in the call as a local 
variable.  I am getting an error "EFAULT" for this call. 
However this did not happen when I made the driver code as a
part of kernel and not as a module.  Any idea about what could
be the problem and how to solve it?

TIA

Regards,
Vijayalakshmi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch 0/6] Bind Mount Extensions 0.06

2005-02-22 Thread Andrew Morton

Matt Mackall <[EMAIL PROTECTED]> wrote:
>
>  Please give each patch a unique, descriptive subject.

yup.

> Summarizing what
>  each patch is doing in your 0/n so that reviewers can focus on the
>  bits that are interesting is also helpful.

Actually, that's fairly irritating, because the 0/n contains useful info
which someone has to go and massage and copy into 1/n.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Horst von Brand

Chris Friesen <[EMAIL PROTECTED]> said:
> Horst von Brand wrote:
> > Anthony DiSante <[EMAIL PROTECTED]> said:

> >>That's one of the things I asked a few messages ago.  Some people on
> >>the list were saying that it'd be "really hard" and would "require a
> >>lot of bookkeeping" to "fix" permanently-D-stated processes... which is
> >>completely different than "impossible."

> > Most people here have little clue. It can't be done.

> I realize it would be extremely difficult if not impossible to do in the 
> current linux architecture, but I find it hard to believe that it is 
> technically impossible if one were allowed to design the system from 
> scratch.

It is hard (if not impossible) to find out /what/ is broken (and how) and
fix it automatically. As you were told, D means the process is waiting for
some event. That event /might/ happen sometime (waiting for slow hardware)
or never (kernel programming error, hardware forgot the operation in
progress, ...).  So you might fake it out by making believe the event did
happen. What if was just delayed, and /does/ then happen with nobody
waiting?

Any such is just papering over the problems, and is /massive/ complexity
for no real gain.

> Maybe I'm on crack, but would it not be technically possible to have all 
> resource usage be tracked so that when a task tries to do something and 
> hangs, eventually it gets cleaned up?

Sure. But there is /no way/ to know if the task will ever do something
(Turing's undecibility sees to that, even with perfect hardware), so the
only chance is to wait and see if the task releases it by itself. If you
just want to axe the task, you'd have to know beforehand what it will do
(and do it for the task on killing it). But the /task/ couldn't do it, what
guarantees the cleanup can?

> We already handle cleaning up stuff for userspace (memory, file 
> descriptors, sockets, etc.).

On process end, i.e., when we know the stuff won't be used anymore. If the
program is stuck, kill it and go as before. If it doesn't go away cleanly,
something is /seriously/ wrong... and it is anybody's guess what.

>  Why not enforce a design that says "all 
> entities taking a lock must specify a maximum hold time".

It is hard enough to program without such restrictions. This would
incidentally also mean that the kernel has to be hard real time,
always. The usual PC hardware just isn't up to that, for starters.

And what would you do if you have nested locks, and the outer one times
out? Must kill the inner one beforehand... more complexity still.

>After that 
> time expires, they are assumed to be hung, and all their resources 
> (which were being tracked by some system) get cleaned up.

> It would probably be complicated, slow, and generally not worth the 
> effort.  But it seems at least technically possible.

If the system takes all extant resources for managing said resources, it is
somewhat pointless...
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Duplicate definition on NR_OPEN!

2005-02-22 Thread Horst von Brand

It is defined in:

include/linux/limits.h:#define NR_OPEN  1024
include/linux/fs.h:#define NR_OPEN (1024*1024)  /* Absolute upper limit on fd 
num */

One is surely wrong?
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ppc32: kernel mapping breakage

2005-02-22 Thread Benjamin Herrenschmidt

Hi !

Christoph Lameter's patch that change page allocators to use GFP_ZERO
broke ppc32 in a subtle way. Our allocator is designed to work before
mem_init_done, in which cases it uses a ppc specific early_get_page()
which doesn't return zeroed pages. However, he removed the call to
clear_page() unconditionally, thus causing the kernel initial page
tables to have random data in them.

They are initialized with set_pte, which means it's _mostly_ harmless,
except that set_pte on ppc32 preserves the _PAGE_HASHPTE bit, thus we
end up with random bits there, which can cause issues with further
manipulation of the kernel page tables and will slow down all hash
faults to them causing unnecessary searches.

Please apply in 2.6.11 if still possible ...

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Index: linux-work/arch/ppc/mm/pgtable.c
===
--- linux-work.orig/arch/ppc/mm/pgtable.c   2005-01-24 17:09:23.0 
+1100
+++ linux-work/arch/ppc/mm/pgtable.c2005-02-23 14:31:41.0 +1100
@@ -107,8 +107,11 @@
ptepage->mapping = (void *) mm;
ptepage->index = address & PMD_MASK;
}
-   } else
+   } else {
pte = (pte_t *)early_get_page();
+   if (pte)
+   clear_page(pte);
+   }
return pte;
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch 0/6] Bind Mount Extensions 0.06

2005-02-22 Thread Matt Mackall

On Tue, Feb 22, 2005 at 01:09:55PM +0100, Herbert Poetzl wrote:
> 
> Hi Andrew! Al! Folks!
> 
> The following set of patches extends the per device 
> 'noatime', 'nodiratime' and last but not least the 
> 'ro' (read only) mount option to the vfs --bind mounts, 
> allowing them to behave like any other mount, by 
> honoring those mount flags (which are silently ignored 
> by the current implementation in 2.4.x and 2.6.x) 

Please give each patch a unique, descriptive subject. Summarizing what
each patch is doing in your 0/n so that reviewers can focus on the
bits that are interesting is also helpful.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Re: AHCI oops

2005-02-22 Thread Jeff Garzik

Can you try this patch?
If it fixes the oops, I'll forward upstream ASAP.
Jeff

= drivers/scsi/ahci.c 1.14 vs edited =
--- 1.14/drivers/scsi/ahci.c2005-02-13 19:58:01 -05:00
+++ edited/drivers/scsi/ahci.c  2005-02-22 21:46:25 -05:00
@@ -179,6 +179,7 @@
 static void ahci_host_stop(struct ata_host_set *host_set);
 static void ahci_qc_prep(struct ata_queued_cmd *qc);
 static u8 ahci_check_status(struct ata_port *ap);
+static u8 ahci_check_err(struct ata_port *ap);
 static inline int ahci_host_intr(struct ata_port *ap, struct ata_queued_cmd 
*qc);
 
 static Scsi_Host_Template ahci_sht = {
@@ -204,6 +205,8 @@
.port_disable   = ata_port_disable,
 
.check_status   = ahci_check_status,
+   .check_altstatus= ahci_check_status,
+   .check_err  = ahci_check_err,
.dev_select = ata_noop_dev_select,
 
.phy_reset  = ahci_phy_reset,
@@ -450,6 +453,13 @@
void *mmio = (void *) ap->ioaddr.cmd_addr;
 
return readl(mmio + PORT_TFDATA) & 0xFF;
+}
+
+static u8 ahci_check_err(struct ata_port *ap)
+{
+   void *mmio = (void *) ap->ioaddr.cmd_addr;
+
+   return (readl(mmio + PORT_TFDATA) >> 8) & 0xFF;
 }
 
 static void ahci_fill_sg(struct ata_queued_cmd *qc)
= drivers/scsi/libata-core.c 1.120 vs edited =
--- 1.120/drivers/scsi/libata-core.c2005-02-22 21:19:40 -05:00
+++ edited/drivers/scsi/libata-core.c   2005-02-22 21:45:16 -05:00
@@ -377,7 +377,7 @@
 }
 
 /**
- * ata_check_status - Read device status reg & clear interrupt
+ * ata_check_status_pio - Read device status reg & clear interrupt
  * @ap: port where the device is
  *
  * Reads ATA taskfile status register for currently-selected device
@@ -415,6 +415,27 @@
return ata_check_status_pio(ap);
 }
 
+u8 ata_altstatus(struct ata_port *ap)
+{
+   if (ap->ops->check_altstatus)
+   return ap->ops->check_altstatus(ap);
+
+   if (ap->flags & ATA_FLAG_MMIO)
+   return readb((void __iomem *)ap->ioaddr.altstatus_addr);
+   return inb(ap->ioaddr.altstatus_addr);
+}
+
+u8 ata_chk_err(struct ata_port *ap)
+{
+   if (ap->ops->check_err)
+   return ap->ops->check_err(ap);
+
+   if (ap->flags & ATA_FLAG_MMIO) {
+   return readb((void __iomem *) ap->ioaddr.error_addr);
+   }
+   return inb(ap->ioaddr.error_addr);
+}
+
 /**
  * ata_tf_to_fis - Convert ATA taskfile to SATA FIS structure
  * @tf: Taskfile to convert
@@ -1161,7 +1182,6 @@
printk(KERN_WARNING "ata%u: dev %u not supported, ignoring\n",
   ap->id, device);
 err_out:
-   ata_irq_on(ap); /* re-enable interrupts */
dev->class++;   /* converts ATA_DEV_xxx into ATA_DEV_xxx_UNSUP */
DPRINTK("EXIT, err\n");
 }
@@ -1669,7 +1689,8 @@
ata_dev_try_classify(ap, 1);
 
/* re-enable interrupts */
-   ata_irq_on(ap);
+   if (ap->ioaddr.ctl_addr)/* FIXME: hack. create a hook instead */
+   ata_irq_on(ap);
 
/* is double-select really necessary? */
if (ap->device[1].class != ATA_DEV_NONE)
@@ -3972,6 +3993,8 @@
 EXPORT_SYMBOL_GPL(ata_tf_to_fis);
 EXPORT_SYMBOL_GPL(ata_tf_from_fis);
 EXPORT_SYMBOL_GPL(ata_check_status);
+EXPORT_SYMBOL_GPL(ata_altstatus);
+EXPORT_SYMBOL_GPL(ata_chk_err);
 EXPORT_SYMBOL_GPL(ata_exec_command);
 EXPORT_SYMBOL_GPL(ata_port_start);
 EXPORT_SYMBOL_GPL(ata_port_stop);
= include/linux/libata.h 1.64 vs edited =
--- 1.64/include/linux/libata.h 2005-02-17 19:29:16 -05:00
+++ edited/include/linux/libata.h   2005-02-22 21:37:02 -05:00
@@ -334,6 +334,8 @@
 
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
u8   (*check_status)(struct ata_port *ap);
+   u8   (*check_altstatus)(struct ata_port *ap);
+   u8   (*check_err)(struct ata_port *ap);
void (*dev_select)(struct ata_port *ap, unsigned int device);
 
void (*phy_reset) (struct ata_port *ap);
@@ -403,6 +405,8 @@
 extern void ata_noop_dev_select (struct ata_port *ap, unsigned int device);
 extern void ata_std_dev_select (struct ata_port *ap, unsigned int device);
 extern u8 ata_check_status(struct ata_port *ap);
+extern u8 ata_altstatus(struct ata_port *ap);
+extern u8 ata_chk_err(struct ata_port *ap);
 extern void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf);
 extern int ata_port_start (struct ata_port *ap);
 extern void ata_port_stop (struct ata_port *ap);
@@ -457,24 +461,9 @@
(dev->class == ATA_DEV_ATAPI));
 }
 
-static inline u8 ata_chk_err(struct ata_port *ap)
-{
-   if (ap->flags & ATA_FLAG_MMIO) {
-   return readb((void __iomem *) ap->ioaddr.error_addr);
-   }
-   return inb(ap->ioaddr.error_addr);
-}
-
 static inline u8 ata_chk_status(struct ata_port *ap)
 {
return ap->ops->check_status(ap);
-}
-
-static inline u8 ata_altstatus(struct ata_port *ap)
-{
-   if (ap->flags &

loading driver automatically & manually

2005-02-22 Thread Anil Kumar

Hi,

I am trying to install RHEL 4, 2.6.9-5.EL.  I have adaptec 39320
controller, The install CD already has aic79xx driver in it.  The
driver does NOT load for some reason. If I take the same aic79xx
driver source, Create an img  and install RHEL4 using linux dd, it
works fine.

Can you please let me know, what files/lookup tables does the OS look
into for loading a driver for 2.6?

Also can you please point me to steps on to how to replace an existing
driver(I suspect the default driver in RHEL4 CD may be wrong)  in the
RHEL4 install CD with my own aic79xx built driver?

with regards,
   Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-02-22 Thread Lee Revell

On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
> > 
> > could you send me the full trace?
> 

On my other machine this 333us trace is the longest latency reported in
the first few minutes with PREEMPT_DESKTOP.  It seems to be a regression
from earlier versions.  If I read the trace right copy_pte_range is the
problem.

Lee

preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02

 latency: 333 µs, #63/63, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
-
| task: XFree86-2593 (uid:0 nice:0 policy:0 rt_prio:0)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq 
   ||| / _--=> preempt-depth   
    /  
   | delay 
   cmd pid | time  |   caller  
  \   /|   \   |   /   
(T1/#0) dpkg  4362 0 5 0006  [380181315825] 0.000ms 
(+3550398.796ms): <676b7064> (<00746500>)
(T1/#2) dpkg  4362 0 5 0006 0002 [380181316227] 0.000ms 
(+0.000ms): __trace_start_sched_wakeup+0x96/0xc0  
(try_to_wake_up+0x81/0x150 )
(T1/#3) dpkg  4362 0 5 0004 0003 [380181316766] 0.001ms 
(+0.001ms): wake_up_state+0x1e/0x30  (signal_wake_up+0x2d/0x30 
)
(T1/#4) dpkg  4362 0 5  0004 [380181317637] 0.003ms 
(+0.000ms): __wake_up+0xe/0x70  (mousedev_event+0xd8/0x140 )
(T1/#5) dpkg  4362 0 5 0001 0005 [380181318080] 0.003ms 
(+0.001ms): __wake_up_common+0xb/0x70  (__wake_up+0x3b/0x70 
)
(T1/#6) dpkg  4362 0 5  0006 [380181318983] 0.005ms 
(+0.002ms): usb_submit_urb+0xe/0x2c0  (hid_irq_in+0x4e/0xe0 
)
(T1/#7) dpkg  4362 0 5  0007 [380181320688] 0.008ms 
(+0.001ms): hcd_submit_urb+0xe/0x200  (usb_submit_urb+0x1c6/0x2c0 
)
(T1/#8) dpkg  4362 0 5 0001 0008 [380181321463] 0.009ms 
(+0.000ms): usb_get_dev+0x9/0x30  (hcd_submit_urb+0x1a9/0x200 
)
(T1/#9) dpkg  4362 0 5 0001 0009 [380181321943] 0.010ms 
(+0.000ms): get_device+0x8/0x30  (usb_get_dev+0x19/0x30 )
(T1/#10) dpkg  4362 0 5 0001 000a [380181322283] 
0.010ms (+0.000ms): kobject_get+0x9/0x30  (get_device+0x1a/0x30 
)
(T1/#11) dpkg  4362 0 5 0001 000b [380181322691] 
0.011ms (+0.001ms): kref_get+0x9/0x60  (kobject_get+0x19/0x30 
)
(T1/#12) dpkg  4362 0 5  000c [380181323295] 
0.012ms (+0.000ms): usb_get_urb+0x9/0x20  (hcd_submit_urb+0xc6/0x200 
)
(T1/#13) dpkg  4362 0 5  000d [380181323566] 
0.012ms (+0.001ms): kref_get+0x9/0x60  (usb_get_urb+0x16/0x20 
)
(T1/#14) dpkg  4362 0 5  000e [380181324216] 
0.013ms (+0.000ms): uhci_urb_enqueue+0xe/0x290  
(hcd_submit_urb+0x123/0x200 )
(T1/#15) dpkg  4362 0 5 0001 000f [380181324743] 
0.014ms (+0.000ms): uhci_find_urb_ep+0xe/0xb0  
(uhci_urb_enqueue+0x7a/0x290 )
(T1/#16) dpkg  4362 0 5 0001 0010 [380181325251] 
0.015ms (+0.000ms): uhci_alloc_urb_priv+0xb/0x80  
(uhci_urb_enqueue+0x87/0x290 )
(T1/#17) dpkg  4362 0 5 0001 0011 [380181325582] 
0.016ms (+0.001ms): kmem_cache_alloc+0xb/0x70  
(uhci_alloc_urb_priv+0x1c/0x80 )
(T1/#18) dpkg  4362 0 5 0001 0012 [380181326332] 
0.017ms (+0.000ms): usb_check_bandwidth+0xc/0x140  
(uhci_urb_enqueue+0x200/0x290 )
(T1/#19) dpkg  4362 0 5 0001 0013 [380181326926] 
0.018ms (+0.001ms): usb_calc_bus_time+0x9/0x270  
(usb_check_bandwidth+0x6b/0x140 )
(T1/#20) dpkg  4362 0 5 0001 0014 [380181327893] 
0.020ms (+0.001ms): uhci_submit_common+0xe/0x380  
(uhci_urb_enqueue+0x239/0x290 )
(T1/#21) dpkg  4362 0 5 0001 0015 [380181328984] 
0.021ms (+0.001ms): uhci_alloc_td+0xb/0x80  
(uhci_submit_common+0xf0/0x380 )
(T1/#22) dpkg  4362 0 5 0001 0016 [380181329685] 
0.023ms (+0.002ms): dma_pool_alloc+0xe/0x1a0  
(uhci_alloc_td+0x20/0x80 )
(T1/#23) dpkg  4362 0 5 0001 0017 [380181331207] 
0.025ms (+0.000ms): usb_get_dev+0x9/0x30  (uhci_alloc_td+0x69/0x80 
)
(T1/#24) dpkg  4362 0 5 0001 0018 [380181331544] 
0.026ms (+0.000ms): get_device+0x8/0x30  (usb_get_dev+0x19/0x30 
)
(T1/#25) dpkg  4362 0 5 0001 0019 [380181331882] 
0.026ms (+0.000ms): kobject_get+0x9/0x30  (get_device+0x1a/0x30 
)
(T1/#26) dpkg  4362 0 5 0001 001a [380181332215] 
0.027ms (+0.000ms):

Re: how to detect the n/w driver name at user level.

2005-02-22 Thread Herbert Xu

Subbu <[EMAIL PROTECTED]> wrote:
> 
> Is there any way that i could get the driver name at user lever other than
> polling for it..??

ethtool -i eth1

should tell you the driver name for eth1.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.11-rc4 i386] Re-order includes to fix userland breakage

2005-02-22 Thread Linus Torvalds



On Tue, 22 Feb 2005, Chris Wright wrote:
> 
> This change is spewing warnings like:

Already fixed in BK.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Problems with dma_mmap_writecombine on mach-pxa

2005-02-22 Thread Frank Buss

Russell King <[EMAIL PROTECTED]> wrote:

> Since we map the whole lot in one go, if you get one page, there's no
> reason why you shouldn't get the lot.  This is why I'm wondering if
> it has something to do with your other modifications.

my colleage has found the bug: in the function dma_mmap in 
arch/arm/mm/consistent.c the call to remap_pfn_range uses 
user_size in PAGE_SIZE units, but looks like it is expected 
in bytes. When using (user_size << PAGE_SHIFT), it works.

I don't know, where to fix it: Should the lower level calls 
get the size in bytes (most function arguments in Linux 
kernel sources are not commented), this means fixing the 
dma_mmap, or should PAGE_SIZE be used, then the lower level 
functions needs to be fixed.

-- 
Frank Buß, [EMAIL PROTECTED]
http://www.frank-buss.de, http://www.it4-systems.de

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.11-rc4 i386] Re-order includes to fix userland breakage

2005-02-22 Thread Chris Wright

* Tom Rini ([EMAIL PROTECTED]) wrote:
> The following moves all includes  (except 
> and  down to below the existing __KERNEL__ test.  None
> of these includes are needed by the user-visible portions of the header,
> and in some cases can cause userland apps to break.  For example, LTP
> and sash with an empty  will fail thusly:
> cc -Wall  -I../../include -g -Wall -I../../../../include -Wall
> setrlimit02.c -L../../../../lib -lltp  -o setrlimit02
> In file included from /usr/include/asm/atomic.h:6,
>  from /usr/include/linux/fs.h:20,
>  from setrlimit02.c:46:
> /usr/include/asm/processor.h:68: error: `CONFIG_X86_L1_CACHE_SHIFT' 
> undeclared here (not in a function)
> /usr/include/asm/processor.h:68: error: requested alignment is not a constant

This change is spewing warnings like:

  CC  init/main.o
In file included from include/linux/fs.h:202,
 from include/linux/proc_fs.h:6,
 from init/main.c:17:
include/linux/limits.h:4:1: warning: "NR_OPEN" redefined
In file included from include/linux/proc_fs.h:6,
 from init/main.c:17:
include/linux/fs.h:24:1: warning: this is the location of the previous 
definition
  CC  init/do_mounts.o
In file included from include/linux/fs.h:202,
 from include/linux/tty.h:20,
 from init/do_mounts.c:5:
include/linux/limits.h:4:1: warning: "NR_OPEN" redefined
In file included from include/linux/tty.h:20,
 from init/do_mounts.c:5:
include/linux/fs.h:24:1: warning: this is the location of the previous 
definition

Move limits.h include back above __KERNEL__ to quiet things back down.

Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

= include/linux/fs.h 1.377 vs edited =
--- 1.377/include/linux/fs.h2005-02-22 13:44:27 -08:00
+++ edited/include/linux/fs.h   2005-02-22 17:26:03 -08:00
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -199,7 +200,6 @@ extern int dir_notify_enable;
 #ifdef __KERNEL__
 
 #include 
-#include 
 #include 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[no subject]

2005-02-22 Thread Frank Buss

Russell King <[EMAIL PROTECTED]> wrote:

> Since we map the whole lot in one go, if you get one page, there's no
> reason why you shouldn't get the lot.  This is why I'm wondering if
> it has something to do with your other modifications.

my colleage has found the bug: in the function dma_mmap in
arch/arm/mm/consistent.c the call to remap_pfn_range uses user_size in
PAGE_SIZE units, but looks like it is expected in bytes. When using
(user_size << PAGE_SHIFT), it works.

I don't know, where to fix it: Should the lower level calls get the size in
bytes (most function arguments in Linux kernel sources are not commented),
this means fixing the dma_mmap, or should PAGE_SIZE be used, then the lower
level functions needs to be fixed.

-- 
Frank Buß, [EMAIL PROTECTED]
http://www.frank-buss.de, http://www.it4-systems.de

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rft/update] r8169 changes in 2.6.x

2005-02-22 Thread Jeff Garzik

On Tue, Feb 22, 2005 at 05:29:35PM -0800, Andrew Morton wrote:
> Francois Romieu <[EMAIL PROTECTED]> wrote:
> >
> > Patch against 2.6.10-rc4:
> > - 
> > http://www.fr.zoreil.com/~romieu/misc/20050222-2.6.11-rc4-r8169.c-test.patch
> 
> There are already a bunch of r8169 patches in Jeff's tree.  The combination
> isn't pretty:

And ~5 more once I run through the 80+ patches left in my
pending-patches folder.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rft/update] r8169 changes in 2.6.x

2005-02-22 Thread Andrew Morton

Francois Romieu <[EMAIL PROTECTED]> wrote:
>
> Patch against 2.6.10-rc4:
> - http://www.fr.zoreil.com/~romieu/misc/20050222-2.6.11-rc4-r8169.c-test.patch

There are already a bunch of r8169 patches in Jeff's tree.  The combination
isn't pretty:


patching file drivers/net/r8169.c
Hunk #1 FAILED at 41.
Hunk #2 FAILED at 69.
Hunk #3 FAILED at 79.
Hunk #4 succeeded at 114 (offset 7 lines).
Hunk #5 FAILED at 161.
Hunk #6 FAILED at 175.
Hunk #7 FAILED at 224.
Hunk #8 succeeded at 231 (offset 3 lines).
Hunk #9 succeeded at 248 (offset 7 lines).
Hunk #10 succeeded at 268 (offset 3 lines).
Hunk #11 succeeded at 294 (offset 7 lines).
Hunk #12 succeeded at 300 (offset 3 lines).
Hunk #13 succeeded at 391 (offset 7 lines).
Hunk #14 FAILED at 424.
Hunk #15 succeeded at 468 (offset 3 lines).
Hunk #16 succeeded at 487 (offset 7 lines).
Hunk #17 succeeded at 500 with fuzz 1 (offset 10 lines).
Hunk #18 FAILED at 732.
Hunk #19 FAILED at 767.
Hunk #20 FAILED at 1036.
Hunk #21 succeeded at 1053 with fuzz 2 (offset 19 lines).
Hunk #22 FAILED at 1093.
Hunk #23 FAILED at 1128.
Hunk #24 FAILED at 1140.
Hunk #25 succeeded at 1178 (offset 10 lines).
Hunk #26 succeeded at 1198 with fuzz 1 (offset 19 lines).
Hunk #27 succeeded at 1213 (offset 10 lines).
Hunk #28 succeeded at 1259 (offset 19 lines).
Hunk #29 succeeded at 1261 with fuzz 2 (offset 13 lines).
Hunk #30 succeeded at 1368 (offset 19 lines).
Hunk #31 FAILED at 1576.
Hunk #32 succeeded at 1600 (offset 13 lines).
Hunk #33 FAILED at 1615.
Hunk #34 succeeded at 1683 (offset 26 lines).
Hunk #35 succeeded at 1717 (offset 13 lines).
Hunk #36 FAILED at 1854.
Hunk #37 succeeded at 2042 (offset 24 lines).
Hunk #38 FAILED at 2152.
Hunk #39 succeeded at 2160 (offset 13 lines).
Hunk #40 succeeded at 2187 (offset 24 lines).
Hunk #41 succeeded at 2245 (offset 13 lines).
Hunk #42 FAILED at 2270.
Hunk #43 succeeded at 2293 with fuzz 2 (offset 24 lines).
Hunk #44 succeeded at 2314 (offset 13 lines).
Hunk #45 FAILED at 2349.
Hunk #46 succeeded at 2387 (offset 30 lines).
Hunk #47 succeeded at 2378 (offset 13 lines).
Hunk #48 FAILED at 2391.
Hunk #49 succeeded at 2445 with fuzz 2 (offset 31 lines).
20 out of 49 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Bodo Eggert

linux-os <[EMAIL PROTECTED]> wrote:

> You don't seem to understand. A process that's stuck in 'D' state
> shows a SEVERE error, usually with a hardware driver.

Or a network filesystem mount to a no longer existing server or share.

> For instance, 
> somebody may have coded something in a critical section that will
> wait forever for some bit to be set when, in fact, that bit may
> never be set because of a hardware glitch. Such problems must
> be found. One can't just suck some process out of the 'D' state.

But you can easily fall into one, e.g. by mounting a SMB share to ~/mnt,
working until after the windows box breaks down and trying to save the
work of the last hour (which involves enumerating and stat()ing all
entries in ~).

> The 'D' state usually stands for 'Down' where a task
> was 'down()' on a semaphore. To get out of that state,
> that task (and none other) needs to execute `up()`.
> This means that whatever that task was waiting for
> needs to happen or it won't call 'up()'.

Maybe the device/mountpoint causing the processes to hang can be declared
dead (This is the more important part to me) and/or the syscall can be
forced to fail. If it involves wasting some MB of RAM for copying all
possibly affected memory in order to avoid corrupting used RAM, that
will be the price to pay for not losing your data.

How to clean up the stuck processes: (This requires a MMU)
Add an error path to each syscall (or create some generic error paths) and
keep the original stack frame. On errors, you can "longjump" (not exactly,
but similar) to the error path after copying the memory. The semaphore will
not be taken, and the code depending on the semaphore will not be executed.

BTW: Your Reply-To: should be omited if it's equal to the From:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Dave Hansen

On Tue, 2005-02-22 at 16:38 -0500, Jes Sorensen wrote:
> > "Dave" == Dave Hansen <[EMAIL PROTECTED]> writes:
> 
> Dave> I was talking with Nigel Cunningham about doing something a
> Dave> little different from the classic page flag bits when the number
> Dave> of users is restricted and performance isn't ultra-critical.
> Dave> Would something like this work for you, instead of using a real
> Dave> page->flags bit for PG_cached?
> 
> Just took a quick look at this and it looks a bit heavy for our
> use. We are only looking at a small number of pages. However I could
> imagine future cases where performance may be more critical.

If it's a quite small number (or range) of pages, perhaps a short
list_head list would suffice.  It would sure beat consuming a page flag.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] TCP-Hybla proposal

2005-02-22 Thread Tobias DiPasquale

On Tue, 22 Feb 2005 10:14:47 -0800, David S. Miller <[EMAIL PROTECTED]> wrote:
> On Tue, 22 Feb 2005 13:03:11 -0500 (EST)
> John Heffner <[EMAIL PROTECTED]> wrote:
> 
> > An idea I've been toying with for a while now is completely abstracting
> > congestion control.  Then you could have congestion control loadable
> > modules, which would avoid this mess of experimental algorithms inside the
> > main-line kernel.  If done right, they might be able to work seamlessly
> > with SCTP, too.  The tricky part is making sure the interface is complete
> > enough.
> 
> The symbols exported to allow this would need to be EXPORT_SYMBOL_GPL().

Why's that?

-- 
[ Tobias DiPasquale ]
0x636f6465736c696e67657240676d61696c2e636f6d
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch 4/6] Bind Mount Extensions 0.06

2005-02-22 Thread Ingo Oeser

Hi,

Herbert Poetzl wrote:
> +static inline int mnt_may_unlink(struct vfsmount *mnt, struct inode *dir,
> struct dentry *child) { +   if (!child->d_inode)
> +   return -ENOENT;
> +   if (MNT_IS_RDONLY(mnt))
> +   return -EROFS;
> +   return 0;
> +}

The argument "dir" is not used. Please remove it and fix the callers.


Regards

Ingo Oeser

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Chris Friesen

linux-os wrote:
Before I get into the reply, I just want to make it clear that I'm not 
arguing that we *should* do any of this, just that it is not technically 
impossible.  It's a thought experiment, not a design suggestion.

All wonderful. However, it dosn't fix the problem. You are,
again, assuming that the problem is the symptom! The problem
is that some piece of code is not handling an exception
properly. It is waiting forever for something that will
never happen. It's that CODE that needs to be fixed.
Absolutely. I'm just theorizing that it is possible to devise a system 
that would be able to deal with such a situation, analogous to the way 
the kernel can deal with bugs in userspace processes (segfaults, traps, 
etc.).

"Cleaning" up the immediate symptoms doesn't let
the next thread that acquires the "cleaned up" lock
use the hardware because it has jammed code between
that thread and the hardware.
If the system is designed such that all resources are tracked, then you 
could clean them up when the "hung" entity is killed (the way we do it 
for userspace resources).  In this case there is no more jammed code. 
The next guy to aquire the mutex knows the hardware is in an 
undetermined state, and is responsable for reinitializing it to a known 
state.  This would be horribly complicated, but I don't think it would 
be impossible.

The bad code needs to be fixed. If the bad code is
fixed, you will __never__ have a process stuck
in 'D' state unless you run for the 1000 years
that could statistically result in a bit in
the semaphore getting flipped.
I don't disagree with you on this.  I think that fixing the bad code is 
absolutely the way to go.  I'm  simply indulging in a thought experiment 
as to whether or not it is theoretically possible to create a system 
that would be able to clean up after this sort of thing once it has 
happened.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] A method for clearing out page cache

2005-02-22 Thread Paul Jackson

Andrew asked:
> So...  Cannot the applicaiton remove all its pagecache with posix_fadvise()
> prior to exitting?

Hang on ...

The replies of Ray and Martin answer your immediate question.

But we (SGI) are still busy discussing the bigger picture behind the
scenes ...

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[rft/update] r8169 changes in 2.6.x

2005-02-22 Thread Francois Romieu

An update of the r8169 driver is available for the 2.6.11-rc4 kernel.

Noticable changes:
- better handling of PHY as found on Acer Aspire 1524WLMi (Richard Dawe);
- fix a bug triggered when the device is brought down then up again;
- avoid a few lost/screaming interrupts;
- closed a race when a change of mtu is issued during network activity;
- fix VLAN on big-endian hosts (is someone using it apart from me ?);
- merge relevant changes from Realtek's 2.2 driver.

If it worked for you before, you should not notice anything.

Patch against 2.6.10-rc4:
- http://www.fr.zoreil.com/~romieu/misc/20050222-2.6.11-rc4-r8169.c-test.patch

Patch-script directory:
- http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.11-rc4/r8169/

Patch-script tarball:
- http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.11-rc4/r8169-blob.tar.bz2

The 2.4.x backport will be updated later this week.

As usual, success/regression reports will be welcome.

Thank you for your attention.

--
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread linux-os

On Tue, 22 Feb 2005, Chris Friesen wrote:
linux-os wrote:
Now, somebody needs a resource. It executes down();
once it gets control again, it has that resource. It attempts
to use that resource through a driver. The driver waits forever.
The resource is now permanently dorked --forever because its
driver is waiting forever. The user code never returns from
the driver so it can never execute up().
What about something like a "robust mutex" (in OSDL terminology)?  The guy 
holding it too long gets killed, and the mutex gets marked as dirty.  The 
next guy to aquire the mutex is responsable for re-initializing the resource 
(resetting the device to a known state, for instance).

Chris
All wonderful. However, it dosn't fix the problem. You are,
again, assuming that the problem is the symptom! The problem
is that some piece of code is not handling an exception
properly. It is waiting forever for something that will
never happen. It's that CODE that needs to be fixed.
"Cleaning" up the immediate symptoms doesn't let
the next thread that acquires the "cleaned up" lock
use the hardware because it has jammed code between
that thread and the hardware.
The bad code needs to be fixed. If the bad code is
fixed, you will __never__ have a process stuck
in 'D' state unless you run for the 1000 years
that could statistically result in a bit in
the semaphore getting flipped.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
 Notice : All mail here is now cached for review by Dictator Bush.
 98.36% of all statistics are fiction.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson

On Tue, Feb 22, 2005 at 03:20:27PM -0800, Andrew Morton wrote:
> [EMAIL PROTECTED] (Olof Johansson) wrote:
> >
> > +   inc_preempt_count();
> > +   ret = get_user(curval, (int __user *)uaddr1);
> > +   dec_preempt_count();
> 
> That _should_ generate a might_sleep() warning, except it looks like we
> forgot to add a check to get_user().
> 
> It would be better to use __copy_from_user_inatomic() here, I think.

Thanks for catching that. New rev below.

-

Some futex functions do get_user calls while holding mmap_sem for
reading. If get_user() faults, and another thread happens to be in mmap
(or somewhere else holding waiting on down_write for the same semaphore),
then do_page_fault will deadlock. Most architectures seem to be exposed
to this.

To avoid it, make sure the page is available. If not, release the
semaphore, fault it in and retry.

I also found another exposure by inspection, moving some of the code
around avoids the possible deadlock there.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: linux-2.5/kernel/futex.c
===
--- linux-2.5.orig/kernel/futex.c   2005-02-21 16:09:38.0 -0600
+++ linux-2.5/kernel/futex.c2005-02-22 17:20:22.0 -0600
@@ -329,6 +329,7 @@
int ret, drop_count = 0;
unsigned int nqueued;
 
+ retry:
down_read(>mm->mmap_sem);
 
ret = get_futex_key(uaddr1, );
@@ -355,9 +356,19 @@
   before *uaddr1.  */
smp_mb();
 
-   if (get_user(curval, (int __user *)uaddr1) != 0) {
-   ret = -EFAULT;
-   goto out;
+   inc_preempt_count();
+   ret = __copy_from_user_inatomic(, (int __user *)uaddr1, 
sizeof(int));
+   dec_preempt_count();
+
+   if (unlikely(ret)) {
+   up_read(>mm->mmap_sem);
+   /* Re-do the access outside the lock */
+   ret = get_user(curval, (int __user *)uaddr1);
+
+   if (!ret)
+   goto retry;
+
+   return ret;
}
if (curval != *valp) {
ret = -EAGAIN;
@@ -480,6 +491,7 @@
int ret, curval;
struct futex_q q;
 
+ retry:
down_read(>mm->mmap_sem);
 
ret = get_futex_key(uaddr, );
@@ -508,9 +520,21 @@
 * We hold the mmap semaphore, so the mapping cannot have changed
 * since we looked it up in get_futex_key.
 */
-   if (get_user(curval, (int __user *)uaddr) != 0) {
-   ret = -EFAULT;
-   goto out_unqueue;
+   inc_preempt_count();
+   ret = __copy_from_user_inatomic(, (int __user *)uaddr, 
sizeof(int));
+   dec_preempt_count();
+   if (unlikely(ret)) {
+   up_read(>mm->mmap_sem);
+
+   if (!unqueue_me()) /* There's a chance we got woken already */
+   return 0;
+
+   /* Re-do the access outside the lock */
+   ret = get_user(curval, (int __user *)uaddr);
+
+   if (!ret)
+   goto retry;
+   return ret;
}
if (curval != val) {
ret = -EWOULDBLOCK;
Index: linux-2.5/mm/mempolicy.c
===
--- linux-2.5.orig/mm/mempolicy.c   2005-02-04 00:27:40.0 -0600
+++ linux-2.5/mm/mempolicy.c2005-02-22 14:34:19.0 -0600
@@ -524,9 +524,13 @@
} else
pval = pol->policy;
 
-   err = -EFAULT;
+   if (vma) {
+   up_read(>mm->mmap_sem);
+   vma = NULL;
+   }
+
if (policy && put_user(pval, policy))
-   goto out;
+   return -EFAULT;
 
err = 0;
if (nmask) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Andrew Morton

Jes Sorensen <[EMAIL PROTECTED]> wrote:
>
> After applying the clue 2x4 to my head a couple of times, I came up
> with this patch. Hopefully it will work a bit better ;-)
> 

I know it's repetitious, but it's nice to maintain a changelog entry along
with the patch.  Especially when seventy people have asked "wtf is this patch
for?".

Implementation-wise, do you really need to clone-and-own the mem.c
functions?  Would it not be sufficient to do

ptr = arch_translate_mem_ptr(page, ptr);

inside mem.c?

> + *  arch/ia64/kernel/mem.c
> ...
> +extern loff_t memory_lseek(struct file * file, loff_t offset, int orig);
> +extern int mmap_kmem(struct file * file, struct vm_area_struct * vma);
> +extern int open_port(struct inode * inode, struct file * filp);
> +

Please find a .h file for the function prototypes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Symlink /sys/class/block to /sys/block

2005-02-22 Thread Greg KH

On Tue, Feb 22, 2005 at 03:06:34PM -0800, Chris Wedgwood wrote:
> On Sat, Feb 19, 2005 at 11:29:13PM +, Malcolm Rowe wrote:
> 
> > Following the discussion in [1], the attached patch creates
> > /sys/class/block as a symlink to /sys/block. The patch applies to
> > 2.6.11-rc4-bk7.
> 
> Shouldn't we really move /sys/block to /sys/class/block and put the
> symlink from there to /sys/block with the hope of removing it one day?

When struct class_device can support children, we can do just that.  But
that support has not been added, yet...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Help tracking down problem --- endless loop in __find_get_block_slow

2005-02-22 Thread Andrew Morton

Jeff Mahoney <[EMAIL PROTECTED]> wrote:
>
> In my experience, the loop is actually outside of
> __find_get_block_slow(), in __getblk_slow(). I've been using xmon to
> interrupt the kernel, and the results vary but are all rooted in the
> for(;;) loop in __getblk_slow. It appears as though grow_buffers is
> finding/creating the page, but then __find_get_block can't locate the
> buffer it needs.

Yes, that'll happen.  Because there are still buffers attached to the page
which have the wrong blocksize.  Say, if someone is trying to read a 2k
buffer_head which is backed by a page which already has 1k buffer_heads
attached to it.

Does your kernel not have that big printk in __find_get_block_slow()?  If
it does, maybe some of the buffers are unmapped.  Try:

--- 25/fs/buffer.c~aTue Feb 22 15:27:35 2005
+++ 25-akpm/fs/buffer.c Tue Feb 22 15:27:41 2005
@@ -456,7 +456,7 @@ __find_get_block_slow(struct block_devic
 * file io on the block device and getblk.  It gets dealt with
 * elsewhere, don't buffer_error if we had some unmapped buffers
 */
-   if (all_mapped) {
+   {
printk("__find_get_block_slow() failed. "
"block=%llu, b_blocknr=%llu\n",
(unsigned long long)block, (unsigned long 
long)bh->b_blocknr);
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CSMI questions

2005-02-22 Thread Greg KH

On Tue, Feb 22, 2005 at 11:16:56AM -0600, mikem wrote:
> All,
> I hate to dredge this up again, but, when Eric Moore submitted changes for MPT
> Fusion driver containing the CSMI ioctls it was rejected. There was talk on
> the linux-scsi list about it being a horrible interface, among other things.
> There were also comments about there being a Linux only approach. Personally,
> I like that idea but it's not good from a business perspective. Especially
> because HP, Dell, and others support more than one OS. Having a unique set of
> management apps for each OS would be very cumbersome.

Honestly, the kernel developers don't care about cross-OS platform
management utilities from a business perspective.  :)

> We've also been looking at how to use sysfs rather than ioctls.

Good.

> Some look reasonable, others seem to be restricted by sysfs itself. 
> 1. only ASCII files are allowed

With 1 value in that file.

> 2. if multiple attributes are contained in one file, who parses out the data?

multiple attributes are not allowed to be contained in a single file.

> 3. one buffer of size (PAGE_SIZE) may not hold all of the data required

You have a _single_ attribute that is bigger than PAGE_SIZE?  What is
it?

> I'd also like an (brief) explanation of why ioctls are so bad. I've seen the 
> reasons of them never going away, etc. But from the beginning of time (UNIX)
> ioctls have been the preferred method of user space/kernel communication.

That's because there was no other method.  See the lkml archives for why
ioctls are considered bad, I don't want to dredge it up again.

Hope this helps,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Chris Friesen

linux-os wrote:
Now, somebody needs a resource. It executes down();
once it gets control again, it has that resource. It attempts
to use that resource through a driver. The driver waits forever.
The resource is now permanently dorked --forever because its
driver is waiting forever. The user code never returns from
the driver so it can never execute up().
What about something like a "robust mutex" (in OSDL terminology)?  The 
guy holding it too long gets killed, and the mutex gets marked as dirty. 
 The next guy to aquire the mutex is responsable for re-initializing 
the resource (resetting the device to a known state, for instance).

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Andrew Morton

[EMAIL PROTECTED] (Olof Johansson) wrote:
>
> + inc_preempt_count();
> + ret = get_user(curval, (int __user *)uaddr1);
> + dec_preempt_count();

That _should_ generate a might_sleep() warning, except it looks like we
forgot to add a check to get_user().

It would be better to use __copy_from_user_inatomic() here, I think.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: JFFS2 Extended attributes support & SELinux in handhelds

2005-02-22 Thread Lorenzo Hernández García-Hierro

El mar, 22-02-2005 a las 12:21 -0600, Josh Boyer escribió:
> You should send this to the JFFS2 development list.  The xattr support
> is probably a JFFS3 candidate.
> 
> The mtd tree is the most current.  Any development would probably get
> the most benefit from being done there.  Especially since JFFS3 doesn't
> exist anywhere else :).

As we have talked in #mtd, I've moved everything to JFFS3 code and
re-worked the pretty basic stuff that was already done, as a really
dirty night-hack.

I hadn't time to fix the remaining errors (no, it's not a working
patch), nor the remaining "stolen" ReiserFS code that needs to be
modified in order to make JFFS3 happy with it (priv. directory handling
in xattr initialization, etc).
I'm going to have limited time for it, I have exams these two weeks and
also finish some other work in progress.

I've uploaded a patch that applies to 2.6.11-rc4 tree, with latest mtd
tree included.

http://pearls.tuxedo-es.org/patches/mtd-jffs3-xattr-20050222-2.6.11-rc4.patch
(998Kb)

I would appreciate any collaboration and help with it.

Cheers, thanks in advance and enjoy (not working) it.
:)
-- 
Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> 
[1024D/6F2B2DEC] & [2048g/9AE91A22][http://tuxedo-es.org]


signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada	digitalmente

Re: OT: Why is usb data many times the cpu hog that firewire is?

2005-02-22 Thread Matt Mackall

On Mon, Feb 21, 2005 at 05:08:27PM -0500, Gene Heskett wrote:
> On Monday 21 February 2005 13:29, Wichert Akkerman wrote:
> >Previously Gene Heskett wrote:
> >> Thats what I was afraid of, which makes using it for a motion
> >> detected burgular alarm source considerably less than practical
> >> since the machine must be able to do other things too.
> >
> >Dependin on the type of compression used you might be able to detect
> >motion by analyzing the compressed datastream.
> >
> Its jpg coming out of the camera, but I don't know to capture the raw 
> stream and do the comparisons.  One would have to first subtract the 
> expected peak values of the sensors noise (snow if you will), either 
> by a running average obtained by frame addition on a pixel by pixel 
> basis.  Somehow, that seems to imply a decoded stream.  And thats 
> obviously not going to be anything but cpu intensive too.  So I'm 
> less than enthusiastic that its a workable solution unless one is 
> able to dedicate a machine to that job exclusively.  X10 FIR's like 
> the EagleEye or HawkEye will need to be used to detect when the 
> recording should be started (and stopped)

JPEG data is DCT of 8x8 pixel chunks. If you can get at that, you can
compare the DC terms of each chunk with minimal decoding. Various
thumbnailers do this for speed already.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Greg KH

On Tue, Feb 22, 2005 at 02:10:58PM -0800, Linus Torvalds wrote:
> 
> Oh, well. The reason I hate the rwsem behaviour is exactly because it
> results in this very subtle class of deadlocks. This one case is certainly
> solvable several ways, but do we have other issues somewhere else? Things
> like kobject might be ripe with things like this. The mm semaphore tends
> to be pretty well-behaved - and I'm not sure the same is true of the
> kobject one.

I'm trying to get rid of the kobject (actually the subsystem) rwsem
right now, so it should be gone completly within a few kernel versions.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Symlink /sys/class/block to /sys/block

2005-02-22 Thread Chris Wedgwood

On Sat, Feb 19, 2005 at 11:29:13PM +, Malcolm Rowe wrote:

> Following the discussion in [1], the attached patch creates
> /sys/class/block as a symlink to /sys/block. The patch applies to
> 2.6.11-rc4-bk7.

Shouldn't we really move /sys/block to /sys/class/block and put the
symlink from there to /sys/block with the hope of removing it one day?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ALPS: do not activate on unsupported models

2005-02-22 Thread Dmitry Torokhov

Hi,

It feels like 2.6.11 is right around the corner. I would like to disable
ALPS suport for some devices we don't know how to handle properly yet
to cut down on number of complaints that we broke mouse support.

Please consider applying the patch below.

-- 
Dmitry

===

Input: ALPS - do not activate native mode for devices whose data
   we can not handle yet.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>

 alps.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Index: dtor/drivers/input/mouse/alps.c
===
--- dtor.orig/drivers/input/mouse/alps.c
+++ dtor/drivers/input/mouse/alps.c
@@ -34,7 +34,7 @@ static struct alps_model_info {
unsigned char signature[3];
unsigned char model;
 } alps_model_data[] = {
-   { { 0x33, 0x02, 0x0a }, ALPS_MODEL_GLIDEPOINT },
+/* { { 0x33, 0x02, 0x0a }, ALPS_MODEL_GLIDEPOINT },*/
{ { 0x53, 0x02, 0x0a }, ALPS_MODEL_GLIDEPOINT },
{ { 0x53, 0x02, 0x14 }, ALPS_MODEL_GLIDEPOINT },
{ { 0x63, 0x02, 0x0a }, ALPS_MODEL_GLIDEPOINT },
@@ -42,8 +42,8 @@ static struct alps_model_info {
{ { 0x73, 0x02, 0x0a }, ALPS_MODEL_GLIDEPOINT },
{ { 0x73, 0x02, 0x14 }, ALPS_MODEL_GLIDEPOINT },
{ { 0x63, 0x02, 0x28 }, ALPS_MODEL_GLIDEPOINT },
-   { { 0x63, 0x02, 0x3c }, ALPS_MODEL_GLIDEPOINT },
-   { { 0x63, 0x02, 0x50 }, ALPS_MODEL_GLIDEPOINT },
+/* { { 0x63, 0x02, 0x3c }, ALPS_MODEL_GLIDEPOINT },*/
+/* { { 0x63, 0x02, 0x50 }, ALPS_MODEL_GLIDEPOINT },*/
{ { 0x63, 0x02, 0x64 }, ALPS_MODEL_GLIDEPOINT },
{ { 0x20, 0x02, 0x0e }, ALPS_MODEL_DUALPOINT },
{ { 0x22, 0x02, 0x0a }, ALPS_MODEL_DUALPOINT },
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6: USB Storage hangs machine on bootup for ~2 minutes

2005-02-22 Thread Parag Warudkar

On Tuesday 22 February 2005 03:41 pm, Alan Stern wrote:
> usb_device_read acquires a couple of locks, one for the USB bus list and
> one for the root hub of the bus it's looking at.  I don't know which one
> occurs at offset 229 on your system -- maybe you can tell.  Oddly enough,
> neither of those locks is for a USB device like the Maxtor drive.  So it's
> not at all clear why plugging in the drive should mess up kudzu.  Or why
> the blockage should clear up after a couple of minutes.
>
> Perhaps we can find out by looking at other entries in the stack trace.  
> Of particular interest are the khubd, usb-storage, and scsi_eh processes.

Alan,
See below for stack traces and also note that the stack traces are after I 
modified usb_device_read to do down_interruptible instead of down. (kudzu 
gets stuck regardless though.) Let me know if you want me to revert the 
down_interruptible change and repost the stack trace.

I wrongly related this to the 2 minute hang - this one is forever if I let 
kudzu start during boot. If I run kudzu after boot is complete, it gets stuck 
and everything else on that drive (mount, unmount ..) also gets stuck. Sorry 
about the confusion.

Attached is the disassembly of usb_device_read from my machine.

Parag

SysRQ + T for relevant processes
==
hald  D 0020e12be31a 0  2558  1  3272  2545 
(NOTLB)
81002c76fb48 0082 81002c76fb28 88515044
   00862c76fb08 81002eb10800 0249 810001c56030
   81002c76fc08 0002
Call Trace:
{:scsi_mod:scsi_request_fn+2356}
   {io_schedule+15} {sync_page+70}
   {__wait_on_bit_lock+69} 
{sync_page+0}
   {__lock_page+167} 
{wake_bit_function+0}
   {wake_bit_function+0} 
{do_generic_mapping_read+660}
   {file_read_actor+0} 
{__generic_file_aio_read+420}
   {generic_file_read+187} 
{autoremove_wake_function+0}
   {do_brk+720} {vfs_read+225}
   {sys_read+80} {system_call+126}

 scsi_eh_2 D  0  3581  1  3582  3277 
(L-TLB)
 81002bdc1d88 0046 01e3 81002bed8800
81002bdc1d48 81002c25a800 0812 803df080
81002bdc1ed0 81002c25a800
 Call Trace:
 {wait_for_completion+437} 
{default_wake_function+0}
{default_wake_function+0} 
{:usb_storage:usb_stor_stop_transport+35}
{:usb_storage:command_abort+256}
{:scsi_mod:scsi_error_handler+2172}
{child_rip+8} 
{:scsi_mod:scsi_error_handler+0}
{child_rip+0}
 usb-storage   D  0  3582  1  3627  3581 
(L-TLB)
 81002b8e1c08 0046 81002b9e1000 0010
00762b8e1c98 81002bed8800 03dd 81002eb10800
c0040280 001f
 Call Trace:
 {wait_for_completion+437} 
{thread_return+253}
{default_wake_function+0} 
{default_wake_function+0}
{:usb_storage:usb_stor_msg_common+550}
{dma_unmap_sg+134} 
{:usb_storage:usb_stor_bulk_transfer_buf+143}
{:usb_storage:usb_stor_Bulk_transport+203}
{:usb_storage:usb_stor_invoke_transport+59}
{:usb_storage:usb_stor_transparent_scsi_command+27}
{:usb_storage:usb_stor_control_thread+756}
{finish_task_switch+195} 
{child_rip+8}
{:usb_storage:usb_stor_control_thread+0}
{child_rip+0}

 scsi_eh_3 S  0  3627  1  3634  3582 
(L-TLB)
 81002bd47d68 0046 81002bd47d28 80219b32
00742bc387c0 81002bc387c0 0226 81002b9fc030
81002bd47d48 80147ab1
 Call Trace:
 {_atomic_dec_and_lock+290} {free_uid+33}
{__down_interruptible+486} 
{default_wake_function+0}
{__down_failed_interruptible+53}
{:scsi_mod:.text.lock.scsi_error+45}
{child_rip+8} 
{:scsi_mod:scsi_error_handler+0}
{child_rip+0}
 usb-storage   S 0020f04e3d81 0  3634  13627 
(L-TLB)
 81002bc53df8 0046 81002bc53d80 1000
00732b9eba7c 81002b9fc030 05ad 81002ebc9800
81002bc53de8 8853570f
 Call Trace:
 {:usb_storage:usb_stor_bulk_transfer_buf+143}
{__down_interruptible+486} 
{default_wake_function+0}
{__down_failed_interruptible+53}
{:usb_storage:.text.lock.usb+5} 
{finish_task_switch+195}
{child_rip+8} 
{:usb_storage:usb_stor_control_thread+0}
   {child_rip+0}

khubd S 001ddd381da1 0   125  1   182 9 
(L-TLB)
810001ecbe18 0046 0246 81002ba2c400
   007401ecbdd8 810001e9a800 3023 81002bed8070
   810001ecbe18 8015c4c9
Call Trace:
{prepare_to_wait+345} {hub_thread+4118}
   {free_pages_bulk+1007} 
{autoremove_wake_function+0}
   {autoremove_wake_function+0} 
{child_rip+8}

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson

On Tue, Feb 22, 2005 at 10:34:57PM +, Jamie Lokier wrote:
> There is one small but important error: the "return ret" mustn't just
> return.  It must call unqueue_me() just like the code at out_unqueue,
> _including_ the conditional "ret = 0", but _excluding_ the up_read().

Not only that, but someone might already have dequeued us, right? It's
probably a pathological case though, i.e. someone did a wake on the same
(bad) address.

How's this patch? It's closer to Linus' pseudo-code than Andrew's, to
avoid the extra get_user() at function entry and keep the common case
path short.

It also includes the feedback from Andrew on the sys_get_mempolicy(),
making the patch even simpler there.




Some futex functions do get_user calls while holding mmap_sem for
reading. If get_user() faults, and another thread happens to be in mmap
(or somewhere else holding waiting on down_write for the same semaphore),
then do_page_fault will deadlock. Most architectures seem to be exposed
to this.

To avoid it, make sure the page is available. If not, release the
semaphore, fault it in and retry.

I also found another exposure by inspection, moving some of the code
around avoids the possible deadlock there.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: linux-2.5/kernel/futex.c
===
--- linux-2.5.orig/kernel/futex.c   2005-02-21 16:09:38.0 -0600
+++ linux-2.5/kernel/futex.c2005-02-22 16:38:24.0 -0600
@@ -329,6 +329,7 @@
int ret, drop_count = 0;
unsigned int nqueued;
 
+ retry:
down_read(>mm->mmap_sem);
 
ret = get_futex_key(uaddr1, );
@@ -355,9 +356,19 @@
   before *uaddr1.  */
smp_mb();
 
-   if (get_user(curval, (int __user *)uaddr1) != 0) {
-   ret = -EFAULT;
-   goto out;
+   inc_preempt_count();
+   ret = get_user(curval, (int __user *)uaddr1);
+   dec_preempt_count();
+
+   if (unlikely(ret)) {
+   up_read(>mm->mmap_sem);
+   /* Re-do the access outside the lock */
+   ret = get_user(curval, (int __user *)uaddr1);
+
+   if (!ret)
+   goto retry;
+
+   return ret;
}
if (curval != *valp) {
ret = -EAGAIN;
@@ -480,6 +491,7 @@
int ret, curval;
struct futex_q q;
 
+ retry:
down_read(>mm->mmap_sem);
 
ret = get_futex_key(uaddr, );
@@ -508,9 +520,21 @@
 * We hold the mmap semaphore, so the mapping cannot have changed
 * since we looked it up in get_futex_key.
 */
-   if (get_user(curval, (int __user *)uaddr) != 0) {
-   ret = -EFAULT;
-   goto out_unqueue;
+   inc_preempt_count();
+   ret = get_user(curval, (int __user *)uaddr);
+   dec_preempt_count();
+   if (unlikely(ret)) {
+   up_read(>mm->mmap_sem);
+
+   if (!unqueue_me()) /* There's a chance we got woken already */
+   return 0;
+
+   /* Re-do the access outside the lock */
+   ret = get_user(curval, (int __user *)uaddr);
+
+   if (!ret)
+   goto retry;
+   return ret;
}
if (curval != val) {
ret = -EWOULDBLOCK;
Index: linux-2.5/mm/mempolicy.c
===
--- linux-2.5.orig/mm/mempolicy.c   2005-02-04 00:27:40.0 -0600
+++ linux-2.5/mm/mempolicy.c2005-02-22 14:34:19.0 -0600
@@ -524,9 +524,13 @@
} else
pval = pol->policy;
 
-   err = -EFAULT;
+   if (vma) {
+   up_read(>mm->mmap_sem);
+   vma = NULL;
+   }
+
if (policy && put_user(pval, policy))
-   goto out;
+   return -EFAULT;
 
err = 0;
if (nmask) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Help tracking down problem --- endless loop in __find_get_block_slow

2005-02-22 Thread Jeff Mahoney

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew Morton wrote:
> "Thomas S. Iversen" <[EMAIL PROTECTED]> wrote:
> 
>>But if I do
>>
>> dd if=/dev/zero of=/mnt/testfile count=N, N>6
>>
>> I get into an endless loop in __find_get_block_slow. 
> 
> 
> The only way in which __find_get_block_slow() can loop is if something
> wrecked the buffer_head ring at page->private: something caused an internal
> loop via bh->b_this_page.
> 
> Are you sure that's where things are hanging?  That it's not stuck on a
> spinlock?
> 
> A sysrq-P trace might help.

I've observed similar effects without DM involved at all. I've been able
to reproduce using subfs (it brings out umount races nicely) on kernels
from 2.6.5 to 2.6.11-rc4, it's platform and device independent.

In my experience, the loop is actually outside of
__find_get_block_slow(), in __getblk_slow(). I've been using xmon to
interrupt the kernel, and the results vary but are all rooted in the
for(;;) loop in __getblk_slow. It appears as though grow_buffers is
finding/creating the page, but then __find_get_block can't locate the
buffer it needs.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCG7ZeLPWxlyuTD7IRAixHAJsHORHEMfFtTIozqwUOkk9WGFxCggCgiSfn
V3kCyFn/X87Mw4laVsJLUp4=
=YNSw
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Arjan van de Ven

On Tue, 2005-02-22 at 17:30 -0500, Jes Sorensen wrote:
> 
> For userspace it's used by some of the MPI type apps in userland.

you got to be kidding. Why are these MPI apps accessing memory that the
kernel has mapped cached (eg ram) via /dev/mem?


(eg my proposal is to make /dev/mem to be just device memory not kernel
accessable ram; wouldn't that solve the entire issue cleanly ?)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Jes Sorensen

> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> Matthew Wilcox <[EMAIL PROTECTED]> wrote:
>> 
>> On Tue, Feb 22, 2005 at 09:41:04AM -0500, Jes Sorensen wrote:
>> > >> + if (page->flags & PG_uncached)
>> > 
>> > Andrew> dude.  That ain't gonna work ;)
>> > 
>> > Pardon my lack of clue, but why not?

Andrew> if (page->flags & (1< would have been correcter.

Andrew,

After applying the clue 2x4 to my head a couple of times, I came up
with this patch. Hopefully it will work a bit better ;-)

Cheers,
Jes

Signed-off-by: Jes Sorensen <[EMAIL PROTECTED]>

diff -urN -X /usr/people/jes/exclude-linux 
linux-2.6.11-rc3-mm2-vanilla/arch/ia64/kernel/Makefile 
linux-2.6.11-rc3-mm2/arch/ia64/kernel/Makefile
--- linux-2.6.11-rc3-mm2-vanilla/arch/ia64/kernel/Makefile  2005-02-16 
11:20:19 -08:00
+++ linux-2.6.11-rc3-mm2/arch/ia64/kernel/Makefile  2005-02-16 11:58:35 
-08:00
@@ -7,7 +7,7 @@
 obj-y := acpi.o entry.o efi.o efi_stub.o gate-data.o fsys.o ia64_ksyms.o irq.o 
irq_ia64.o  \
 irq_lsapic.o ivt.o machvec.o pal.o patch.o process.o perfmon.o 
ptrace.o sal.o  \
 salinfo.o semaphore.o setup.o signal.o sys_ia64.o time.o traps.o 
unaligned.o \
-unwind.o mca.o mca_asm.o topology.o
+unwind.o mca.o mca_asm.o topology.o mem.o
 
 obj-$(CONFIG_IA64_BRL_EMU) += brl_emu.o
 obj-$(CONFIG_IA64_GENERIC) += acpi-ext.o
diff -urN -X /usr/people/jes/exclude-linux 
linux-2.6.11-rc3-mm2-vanilla/arch/ia64/kernel/mem.c 
linux-2.6.11-rc3-mm2/arch/ia64/kernel/mem.c
--- linux-2.6.11-rc3-mm2-vanilla/arch/ia64/kernel/mem.c 1969-12-31 16:00:00 
-08:00
+++ linux-2.6.11-rc3-mm2/arch/ia64/kernel/mem.c 2005-02-22 14:11:40 -08:00
@@ -0,0 +1,151 @@
+/*
+ *  arch/ia64/kernel/mem.c
+ *
+ *  IA64 specific  portions of /dev/mem access, notably handling
+ *  read/write from uncached memory
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ *  Copyright (C) 2005 Jes Sorensen <[EMAIL PROTECTED]>
+ */
+
+
+#include 
+
+#include 
+#include 
+
+
+extern loff_t memory_lseek(struct file * file, loff_t offset, int orig);
+extern int mmap_kmem(struct file * file, struct vm_area_struct * vma);
+extern int open_port(struct inode * inode, struct file * filp);
+
+
+static inline int range_is_allowed(unsigned long from, unsigned long to)
+{
+   unsigned long cursor;
+
+   cursor = from >> PAGE_SHIFT;
+   while ((cursor << PAGE_SHIFT) < to) {
+   if (!devmem_is_allowed(cursor))
+   return 0;
+   cursor++;
+   }
+   return 1;
+}
+
+
+/*
+ * This funcion reads the *physical* memory. The f_pos points directly
+ * to the memory location. 
+ */
+static ssize_t read_mem(struct file * file, char __user * buf,
+   size_t count, loff_t *ppos)
+{
+   unsigned long p = *ppos;
+   ssize_t read, sz;
+   struct page *page;
+   char *ptr;
+
+
+   if (!valid_phys_addr_range(p, ))
+   return -EFAULT;
+   read = 0;
+
+   while (count > 0) {
+   /*
+* Handle first page in case it's not aligned
+*/
+   if (-p & (PAGE_SIZE - 1))
+   sz = -p & (PAGE_SIZE - 1);
+   else
+   sz = min(PAGE_SIZE, count);
+
+   page = pfn_to_page(p >> PAGE_SHIFT);
+   /*
+* On ia64 if a page has been mapped somewhere as
+* uncached, then it must also be accessed uncached
+* by the kernel or data corruption may occur
+*/
+   if (PageUncached(page))
+   ptr = (char *)p + __IA64_UNCACHED_OFFSET;
+   else
+   ptr = __va(p);
+   if (copy_to_user(buf, ptr, sz))
+   return -EFAULT;
+   buf += sz;
+   p += sz;
+   count -= sz;
+   read += sz;
+   }
+
+   *ppos += read;
+   return read;
+}
+
+
+static ssize_t write_mem(struct file * file, const char __user * buf, 
+size_t count, loff_t *ppos)
+{
+   unsigned long p = *ppos;
+   unsigned long copied;
+   ssize_t written, sz;
+   struct page *page;
+   char *ptr;
+
+   if (!valid_phys_addr_range(p, ))
+   return -EFAULT;
+
+   written = 0;
+
+   if (!range_is_allowed(p, p + count))
+   return -EPERM;
+   /*
+* Need virtual p below here
+*/
+   while (count > 0) {
+   /*
+* Handle first page in case it's not aligned
+*/
+   if (-p & (PAGE_SIZE - 1))
+   sz = -p & (PAGE_SIZE - 1);
+   else
+   sz = min(PAGE_SIZE, count);
+
+   page = pfn_to_page(p >> PAGE_SHIFT);
+   /*
+* On ia64 if a page has been mapped somewhere as
+* uncached, then it must also be accessed

Re: Why does printk helps PCMCIA card to initialise?

2005-02-22 Thread David Hinds

On Mon, 21 Feb 2005, Linus Torvalds wrote:
> On Mon, 21 Feb 2005, Russell King wrote:
> >
> > In cs.c, alloc_io_space(), find the line:
> >
> > if (*base & ~(align-1)) {
> >
> > delete the ~ and rebuild. This may resolve your problem.
> 
> Unlikely. The code is too broken for words.

The original code is correct; you are misinterpreting the meaning of
the "align" variable here.  PCMCIA cards can request a specific base
IO address, and can also specify how many IO address lines they
decode.  The number of decoded lines determines a maximal alignment
restriction for a card; if it only decodes 3 lines, then it should not
reasonably ask for an IO region with more specificity than being on an
8 port boundary.  The "align" variable here holds this alignment.  The
"oddness" here is that the card is providing conflicting information,
that it needs IO ports at a specific address, but is only decoding 3
address lines (i.e. align=8).

The names of "base" and "align" have the expected meanings when a card
only specifies one or the other.  It's only for the case where both
are specified that the meaning is complicated.  Then, "base" is more
like an offset into a block that has "align" alignment

Given an "odd" request for a base=0x260 and align=8, the allocator
promotes this to align=0x400, and would allow addresses 0x260, 0x660,
0xa60, 0xe60, etc, subject to restrictions in /etc/pcmcia/config.opts.

The real problem here is that all the IO address ranges the card
claims to support were unavailable.  I'd first try adding:

  include port 0x0600-0x07ff

to /etc/pcmcia/config.opts to give the allocator more flexibility in
choosing port ranges.  

-- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier

Linus Torvalds wrote:
> > queue_me(...) etc.
> > current->flags |= PF_MMAP_SEM; <- new
> > ret = get_user(...);
> > current->flags &= PF_MMAP_SEM; <- new
> > /* the rest */
> 
> That is uglee. 
> 
> We really have this already, and it's called "current->preempt". It 
> handles any lock at all, and doesn't add yet another special case to all 
> the architectures.

Ooh, I didn't know current->preempt did that (been away).

>   repeat:
>   down_read(>mm->mmap_sem);
>   get_futex_key(...) etc.
>   queue_me(...) etc.
>   inc_preempt_count();
>   ret = get_user(...);
>   dec_preempt_count();
>   if (unlikely(ret)) {
>   up_read(>mm->mmap_sem);
>   /* Re-do the access outside the lock */
>   ret = get_user(...);
>   if (!ret)
>   goto repeat;
>   return ret;
>   }

That would work.  I like it. :)

Page faults will enter the fault handler twice (i.e. slower), but
that's not really a disadvantage, because a program always references
the memory just before calling futex_wait anyway.  A fault is rare.

There is one small but important error: the "return ret" mustn't just
return.  It must call unqueue_me() just like the code at out_unqueue,
_including_ the conditional "ret = 0", but _excluding_ the up_read().

Alternatively, since it's a rare case, just shuffle the loop around:

down_read(>mm->mmap_sem);
repeat:
get_futex_key(...) etc.
queue_me(...) etc.
inc_preempt_count();
ret = get_user(...);
dec_preempt_count();
if (unlikely(ret)) {
up_read(>mm->mmap_sem);
/* Re-do the access outside the lock */
ret = get_user(...);
down_read(>mm->mmap_sem);
if (!ret)
goto repeat;
goto out_unqueue;
}

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Jes Sorensen

> "Arjan" == Arjan van de Ven <[EMAIL PROTECTED]> writes:

Arjan> On Tue, 2005-02-22 at 04:52 -0500, Jes Sorensen wrote:
>> Hi,
>> 
>> This patch introduces ia64 specific read/write handlers for
>> /dev/mem access which is needed to avoid uncached pages to be
>> accessed through the cached kernel window which can lead to random
>> corruption. It also introduces a new page-flag PG_uncached which
>> will be used to mark the uncached pages. I assume this may be
>> useful to other architectures as well where the CPU may use
>> speculative reads which conflict with uncached access. In addition
>> I moved do_write_mem to be under ARCH_HAS_DEV_MEM as it's only ever
>> used if that is defined.
>> 
>> The patch is needed for the new ia64 special memory driver (mspec -
>> former fetchop).

Arjan> is there ANY valid reason to allow access to cached uses at
Arjan> all?  (eg kernel ram)

Arjan> why not just disable any such ram access entirely...

You mean uncached?

For userspace it's used by some of the MPI type apps in userland.
Presumably there's cases where it gives better performance. For the
SN2 hardware there's also a special mode known as fetchop mode which
requires uncached memory, it's used quite heavily by the same types of
apps.

The problem is if you then have apps such as lcrash which may read
through all kernel memory. If a page is mapped uncached to userland
you can hit the memory corruption case if you access the same page
cached from within a kernel cached mapping. I suspect the suspend code
could hit similar problems, but I don't know that code well enough to
say if it's the case or not.

Cheers,
Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt

On Tue, 2005-02-22 at 14:10 -0800, Linus Torvalds wrote:

> Oh, well. The reason I hate the rwsem behaviour is exactly because it
> results in this very subtle class of deadlocks. This one case is certainly
> solvable several ways, but do we have other issues somewhere else? Things
> like kobject might be ripe with things like this. The mm semaphore tends
> to be pretty well-behaved - and I'm not sure the same is true of the
> kobject one.

We could detect those tho. When the appropriate DEBUG option is set, by
storing a cpumask in the semaphore we could detect if it's already taken
on this cpu...

> Normal recursive deadlocks are wonderful - most of them show up
> immediately, so assuming you just have enough coverage, you're fine. This
> fairness-related deadlock requires a race to happen.

Unless you consider that taking the read semaphore twice on the same CPU
is always a bug, thus the above stuff would work for catching them at
least more often...
 
> Maybe it would be sufficient to have a debugging version of rwsems that
> just notice recursion?
> 
>   Linus
-- 
Benjamin Herrenschmidt <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier

Olof Johansson wrote:
> > That won't work because the vma lock must be help between key
> > calculation and get_user() - otherwise futex is not reliable.  It
> > would work if the futex key calculation was inside the loop.
> 
> Sure, but that's still true: It's just that the get_user() is done twice
> instead. The semaphore is never released between the key calculation and
> the "real" get_user().

Ah, I didn't look at where the loop is used and didn't think there'd
be _two_ get_user() calls in the fast case.  Not my instinct.

> > A much simpler solution (and sorry for not offering it earlier,
> > because Andrew Morton pointed out this bug long ago, but I was busy), is:
> 
> Either way works for me. Andrew/Linus, got a preference? I'll either
> post my refresh based on Andrews comments, or code up Jamie's
> suggestion.

Yours has a couple of problems.

   1. It'll make futex waits somewhat slower.  One of the nicer features
  of 2.6 futexes is that we got rid of the explicit page table lookup.

   2. It's broken because a page can be paged out by another thread
  after you've forced it in and before the get_user().  We only
  take mmap_sem, not the page table lock.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds

On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote:
> 
> Yours is probably the most efficient too. Note sure what is best for
> rwsems tho, there seem to be some interest preventing readers from
> starving writers for ever, this has been debated endlessly iirc,
> though I have no personal opinion there.

Yes, the starvation issue is potentially real. And thinking about it,
we've even had that in real life, with /proc and lots of page faults. So I
guess that's a strong argument for the fairness thing.

Oh, well. The reason I hate the rwsem behaviour is exactly because it
results in this very subtle class of deadlocks. This one case is certainly
solvable several ways, but do we have other issues somewhere else? Things
like kobject might be ripe with things like this. The mm semaphore tends
to be pretty well-behaved - and I'm not sure the same is true of the
kobject one.

Normal recursive deadlocks are wonderful - most of them show up
immediately, so assuming you just have enough coverage, you're fine. This
fairness-related deadlock requires a race to happen.

Maybe it would be sufficient to have a debugging version of rwsems that
just notice recursion?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ACPI] Call for help: list of machines with working S3

2005-02-22 Thread Karol Kozimor

Thus wrote Norbert Preining:
> - DRI must be disabled I guess?! Even with newer X server (x.org)?

You still didn't state which X server are you using. In short, XFree86 4.4,
X.Org 6.7 and 6.8.2 are fine, anything other (including X.Org 6.8.0 and .1)
is not.
Best regards,

-- 
Karol 'sziwan' Kozimor
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: reading the same entropy twice

2005-02-22 Thread Lee Revell

On Tue, 2005-02-22 at 16:55 -0500, Bob O'Neill wrote:
> Hello.
> 
> I have noticed that it is possible on an SMP box for two processes to
> simultaneously read the same entropy out of /dev/urandom.  This
> doesn't seem right to me.  I was using the entropy value to generate a
> random number to use as a session ID, so occasionally there would be a
> collision on session IDs, causing a login failure as session IDs are
> required to be unique.  This issue does not appear to be related to
> entropy depletion.
> 
> Could you provide me with some insight into why this is the case, if
> it is intentional?  It seems like it could be addressed with a
> spinlock.

Please check the LKML archives, this was debated at length last month
IIRC.  I don't recall whether it ended conclusivelty.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Arjan van de Ven

On Tue, 2005-02-22 at 04:52 -0500, Jes Sorensen wrote:
> Hi,
> 
> This patch introduces ia64 specific read/write handlers for /dev/mem
> access which is needed to avoid uncached pages to be accessed through
> the cached kernel window which can lead to random corruption. It also
> introduces a new page-flag PG_uncached which will be used to mark the
> uncached pages. I assume this may be useful to other architectures as
> well where the CPU may use speculative reads which conflict with
> uncached access. In addition I moved do_write_mem to be under
> ARCH_HAS_DEV_MEM as it's only ever used if that is defined.
> 
> The patch is needed for the new ia64 special memory driver (mspec -
> former fetchop).


is there ANY valid reason to allow access to cached uses at all?
(eg kernel ram)

why not just disable any such ram access entirely...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Ray Bryant

Andi Kleen wrote:
OK, so what is the alternative?  Well, if we had a va_start and
va_end (or a va_start and length) we could move the shared object
once using a call of the form
  migrate_pages(pid, va_start, va_end, count, old_node_list,
new_node_list);
with old_node_list = 0 1 2 ... 31
new_node_list = 2 3 4 ... 33
for one of the pid's in the job.

I still don't like it. It would be bad to make migrate_pages another
ptrace() [and ptrace at least really enforces a stopped process]
But I can see your point that migration DEFAULT pages with first touch
aware applications pretty much needs the old_node, new_node lists.
I just don't think an external process should mess with other processes
VA. But I can see that it makes sense to do this on SHM that 
is mapped into a management process.

How about you add the va_start, va_end but only accept them 
when pid is 0 (= current process). Otherwise enforce with EINVAL
that they are both 0. This way you could map the
shared object into the batch manager, migrate it there, then
mark it somehow to not be migrated further, and then
migrate the anonymous pages using migrate_pages(pid, ...) 

There can be mapped files that can't be mapped into the migration task.
.
Here's an example (courtesy of Jack Steiner);
sprintf(fname, "/tmp/tmp.%d", getpid());
unlink(fname);
fd = open(fname, O_CREAT|O_RDWR);
p = mmap(NULL, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);
unlink(fname);
/* "p" remains valid until unmapped */
The file /tmp/tmp.pid is both mapped and deleted.  It can't be opened
by another process to mmap() it, so it can't be mapped into the
migration task AFAIK how to do things.  The file does show up in 
/proc/pid/maps as shown below (pardon the line splitting):

2027-20278000 rw-p 0020 08:13 75498728  \ 
/lib/tls/libc.so.6.1
20278000-20284000 rw-p 20278000 00:00 0
2030-20c8c000 rw-s  08:13 100885287 \ 
/tmp/tmp.18259 (deleted)
4000-40008000 r-xp  00:2a 14688706  \ 
/home/tulip14/steiner/apps/bigmem/big

Jack says:
"This is a fairly common way to work with scratch map'ed files. Sites that
have very large disk farms but limited swap space frequently do this (or at 
least they use to...)"

So while I tend to agree with your concern about manipulating
one process's address space from another, I honestly think we
are stuck, and I don't see a good way around this.
BTW it might be better to make va_end a size, just to be more
symmetric with mlock,madvise,mmap et.al.
Yes, I agree.  Let's make that so.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
Best Regards,
Ray
---
  Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
[EMAIL PROTECTED] [EMAIL PROTECTED]
The box said: "Requires Windows 98 or better",
   so I installed Linux.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds

On Tue, 22 Feb 2005, Andrew Morton wrote:
> 
> However the pte can get unmapped by memory reclaim so we could still take a
> minor fault, and hit the same deadlock, yes?

You _could_ fix that by getting the pagetable spinlock, I guess. Which
check_user_page_readable() assumes you'd be holding anyway (not holding it
would appear to be a bug).

At which point you might as well just walk the tables by hand and just do 
the read that way. Of course, then you have virtual aliasing issues etc.

Insane, but possible.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

reading the same entropy twice

2005-02-22 Thread Bob O'Neill

Hello.

I have noticed that it is possible on an SMP box for two processes to
simultaneously read the same entropy out of /dev/urandom.  This
doesn't seem right to me.  I was using the entropy value to generate a
random number to use as a session ID, so occasionally there would be a
collision on session IDs, causing a login failure as session IDs are
required to be unique.  This issue does not appear to be related to
entropy depletion.

Could you provide me with some insight into why this is the case, if
it is intentional?  It seems like it could be addressed with a
spinlock.

Thanks.
-Bob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6.11-rc4 i386] Re-order includes to fix userland breakage

2005-02-22 Thread Tom Rini

The following moves all includes  (except 
and  down to below the existing __KERNEL__ test.  None
of these includes are needed by the user-visible portions of the header,
and in some cases can cause userland apps to break.  For example, LTP
and sash with an empty  will fail thusly:
cc -Wall  -I../../include -g -Wall -I../../../../include -Wallsetrlimit02.c 
-L../../../../lib -lltp  -o setrlimit02
In file included from /usr/include/asm/atomic.h:6,
 from /usr/include/linux/fs.h:20,
 from setrlimit02.c:46:
/usr/include/asm/processor.h:68: error: `CONFIG_X86_L1_CACHE_SHIFT' undeclared 
here (not in a function)
/usr/include/asm/processor.h:68: error: requested alignment is not a constant

Build/run tested with a glibc rebuild as well.

Signed-off-by: Tom Rini <[EMAIL PROTECTED]>

= include/linux/fs.h 1.376 vs edited =
--- 1.376/include/linux/fs.h2005-02-03 07:42:40 -07:00
+++ edited/include/linux/fs.h   2005-02-22 14:44:27 -07:00
@@ -7,25 +7,7 @@
  */
 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct iovec;
-struct nameidata;
-struct pipe_inode_info;
-struct poll_table_struct;
-struct kstatfs;
-struct vm_area_struct;
-struct vfsmount;
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -216,13 +198,32 @@ extern int dir_notify_enable;
 
 #ifdef __KERNEL__
 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+
+#include 
 #include 
 #include 
+
+struct iovec;
+struct nameidata;
+struct pipe_inode_info;
+struct poll_table_struct;
+struct kstatfs;
+struct vm_area_struct;
+struct vfsmount;
 
 /* Used to be a macro which just called the function, now just a function */
 extern void update_atime (struct inode *);

-- 
Tom Rini
http://gate.crashing.org/~trini/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt

On Tue, 2005-02-22 at 13:31 -0800, Linus Torvalds wrote:
> 
> On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote:
> > 
> > Isn't Olof scheme racy ? Can't the stuff get swapped out between the
> > first get_user() and the "real" one ?
> 
> Yes. But see my suggested modification (which I still think is "the thing 
> that Olof does", except it's more efficient and avoids the race).
> 
> If rwsems acted like rwlocks, we wouldn't have this issue at all.

Yours is probably the most efficient too. Note sure what is best for
rwsems tho, there seem to be some interest preventing readers from
starving writers for ever, this has been debated endlessly iirc,
though I have no personal opinion there.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread linux-os

On Tue, 22 Feb 2005, Chris Friesen wrote:
Horst von Brand wrote:
Anthony DiSante <[EMAIL PROTECTED]> said:

That's one of the things I asked a few messages ago.  Some people on the 
list were saying that it'd be "really hard" and would "require a lot of 
bookkeeping" to "fix" permanently-D-stated processes... which is 
completely different than "impossible."
Most people here have little clue. It can't be done.
I realize it would be extremely difficult if not impossible to do in the 
current linux architecture, but I find it hard to believe that it is 
technically impossible if one were allowed to design the system from scratch.

No. It has nothing to do with the architecture. These problems go
all the way back to the first multi-tasking systems.
Maybe I'm on crack, but would it not be technically possible to have all 
resource usage be tracked so that when a task tries to do something and 
hangs, eventually it gets cleaned up?
It's not the "task" that's hung. That's just the symptoms. It's
the SHARED RESOURCE that is hung!  Once some task attempts
to use a shared resource, it must (somehow) get in-line so
that the it can use that task without somebody else coming
along and mucking with it. To get "in-line" means to
execute some kind of MUTEX. There are many kinds. VAXen
had a "lock manager", there are simple sleeping-loops using
semaphores, etc. Linux uses such loops, the two most
commonly used are "down()" and "up()".
Now, somebody needs a resource. It executes down();
once it gets control again, it has that resource. It attempts
to use that resource through a driver. The driver waits forever.
The resource is now permanently dorked --forever because its
driver is waiting forever. The user code never returns from
the driver so it can never execute up().
If you, somehow, grab hold of the program-counter (like
a long-jump), and force a return so that up() gets
executed, the wrong thread unlocks the semaphore but its
forever broken anyway because the resource it protects is
hung.
We already handle cleaning up stuff for userspace (memory, file descriptors, 
sockets, etc.).  Why not enforce a design that says "all entities taking a 
lock must specify a maximum hold time".  After that time expires, they are 
assumed to be hung, and all their resources (which were being tracked by some 
system) get cleaned up.

Time won't do it. It's not a matter of "cleaning up" it's a matter
of not waiting forever in the first place. If you were able to
"clean up" by reinitializing the semaphores, etc., killing anything
that was attached,  etc., the next instance of attempting to use
that resource will hang the exact same way.
We are not talking about some broken semaphore code that sometimes
gets hung. We are talking about the resource it protects. The
semaphore code is fine.
It would probably be complicated, slow, and generally not worth the effort. 
But it seems at least technically possible.

Chris
-
The problem is not "Waiting in D state". That's the symptom.
The problem is waiting forever after a lock has been taken.
That is the problem.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
 Notice : All mail here is now cached for review by Dictator Bush.
 98.36% of all statistics are fiction.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Andrew Morton

Jamie Lokier <[EMAIL PROTECTED]> wrote:
>
> ...
> 
> > > One attempt to fix this is included below. It works, but I'm not entirely
> > > happy with the fact that it's a bit messy solution. If anyone has a
> > > better idea for how to solve it I'd be all ears.
> > 
> > It's fairly sane.  Style-wise I'd be inclined to turn this:
> > 
> > down_read(>mm->mmap_sem);
> > while (!check_user_page_readable(current->mm, uaddr1)) {
> > up_read(>mm->mmap_sem);
> > /* Fault in the page through get_user() but discard result */
> > if (get_user(curval, (int __user *)uaddr1) != 0)
> > return -EFAULT;
> > down_read(>mm->mmap_sem);
> > }
> 
> That won't work because the vma lock must be help between key
> calculation and get_user() - otherwise futex is not reliable.  It
> would work if the futex key calculation was inside the loop.

All the above is trying to do is to convert the initial down_read(mmap_sem)
into a function which, on exit, guarantees that

a) down_read(mmap_sem) is held and

b) the subsequent get_user() of that address will not generate a pagefault.

So it shouldn't affect the futex code's atomicity at all.

However the pte can get unmapped by memory reclaim so we could still take a
minor fault, and hit the same deadlock, yes?

> A much simpler solution (and sorry for not offering it earlier,
> because Andrew Morton pointed out this bug long ago, but I was busy), is:
> 
> In futex.c:
> 
>   down_read(>mm->mmap_sem);
>   get_futex_key(...) etc.
>   queue_me(...) etc.
>   current->flags |= PF_MMAP_SEM; <- new
>   ret = get_user(...);
>   current->flags &= PF_MMAP_SEM; <- new
>   /* the rest */
> 
> And in arch/*/mm/fault.c, replace every one of these:
> 
>   down_read(>mmap_sem);
> 
>   up_read(>mmap_sem);
> 
> with these:
> 
>   if (!(current & PF_MMAP_SEM))
>   down_read(>mmap_sem);
> 
>   if (!(current & PF_MMAP_SEM))
>   up_read(>mmap_sem);
> 

Yes, that will work.  However I do feel that it's cleaner to localise this
nastiness into a single function which the futex code calls, rather than
spreading it all around and adding overhead to every pagefault.  If we can
work out how.

wrt this down_read/down_write/down_read deadlock: iirc, the reason why
down_write() takes precedence over down_read() is to avoid the permanent
writer starvation which would occur if there is heavy down_read() traffic.

As Linus points out, an alternative would be to do an inc_preempt_count()
around the offending get_user(), then use __copy_from_user_inatomic(), then
take some sort of remedial action if __copy_from_user_inatomic() returns a
fault.  Something like:

retry:
if (get_user(uaddr) == -EFAULT)
return -EFAULT;
down_read(mmap_sem);
inc_preempt_count();
if (__copy_from_user_inatomic(..., uaddr)) {
up_read(mmap_sem);
dec_preempt_count();
goto retry;
}

dec_preempt_count();
up_read(mmap_sem);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Jes Sorensen

> "Dave" == Dave Hansen <[EMAIL PROTECTED]> writes:

Dave> I was talking with Nigel Cunningham about doing something a
Dave> little different from the classic page flag bits when the number
Dave> of users is restricted and performance isn't ultra-critical.
Dave> Would something like this work for you, instead of using a real
Dave> page->flags bit for PG_cached?

Just took a quick look at this and it looks a bit heavy for our
use. We are only looking at a small number of pages. However I could
imagine future cases where performance may be more critical.

Cheers,
Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm series] ia64 specific /dev/mem handlers

2005-02-22 Thread Jes Sorensen

> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> Matthew Wilcox <[EMAIL PROTECTED]> wrote:
>> 
>> On Tue, Feb 22, 2005 at 09:41:04AM -0500, Jes Sorensen wrote:
>> > >> + if (page->flags & PG_uncached)
>> > 
>> > Andrew> dude.  That ain't gonna work ;)
>> > 
>> > Pardon my lack of clue, but why not?

Andrew> if (page->flags & (1< would have been correcter.

DOH!

Desperately seeking a bulk supply of those brown paper bags!

>> I think you're supposed to always use test_bit() to check page
>> flags

Andrew> Yup.  Add PageUncached macros to page-flags.h.

M another butt ugly StUdLyCaPs macro coming soon.

Thanks,
Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds

On Tue, 22 Feb 2005, Jamie Lokier wrote:
> 
> A much simpler solution (and sorry for not offering it earlier,
> because Andrew Morton pointed out this bug long ago, but I was busy), is:
> 
> In futex.c:
> 
>   down_read(>mm->mmap_sem);
>   get_futex_key(...) etc.
>   queue_me(...) etc.
>   current->flags |= PF_MMAP_SEM; <- new
>   ret = get_user(...);
>   current->flags &= PF_MMAP_SEM; <- new
>   /* the rest */

That is uglee. 

We really have this already, and it's called "current->preempt". It 
handles any lock at all, and doesn't add yet another special case to all 
the architectures.

Just do

repeat:
down_read(>mm->mmap_sem);
get_futex_key(...) etc.
queue_me(...) etc.
inc_preempt_count();
ret = get_user(...);
dec_preempt_count();
if (unlikely(ret)) {
up_read(>mm->mmap_sem);
/* Re-do the access outside the lock */
ret = get_user(...);
if (!ret)
goto repeat;
return ret;
}
...

and you should be ok.

No new special cases, no new abstractions. At most, we should probably 
create a "get_user_inatomic()", to 

 - make it damn obvious what we're doing, and match the explicit
   "inatomic" in the other place where we depend on this (fs/filemap.c)

 - allow the regular "get_user()" to continue to do the normal
   "might_sleep()" checks.

That's assuming we can't just make rwsem's nest nicely.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds

On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote:
> 
> Isn't Olof scheme racy ? Can't the stuff get swapped out between the
> first get_user() and the "real" one ?

Yes. But see my suggested modification (which I still think is "the thing 
that Olof does", except it's more efficient and avoids the race).

If rwsems acted like rwlocks, we wouldn't have this issue at all.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier

Chris Friesen wrote:
> > down_read(>mm->mmap_sem);
> > get_futex_key(...) etc.
> > queue_me(...) etc.
> > current->flags |= PF_MMAP_SEM; <- new
> > ret = get_user(...);
> > current->flags &= PF_MMAP_SEM; <- new
> > /* the rest */
> 
> Should the second new line be this (with the inverse)?
> 
>   current->flags &= ~PF_MMAP_SEM;

Quiet!  I was trying to sneak in a security hole! :)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson

On Tue, Feb 22, 2005 at 09:07:52PM +, Jamie Lokier wrote:
> 
> That won't work because the vma lock must be help between key
> calculation and get_user() - otherwise futex is not reliable.  It
> would work if the futex key calculation was inside the loop.

Sure, but that's still true: It's just that the get_user() is done twice
instead. The semaphore is never released between the key calculation and
the "real" get_user().

> A much simpler solution (and sorry for not offering it earlier,
> because Andrew Morton pointed out this bug long ago, but I was busy), is:

Either way works for me. Andrew/Linus, got a preference? I'll either
post my refresh based on Andrews comments, or code up Jamie's
suggestion.

-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt

On Wed, 2005-02-23 at 08:16 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2005-02-22 at 11:36 -0800, Linus Torvalds wrote:
> 
> > DavidH - what's the word on nested read-semaphores like this? Are they 
> > supposed to work (like nested read-spinlocks), or do we need to do the 
> > things Olof does?
> 
> Isn't Olof scheme racy ? Can't the stuff get swapped out between the
> first get_user() and the "real" one ?

Forget it, I missed the check_user_page_readable() guy within the
semaphore protection. I didn't know that function ;)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Chris Friesen

Jamie Lokier wrote:
In futex.c:
down_read(>mm->mmap_sem);
get_futex_key(...) etc.
queue_me(...) etc.
current->flags |= PF_MMAP_SEM; <- new
ret = get_user(...);
current->flags &= PF_MMAP_SEM; <- new
/* the rest */
Should the second new line be this (with the inverse)?
current->flags &= ~PF_MMAP_SEM;
Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt

On Tue, 2005-02-22 at 11:36 -0800, Linus Torvalds wrote:

> DavidH - what's the word on nested read-semaphores like this? Are they 
> supposed to work (like nested read-spinlocks), or do we need to do the 
> things Olof does?

Isn't Olof scheme racy ? Can't the stuff get swapped out between the
first get_user() and the "real" one ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Problems with 2.6.11-rc4, Opteron server and MPTBase : Round 2

2005-02-22 Thread Weathers, Norman R.

OK, some more information concerning the previous problems with
2.6.11-rc4.

Ok, 2.6.11-rc3 does the exact same thing as 2.6.11-rc4 does, which is
crashes whenever you try and boot up our Opteron based server which has
an LSI MPT Fusion based SCSI card as the primary card.  Now comes the
weird part...  It only crashes if the mptbase and mptscsih are modules.
If the drivers are built into the kernel, the 2.6.11-rc3 kernel boots
just fine.  I am going to see if the 2.6.11-rc4 kernel boots as well
when the driver is built in.

Thanks again for any help anyone can give.

Norman Weathers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cfq: depth 4 reached, tagging now on

2005-02-22 Thread Lee Revell

On Mon, 2005-02-21 at 09:20 +0100, Jens Axboe wrote:
> On Sat, Feb 19 2005, Lee Revell wrote:
> > Starting around 2.6.11-rc4 I get this printk during the boot process
> > after kjournald starts, and again if I stress the filesystem.
> > 
> > cfq: depth 4 reached, tagging now on
> > 
> > Is this printk intentional?  I am sure users will wonder about it,
> > especially because (presumably) cfq turns tagging off at some point in
> > between, and doesn't say anything about it.
> 
> It is intentional, but could be supressed. But I'm wondering if the
> accounting change introduced a bug - what hardware are you using cfq on
> (ie does it actually do tagged command queueing, is it SCSI?)?
> 

Yes, this is an all SCSI system using the aic7xxx driver.

> It's a one-time message. CFQ starts out assuming the drive doesn't do
> TCQ, if the driver depth goes beyond a defined limit (4), it will assume
> that the hardware can do tagged queueing and change its internal
> accounting accordingly. The setting stays that way, it's not a
> transitional state.
> 


OK.  Then the multiple messages were CFQ enabling TCQ for the different
drives.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier

Andrew Morton wrote:
> > This will quickly lock up, since the futex_wait code dows a
> > down_read(mmap_sem), then a get_user().
> > 
> > The do_page_fault code on ppc64 (as well as other architectures) needs
> > to take the same semaphore for reading. This is all good until the
> > second thread comes into play: Its mmap call tries to take the same
> > semaphore for writing which causes in the do_page_fault down_read()
> > to get stuck. Classic deadlock.
> 
> Yup.  Jamie says that the futex code _has_ to hold mmap_sem across the
> get_user().  I forget (but could probably locate) the details.

It does - the "key" which identifies a futex depends on a vma
calculation, and the vma must not change between the calculation and
the get_user().

> > One attempt to fix this is included below. It works, but I'm not entirely
> > happy with the fact that it's a bit messy solution. If anyone has a
> > better idea for how to solve it I'd be all ears.
> 
> It's fairly sane.  Style-wise I'd be inclined to turn this:
> 
>   down_read(>mm->mmap_sem);
>   while (!check_user_page_readable(current->mm, uaddr1)) {
>   up_read(>mm->mmap_sem);
>   /* Fault in the page through get_user() but discard result */
>   if (get_user(curval, (int __user *)uaddr1) != 0)
>   return -EFAULT;
>   down_read(>mm->mmap_sem);
>   }

That won't work because the vma lock must be help between key
calculation and get_user() - otherwise futex is not reliable.  It
would work if the futex key calculation was inside the loop.

A much simpler solution (and sorry for not offering it earlier,
because Andrew Morton pointed out this bug long ago, but I was busy), is:

In futex.c:

down_read(>mm->mmap_sem);
get_futex_key(...) etc.
queue_me(...) etc.
current->flags |= PF_MMAP_SEM; <- new
ret = get_user(...);
current->flags &= PF_MMAP_SEM; <- new
/* the rest */

And in arch/*/mm/fault.c, replace every one of these:

down_read(>mmap_sem);

up_read(>mmap_sem);

with these:

if (!(current & PF_MMAP_SEM))
down_read(>mmap_sem);

if (!(current & PF_MMAP_SEM))
up_read(>mmap_sem);

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ide-scsi is deprecated for cd burning! Use ide-cd and give dev=/dev/hdX as device

2005-02-22 Thread Bill Davidsen

Bartlomiej Zolnierkiewicz wrote:
Hi,
On Mon, 21 Feb 2005 15:00:28 +, Alan Cox <[EMAIL PROTECTED]> wrote:
On Gwe, 2005-02-18 at 10:31, Kiniger, Karl (GE Healthcare) wrote:
Not entirely true (at least for me). I actually tried to read the
last iso9660 data sector with a small C program (reading 2 kb) and
it failed to read the sector. Using ide-scsi I was able to read it.
Thats the bug that should now be fixed by the ide changes I did so that
ide-cd has the knowledge ide-scsi has for partial completions of I/O

I haven't looked closely but I've noticed that these fixes are accessing rq->bio
directly which is a layering violation.  Could you de-bio and submit them?
[ AFAIR they are already splitted out in RHEL4 ]
Speaking about ide-scsi, it will be undeprecated after I fix the locking.
Rationale is that ide-scsi is _much_ simpler than ide-{cd,tape}.
[ although it doesn't support all the hardware that ide-{cd,tape} do ]
Some time ago I offered the opinion that this was the correct way to go. 
 Linux presents real SCSI, PPA, USB, and firewire as SCSI, and with 
ide-scsi all ATAPI devices are covered as well.

I have not tried ide-floppy in some time, but my two machines which do 
ZIP drive media exchange both use the ide-scsi interface and 2.4 kernel 
to talk to the devices. They are unlikely to get or need an update, but 
I did try ide-floppy at one time and had some poorly-remembered 
"learning experience" doing so.

Thanks for your work on this!
--
   -bill davidsen ([EMAIL PROTECTED])
"The secret to procrastination is to put things off until the
 last possible moment - but no longer"  -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Chris Friesen

Horst von Brand wrote:
Anthony DiSante <[EMAIL PROTECTED]> said:

That's one of the things I asked a few messages ago.  Some people on the 
list were saying that it'd be "really hard" and would "require a lot of 
bookkeeping" to "fix" permanently-D-stated processes... which is completely 
different than "impossible."
Most people here have little clue. It can't be done.
I realize it would be extremely difficult if not impossible to do in the 
current linux architecture, but I find it hard to believe that it is 
technically impossible if one were allowed to design the system from 
scratch.

Maybe I'm on crack, but would it not be technically possible to have all 
resource usage be tracked so that when a task tries to do something and 
hangs, eventually it gets cleaned up?

We already handle cleaning up stuff for userspace (memory, file 
descriptors, sockets, etc.).  Why not enforce a design that says "all 
entities taking a lock must specify a maximum hold time".  After that 
time expires, they are assumed to be hung, and all their resources 
(which were being tracked by some system) get cleaned up.

It would probably be complicated, slow, and generally not worth the 
effort.  But it seems at least technically possible.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Odd data corruption problem with LVM/ReiserFS

2005-02-22 Thread pcg

On Tue, Feb 22, 2005 at 02:46:44PM -0600, Alex Adriaanse <[EMAIL PROTECTED]> 
wrote:
> On Tue, 22 Feb 2005 20:49:00 +0100, Marc A. Lehmann <[EMAIL PROTECTED]> wrote:
> > Well, I do use reiserfs->aes-loop->lvm/dm->md5/raid5, and it never failed
> > for me, except once, and the error is likely to be outside reiserfs, and
> > possibly outside lvm.
> 
> Marc, what about you, were you using dm-snapshot when you experienced
> temporary corruption?

No snapshots either.

-- 
The choice of a
  -==- _GNU_
  ==-- _   generation Marc Lehmann
  ---==---(_)__  __   __  [EMAIL PROTECTED]
  --==---/ / _ \/ // /\ \/ /  http://schmorp.de/
  -=/_/_//_/\_,_/ /_/\_\  XX11-RIPE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: POSTing of video cards (WAS: Solo Xgl..)

2005-02-22 Thread Linus Torvalds

On Tue, 22 Feb 2005, Dmitry Torokhov wrote:
> 
> This sounds awfully like firmware loader that seems to be working just
> fine for a range of network cards and other devices.

Yes. HOWEVER - and note how firmware loading for this case is not validly
done at device discovery, but at "ifconfig" time.

Ie device discovery (probing) is a _separate_ phase entirely, and happens 
much earlier. We should initialize the hardware only when it actually gets 
"acively used" some way by user space.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Odd data corruption problem with LVM/ReiserFS

2005-02-22 Thread Alex Adriaanse

On Tue, 22 Feb 2005 20:49:00 +0100, Marc A. Lehmann <[EMAIL PROTECTED]> wrote:
> > >A reboot fixes this for both ext3 and reiserfs (i.e. the error is gone).
> > >
> >
> > Well, it didn't fix it for me. The fs was trashed for good. The major
> > question for me is now usability of md/dm for any purpose with 2.6.x.
> > For me this is a showstopper for any kind of 2.6 production use.
> 
> Well, I do use reiserfs->aes-loop->lvm/dm->md5/raid5, and it never failed
> for me, except once, and the error is likely to be outside reiserfs, and
> possibly outside lvm.

Marc, what about you, were you using dm-snapshot when you experienced
temporary corruption?

Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] [BUG: UML 2.6.11-rc4-bk-latest] sleeping function called from invalid context and segmentation fault

2005-02-22 Thread Blaisorblade

On Friday 18 February 2005 16:33, Anton Altaparmakov wrote:
> On Wed, 2005-02-16 at 19:35 +0100, Blaisorblade wrote:
> > On Monday 14 February 2005 12:48, Anton Altaparmakov wrote:
> > > Hi,
> > >
> > > I get a few Debug messages of the form from UML:
> > >
> > > Debug: sleeping function called from invalid context at
> > > include/asm/arch/semaphore.h:107
> > > in_atomic():0, irqs_disabled():1
> > > Call Trace:
> > > 087d77b0:  [<0809aaa5>] __might_sleep+0x135/0x180
> > > 087d77d8:  [<084d377f>] mcount+0xf/0x20
> > > 087d77e0:  [<0807cc13>] uml_console_write+0x33/0x80
> > >
> > > Most are coming via uml_console_write.
> >
> > The problem is that the UML tty drivers use a semaphore instead of a
> > spinlock for the locking, which also causes some other problems.
> >
> > The attached patch should fix this, but I've not yet made sure it is not
> > deadlock-prone (I didn't hit any during some very limited testing).
> >
> > So it's not yet ready for 2.6.11.
>
> Trying with the above patch in now only get two "sleeping function
> called from invalid context" warnings during boot and none during
> running.
I'll look at whether I can produce them... if it's no problem, post their 
traces anyway, please.
> However I get a lot of those errors: 
>
> arch/um/drivers/line.c:262: spin_lock(arch/um/drivers/line.c:085b5900)
> already locked by arch/um/drivers/line.c/262
Ok, I'll be looking into them ASAP (which infortunately means not very soon, 
sorry).

At a quick look, I see that line 262 is a "spin_lock" called by 
line_write_interrupt. I used spin_lock_irqsave everywhere else... but 
actually the interrupt must explicitly disable interrupts (I forgot) so, the 
simple answer seems to be using spin_lock_irqsave() would fix it. I cannot 
make a patch now.
> Also both before and after the patch I see a lot of messages like:
>
> kernel: line_write_room: tty2: no room left in buffer
I've never seen them... which is your test case?
Anyway, I'm not sure this is a needed warning: 

in include/linux/tty_driver.h, the .write_room member must tell the available 
space... it's reasonable that the TTY layer will call a flush_buffers 
function. However, looking at stdio_console, I'm seeing that, in fact, there 
is no flush_chars function(!!). I'll provide one ASAP.

-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6: USB Storage hangs machine on bootup for ~2 minutes

2005-02-22 Thread Alan Stern

On Tue, 22 Feb 2005, Parag Warudkar wrote:

> > You said that the system hangs during bootup.  Where in the log does that
> > hang occur?  The log itself looks perfectly normal.  The Maxtor drive is
> > scanned, the partitions detected, and then apparently one or two
> > partitions are mounted.  There's no indication of any problem.
> 
> I have tracked down the reason for this hang  - it seems that kudzu gets 
> stuck 
> in D state on usb_device_read - Below SysRQ+T from 2.6.11-rc4 - always 
> reproducible.
> 
> kudzu D  0  4424   4472 
> (NOTLB)
> 81002bebfd98 0086 81002c538150 81002f21d00e
>00078847ce40 81002b5977c0 fd38 803defc0
>81002bebfd88 80219b32
> Call Trace:
> {_atomic_dec_and_lock+290} {__down+421}
>{default_wake_function+0} 
> {__down_failed+53}
>{.text.lock.usb+5} 
> {usb_device_read+229}
>{vfs_read+225} {sys_read+80}
>{system_call+126}
> 
> Thereafter, if I try to mount the USB drive, even mount gets stuck.

usb_device_read acquires a couple of locks, one for the USB bus list and 
one for the root hub of the bus it's looking at.  I don't know which one 
occurs at offset 229 on your system -- maybe you can tell.  Oddly enough, 
neither of those locks is for a USB device like the Maxtor drive.  So it's 
not at all clear why plugging in the drive should mess up kudzu.  Or why 
the blockage should clear up after a couple of minutes.

Perhaps we can find out by looking at other entries in the stack trace.  
Of particular interest are the khubd, usb-storage, and scsi_eh processes.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Anthony DiSante

Chris Friesen wrote:
There has been some discussion that these hung
states could be "fixed", but that's absolutely
positively incorrect.
That's one of the things I asked a few messages ago.  Some people on 
the list were saying that it'd be "really hard" and would "require a 
lot of bookkeeping" to "fix" permanently-D-stated processes... which 
is completely different than "impossible."

Nothing is "impossible".
Maybe where you live, but in my world some things are most certainly 
impossible.  Getting a 1MHz CPU to run at 1THz is impossible.  Having the 
kernel automatically install the latest firmware for a buggy device, without 
actually having the new firmware file, is impossible.  Turning your 17" LCD 
monitor into a 50" HDTV is impossible.

Cracking SHA-256 isn't "impossible", it just 
takes more computing power than exists on the face of the planet.
Thanks for proving my point.  That's a perfect example of the difference 
between "hard" and "impossible."

Call it "infeasable" if you like.  It's theoretically possible, but the 
amount of work and the overhead involved just are not realistic.
Again, that was one of my earlier questions, since some people here were 
saying "impossible" while other were saying "really hard."

-Anthony DiSante
http://nodivisions.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ide-scsi is deprecated for cd burning! Use ide-cd and give dev=/dev/hdXas device

2005-02-22 Thread Bill Davidsen

David Lang wrote:
I regularly burn tarballs to a CD without useing a filesystem and as 
long as I use the -pad option when burning I've had no problems reading 
them (the -pad was nessasary even when I was useing ide-scsi)
That matches my experience, at least as far as the "no problem" part, I 
never tried without -pad because it just seemed as if cdrecord would 
have a better idea of what the drive wanted than I do.

I have burned tarballs, as well as cpio (I like the checking with -Hcrc 
and not overwriting newer versions of a file), and more recently I have 
been burning encrypted filesystem images onto DVDs directly, and that 
works as well.

--
   -bill davidsen ([EMAIL PROTECTED])
"The secret to procrastination is to put things off until the
 last possible moment - but no longer"  -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Jay Lan

Kaigai Kohei wrote:
Hello, everyone.
Andrew Morton wrote:
 > Jay Lan <[EMAIL PROTECTED]> wrote:
 >
 >>Since the need of Linux system accounting has gone beyond what BSD
 >>accounting provides, i think it is a good idea to create a thin layer
 >>of common code for various accounting packages, such as BSD accounting,
 >>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
 >>a routine in the common code which would then invoke those accounting
 >>packages that register to the acct_common to handle do_exit situation.
 >
 >
 > This all seems to be heading in the wrong direction.  Do we really 
want to
 > have lots of different system accounting packages all hooking into a
 > generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
 > framework?
 >
 > Can't we get _one_ accounting system in there, get it right, avoid the
 > framework?

I think there are two issues about system accounting framework.
Issue: 1) How to define the appropriate unit for accounting ?
Current BSD-accountiong make a collection per process accounting 
information.
CSA make additionally a collection per process-aggregation accounting.
The 'enhanced acct data collection' patches that were added to
2-6-11-rc* tree still do collection of per process data.
CSA added those per-process data to per-aggregation ("job") data
structure at do_exit() time when a process termintes.
It is appropriate to make the fork-exit event handling framework for 
definition
of the process-aggregation, such as PAGG.

This system-accounting per process-aggregation is quite useful,
thought I tried the SGI's implementation named 'job' in past days.
Issue: 2) What items should be collected for accounting information ?
BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
Integral-VM/RSS, and amount of block-I/O additionally.
These data are now collected in 2.6.11-rc* code. Note that these data
are still per-process.
I think it's hard to implement the accounting-engine as a kernel loadable
module using any kinds of framework. Because, we must put callback 
functions
into all around the kernel for this purpose.

Thus, I make a proposion as follows:
We should separate the process-aggregation functionality and collecting
accounting informations.
I totally agree with this! Actually that was what we have done. The data
collection part of code has been unified.
Something of framework to implement process-aggregation is necessary.
And, making a collection of accounting information should be merged
into BSD-accounting and implemented as a part of monolithic kernel
as Guillaume said.
This sounds good. I am interested in learning how ELSA saves off
the per-process accounting data before the data got disposed. If
that scheme works for CSA, we would be very happy to adopt the
scheme. The current BSD scheme is very insufficient. The code is
very BSD centric and it provides no way to handle process-aggregation.
Thanks,
 - jay
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Horst von Brand

Anthony DiSante <[EMAIL PROTECTED]> said:
> linux-os wrote:
> > There has been some discussion that these hung
> > states could be "fixed", but that's absolutely
> > positively incorrect.

> That's one of the things I asked a few messages ago.  Some people on the 
> list were saying that it'd be "really hard" and would "require a lot of 
> bookkeeping" to "fix" permanently-D-stated processes... which is completely 
> different than "impossible."

Most people here have little clue. It can't be done.
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ide-scsi is deprecated for cd burning! Use ide-cd and give dev=/dev/hdX as device

2005-02-22 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:
On Fri, 18 Feb 2005 15:23:44 EST, Bill Davidsen said:

I'll try to build a truth table for this, I'm now working with some 
non-iso data sets, so I'm a bit more interested. I would expect read() 
to only try to read one sector, so I'll just do a quick and dirty to get 
the size from the command line, seek and read.

I haven't had a problem using dd to date, as long as I know how long the 
data set was, but I'll try to have results tonight.

The problem is that often you don't know exactly how long the data set is
(think "backup burned to CD/RW") - there's a *lot* of code that does stuff
like
while (actual=read(fd,buffer,65536) > 0) {
...
}
with the realistic expectation that the last read might return less than 64k,
in which case 'actual' will tell us how much was read.  Instead, we just get
an error on the read.
Note that 'dd' does this - that's why you get messages like '12343+1 blocks 
read'.
We *really* want to get to a point where 'dd' will work *without* having to
tell it a 'bs=' and 'count=' to get the size right
I think I already had a pretty good grasp on that, in my previous post 
on this I noted: "The last time I looked at this, the issue was that the 
user software did a large read and the ide-cd didn't properly return a 
small data block with no error, but rather returned an error with no 
data. If you get the size of the ISO image, you can read that with any 
program which doesn't try to read MORE than that."

It sounds as if (a) the problem with ide-cd is going to get fixed, and 
(b) ide-scsi may not remain depreciated. A win-win if I ever saw one.

--
   -bill davidsen ([EMAIL PROTECTED])
"The secret to procrastination is to put things off until the
 last possible moment - but no longer"  -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] TCP-Hybla proposal

2005-02-22 Thread Baruch Even

Stephen Hemminger wrote:
On Tue, 22 Feb 2005 15:34:42 +0100
Daniele Lacamera <[EMAIL PROTECTED]> wrote:
One last note: IMHO we really need a better way to select congestion 
avoidance scheme between those available, instead of switching each one 
on and off. I.e., we can't say how vegas and westwood perform when 
switched on together, can we?
The protocol choices are mutually exclusive, if you walk through the code
(or do experiments), you find that that only one gets used.  As part of the
longer term plan, I would like to:
- have one sysctl
- choice by route and destination
- union for fields in control block
I'm currently working on a patch to make it a single sysctl, I've got it 
working (as in, the kernel doesn't crash). I still need to validate the 
actual implementation.

I'd say the next stage is to merge fields as much as possible.
I doubt the real use of selection by route/dest, all of the high-speed 
protocols (except possibly for TCP-Hybla) are intended for sender-only 
servers who push lots of data and should work in all cases and alongside 
 Reno TCP traffic without undue unfairness.

I hope to finish the clean-up and preparation of H-TCP for inclusion in 
the kernel and can then help with the unionisation.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: uninterruptible sleep lockups

2005-02-22 Thread Chris Friesen

Anthony DiSante wrote:
linux-os wrote:
There has been some discussion that these hung
states could be "fixed", but that's absolutely
positively incorrect.

That's one of the things I asked a few messages ago.  Some people on the 
list were saying that it'd be "really hard" and would "require a lot of 
bookkeeping" to "fix" permanently-D-stated processes... which is 
completely different than "impossible."
Nothing is "impossible".  Cracking SHA-256 isn't "impossible", it just 
takes more computing power than exists on the face of the planet.

Call it "infeasable" if you like.  It's theoretically possible, but the 
amount of work and the overhead involved just are not realistic.  And 
then you have the likelihood of a bug in the bookkeeping code leading to 
runtime corruption...  Better to take the hit now and fix the original 
problem.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Jay Lan

Guillaume Thouvenin wrote:
On Fri, 2005-02-18 at 17:16 -0800, Andrew Morton wrote:
Jay Lan <[EMAIL PROTECTED]> wrote:
Since the need of Linux system accounting has gone beyond what BSD
accounting provides, i think it is a good idea to create a thin layer
of common code for various accounting packages, such as BSD accounting,
CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
a routine in the common code which would then invoke those accounting
packages that register to the acct_common to handle do_exit situation.
This all seems to be heading in the wrong direction.  Do we really want to
have lots of different system accounting packages all hooking into a
generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
framework?
Can't we get _one_ accounting system in there, get it right, avoid the
framework?

  Is it possible to just merge the BSD accounting and the CSA accounting
by adding in the current BSD per-process accounting structure some
missing fields like the mm integral provided by the CSA patch?
Hi Guillaume,
All raw data CSA needs already stored in task_struct of the process.
ELSA is just a user of the accounting data. We need a hook in the
do_fork() routine to manage group of processes, not to do accounting.
I see at least three layers of functions in doing system accounting:
data collection, handling of the raw data, and presentation of the
data to users.
We have merged the data collection part. :)
Handling of the raw data seems done in ELSA by user spaced daemon
and you are proposing to add a hook at fork time. I am interested
in learning your approach. How ELSA adds per process accounting data
to your grouping (banks) when a process exit? How do you save
accounting data you need in task_struct before it is disposed? BSD
handles that through acct_process() hook at do_exit(). CSA also
depends on a hook at do_exit() to merge per-process data to per-job
data. How does ELSA handle this without a need of a do_exit() hook?
Thanks,
 - jay
Guillaume

---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595_id=14396=click
___
Lse-tech mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lse-tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 437 matches

Mail list logo