Re: [PATCH] i386 kernel instant reboot with older binutils fix

2007-01-02 Thread Eric W. Biederman
Vivek Goyal <[EMAIL PROTECTED]> writes:

> Hi Eric,
>
> This .text.head section is not part of vmlinux. This is part of uncompressed
> portion in bzImage. arch/i386/boot/compressed/head.S.
>
> Hence, arch/i386/boot/compressed/vmlinux.lds should take care of it which
> already has entry for linking .text.head section.

Yep.  Sorry never mind.

Thanks for the good tracking on this one.  

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] psmouse split [01/03]

2007-01-02 Thread Andres Salomon
Sam Ravnborg wrote:
> Hi Andres.
> 
[...]
> 
> The above code should be redone to use list based assignement.
> Something like this:
> 
> psmouse-y := psmouse-base.o
> psmouse-$(CONFIG_MOUSE_PS2_ALPS)   += alps.o
> psmouse-$(CONFIG_MOUSE_PS2_LOGIPS2PP)  += logips2pp.o
> psmouse-$(CONFIG_MOUSE_PS2_SYNAPTICS)  += synaptics.o
> psmouse-$(CONFIG_MOUSE_PS2_LIFEBOOK)   += lifebook.o
> psmouse-$(CONFIG_MOUSE_PS2_TRACKPOINT) += trackpoint.o
> 
>   Sam

Thanks; committed to my git repo.

http://dev.laptop.org/git?p=users/dilinger/psmouse-split;a=shortlog;h=psmouse-static

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ACPI: EC: evaluating _Q10

2007-01-02 Thread Thomas Meyer

Len Brown schrieb:

The bigger question is why you get "tons of these" --
as EC  events are usually infrequent.
Do you have a big number next to "acpi" in /proc/interrupts?
If so, at what rate is it growing?
  
maybe tons were a bit to overstated... After a fresh reboot, i count 110 
_q10 and one _q21messages now with 8 min. uptime and around 10300 acpi 
interrupts.



480 sec/110 ec events = 4 seconds/event.  This doesn't worry me.
Could be battery updates, thermal updates etc.

480/10300 = an interrupt every 46 ms.
This is certainly not right.
Have you always seen runaway acpi interrupts on this box, no matter the kernel?
  
To be honest i didn't care and knew about that this could be an problem 
until now. But the biggest part of the acpi interrupts seems to happen 
while the first minutes, maybe while booting because with 22 min. uptime 
i get these values:


  CPU0   CPU1
 0: 413784  0   IO-APIC-edge  timer
 9:  14544  0   IO-APIC-fasteoi   acpi

24 min. uptime:
  CPU0   CPU1
 0: 435875  0   IO-APIC-edge  timer
 9:  15247  0   IO-APIC-fasteoi   acpi

26 min. uptime:
 0: 470428  0   IO-APIC-edge  timer
 9:  16251  0   IO-APIC-fasteoi   acpi

So let's say approximatley 700 to 1000 acpi interrupts in 120 seconds. I 
guess this sounds better, doesn't it?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 2/4 qrcu: add rcutorture test

2007-01-02 Thread Jens Axboe
From: Oleg Nesterov <[EMAIL PROTECTED]>

Add rcutorture test for qrcu.

Works for me!

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>
Signed-off-by: Josh Triplett <[EMAIL PROTECTED]>
Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
---
 include/linux/srcu.h |4 +-
 kernel/rcutorture.c  |   71 -
 2 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index fcdb749..03a9010 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -64,8 +64,8 @@ struct qrcu_struct {
 };
 
 int init_qrcu_struct(struct qrcu_struct *qp);
-int qrcu_read_lock(struct qrcu_struct *qp);
-void qrcu_read_unlock(struct qrcu_struct *qp, int idx);
+int qrcu_read_lock(struct qrcu_struct *qp) __acquires(qp);
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx) __releases(qp);
 void synchronize_qrcu(struct qrcu_struct *qp);
 
 /**
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 482b11f..bd7fd49 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -465,6 +465,73 @@ static struct rcu_torture_ops srcu_ops = {
 };
 
 /*
+ * Definitions for qrcu torture testing.
+ */
+
+static struct qrcu_struct qrcu_ctl;
+
+static void qrcu_torture_init(void)
+{
+   init_qrcu_struct(_ctl);
+   rcu_sync_torture_init();
+}
+
+static void qrcu_torture_cleanup(void)
+{
+   synchronize_qrcu(_ctl);
+   cleanup_qrcu_struct(_ctl);
+}
+
+static int qrcu_torture_read_lock(void) __acquires(_ctl)
+{
+   return qrcu_read_lock(_ctl);
+}
+
+static void qrcu_torture_read_unlock(int idx) __releases(_ctl)
+{
+   qrcu_read_unlock(_ctl, idx);
+}
+
+static int qrcu_torture_completed(void)
+{
+   return qrcu_ctl.completed;
+}
+
+static void qrcu_torture_synchronize(void)
+{
+   synchronize_qrcu(_ctl);
+}
+
+static int qrcu_torture_stats(char *page)
+{
+   int cnt = 0;
+   int idx = qrcu_ctl.completed & 0x1;
+
+   cnt += sprintf([cnt], "%s%s per-CPU(idx=%d):",
+   torture_type, TORTURE_FLAG, idx);
+
+   cnt += sprintf([cnt], " (%d,%d)",
+   atomic_read(qrcu_ctl.ctr + 0),
+   atomic_read(qrcu_ctl.ctr + 1));
+
+   cnt += sprintf([cnt], "\n");
+   return cnt;
+}
+
+static struct rcu_torture_ops qrcu_ops = {
+   .init = qrcu_torture_init,
+   .cleanup = qrcu_torture_cleanup,
+   .readlock = qrcu_torture_read_lock,
+   .readdelay = srcu_read_delay,
+   .readunlock = qrcu_torture_read_unlock,
+   .completed = qrcu_torture_completed,
+   .deferredfree = rcu_sync_torture_deferred_free,
+   .sync = qrcu_torture_synchronize,
+   .stats = qrcu_torture_stats,
+   .name = "qrcu"
+};
+
+/*
  * Definitions for sched torture testing.
  */
 
@@ -503,8 +570,8 @@ static struct rcu_torture_ops sched_ops = {
 };
 
 static struct rcu_torture_ops *torture_ops[] =
-   { _ops, _sync_ops, _bh_ops, _bh_sync_ops, _ops,
- _ops, NULL };
+   { _ops, _sync_ops, _bh_ops, _bh_sync_ops,
+ _ops, _ops, _ops, NULL };
 
 /*
  * RCU torture writer kthread.  Repeatedly substitutes a new structure
-- 
1.4.4.2.g02c9

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 1/4 qrcu: "quick" srcu implementation

2007-01-02 Thread Jens Axboe
From: Oleg Nesterov <[EMAIL PROTECTED]>

Very much based on ideas, corrections, and patient explanations from
Alan and Paul.

The current srcu implementation is very good for readers, lock/unlock
are extremely cheap. But for that reason it is not possible to avoid
synchronize_sched() and polling in synchronize_srcu().

Jens Axboe wrote:
>
> It works for me, but the overhead is still large. Before it would take
> 8-12 jiffies for a synchronize_srcu() to complete without there actually
> being any reader locks active, now it takes 2-3 jiffies. So it's
> definitely faster, and as suspected the loss of two of three
> synchronize_sched() cut down the overhead to a third.

'qrcu' behaves the same as srcu but optimized for writers. The fast path
for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock().
The slow path is __wait_event(), no polling. However, the reader does
atomic inc/dec on lock/unlock, and the counters are not per-cpu.

Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context,
and 'qrcu_struct' can be compile-time initialized.

See also (a long) discussion:
http://marc.theaimsgroup.com/?t=11637085763

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
---
 include/linux/srcu.h |   30 ++
 kernel/srcu.c|  105 ++
 2 files changed, 135 insertions(+), 0 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index aca0eee..fcdb749 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -27,6 +27,8 @@
 #ifndef _LINUX_SRCU_H
 #define _LINUX_SRCU_H
 
+#include 
+
 struct srcu_struct_array {
int c[2];
 };
@@ -50,4 +52,32 @@ void srcu_read_unlock(struct srcu_struct *sp, int idx) 
__releases(sp);
 void synchronize_srcu(struct srcu_struct *sp);
 long srcu_batches_completed(struct srcu_struct *sp);
 
+/*
+ * fully compatible with srcu, but optimized for writers.
+ */
+
+struct qrcu_struct {
+   int completed;
+   atomic_t ctr[2];
+   wait_queue_head_t wq;
+   struct mutex mutex;
+};
+
+int init_qrcu_struct(struct qrcu_struct *qp);
+int qrcu_read_lock(struct qrcu_struct *qp);
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx);
+void synchronize_qrcu(struct qrcu_struct *qp);
+
+/**
+ * cleanup_qrcu_struct - deconstruct a quick-RCU structure
+ * @qp: structure to clean up.
+ *
+ * Must invoke this after you are finished using a given qrcu_struct that
+ * was initialized via init_qrcu_struct().  We reserve the right to
+ * leak memory should you fail to do this!
+ */
+static inline void cleanup_qrcu_struct(struct qrcu_struct *qp)
+{
+}
+
 #endif
diff --git a/kernel/srcu.c b/kernel/srcu.c
index 3507cab..53c6989 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -256,3 +256,108 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock);
 EXPORT_SYMBOL_GPL(synchronize_srcu);
 EXPORT_SYMBOL_GPL(srcu_batches_completed);
 EXPORT_SYMBOL_GPL(srcu_readers_active);
+
+/**
+ * init_qrcu_struct - initialize a quick-RCU structure.
+ * @qp: structure to initialize.
+ *
+ * Must invoke this on a given qrcu_struct before passing that qrcu_struct
+ * to any other function.  Each qrcu_struct represents a separate domain
+ * of QRCU protection.
+ */
+int init_qrcu_struct(struct qrcu_struct *qp)
+{
+   qp->completed = 0;
+   atomic_set(qp->ctr + 0, 1);
+   atomic_set(qp->ctr + 1, 0);
+   init_waitqueue_head(>wq);
+   mutex_init(>mutex);
+
+   return 0;
+}
+
+/**
+ * qrcu_read_lock - register a new reader for an QRCU-protected structure.
+ * @qp: qrcu_struct in which to register the new reader.
+ *
+ * Counts the new reader in the appropriate element of the qrcu_struct.
+ * Returns an index that must be passed to the matching qrcu_read_unlock().
+ */
+int qrcu_read_lock(struct qrcu_struct *qp)
+{
+   for (;;) {
+   int idx = qp->completed & 0x1;
+   if (likely(atomic_inc_not_zero(qp->ctr + idx)))
+   return idx;
+   }
+}
+
+/**
+ * qrcu_read_unlock - unregister a old reader from an QRCU-protected structure.
+ * @qp: qrcu_struct in which to unregister the old reader.
+ * @idx: return value from corresponding qrcu_read_lock().
+ *
+ * Removes the count for the old reader from the appropriate element of
+ * the qrcu_struct.
+ */
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx)
+{
+   if (atomic_dec_and_test(qp->ctr + idx))
+   wake_up(>wq);
+}
+
+/**
+ * synchronize_qrcu - wait for prior QRCU read-side critical-section completion
+ * @qp: qrcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_qrcu() from the corresponding
+ * QRCU read-side critical section; doing so will result in 

[BLOCK] 0/4 explicit io plugging

2007-01-02 Thread Jens Axboe
This series of 4 patches switch the block layer to use explicit
plugging instead of the implicit plugging that takes place now when io
is queued against an empty queue.

The first three patches update RCU to include a QRCU method similar to
SRCU. QRCU is a bit heavier on the reader side, but a _lot_ cheaper for
the synchronization part. The new plugging scheme needs to synchronize
queue plugs for barriers and queue quiescing, so it needs to be cheap.

The fourth patch is the actual meat of the series. It also has a longer
explanation of the benefits of the explicit plugging.

I'm sending this out to get some review of the code, and to ask people
to do some testing. I'm looking for both the "hey it works for me" as
well as benchmark runs. In the performance category, I'm interested in
both high end (lots of CPUs) testing to see whether this actually does
reduce lock contention and block layer cpu utilization as well as more
simplistic io performance results on "normal" boxes to make sure we are
not regressing anywhere.

This code is also available in the 'plug' branch of the block layer git
repo:

git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git/

 Documentation/RCU/checklist.txt |   13 +
 Documentation/RCU/rcu.txt   |6 
 Documentation/RCU/torture.txt   |   15 -
 Documentation/RCU/whatisRCU.txt |3 
 Documentation/block/biodoc.txt  |5 
 block/as-iosched.c  |   15 -
 block/cfq-iosched.c |8 
 block/deadline-iosched.c|9 
 block/elevator.c|   44 ---
 block/ll_rw_blk.c   |  483 
 block/noop-iosched.c|8 
 drivers/block/cciss.c   |6 
 drivers/block/cpqarray.c|3 
 drivers/block/floppy.c  |1 
 drivers/block/loop.c|   12 
 drivers/block/pktcdvd.c |5 
 drivers/block/rd.c  |2 
 drivers/block/umem.c|   16 -
 drivers/ide/ide-cd.c|9 
 drivers/ide/ide-io.c|   25 --
 drivers/md/bitmap.c |1 
 drivers/md/dm-emc.c |2 
 drivers/md/dm-table.c   |   14 -
 drivers/md/dm.c |   18 -
 drivers/md/dm.h |1 
 drivers/md/linear.c |   14 -
 drivers/md/md.c |3 
 drivers/md/multipath.c  |   32 --
 drivers/md/raid0.c  |   17 -
 drivers/md/raid1.c  |   70 -
 drivers/md/raid10.c |   73 --
 drivers/md/raid5.c  |   60 
 drivers/message/i2o/i2o_block.c |6 
 drivers/mmc/mmc_queue.c |3 
 drivers/s390/block/dasd.c   |3 
 drivers/s390/char/tape_block.c  |1 
 drivers/scsi/ide-scsi.c |2 
 drivers/scsi/scsi_lib.c |   47 +--
 fs/adfs/inode.c |1 
 fs/affs/file.c  |2 
 fs/befs/linuxvfs.c  |1 
 fs/bfs/file.c   |1 
 fs/block_dev.c  |2 
 fs/buffer.c |   25 --
 fs/cifs/file.c  |2 
 fs/direct-io.c  |7 
 fs/ecryptfs/mmap.c  |   23 -
 fs/efs/inode.c  |1 
 fs/ext2/inode.c |2 
 fs/ext3/inode.c |3 
 fs/ext4/inode.c |3 
 fs/fat/inode.c  |1 
 fs/freevxfs/vxfs_subr.c |1 
 fs/fuse/inode.c |1 
 fs/gfs2/ops_address.c   |1 
 fs/hfs/inode.c  |2 
 fs/hfsplus/inode.c  |2 
 fs/hpfs/file.c  |1 
 fs/isofs/inode.c|1 
 fs/jfs/inode.c  |1 
 fs/jfs/jfs_metapage.c   |1 
 fs/minix/inode.c|1 
 fs/ntfs/aops.c  |4 
 fs/ntfs/compress.c  |2 
 fs/ocfs2/aops.c |1 
 fs/ocfs2/cluster/heartbeat.c|4 
 fs/qnx4/inode.c |1 
 fs/reiserfs/inode.c |1 
 fs/sysv/itree.c |1 
 fs/udf/file.c   |1 
 fs/udf/inode.c  |1 
 fs/ufs/inode.c  |1 
 fs/ufs/truncate.c   |2 
 fs/xfs/linux-2.6/xfs_aops.c |1 
 fs/xfs/linux-2.6/xfs_buf.c  |   15 -
 include/linux/backing-dev.h |3 
 include/linux/blkdev.h  |   75 +++---
 include/linux/buffer_head.h |1 
 include/linux/elevator.h|8 
 include/linux/fs.h  |1 
 include/linux/pagemap.h |   12 
 include/linux/raid/md.h |1 
 include/linux/sched.h   |1 
 include/linux/srcu.h|   30 ++
 include/linux/swap.h|2 
 kernel/rcutorture.c |   71 +
 kernel/sched.c  |1 
 kernel/srcu.c   |  105 
 mm/filemap.c|   62 -
 mm/nommu.c  |4 
 

[PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-02 Thread hui
On Sat, Dec 30, 2006 at 12:19:40PM +0100, Ingo Molnar wrote:
> your patch looks pretty ok to me in principle. A couple of suggestions 
> to make it more mergable:
> 
>  - instead of BUG_ON()s please use DEBUG_LOCKS_WARN_ON() and make sure 
>the code is never entered again if one assertion has been triggered.
>Pass down a return result of '0' to signal failure. See
>kernel/lockdep.c about how to do this. One thing we dont need are
>bugs in instrumentation bringing down a machine.

I'm using a non-fatal error checking instead of BUG_ON. BUG_ON was a more
aggressive way that I use to find problem initiallly.

>  - remove dead (#if 0) code

Done.

>  - Documentation/CodingStyle compliance - the code is not ugly per se
>but still looks a bit 'alien' - please try to make it look Linuxish,
>if i apply this we'll probably stick with it forever. This is the
>major reason i havent applied it yet.

I reformatted most of the patch to be 80 column limited. I simplified a
number of names, but I'm open to suggestions and patches to how to go
about this. Much of this code was a style experiment, but now I have to
make this more mergable.

>  - the xfs/wrap_lock change looks bogus - the lock is initialized
>already. What am i missing?

Correct. This has been removed.

I've applied Daniel Walker's changes as well.

Patch here:


http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.patch

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] psmouse split [01/03]

2007-01-02 Thread Sam Ravnborg
Hi Andres.

> diff --git a/drivers/input/mouse/Makefile b/drivers/input/mouse/Makefile
> index 21a1de6..e7c7fbb 100644
> --- a/drivers/input/mouse/Makefile
> +++ b/drivers/input/mouse/Makefile
> @@ -14,4 +14,24 @@ obj-$(CONFIG_MOUSE_SERIAL) += sermouse.o
>  obj-$(CONFIG_MOUSE_HIL)  += hil_ptr.o
>  obj-$(CONFIG_MOUSE_VSXXXAA)  += vsxxxaa.o
>  
> -psmouse-objs  := psmouse-base.o alps.o logips2pp.o synaptics.o lifebook.o 
> trackpoint.o
> +psmouse-objs := psmouse-base.o
> +
> +ifeq ($(CONFIG_MOUSE_PS2_ALPS),y)
> +psmouse-objs += alps.o
> +endif
> +
> +ifeq ($(CONFIG_MOUSE_PS2_LOGIPS2PP),y)
> +psmouse-objs += logips2pp.o
> +endif
> +
> +ifeq ($(CONFIG_MOUSE_PS2_SYNAPTICS),y)
> +psmouse-objs += synaptics.o
> +endif
> +
> +ifeq ($(CONFIG_MOUSE_PS2_LIFEBOOK),y)
> +psmouse-objs += lifebook.o
> +endif
> +
> +ifeq ($(CONFIG_MOUSE_PS2_TRACKPOINT),y)
> +psmouse-objs += trackpoint.o
> +endif


The above code should be redone to use list based assignement.
Something like this:

psmouse-y := psmouse-base.o
psmouse-$(CONFIG_MOUSE_PS2_ALPS)   += alps.o
psmouse-$(CONFIG_MOUSE_PS2_LOGIPS2PP)  += logips2pp.o
psmouse-$(CONFIG_MOUSE_PS2_SYNAPTICS)  += synaptics.o
psmouse-$(CONFIG_MOUSE_PS2_LIFEBOOK)   += lifebook.o
psmouse-$(CONFIG_MOUSE_PS2_TRACKPOINT) += trackpoint.o

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: Absolute relocations present

2007-01-02 Thread Thomas Meyer

Vivek Goyal schrieb:

What's your ld version. I don't remember but some particular versions
of ld will have this problem. These ld versions do some optimizations
and if a section size is zero then linker gets rid of that section and
any symbol defined w.r.t removed section, ld makes that symbol absolute
instead of section relative. That's why you see above warnings. 


I had raised this issue on binutils mailing list and they fixed it.

http://sourceware.org/ml/binutils/2006-09/msg00305.html

I am using following ld version and it works fine for me.

GNU ld version 2.17.50.0.6-2.el5 20061020
 
So you will have to move to the latest ld version and problem should be

resolved.
  
Correct. I'm using binutils version 2.17. This is the current testing 
branch of gentoo for x86.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: remove BKL dependency from drivers/block/loop.c

2007-01-02 Thread Jens Axboe
On Wed, Dec 27 2006, Ingo Molnar wrote:
> Subject: [patch] block: remove BKL dependency from drivers/block/loop.c
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> the block loopback device is protected by lo->lo_ctl_mutex and it does 
> not need to hold the BKL anywhere. Convert its ioctl to unlocked_ioctl 
> and remove the BKL acquire/release from its compat_ioctl.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

Acked-by: Jens Axboe <[EMAIL PROTECTED]>

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] psmouse split [03/03]

2007-01-02 Thread Andres Salomon
Andres Salomon wrote:
> Andres Salomon wrote:
>> Dmitry Torokhov wrote:
>>> On 12/13/06, Andres Salomon <[EMAIL PROTECTED]> wrote:
 Alright, I guess we're down to a matter of taste then.  I'll change the
 patch to still have a monolithic psmouse that allows protocols to be
 enabled/disabled via Kconfig.

>>> That'd be great. Thanks!
>>>
>> Yikes, almost forgot to send this.  Here you go; 3 patches total.
>> Please let me know if there are any other change.  The first is attached.

^
Er, let me know if you'd like any other changes.

>>
> 
> Here's the second; everything is split except for the synaptic stuff.
> 

And finally, the third splits out the synaptic stuff.

My initial tests show that compiling the psmouse driver for a specific
protocol extension can cut the driver's size by more than half.
>From ba82c3e427cd9e319e5d8898c2f730589da698a6 Mon Sep 17 00:00:00 2001
From: Andres Salomon <[EMAIL PROTECTED]>
Date: Tue, 26 Dec 2006 17:13:42 -0500
Subject: [PATCH] Allow disabling of synaptic protocol extension

This allows disabling of synaptic; basically, it leaves synaptic_detect()
and synaptic_reset() (for synaptic hardware emulating other protocols), but
gets rid of synaptic_init.

Signed-off-by: Andres Salomon <[EMAIL PROTECTED]>
---
 drivers/input/mouse/Makefile   |6 +-
 drivers/input/mouse/psmouse-base.c |5 +
 drivers/input/mouse/synaptics.c|   34 ++
 3 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/input/mouse/Makefile b/drivers/input/mouse/Makefile
index e7c7fbb..76722ec 100644
--- a/drivers/input/mouse/Makefile
+++ b/drivers/input/mouse/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_MOUSE_SERIAL)+= sermouse.o
 obj-$(CONFIG_MOUSE_HIL)+= hil_ptr.o
 obj-$(CONFIG_MOUSE_VSXXXAA)+= vsxxxaa.o
 
-psmouse-objs := psmouse-base.o
+psmouse-objs := psmouse-base.o synaptics.o
 
 ifeq ($(CONFIG_MOUSE_PS2_ALPS),y)
 psmouse-objs += alps.o
@@ -24,10 +24,6 @@ ifeq ($(CONFIG_MOUSE_PS2_LOGIPS2PP),y)
 psmouse-objs += logips2pp.o
 endif
 
-ifeq ($(CONFIG_MOUSE_PS2_SYNAPTICS),y)
-psmouse-objs += synaptics.o
-endif
-
 ifeq ($(CONFIG_MOUSE_PS2_LIFEBOOK),y)
 psmouse-objs += lifebook.o
 endif
diff --git a/drivers/input/mouse/psmouse-base.c 
b/drivers/input/mouse/psmouse-base.c
index 6b3ac9d..bfb47e1 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -583,8 +583,11 @@ #endif
synaptics_hardware = 1;
 
if (max_proto > PSMOUSE_IMEX) {
+#ifdef CONFIG_MOUSE_PS2_SYNAPTICS
if (!set_properties || synaptics_init(psmouse) == 0)
return PSMOUSE_SYNAPTICS;
+#endif
+
 /*
  * Some Synaptics touchpads can emulate extended protocols (like IMPS/2).
  * Unfortunately Logitech/Genius probes confuse some firmware versions so
@@ -702,6 +705,7 @@ #endif
.maxproto   = 1,
.detect = im_explorer_detect,
},
+#ifdef CONFIG_MOUSE_PS2_SYNAPTICS
{
.type   = PSMOUSE_SYNAPTICS,
.name   = "SynPS/2",
@@ -709,6 +713,7 @@ #endif
.detect = synaptics_detect,
.init   = synaptics_init,
},
+#endif
 #ifdef CONFIG_MOUSE_PS2_ALPS
{
.type   = PSMOUSE_ALPS,
diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
index 49ac696..5d69f52 100644
--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -45,28 +45,30 @@ #define YMAX_NOMINAL 4448
  /
 
 /*
- * Send a command to the synpatics touchpad by special commands
+ * Set the synaptics touchpad mode byte by special commands
  */
-static int synaptics_send_cmd(struct psmouse *psmouse, unsigned char c, 
unsigned char *param)
+static int synaptics_mode_cmd(struct psmouse *psmouse, unsigned char mode)
 {
-   if (psmouse_sliced_command(psmouse, c))
+   unsigned char param[1];
+
+   if (psmouse_sliced_command(psmouse, mode))
return -1;
-   if (ps2_command(>ps2dev, param, PSMOUSE_CMD_GETINFO))
+   param[0] = SYN_PS_SET_MODE2;
+   if (ps2_command(>ps2dev, param, PSMOUSE_CMD_SETRATE))
return -1;
return 0;
 }
 
+#ifdef CONFIG_MOUSE_PS2_SYNAPTICS
+
 /*
- * Set the synaptics touchpad mode byte by special commands
+ * Send a command to the synpatics touchpad by special commands
  */
-static int synaptics_mode_cmd(struct psmouse *psmouse, unsigned char mode)
+static int synaptics_send_cmd(struct psmouse *psmouse, unsigned char c, 
unsigned char *param)
 {
-   unsigned char param[1];
-
-   if (psmouse_sliced_command(psmouse, mode))
+   if (psmouse_sliced_command(psmouse, c))
return -1;
-   param[0] = SYN_PS_SET_MODE2;
-   if (ps2_command(>ps2dev, param, PSMOUSE_CMD_SETRATE))
+   if 

Re: [PATCH] psmouse split [02/03]

2007-01-02 Thread Andres Salomon
Andres Salomon wrote:
> Dmitry Torokhov wrote:
>> On 12/13/06, Andres Salomon <[EMAIL PROTECTED]> wrote:
>>> Alright, I guess we're down to a matter of taste then.  I'll change the
>>> patch to still have a monolithic psmouse that allows protocols to be
>>> enabled/disabled via Kconfig.
>>>
>> That'd be great. Thanks!
>>
> 
> Yikes, almost forgot to send this.  Here you go; 3 patches total.
> Please let me know if there are any other change.  The first is attached.
> 

Here's the second; everything is split except for the synaptic stuff.
>From 712d339038bb348c354f5e7472f1f156e485bda3 Mon Sep 17 00:00:00 2001
From: Andres Salomon <[EMAIL PROTECTED]>
Date: Tue, 26 Dec 2006 16:50:47 -0500
Subject: [PATCH] Wrap all protocols except for synaptics w/ ifdefs

This patch allows ALPS, LOGIPS2PP, LIFEBOOK, and TRACKPOINT protocol
extensions to be disabled during compilation.  The synaptics stuff is left
alone for now, since it needs special handling for synaptic pass-through
ports.

Signed-off-by: Andres Salomon <[EMAIL PROTECTED]>
---
 drivers/input/mouse/psmouse-base.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/input/mouse/psmouse-base.c 
b/drivers/input/mouse/psmouse-base.c
index a0e4a03..6b3ac9d 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -555,6 +555,7 @@ static int psmouse_extensions(struct psm
 {
int synaptics_hardware = 0;
 
+#ifdef CONFIG_MOUSE_PS2_LIFEBOOK
 /*
  * We always check for lifebook because it does not disturb mouse
  * (it only checks DMI information).
@@ -565,6 +566,7 @@ static int psmouse_extensions(struct psm
return PSMOUSE_LIFEBOOK;
}
}
+#endif
 
 /*
  * Try Kensington ThinkingMouse (we try first, because synaptics probe
@@ -596,6 +598,7 @@ static int psmouse_extensions(struct psm
synaptics_reset(psmouse);
}
 
+#ifdef CONFIG_MOUSE_PS2_ALPS
 /*
  * Try ALPS TouchPad
  */
@@ -610,15 +613,20 @@ static int psmouse_extensions(struct psm
max_proto = PSMOUSE_IMEX;
}
}
+#endif
 
if (max_proto > PSMOUSE_IMEX && genius_detect(psmouse, set_properties) 
== 0)
return PSMOUSE_GENPS;
 
+#ifdef CONFIG_MOUSE_PS2_LOGIPS2PP
if (max_proto > PSMOUSE_IMEX && ps2pp_init(psmouse, set_properties) == 
0)
return PSMOUSE_PS2PP;
+#endif
 
+#ifdef CONFIG_MOUSE_PS2_TRACKPOINT
if (max_proto > PSMOUSE_IMEX && trackpoint_detect(psmouse, 
set_properties) == 0)
return PSMOUSE_TRACKPOINT;
+#endif
 
 /*
  * Reset to defaults in case the device got confused by extended
@@ -660,12 +668,14 @@ static const struct psmouse_protocol psm
.maxproto   = 1,
.detect = ps2bare_detect,
},
+#ifdef CONFIG_MOUSE_PS2_LOGIPS2PP
{
.type   = PSMOUSE_PS2PP,
.name   = "PS2++",
.alias  = "logitech",
.detect = ps2pp_init,
},
+#endif
{
.type   = PSMOUSE_THINKPS,
.name   = "ThinkPS/2",
@@ -699,6 +709,7 @@ static const struct psmouse_protocol psm
.detect = synaptics_detect,
.init   = synaptics_init,
},
+#ifdef CONFIG_MOUSE_PS2_ALPS
{
.type   = PSMOUSE_ALPS,
.name   = "AlpsPS/2",
@@ -706,18 +717,23 @@ static const struct psmouse_protocol psm
.detect = alps_detect,
.init   = alps_init,
},
+#endif
+#ifdef CONFIG_MOUSE_PS2_LIFEBOOK
{
.type   = PSMOUSE_LIFEBOOK,
.name   = "LBPS/2",
.alias  = "lifebook",
.init   = lifebook_init,
},
+#endif
+#ifdef CONFIG_MOUSE_PS2_TRACKPOINT
{
.type   = PSMOUSE_TRACKPOINT,
.name   = "TPPS/2",
.alias  = "trackpoint",
.detect = trackpoint_detect,
},
+#endif
{
.type   = PSMOUSE_AUTO,
.name   = "auto",
-- 
1.4.1



[PATCHSET 2][PATCH 1/1] Combining epoll and disk file AIO

2007-01-02 Thread Suparna Bhattacharya
On Wed, Dec 27, 2006 at 09:08:56PM +0530, Suparna Bhattacharya wrote:
> (2) Most of these other applications need the ability to process both
> network events (epoll) and disk file AIO in the same loop. With POSIX AIO
> they could at least sort of do this using signals (yeah, and all 
> associated
> issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with
> modifications from Jeff Moyer and me) addresses this problem for native
> linux aio in a simple manner. Tridge has written a test harness to
> try out the Samba4 event library modifications to use this. Jeff Moyer
> has a modified version of pipetest for comparison.
> 


Enable epoll wait to be unified with io_getevents

From: Zach Brown, Jeff Moyer, Suparna Bhattacharya

Previously there have been (complicated and scary) attempts to funnel
individual aio events down epoll or vice versa.  This instead lets one
issue an entire sys_epoll_wait() as an aio op.  You'd setup epoll as
usual and then issue epoll_wait aio ops which would complete once epoll
events had been copied. This will enable a single io_getevents() event
loop to process both disk AIO and epoll notifications.

>From an application standpoint a typical flow works like this:
- Use epoll_ctl as usual to add/remove epoll registrations
- Instead of issuing sys_epoll_wait, setup an iocb using
  io_prep_epoll_wait (see examples below) specifying the epoll
  events buffer to fill up with epoll notifications. Submit the iocb
  using io_submit
- Now io_getevents can be used to wait for both epoll waits and
  disk aio completion. If the returned AIO event is of type
  IO_CMD_EPOLL_WAIT, then corresponding result value indicates the
  number of epoll notifications in the iocb's event buffer, which
  can now be processed just like once would process results from a
  sys_epoll_wait()

There are a couple of sample applications:
- Andrew Tridgell has implemented a little test harness using an aio events
  library implementation intended for samba4 
  http://samba.org/~tridge/etest
  (The -e aio option uses aio epoll wait and can issue disk aio as well)
- An updated version of pipetest from Jeff Moyer has a --aio-epoll option
  http://people.redhat.com/jmoyer/aio/epoll/pipetest.c

There is obviously a little overhead compared to using sys_epoll_wait(), due
to the extra step of submitting the epoll wait iocb, most noticible when
there are very few events processed per loop. However, the goal here is not
to build an epoll alternative but merely to allow network and disk I/O to
be processed in the same event loop which is where the efficiencies really
come from. Picking up more epoll events in each loop can amortize the
overhead across many operations to mitigate the impact.
  
Thanks to Arjan Van de Van for helping figure out how to resolve the
lockdep complaints. Both ctx->lock and ep->lock can be held in certain 
wait queue callback routines, thus being nested inside q->lock. However, this
excludes ctx->wait or ep->wq wait queues, which can safetly be nested
inside ctx->lock or ep->lock respectively. So we teach lockdep to recognize
these as distinct classes.

Signed-off-by: Zach Brown <[EMAIL PROTECTED]>
Signed-off-by: Jeff Moyer <[EMAIL PROTECTED]>
Signed-off-by: Suparna Bhattacharya <[EMAIL PROTECTED]>

---

 linux-2.6.20-rc1-root/fs/aio.c  |   54 +
 linux-2.6.20-rc1-root/fs/eventpoll.c|   95 +---
 linux-2.6.20-rc1-root/include/linux/aio.h   |2 
 linux-2.6.20-rc1-root/include/linux/aio_abi.h   |1 
 linux-2.6.20-rc1-root/include/linux/eventpoll.h |   31 +++
 linux-2.6.20-rc1-root/include/linux/sched.h |2 
 linux-2.6.20-rc1-root/kernel/timer.c|   21 +
 7 files changed, 196 insertions(+), 10 deletions(-)

diff -puN fs/aio.c~aio-epoll-wait fs/aio.c
--- linux-2.6.20-rc1/fs/aio.c~aio-epoll-wait2006-12-28 14:22:52.0 
+0530
+++ linux-2.6.20-rc1-root/fs/aio.c  2007-01-03 11:45:40.0 +0530
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -193,6 +194,8 @@ static int aio_setup_ring(struct kioctx 
kunmap_atomic((void *)((unsigned long)__event & PAGE_MASK), km); \
 } while(0)
 
+static struct lock_class_key ioctx_wait_queue_head_lock_key;
+
 /* ioctx_alloc
  * Allocates and initializes an ioctx.  Returns an ERR_PTR if it failed.
  */
@@ -224,6 +227,8 @@ static struct kioctx *ioctx_alloc(unsign
spin_lock_init(>ctx_lock);
spin_lock_init(>ring_info.ring_lock);
init_waitqueue_head(>wait);
+   /* Teach lockdep to recognize this lock as a different class */
+   lockdep_set_class(>wait.lock, _wait_queue_head_lock_key);
 
INIT_LIST_HEAD(>active_reqs);
INIT_LIST_HEAD(>run_list);
@@ -1401,6 +1406,42 @@ static ssize_t aio_setup_single_vector(s
return 0;
 }
 
+/* Uses iocb->ki_private */
+void aio_free_iocb_timer(struct kiocb *iocb)
+{
+   struct timer_list 

Re: [PATCH] Update Documentation/pci.txt v7

2007-01-02 Thread Grant Grundler
On Tue, Jan 02, 2007 at 01:45:05PM -0800, Greg KH wrote:
> On Mon, Dec 25, 2006 at 01:08:31AM -0700, Grant Grundler wrote:
> > On Mon, Dec 25, 2006 at 01:06:35AM -0700, Grant Grundler wrote:
> > > On Sat, Dec 23, 2006 at 11:07:26PM -0700, Grant Grundler wrote:
> > > > "final" patch v7 and commit log entry appended below. :)
> > > 
> > > v8 adds 2cd round of feedback from Randy Dunlap.
> > 
> > Obviously the subject line is stale...it's really v8 now.
> > It's just way past my bedtime again.
> 
> Care to resend the latest version to me now?  I'm lost in a maze of
> different versions of this patch :)

Here is the archived version (click on "get diff" link):
http://lkml.org/lkml/2006/12/25/2

Andrew (akpm) already picked it up:

| The patch titled
|  Update Documentation/pci.txt
| has been added to the -mm tree.  Its filename is
|  update-documentation-pcitxt.patch

Maybe it's easier for you to grab the patch directly from akpm?

thanks for checking again!
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] psmouse split [01/03]

2007-01-02 Thread Andres Salomon
Dmitry Torokhov wrote:
> On 12/13/06, Andres Salomon <[EMAIL PROTECTED]> wrote:
>>
>> Alright, I guess we're down to a matter of taste then.  I'll change the
>> patch to still have a monolithic psmouse that allows protocols to be
>> enabled/disabled via Kconfig.
>>
> 
> That'd be great. Thanks!
> 

Yikes, almost forgot to send this.  Here you go; 3 patches total.
Please let me know if there are any other change.  The first is attached.
>From 3238fbc61c7879c38d750b710dd009560b815ab4 Mon Sep 17 00:00:00 2001
From: Andres Salomon <[EMAIL PROTECTED]>
Date: Tue, 26 Dec 2006 16:24:57 -0500
Subject: [PATCH] Create PS/2 protocol options for Kconfig

Initial framework for disabling PS/2 protocol extensions.  The current
protocols can only be disabled if CONFIG_EMBEDDED is selected.  No source
files are changed w/ this patch, merely build stuff.

Signed-off-by: Andres Salomon <[EMAIL PROTECTED]>
---
 drivers/input/mouse/Kconfig  |   50 ++
 drivers/input/mouse/Makefile |   22 ++
 2 files changed, 71 insertions(+), 1 deletions(-)

diff --git a/drivers/input/mouse/Kconfig b/drivers/input/mouse/Kconfig
index 35d998c..498930d 100644
--- a/drivers/input/mouse/Kconfig
+++ b/drivers/input/mouse/Kconfig
@@ -37,6 +37,56 @@ config MOUSE_PS2
  To compile this driver as a module, choose M here: the
  module will be called psmouse.
 
+config MOUSE_PS2_ALPS
+   bool "ALPS PS/2 mouse protocol extension" if EMBEDDED
+   default y
+   depends on MOUSE_PS2
+   ---help---
+ Say Y here if you have an ALPS PS/2 touchpad connected to
+ your system.
+
+ If unsure, say Y.
+
+config MOUSE_PS2_LOGIPS2PP
+   bool "Logictech PS/2++ mouse protocol extension" if EMBEDDED
+   default y
+   depends on MOUSE_PS2
+   ---help---
+ Say Y here if you have a Logictech PS/2++ mouse connected to
+ your system.
+
+ If unsure, say Y.
+
+config MOUSE_PS2_SYNAPTICS
+   bool "Synaptics PS/2 mouse protocol extension" if EMBEDDED
+   default y
+   depends on MOUSE_PS2
+   ---help---
+ Say Y here if you have a Synaptics PS/2 TouchPad connected to
+ your system.
+
+ If unsure, say Y.
+
+config MOUSE_PS2_LIFEBOOK
+   bool "Fujitsu Lifebook PS/2 mouse protocol extension" if EMBEDDED
+   default y
+   depends on MOUSE_PS2
+   ---help---
+ Say Y here if you have a Fujitsu B-series Lifebook PS/2
+ TouchScreen connected to your system.
+
+ If unsure, say Y.
+
+config MOUSE_PS2_TRACKPOINT
+   bool "IBM Trackpoint PS/2 mouse protocol extension" if EMBEDDED
+   default y
+   depends on MOUSE_PS2
+   ---help---
+ Say Y here if you have an IBM Trackpoint PS/2 mouse connected
+ to your system.
+
+ If unsure, say Y.
+
 config MOUSE_SERIAL
tristate "Serial mouse"
select SERIO
diff --git a/drivers/input/mouse/Makefile b/drivers/input/mouse/Makefile
index 21a1de6..e7c7fbb 100644
--- a/drivers/input/mouse/Makefile
+++ b/drivers/input/mouse/Makefile
@@ -14,4 +14,24 @@ obj-$(CONFIG_MOUSE_SERIAL)   += sermouse.o
 obj-$(CONFIG_MOUSE_HIL)+= hil_ptr.o
 obj-$(CONFIG_MOUSE_VSXXXAA)+= vsxxxaa.o
 
-psmouse-objs  := psmouse-base.o alps.o logips2pp.o synaptics.o lifebook.o 
trackpoint.o
+psmouse-objs := psmouse-base.o
+
+ifeq ($(CONFIG_MOUSE_PS2_ALPS),y)
+psmouse-objs += alps.o
+endif
+
+ifeq ($(CONFIG_MOUSE_PS2_LOGIPS2PP),y)
+psmouse-objs += logips2pp.o
+endif
+
+ifeq ($(CONFIG_MOUSE_PS2_SYNAPTICS),y)
+psmouse-objs += synaptics.o
+endif
+
+ifeq ($(CONFIG_MOUSE_PS2_LIFEBOOK),y)
+psmouse-objs += lifebook.o
+endif
+
+ifeq ($(CONFIG_MOUSE_PS2_TRACKPOINT),y)
+psmouse-objs += trackpoint.o
+endif
-- 
1.4.1



Re: [PATCH] i386 kernel instant reboot with older binutils fix

2007-01-02 Thread Vivek Goyal
On Tue, Jan 02, 2007 at 11:44:34PM -0700, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > o i386 kernel reboots instantly if compiled with binutils older than
> >   2.6.15.
> >
> > o Older binutils required explicit flags to mark a section allocatable
> >   and executable(AX). Newer binutils automatically mark a section AX if
> >   the name starts with .text.
> >
> > o While defining a new section using assembler "section" directive,
> >   explicitly mention section flags.
> 
> As such this patch looks fine, and is certainly harmless.  But don't we
> also need to address the issue that .text.head is not listed in the
> linker script?
> 
> i.e.  Don't we also need?
> 
>   .text : AT(ADDR(.text) - LOAD_OFFSET) {
>   _text = .;  /* Text and read-only data */
> + *(.text.head)
>   *(.text)
>   SCHED_TEXT
>   LOCK_TEXT
>   KPROBES_TEXT
>   *(.fixup)
>   *(.gnu.warning)
>   _etext = .; /* End of text section */
>   } :text = 0x9090
> 
> 
> I'm not even certain how the i386 kernel links properly without the above.

Hi Eric,

This .text.head section is not part of vmlinux. This is part of uncompressed
portion in bzImage. arch/i386/boot/compressed/head.S.

Hence, arch/i386/boot/compressed/vmlinux.lds should take care of it which
already has entry for linking .text.head section.

. =  0  ;
.text.head : {
_head = . ;
*(.text.head)
_ehead = . ;
}

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 kernel instant reboot with older binutils fix

2007-01-02 Thread Eric W. Biederman
Vivek Goyal <[EMAIL PROTECTED]> writes:

> o i386 kernel reboots instantly if compiled with binutils older than
>   2.6.15.
>
> o Older binutils required explicit flags to mark a section allocatable
>   and executable(AX). Newer binutils automatically mark a section AX if
>   the name starts with .text.
>
> o While defining a new section using assembler "section" directive,
>   explicitly mention section flags. 

As such this patch looks fine, and is certainly harmless.  But don't we
also need to address the issue that .text.head is not listed in the
linker script?

i.e.  Don't we also need?

  .text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .;  /* Text and read-only data */
+   *(.text.head)
*(.text)
SCHED_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
*(.gnu.warning)
_etext = .; /* End of text section */
  } :text = 0x9090


I'm not even certain how the i386 kernel links properly without the above.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM
> > In the example you
> > gave earlier, task with min_nr of 2 will be woken up after 4 completed
> > events.
> 
> I only gave 2 ios/events in that example.
> 
> Does that clear up the confusion?

It occurs to me that people might not be aware how peculiar the
current io_getevent wakeup scheme is, to the extend of erratic
behavior.

In the blocking path of read_events(), we essentially doing the
following loop (simplified for clarity):

while (i < nr) {
add_wait_queue_exclusive(>wait, );
do {
ret = aio_read_evt(ctx, );
if (!ret)
schedule();
while (1);
remove_wait_queue(>wait, );
copy_to_user(event, , sizeof(ent));
}

Noticed that when thread comes out of schedule(), it removes itself
from the wait queue, and requeue itself at the end of the wait queue
for each and every event it reaps.  So if there are multiple threads
waiting in io_getevents, completed I/O are handed out in round robin
scheme to all waiting threads.

To illustrate it in ascii graph, here is what happens:

   thread 1   thread 2

   queue at head
   schedule()

  queue at 2nd position
  schedule

aio_complete
(event 1)
   remove_wait_queue  (now thread 2 is at head)
   reap event 1
   requeue at tail
   schedule

aio_complete
(event 2)
  remove_wait_queue (now thread 1 is at 
head)
  reap event 2
  requeue at tail
  schedule

If thread 1 sleeps first with min_nr = 2, and thread 2 sleeps
second with min_nr = 3, then thread 1 wakes up on event _3_.
But if thread 2 sleeps first, thread 1 sleeps second, thread 1
wakes up on event _4_.  If someone ask me to describe algorithm
of io_getevents wake-up scheme in the presence of multiple
waiters, I call it erratic and un-deterministic.

Looking back to the example Zach gave earlier, current
implementation behaves just like what described as an undesired
bug (modified and tortured):

issue 2 ops
first io_getevents sleeps with a min_nr of 2
second io_getevents sleeps with min_nr of 3
2 ops complete
first sleeper twiddles thumbs

So I can categorize my patchset as a bug fix instead of a
performance patch ;-)  Let's be serious, this ought to be fixed
one way or the other.


- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: replace "memset(...,0,PAGE_SIZE)" calls with "clear_page()"?

2007-01-02 Thread dean gaudet
On Sat, 30 Dec 2006, Denis Vlasenko wrote:

> I was experimenting with SSE[2] clear_page() which uses
> non-temporal stores. That one requires 16 byte alignment.
> 
> BTW, it worked ~300% faster than memset. But Andi Kleen
> insists that cache eviction caused by NT stores will make it
> slower in macrobenchmark.
> 
> Apart from fairly extensive set of microbechmarks
> I tested kernel compiles (i.e. "real world load")
> and they are FASTER too, not slower, but Andi
> is fairly entrenched in his opinion ;)
> I gave up.

you know, with the kernel zeroing pages through the 1:1 phys mapping, and 
userland accessing pages through a different mapping... it seems that 
frequently virtual address bits 12..14 will differ between user and 
kernel.

on K8 this results in a virtual alias conflict which costs *70 cycles* per 
cache line.  (K8 L1 DC uses virtual bits 12..14 as part of the index.)  
this is larger than the cost for L1 miss L2 hit...

this wouldn't happen with movnt... but then we get into the handwaving 
arguments about timing of accesses to the freshly zeroed page.  too bad 
there's no "evict from L1 to L2" operation -- that would avoid the virtual 
alias problem.

there's an event (75h unit mask 02h) to measure virtual alias conflicts... 
i've always wondered if there are workloads which trigger this behaviour. 
it can happy on copy to/from user as well.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3: known unfixed regressions - x86_64 boot failure: "IO-APIC + timer doesn't work"

2007-01-02 Thread Yinghai Lu

Please check the latest version. ( 01/02/2007)

YH
[PATCH] x86_64: check_timer with io apic setup before try_apic_pin

In the check_timer, it forget to set up the io apic before try_apic_pin for
timer. So add set_try_apic_pin to set up the io apic pin. otherwise the 
try_apic_pin will not work.

also add remove_irq_to_pin to corresponging to add_pin_to_irq to make irq_2_pin
more complete and make the set io apic more convenient.

Also add add_irq_entry in mparese.c to add apic/pin pair to mp_irqs in case 
some nvidia based MB have wrong io apic pin entry for timer. Some ck804 based 
MB, MPTABLE or ACPI assume timer is on ioapic pin2. but HW is set to pin0. or 
reversing case.

the check_timer will try 
1. apic1, pin1 with 8259 IRQ0 disabled
2. apic=0, pin=0 with 8259 IRQ0 disabled
3. apic=0, pin=2 with 8259 IRQ0 disabled
4. apic1, pin1 with 8259 IRQ0 enabled
5. apic2, pin2, pure 8259A routing on the 8259 as reported by BIOS

without the patch:
..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 disabled<3> .. failed
..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 enabled<7>APIC error on CPU0: 00(40) .. failed
..TIMER: trying IO-APIC=0 PIN=2 fallback with 8259 IRQ0 disabled<3> .. failed
..TIMER: trying IO-APIC=0 PIN=0 8259A broadcast ExtINT from BIOS<7>number of MP IRQ sources: 84.
testing the IO APIC...


 done.

with the patch:
..TIMER: trying IO-APIC=0 PIN=0 with 8259 IRQ0 disabled<3> .. failed
..TIMER: trying IO-APIC=0 PIN=2 fallback with 8259 IRQ0 disabled<7>number of MP IRQ sources: 85.
testing the IO APIC...


 done. 
 
cc: Andi Kleen <[EMAIL PROTECTED]>
cc: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 2a1dcd5..e200d0a 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -273,10 +273,17 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 	struct irq_pin_list *entry = irq_2_pin + irq;
 
 	BUG_ON(irq >= NR_IRQS);
-	while (entry->next)
+	while (entry->next) {
+		if (entry->apic == apic && entry->pin == pin) 
+			return;
+		if (entry->pin == -1) 
+			break;
 		entry = irq_2_pin + entry->next;
+	}
 
 	if (entry->pin != -1) {
+		if (entry->apic == apic && entry->pin == pin) 
+			return;
 		entry->next = first_free_entry;
 		entry = irq_2_pin + entry->next;
 		if (++first_free_entry >= PIN_MAP_SIZE)
@@ -286,6 +293,39 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 	entry->pin = pin;
 }
 
+static void remove_pin_to_irq(unsigned int irq, int apic, int pin)
+{
+	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *pri;
+	struct irq_pin_list *next;
+
+	BUG_ON(irq >= NR_IRQS);
+
+	for (;;) {
+		if (entry->apic == apic && entry->pin == pin) {
+			if(entry->next) {
+next = irq_2_pin + entry->next;
+entry->apic = next->apic;
+entry->pin = next->pin;
+entry->next = next->next;
+next->apic = -1;
+next->pin = -1;
+next->next = 0;
+			} else {
+entry->apic = -1;
+entry->pin = -1;
+			}
+			return;
+		}
+		pri = entry;
+		if (pri->next) 
+			entry = irq_2_pin + pri->next;
+		else
+			break;
+	} 
+
+}
+
 
 #define DO_ACTION(name,R,ACTION, FINAL)	\
 	\
@@ -1570,6 +1610,22 @@ static inline void unlock_ExtINT_logic(void)
  * fanatically on his truly buggy board.
  */
 
+static void set_try_apic_pin(int apic, int pin, int type)
+{
+	int idx;
+	int irq = 0;
+	int bus = 0; /* MP_ISA_BUS */
+	int irqflag = 5; /* MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH */
+
+	idx = find_irq_entry(apic,pin,type);
+
+	if (idx == -1) 
+		idx = add_irq_entry(type, irqflag, bus, irq, apic, pin);
+
+	add_pin_to_irq(irq, apic, pin);
+	setup_IO_APIC_irq(apic, pin, idx, irq);
+}
+
 static int try_apic_pin(int apic, int pin, char *msg)
 {
 	apic_printk(APIC_VERBOSE, KERN_INFO
@@ -1588,7 +1644,7 @@ static int try_apic_pin(int apic, int pin, char *msg)
 		}
 		return 1;
 	}
-	clear_IO_APIC_pin(apic, pin);
+
 	apic_printk(APIC_QUIET, KERN_ERR " .. failed\n");
 	return 0;
 }
@@ -1599,6 +1655,7 @@ static void check_timer(void)
 	int apic1, pin1, apic2, pin2;
 	int vector;
 	cpumask_t mask;
+	int i;
 
 	/*
 	 * get/set the timer IRQ vector:
@@ -1621,33 +1678,60 @@ static void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
-	/* Do this first, otherwise we get double interrupts on ATI boards */
-	if ((pin1 != -1) && try_apic_pin(apic1, pin1,"with 8259 IRQ0 disabled"))
-		return;
+	apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
+		vector, apic1, pin1, apic2, pin2);
 
-	/* Now try again with IRQ0 8259A enabled.
-	   Assumes timer is on IO-APIC 0 ?!? */
-	enable_8259A_irq(0);
-	unmask_IO_APIC_irq(0);
-	if (try_apic_pin(apic1, pin1, "with 8259 IRQ0 enabled"))
-		return;
-	disable_8259A_irq(0);
+	if (pin1 

Re: 2.6.19 and up to 2.6.20-rc2 Ethernet problems x86_64

2007-01-02 Thread Len Brown

> ..same problem with 2.6.20-rc3. Last worked with 
> 2.6.19-rc6-git12, so it was 2.6.19 where it failed.


> Attaching both case1 normal, case2 acpi=noirq. With acpi=noirq ethernet 
> doesn't get configured, route -n says it's an Unsupported operation, 
> ifconfig only shows for localhost, ifconfig eth0 192.168.10.5 also 
> complains of a config error.

It seems that the "acpi=noirq" (and probably also the acpi=off) case
is simply an additional broken case, not a success case to compare to.

The thing we really want to compare is dmesg and /proc/interrupts
from 2.6.19-rc6-git12, and the broken current release.
Perhaps you can put that info in the bug report when you file it.

thanks,
-Len

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Section mismatch on current git head

2007-01-02 Thread Vivek Goyal
On Sat, Dec 23, 2006 at 12:40:27AM -0500, Len Brown wrote:
> 
> > WARNING: vmlinux - Section mismatch: reference to
> > .init.data:acpi_sci_flags from .text between 'acpi_sci_ioapic_setup' (at
> > offset 0xc010e020) and 'acpi_unmap_lsapic'
> > WARNING: vmlinux - Section mismatch: reference to
> > .init.data:acpi_sci_flags from .text between 'acpi_sci_ioapic_setup' (at
> > offset 0xc010e04a) and 'acpi_unmap_lsapic'
> > WARNING: vmlinux - Section mismatch: reference to
> > .init.text:mp_override_legacy_irq from .text between
> > 'acpi_sci_ioapic_setup' (at offset 0xc010e062) and 'acpi_unmap_lsapic'
> > WARNING: vmlinux - Section mismatch: reference to
> > .init.data:acpi_sci_override_gsi from .text between
> > 'acpi_sci_ioapic_setup' (at offset 0xc010e068) and 'acpi_unmap_lsapic'
> 
> The acpi_sci_ioapic_setup ones should go away with the patch below, but do no 
> harm in the mean-time.
> cheers,
> -Len
> 
> commit 0351a612f7a46995c28d4ef6189229b5d1dfc6c3
> Author: Len Brown <[EMAIL PROTECTED]>
> Date:   Thu Dec 21 01:29:59 2006 -0500
> 
> ACPI: fix section mis-match build warning
> 
> Dunno why this pops out in only in the allmodconfig build.
> Though the warning is accurate, all the callers of the flagged
> non __init function are __init, this is not a functional change.
> 

Hi Len,

These warnings pop up on allmodconfig as CONFIG_RELOCATABLE is set.

This option retains relocation information in vmlinux file and MODPOST
goes through these relocations and generates warnings.

So till date MODPOST never cribbed about vmlinux section mismatches as
it could never detect those. But with CONFIG_RELOCATABLE=y, these mismatches
become visible to MODPOST.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Section mismatch on current git head

2007-01-02 Thread Vivek Goyal
On Fri, Dec 22, 2006 at 09:47:10AM -0800, Randy Dunlap wrote:
> On Fri, 22 Dec 2006 15:24:41 +0100 Thomas Meyer wrote:
> 
> > Hello.
> >
> > What kind of problem is this:
> > 1.) the one that should be fixed, but also can be ignored or
> > 2.) the one that have to be fixed and ignorance is a bad idea?
> >
> 
> Is this with CONFIG_RELOCATABLE=y?
> There were some patches posted to address section mismatches
> with that config option.  I suppose that they will be in the
> next -mm release (?), so this needs to be retested with
> those patches applied.
> 

Hi Randy,

I have posted some patches. But there are lots of warnings and I am
still working through the rest. Already posted patches are available
in rc2-mm1.

Any help from respective subsystem maintainers is appreciated. :-)

These problems are already present. CONFIG_RELOCATABLE=y just makes
them visible to MODPOST as relocatation information is retained in
vmlinux if CONFIG_RELOCATABLE=y.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Willy Tarreau
On Wed, Jan 03, 2007 at 03:12:13AM +0100, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > > 
> > > - Select a different x86 CPU in the config.
> > >   -   Unfortunately the C3-2 flags seem to simply tell GCC
> > >   to schedule for ppro (like i686) and enabled MMX and SSE
> > >   -   Probably useless
> > 
> > Actually, try this one. Try using something that doesn't like "cmov". 
> > Maybe the C3-2 simply has some internal cmov bugginess. 
> 
> That's a good suggestion. Earlier C3s didn't have cmov so it's 
> not entirely unlikely that cmov in C3-2 is broken in some cases.

Agreed! When I developped the cmov emulator, I used an early C3 for the
tests (well, a "Samuel2" to be precise), because it did not report "cmov"
in its flags. I first thought "wow, my emulator is amazingly fast!" because
it took something like 50 cycles to do cmovne %eax,%ebx.

Then I realized that this processor performed cmov itself between
registers, and only triggered the invalid opcode when one of the operand
was a memory reference. And this time, for a hard-coded instruction, it
was really slow...

For this reason, I would not be surprized at all that there would be some
buggy behaviour in the cmov right there. Maybe a bug in the decoder unit
making it skip a byte when the next instruction in the prefetch queue is
a cmov affecting same registers... When vendors can do dirty things such
as executing unsupported instructions, we can expect anything from them.

> Configuring for P5MMX or 486 should be good safe alternatives.

I generally use the P5MMX target for such processors.

> /Mikael

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OT Coffee (was Re: Open letter to Linux kernel developers (was Re: Binary Drivers)

2007-01-02 Thread Valdis . Kletnieks
On Tue, 02 Jan 2007 15:01:56 PST, David Schwartz said:
> There is simply no way you can argue that McDonald's failed to warn people
> about the risks. The cup says "hot" on it,

Actually, the "HOT" on the cup and the sticker in the drive-through that
says "Warning: Coffee is served very hot" were added after that lawsuit.


pgpsyjbrl64U8.pgp
Description: PGP signature


Re: WARNING: Absolute relocations present

2007-01-02 Thread Vivek Goyal
On Fri, Dec 22, 2006 at 03:33:12PM +0100, Thomas Meyer wrote:
> More warnings on current git head:
> 
>  OBJCOPY arch/i386/boot/compressed/vmlinux.bin
>  RELOCS  arch/i386/boot/compressed/vmlinux.relocs
> WARNING: Absolute relocations present
> Offset Info Type Sym.Value Sym.Name
> c0107bd7 00636601   R_386_32 c034f000  __smp_alt_instructions
> c0107bff 00622301   R_386_32 c034f000  __smp_alt_instructions_end
> c0107c68 00622301   R_386_32 c034f000  __smp_alt_instructions_end
> c0107c6d 00636601   R_386_32 c034f000  __smp_alt_instructions
> c01365aa 004aba01   R_386_32 c030ef3c  __stop___ksymtab_gpl_future
> c01365af 0053a101   R_386_32 c030ef3c  __start___ksymtab_gpl_future
> c01365e6 0053a101   R_386_32 c030ef3c  __start___ksymtab_gpl_future
> c01365ed 004aad01   R_386_32 c0311d38  __start___kcrctab_gpl_future
> c01365f4 00486d01   R_386_32 c030ef3c  __stop___ksymtab_unused
> c01365f9 004b6601   R_386_32 c030ef3c  __start___ksymtab_unused
> c0136614 004b6601   R_386_32 c030ef3c  __start___ksymtab_unused
> c013661b 004c4d01   R_386_32 c0311d38  __start___kcrctab_unused
> and so on...
> 
> Should i ignore these warnings, too?
> 

Hi Thomas,

What's your ld version. I don't remember but some particular versions
of ld will have this problem. These ld versions do some optimizations
and if a section size is zero then linker gets rid of that section and
any symbol defined w.r.t removed section, ld makes that symbol absolute
instead of section relative. That's why you see above warnings. 

I had raised this issue on binutils mailing list and they fixed it.

http://sourceware.org/ml/binutils/2006-09/msg00305.html

I am using following ld version and it works fine for me.

GNU ld version 2.17.50.0.6-2.el5 20061020
 
So you will have to move to the latest ld version and problem should be
resolved.

> I have to ignore a lot of warnings on the current linux tree...
> 

These warnings will not impact booting of your kernel as long as you
are running the kernel from its compiled address. It will run into
issues only if this kernel is loaded and run from an arbitrary address.

I think as of today, only kexec bootloader has been modified to load
the bzImage at some other arbitratary address. Grub and lilo will still
load it at 1MB so it should work fine.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open letter to Linux kernel developers (was Re: Binary Drivers)

2007-01-02 Thread Valdis . Kletnieks
On Tue, 02 Jan 2007 12:14:54 PST, David Schwartz said:
> 
> > The recommendet _serving_ temperature for coffe is 55 °C or below.
> 
> Nonsense! 55C (100F) is ludicrously low for coffee.
> 
> 70C (125F) is the *minimum* recommended serving temperature. 165-190F is the

100F == 37C
125F == 52C

55C == 131F
70C == 158F

Yes, 100F *is* ludicrously low for coffee.  :)


pgpzvU9q5Otdl.pgp
Description: PGP signature


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Benjamin Herrenschmidt

> In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
> on machines with USB keyboards, because sharing the USB host
> controller is beyond non-trivial.  I've never implemented support
> for that on sparc64 and I frankly have no desire to do the work
> necessary to support that.  It simply is not worth it.

I was wondering about that  :-)

Device sharing with a "live" OF is just an absolute pain in the ass and
I'm actually pretty happy not to have to do it. Segher and I, and more
recently, paulus and I, have been discussing about ways to deal with it
or make a version of SLOF (segher's pet OF implementation) that could
stay alive but the more I think about the burden, the more I'm inclined
to just give up...

There have been a recurring need for system vendors to provide
"firmware" type code to be called from the OS or to co-exist with the OS
though. Wether that's a good idea or not or wether those vendor
justifications for that are valid or not is of course debatable ;-)

Some of those attemptes resulted in horrors ranging from SMM BIOSes to
RTAS, via ACPI AML stuff or even worse, Apple's platform function
"scripts" in the device-tree.

I think every single of these approaches have proven that it actually
caused more problems than it solves.

Most of the time, the goal is to actually ease the OS work by
abstracting dodgy motherboard bits, like all the random IOs to toggle to
enable clocks on a given bus for power management etc... But every time,
it's been abused as a way to hide implementation details or hardware
specs, and everytime, bugs in those "firmware" provided blobs have
proven more damageable than having clear documentation for the HW and
proper driver code to deal with it.
 
Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Heads up on a series of AIO patchsets

2007-01-02 Thread Suparna Bhattacharya
On Tue, Jan 02, 2007 at 03:56:09PM -0800, Zach Brown wrote:
> Sorry for the delay, I'm finally back from the holiday break :)

Welcome back !

> 
> >(1) The filesystem AIO patchset, attempts to address one part of
> >the problem
> >which is to make regular file IO, (without O_DIRECT)
> >asynchronous (mainly
> >the case of reads of uncached or partially cached files, and
> >O_SYNC writes).
> 
> One of the properties of the currently implemented EIOCBRETRY aio
> path is that ->mm is the only field in current which matches the
> submitting task_struct while inside the retry path.

Yes and that as I guess you know is to enable the aio worker thread to
operate on the caller's address space for copy_from/to_user. 

The actual io setup and associated checks are expected to have been
handled at submission time.

> 
> It looks like a retry-based aio write path would be broken because of
> this.  generic_write_checks() could run in the aio thread and get its
> task_struct instead of that of the submitter.  The wrong rlimit will
> be tested and SIGXFSZ won't be raised.  remove_suid() could check the
> capabilities of the aio thread instead of those of the submitter.

generic_write_checks() are done in the submission path, not repeated during
retries, so such types of checks are not intended to run in the aio thread.

Did I miss something here ?

> 
> I don't think EIOCBRETRY is the way to go because of this increased
> (and subtle!) complexity.  What are the chances that we would have
> ever found those bugs outside code review?  How do we make sure that
> current references don't sneak back in after having initially audited
> the paths?

The EIOCBRETRY route is not something that is intended to be used blindly,
It is just one alternative to implement an aio operation by splitting up
responsibility between the submitter and aio threads, where aio threads 
can run in the caller's address space.

> 
> Take the io_cmd_epoll_wait patch..
> 
> >issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach
> >Brown with
> >modifications from Jeff Moyer and me) addresses this problem
> >for native
> >linux aio in a simple manner.
> 
> It's simple looking, sure.  This current flipping didn't even occur
> to me while throwing the patch together!
> 
> But that patch ends up calling ->poll (and poll_table->qproc) and
> writing to userspace (so potentially calling ->nopage) from the aio

Yes of course, but why is that a problem ?
The copy_from/to_user/put_user constructs are designed to handle soft failures,
and we are already using the caller's ->mm. Do you see a need for any
additional asserts() ?

If there is something that is needed by ->nopage etc which is not abstracted
out within the ->mm, then we would need to fix that instead, for correctness
anyway, isn't that so ?

Now it is possible that there are minor blocking points in the code and the
effect of these would be to hold up / delay subsequent queued aio operations;
which is an efficiency issue, but not a correctness concern.

> threads.  Are we sure that none of them will behave surprisingly
> because current changed under them?

My take is that we should fix the problems that we see. It is likely that
what manifests relatively more easily with AIO is also a subtle problem
in other cases.

> 
> It might be safe now, but that isn't really the point.  I'd rather we
> didn't have yet one more subtle invariant to audit and maintain.
> 
> At the risk of making myself vulnerable to the charge of mentioning
> vapourware, I will admit that I've been working on a (slightly mad)
> implementation of async syscalls.  I've been quiet about it because I
> don't want to whip up complicated discussion without being able to
> show code that works, even if barely.  I mention it now only to make
> it clear that I want to be constructive, not just critical :).

That is great and I look forward to it :) I am, however assuming that
whatever implementation you come up will have a different interface
from current linux aio -- i.e. a next generation aio model, that will be
easily integratable with kevents etc.

Which takes me back to Ingo's point - lets have the new evolve parallely
with the old, if we can, and not hold up the patches for POSIX AIO to
start using kernel AIO, or for epoll to integrate with AIO.

OK, I just took a quick look at your blog and I see that you
are basically implementing Linus' microthreads scheduling approach -
one year since we had that discussion. Glad to see that you found a way
to make it workable ... (I'm guessing that you are copying over the part
of the stack in use at the time of every switch, is that correct ? At what
point do you do the allocation of the saved stacks ? Sorry I should hold
off all these questions till your patch comes out)

Regards
Suparna

> 
> - z

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: Contents of core dumps

2007-01-02 Thread David Miller
From: Daniel Jacobowitz <[EMAIL PROTECTED]>
Date: Tue, 2 Jan 2007 21:02:28 -0500

> On Thu, Apr 06, 2006 at 10:18:07PM -0700, David S. Miller wrote:
> > How about something like the following patch?  If it's executable
> > and not written to, skip it.  This would skip the main executable
> > image and all text segments of the shared libraries mapped in.
> 
> I've been going through GDB test failures (... again...) and I'm down
> to a respectably small number on x86_64, but this is one of the
> remaining ones.  I don't suppose there's been any change since we
> discussed this in April?

Not to my knowledge.

> Does Linux need knobs for this?

I don't think so.

The current behavior is very non-intuitive.

As a person who hacks gdb, the kernel, and the interactions between
them extensively, it took even me quite a while to track down this
problem.

Imagine some less skilled person trying to analyze a core dump
expecting the necessary information to be there and being unable
to figure out why?

So I'd say we should just put this change in, as-is.  It fixes bugs,
and in all the time that has passed since my initial posting there
has not been any serious dissent.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_PHYSICAL_ALIGN limited to 4M?

2007-01-02 Thread Vivek Goyal
On Tue, Jan 02, 2007 at 12:05:18PM +0100, Rene Herman wrote:
> Good day.
> 
> A while ago it was remarked on list here that keeping the kernel 4M
> aligned physically might be a performance win if the added 1M (it
> normally loads at 1M) meant it would fit on one 4M aligned hugepage
> instead of 2 and since that time I've been doing such.
> 
> In fact, while I was at it, I ran the kernel at 16M; while admittedly a
> bit of a non-issue, having never experienced ZONE_DMA shortage, I am an
> ISA user on a >16M machine so this seemed to make sense -- no kernel
> eating up "precious" ISA-DMAable memory.
> 
> Recently CONFIG_PHYSICAL_START was replaced by CONFIG_PHYSICAL_ALIGN
> (commit e69f202d0a1419219198566e1c22218a5c71a9a6) and while 4M alignment
> is still possible, that's also the strictest alignment allowed meaning I
> can't load my (non-relocatable) kernel at 16M anymore.
> 
> If I just apply the following and set it to 16M, things seem to be
> working for me. Was there an important reason to limit the alignment to
> 4M, and if so, even on non relocatable kernels?

Hi Rene,

Can't think of any reason why we can't keep alignment uppper limit to
16M. That time I had kept 4M as upper limit as that seemed to be only
practical usage.

Rencetly I have restored back CONFIG_PHYSICAL_START option. That patch
is still in -mm. IMHO, your case will fit more if we set
CONFIG_PHYSICAL_START to 16M rather than increasing alignment upper limit
for CONFIG_PHYSICAL_ALIGN. 

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm1/broken-out/i386-restore-config_physical_start-option.patch

Andrew, Can you please push this patch to 2.6.20-rc3?

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread David Miller
From: Jan Engelhardt <[EMAIL PROTECTED]>
Date: Wed, 3 Jan 2007 02:13:39 +0100 (MET)

> 
> On Jan 3 2007 01:52, Segher Boessenkool wrote:
> >> > Leaving aside the issue of in-memory or not, I don't think
> >> > it is realistic to think any completely common implementation
> >> > will work for this -- it might for current SPARC+PowerPC+OLPC,
> >> > but more stuff will be added over time...
> >> 
> >> I see nothing supporting this IMHO bogus claim.
> >
> > Please keep in mind that not all systems want to kill OF
> > as soon as they enter the kernel -- some want to keep it
> > active basically forever (or only remove it when the user
> > asks for it).
> 
> Kill OF? sparc does not want that IMO, how else should I return to
> the 'ok' prompt?

PowerPC kills OF because it really has to, that's one of numerous
reasons that it started sucking the device tree into a kernel copy
early in the bootup and using that for device discovery etc.

To be honest, the 'ok' prompt is of limited value when you have
things like Alt-SysRq and PPC's XMON debugger in the kernel already.

In fact, the 'ok' prompt is an ENORMOUS pain in the ass to support
on machines with USB keyboards, because sharing the USB host
controller is beyond non-trivial.  I've never implemented support
for that on sparc64 and I frankly have no desire to do the work
necessary to support that.  It simply is not worth it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread David Miller
From: Segher Boessenkool <[EMAIL PROTECTED]>
Date: Wed, 3 Jan 2007 02:14:34 +0100

> [snipping a bit for now]
> 
> > and then, "fix"
> > that so that it works on x86 :-)
> 
> That works, if the goal is to just add x86/OLPC to the list of
> platforms that have a device tree fs.  I thought the plan was
> to create a single, more generic, OF interface/API in the kernel
> though.

Absolutely.

But let's get the first-order issue solved so the OLPC people
can get the functionality they need.  That's the most practical
approach.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread David Miller
From: Segher Boessenkool <[EMAIL PROTECTED]>
Date: Wed, 3 Jan 2007 01:52:06 +0100

> >> Leaving aside the issue of in-memory or not, I don't think
> >> it is realistic to think any completely common implementation
> >> will work for this -- it might for current SPARC+PowerPC+OLPC,
> >> but more stuff will be added over time...
> >
> > I see nothing supporting this IMHO bogus claim.
> 
> Please keep in mind that not all systems want to kill OF
> as soon as they enter the kernel -- some want to keep it
> active basically forever (or only remove it when the user
> asks for it).

That's what we do on sparc32 and sparc64, I of course understand this.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread David Miller
From: Segher Boessenkool <[EMAIL PROTECTED]>
Date: Wed, 3 Jan 2007 01:48:02 +0100

> > therefore you can't let multiple CPUs call
> > into OFW at one time.  You must use some kind of locking mechanism,
> > and that locking mechanism is not simple because it has to not just
> > stop the other cpus, it has to be able to stop the other cpus yet
> > still allow them to receive SMP cross-calls from the firmware if the
> > OFW call is 'stop' or similar.
> 
> YOu don't need to *stop* the other CPUs, you just need to
> prevent them from entering the client interface.  Put a lock
> in front.

That's not the issue.

If the global OFW lock disables interrupts, other cpus trying to get
that lock can't receive CPU cross calls since they are delivered via
interrupts, get it?  That's the issue you need to be careful about.

> > Please let's get over this memory consumption non-issue and move
> > on to more productive talk.
> 
> Okay -- so answer the second part of my concern please: if you keep
> a copy, you need to keep both in sync -- that means every change
> by the kernel has to be done twice, and you won't ever be told about
> changes by the OF, so you have to get a full fresh copy every single
> time you return from an OF client call that could have changed a
> property.

Sure, you need to call OFW when a property is changed, and we have
code to handle that perfectly fine in the sparc of_*() code.

There are mechanisms on the OFW implementations that do hot plugging
to learn about the OFW tree changes.  On some sparc64 platforms,
for example, you have to do a "SUNW,foo-operation" OFW call when a
board is hot-plugged in an Ex000 enterprise box, and after that call
finishes successfully, you know where the new board is in the OFW
tree so you import everything underneath.  You do the opposite for
a hot-unplug sequence.

Every platform is going to handle this differently.

But there is nothing about it that precludes doing an in-kernel
OFW tree.

Even the most brute-force implementation is possible.  When any
hot-plug event occurs, validate the current in-kernel tree.  It's that
easy.

I guess it's good to have someone like you, the dissenter in the
group, but Ben and I will snuff you out completely soon enough :-)))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc5 libata PATA ATAPI CDROM SiS 5513 NOT WORKING

2007-01-02 Thread Tejun Heo
Joel Soete wrote:
> Hello Alan, Jeff,
> 
> Reading a paper on this new libata, I just want to try but failled yet
> for what said this thread "ATAPI CDROM" ;_(.
> 
> I first test the latest stable 2.6.19.1 without luck, so I also want to
> try latest 2.6.20-rc2 unfortunately without more success.

I'm attaching two patches.  One against 2.6.19 the other against
2.6.20-rc3.  Both have about the same effect.  Please apply and report
what happens and full dmesg.

Thanks and happy new year.

-- 
tejun
Index: work/drivers/ata/libata-core.c
===
--- work.orig/drivers/ata/libata-core.c	2007-01-03 12:33:36.0 +0900
+++ work/drivers/ata/libata-core.c	2007-01-03 12:36:28.0 +0900
@@ -59,6 +59,10 @@
 
 #include "libata.h"
 
+enum {
+	ATA_MODE_STRING_MAX	= 16,
+};
+
 /* debounce timing parameters in msecs { interval, duration, timeout } */
 const unsigned long sata_deb_timing_normal[]		= {   5,  100, 2000 };
 const unsigned long sata_deb_timing_hotplug[]		= {  25,  500, 2000 };
@@ -367,6 +371,7 @@ static int ata_xfer_mode2shift(unsigned 
 /**
  *	ata_mode_string - convert xfer_mask to string
  *	@xfer_mask: mask of bits supported; only highest bit counts.
+ *	@buf: buffer of ATA_MODE_STRING_MAX bytes
  *
  *	Determine string which represents the highest speed
  *	(highest bit in @modemask).
@@ -375,10 +380,10 @@ static int ata_xfer_mode2shift(unsigned 
  *	None.
  *
  *	RETURNS:
- *	Constant C string representing highest speed listed in
- *	@mode_mask, or the constant C string "".
+ *	Pointer to @buf which contains C string representing highest
+ *	DMA and PIO speeds listed in @mode_mask.
  */
-static const char *ata_mode_string(unsigned int xfer_mask)
+static const char *ata_mode_string(unsigned int xfer_mask, char *buf)
 {
 	static const char * const xfer_mode_str[] = {
 		"PIO0",
@@ -403,11 +408,24 @@ static const char *ata_mode_string(unsig
 		"UDMA7",
 	};
 	int highbit;
+	const char *str, *pio_str;
+
+	str = pio_str = "";
 
 	highbit = fls(xfer_mask) - 1;
 	if (highbit >= 0 && highbit < ARRAY_SIZE(xfer_mode_str))
-		return xfer_mode_str[highbit];
-	return "";
+		str = xfer_mode_str[highbit];
+
+	highbit = fls(xfer_mask & ATA_MASK_PIO) - 1;
+	if (highbit >= 0 && highbit < ARRAY_SIZE(xfer_mode_str))
+		pio_str = xfer_mode_str[highbit];
+
+	if (str != pio_str)
+		snprintf(buf, ATA_MODE_STRING_MAX, "%s:%s", str, pio_str);
+	else
+		snprintf(buf, ATA_MODE_STRING_MAX, "%s", str);
+
+	return buf;
 }
 
 static const char *sata_spd_string(unsigned int spd)
@@ -1389,7 +1407,7 @@ int ata_dev_configure(struct ata_device 
 {
 	struct ata_port *ap = dev->ap;
 	const u16 *id = dev->id;
-	unsigned int xfer_mask;
+	char xfer_buf[ATA_MODE_STRING_MAX];
 	char revbuf[7];		/* XYZ-99\0 */
 	int rc;
 
@@ -1427,7 +1445,7 @@ int ata_dev_configure(struct ata_device 
 	 */
 
 	/* find max transfer mode; for printk only */
-	xfer_mask = ata_id_xfermask(id);
+	ata_mode_string(ata_id_xfermask(id), xfer_buf);
 
 	if (ata_msg_probe(ap))
 		ata_dump_id(id);
@@ -1463,8 +1481,7 @@ int ata_dev_configure(struct ata_device 
 			if (ata_msg_drv(ap) && print_info)
 ata_dev_printk(dev, KERN_INFO, "%s, "
 	"max %s, %Lu sectors: %s %s\n",
-	revbuf,
-	ata_mode_string(xfer_mask),
+	revbuf, xfer_buf,
 	(unsigned long long)dev->n_sectors,
 	lba_desc, ncq_desc);
 		} else {
@@ -1486,8 +1503,7 @@ int ata_dev_configure(struct ata_device 
 			if (ata_msg_drv(ap) && print_info)
 ata_dev_printk(dev, KERN_INFO, "%s, "
 	"max %s, %Lu sectors: CHS %u/%u/%u\n",
-	revbuf,
-	ata_mode_string(xfer_mask),
+	revbuf, xfer_buf,
 	(unsigned long long)dev->n_sectors,
 	dev->cylinders, dev->heads,
 	dev->sectors);
@@ -1526,8 +1542,7 @@ int ata_dev_configure(struct ata_device 
 		/* print device info to dmesg */
 		if (ata_msg_drv(ap) && print_info)
 			ata_dev_printk(dev, KERN_INFO, "ATAPI, max %s%s\n",
-   ata_mode_string(xfer_mask),
-   cdb_intr_string);
+   xfer_buf, cdb_intr_string);
 	}
 
 	if (dev->horkage & ATA_HORKAGE_DIAGNOSTIC) {
@@ -2121,6 +2136,7 @@ int ata_timing_compute(struct ata_device
 int ata_down_xfermask_limit(struct ata_device *dev, int force_pio0)
 {
 	unsigned long xfer_mask;
+	char xfer_buf[ATA_MODE_STRING_MAX];
 	int highbit;
 
 	xfer_mask = ata_pack_xfermask(dev->pio_mask, dev->mwdma_mask,
@@ -2143,7 +2159,7 @@ int ata_down_xfermask_limit(struct ata_d
 			>udma_mask);
 
 	ata_dev_printk(dev, KERN_WARNING, "limiting speed to %s\n",
-		   ata_mode_string(xfer_mask));
+		   ata_mode_string(xfer_mask, xfer_buf));
 
 	return 0;
 
@@ -2154,6 +2170,7 @@ int ata_down_xfermask_limit(struct ata_d
 static int ata_dev_set_mode(struct ata_device *dev)
 {
 	unsigned int err_mask;
+	char xfer_buf[ATA_MODE_STRING_MAX];
 	int rc;
 
 	dev->flags &= ~ATA_DFLAG_PIO;
@@ -2174,8 +2191,10 @@ static int ata_dev_set_mode(struct ata_d
 	DPRINTK("xfer_shift=%u, xfer_mode=0x%x\n",
 		dev->xfer_shift, 

[PATCH] i386 kernel instant reboot with older binutils fix

2007-01-02 Thread Vivek Goyal



o i386 kernel reboots instantly if compiled with binutils older than
  2.6.15.

o Older binutils required explicit flags to mark a section allocatable
  and executable(AX). Newer binutils automatically mark a section AX if
  the name starts with .text.

o While defining a new section using assembler "section" directive,
  explicitly mention section flags. 

Signed-off-by: Segher Boessenkool <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/boot/compressed/head.S |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/i386/boot/compressed/head.S~jean-reboot-issue-fix 
arch/i386/boot/compressed/head.S
--- 
linux-2.6.20-rc2-reloc/arch/i386/boot/compressed/head.S~jean-reboot-issue-fix   
2007-01-02 09:54:56.0 +0530
+++ linux-2.6.20-rc2-reloc-root/arch/i386/boot/compressed/head.S
2007-01-02 09:57:46.0 +0530
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-.section ".text.head"
+.section ".text.head","ax",@progbits
.globl startup_32
 
 startup_32:
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0

2007-01-02 Thread Tejun Heo
bbee wrote:
>> Yeap, I have major issues with SDB FISes which contains spurious
>> completions but most other spurious interrupts shouldn't be dangerous
>> and I haven't seen spurious completions for quite some time, so I was
>> thinking either removing the message or printing it only on SDB FIS
>> containing spurious completions.
>>
>> But, Andrew Lyon *is* reporting spurious completions.  Now I just wanna
>> update those printks such that more info is reported only on spurious
>> SDB FISes.
> 
> That would certainly help verify that I'm having the exact same problem,
> since Andrew didn't say anything about his drive going offline.

Okay.

[--snip--]
> I reverted the patch and am waiting for the exception while running
> "stress --io 2 --hdd 2". By past experience, it could take a while; I am
> already seeing the spurious iterrupt messages though.

Thanks, please keep me posted.

>> Can you post the results of 'dmesg' and 'hdparm -I /dev/sdX'?
> 
> Follows at end of message (md init snipped from dmesg for brevity).
> 
>> Yeap, I'm definitely interested in resolving this problem.  It's not
>> likely but possible that the *controller* is responsible for spurious
>> interrupts.
> 
> Unfortunately I don't have any other model of SATA drive to test it
> with, but Andrew by his dmesg seems to be using a different brand of drive.
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: ATA-7, max UDMA/100, 586072368 sectors: LBA48 NCQ (depth 31/32)
> ata1.00: ata1: dev 0 multi count 16
> ata1.00: configured for UDMA/100
> scsi 0:0:0:0: Direct-Access ATA  Maxtor 6V300F0   VA11 PQ: 0
> ANSI: 5

Yeah, a different drive.  I'll ask around.  Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0

2007-01-02 Thread bbee

On Wed, 3 Jan 2007, Tejun Heo wrote:

bbee wrote:

Tejun Heo  gmail.com> writes:

Andrew Lyon wrote:

ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)

Is this condition dangerous?

Not usually.  Might indicate something is going wrong in some really
rare cases.  I think vendors are getting NCQ right these days.  Maybe
it's time to remove that printk.


Hi Tejun, it's funny you should say that, because in the subthread at
http://thread.gmane.org/gmane.linux.ide/10264/focus=10334
you seemed to have major issues with this very error and were saying there
could even be data corruption.


Yeap, I have major issues with SDB FISes which contains spurious
completions but most other spurious interrupts shouldn't be dangerous
and I haven't seen spurious completions for quite some time, so I was
thinking either removing the message or printing it only on SDB FIS
containing spurious completions.

But, Andrew Lyon *is* reporting spurious completions.  Now I just wanna
update those printks such that more info is reported only on spurious
SDB FISes.


That would certainly help verify that I'm having the exact same problem,
since Andrew didn't say anything about his drive going offline.


However, in my case it gets a lot worse. The following happens infrequently,
usually within 15 days of uptime on a light I/O load:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[---snip---]
ata1.00: detaching (SCSI 0:0:0:0)
scsi 0:0:0:0: rejecting I/O to dead device

The drive then dissapears from the system. This is not preceded by any
spurious interrupt messages, but I have a hunch it is related because
following your grave comments in the referenced thread, I looked for a kernel
option to disable NCQ. Astonished to find none, I changed the source using the
flag you added in this patch:


Yeah, it usually indicates lousy NCQ implementation on drive's side.  I
can't tell whether the drive going offline is directly related tho.


Neither can I, but it has definately stopped since I disabled NCQ.


With NCQ disabled, the spurious interrupt messages as well as the exceptions
go away.


Hmmm... How certain are you about disabling NCQ fixing the problem?  Are
other conditions controlled?


Well, it's not a lab environment ("production" PVR box), and I can't be 
sure what conditions to control since the exception occurs unpredictably 
(which is why I suspected noise issues). The spurious interrupts were more 
frequent, 5-6 a day.


But I did only start using the SATA chip after the "major libata update" 
the first thread was about so I can't say anything about stability with the 
earlier ahci code.



How many times did you verify the fix?


I can't perfectly verify the fix since I don't have a test case. 
However by my syslog history in the past 3.5 months the system never went 
above 15 days of uptime before the exception ocurred, it's now been up for 
24. The spurious interrupts are completely gone.



If you undo the change and leave everything else the same, does the
exception come back?


I reverted the patch and am waiting for the exception while running "stress 
--io 2 --hdd 2". By past experience, it could take a while; I am already 
seeing the spurious iterrupt messages though.



Can you post the results of 'dmesg' and 'hdparm -I /dev/sdX'?


Follows at end of message (md init snipped from dmesg for brevity).


Yeap, I'm definitely interested in resolving this problem.  It's not
likely but possible that the *controller* is responsible for spurious
interrupts.


Unfortunately I don't have any other model of SATA drive to test it with, 
but Andrew by his dmesg seems to be using a different brand of drive.




dmesg :

Linux version 2.6.19-hardened-r3 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 
4.1.1-r3)) #2 PREEMPT Wed Jan 3 03:45:15 CET 2007
Command line: root=/dev/md4
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e8000 - 0010 (reserved)
 BIOS-e820: 0010 - 7ffb (usable)
 BIOS-e820: 7ffb - 7ffc (ACPI data)
 BIOS-e820: 7ffc - 7fff (ACPI NVS)
 BIOS-e820: 7fff - 8000 (reserved)
 BIOS-e820: ff7c - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524208) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.3 present.
ACPI: RSDP (v000 ACPIAM) @ 0x000f9a20
ACPI: RSDT (v001 A M I  OEMRSDT  0x05000629 MSFT 0x0097) @ 
0x7ffb
ACPI: FADT (v002 A M I  OEMFACP  0x05000629 MSFT 0x0097) @ 
0x7ffb0200
ACPI: MADT (v001 A M I  OEMAPIC  0x05000629 MSFT 0x0097) @ 
0x7ffb0390
ACPI: MCFG (v001 A M I  OEMMCFG  0x05000629 MSFT 0x0097) @ 
0x7ffb0400
ACPI: OEMB (v001 A M I  AMI_OEM  0x05000629 MSFT 0x0097) @ 

Re: ACPI: EC: evaluating _Q10

2007-01-02 Thread Len Brown
> > The bigger question is why you get "tons of these" --
> > as EC  events are usually infrequent.
> > Do you have a big number next to "acpi" in /proc/interrupts?
> > If so, at what rate is it growing?
> 
> maybe tons were a bit to overstated... After a fresh reboot, i count 110 
> _q10 and one _q21messages now with 8 min. uptime and around 10300 acpi 
> interrupts.

480 sec/110 ec events = 4 seconds/event.  This doesn't worry me.
Could be battery updates, thermal updates etc.

480/10300 = an interrupt every 46 ms.
This is certainly not right.
Have you always seen runaway acpi interrupts on this box, no matter the kernel?

thanks,
-Len

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-git] rts-rs5c372 updates: more chips, alarm, 12hr mode, etc

2007-01-02 Thread David Brownell
Hi Voipio,

Yes, the patch you sent (switching to "method 3" to work around the
evident bug in the i2c-ixp3xx driver) works on the platform I was
using too (after unrelated tweaks).

Here's an updated patch, using "method 3".  If it still behaves
for you, it'd seem ready to merge...

- Dave


CUT HERE
Update the rtc-rs5c372 driver:

 Bugfixes:
  - Handle RTCs which are configured to use 12-hour mode.
  - Never report bogus/un-initialized times.
  - Displaying "raw trim" requires not masking it first!
  - Fix the and sysfs procfs display of crystal and trim data.

 Features:
  - Handle other RTCs in this family, notably rv5c386/rv5c387.
  - Declare the other registers.
  - Provide alarm get/set functionality.
  - Handle AIE and UIE; but no IRQ handling yet.
  - Warn if the clock needs to be set.

 Cleanup:
  - Shrink object by not including needless sysfs or procfs support
  - We don't need no steenkin' forward declarations.  (Except one.)

Until the I2C framework merges "new style" driver support, matching
the driver model better, using rv5c chips or alarm IRQs requires a
separate board-specific patch.  (And an IRQ handler, handing off labor
through a work_struct...)

This uses the "method 3" register reads, but notes that it's done
to work around an evident i2c adapter driver bug, and the curious
issue where the chip behavior disagrees with the chip docs.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>

Index: g26/drivers/rtc/rtc-rs5c372.c
===
--- g26.orig/drivers/rtc/rtc-rs5c372.c  2006-12-27 17:19:55.0 -0800
+++ g26/drivers/rtc/rtc-rs5c372.c   2007-01-02 18:44:35.0 -0800
@@ -1,5 +1,5 @@
 /*
- * An I2C driver for the Ricoh RS5C372 RTC
+ * An I2C driver for Ricoh RS5C372 and RV5C38[67] RTCs
  *
  * Copyright (C) 2005 Pavel Mironchik <[EMAIL PROTECTED]>
  * Copyright (C) 2006 Tower Technologies
@@ -13,7 +13,7 @@
 #include 
 #include 
 
-#define DRV_VERSION "0.3"
+#define DRV_VERSION "0.4"
 
 /* Addresses to scan */
 static unsigned short normal_i2c[] = { /* 0x32,*/ I2C_CLIENT_END };
@@ -21,6 +21,13 @@ static unsigned short normal_i2c[] = { /
 /* Insmod parameters */
 I2C_CLIENT_INSMOD;
 
+
+/*
+ * Ricoh has a family of I2C based RTCs, which differ only slightly from
+ * each other.  Differences center on pinout (e.g. how many interrupts,
+ * output clock, etc) and how the control registers are used.  The '372
+ * is significant only because that's the one this driver first supported.
+ */
 #define RS5C372_REG_SECS   0
 #define RS5C372_REG_MINS   1
 #define RS5C372_REG_HOURS  2
@@ -29,59 +36,142 @@ I2C_CLIENT_INSMOD;
 #define RS5C372_REG_MONTH  5
 #define RS5C372_REG_YEAR   6
 #define RS5C372_REG_TRIM   7
+#  define RS5C372_TRIM_XSL 0x80
+#  define RS5C372_TRIM_MASK0x7F
 
-#define RS5C372_TRIM_XSL   0x80
-#define RS5C372_TRIM_MASK  0x7F
-
-#define RS5C372_REG_BASE   0
-
-static int rs5c372_attach(struct i2c_adapter *adapter);
-static int rs5c372_detach(struct i2c_client *client);
-static int rs5c372_probe(struct i2c_adapter *adapter, int address, int kind);
+#define RS5C_REG_ALARM_A_MIN   8   /* or ALARM_W */
+#define RS5C_REG_ALARM_A_HOURS 9
+#define RS5C_REG_ALARM_A_WDAY  10
+
+#define RS5C_REG_ALARM_B_MIN   11  /* or ALARM_D */
+#define RS5C_REG_ALARM_B_HOURS 12
+#define RS5C_REG_ALARM_B_WDAY  13  /* (ALARM_B only) */
+
+#define RS5C_REG_CTRL1 14
+#  define RS5C_CTRL1_AALE  (1 << 7)/* or WALE */
+#  define RS5C_CTRL1_BALE  (1 << 6)/* or DALE */
+#  define RV5C387_CTRL1_24 (1 << 5)
+#  define RS5C372A_CTRL1_SL1   (1 << 5)
+#  define RS5C_CTRL1_CT_MASK   (7 << 0)
+#  define RS5C_CTRL1_CT0   (0 << 0)/* no periodic irq */
+#  define RS5C_CTRL1_CT4   (4 << 0)/* 1 Hz level irq */
+#define RS5C_REG_CTRL2 15
+#  define RS5C372_CTRL2_24 (1 << 5)
+#  define RS5C_CTRL2_XSTP  (1 << 4)
+#  define RS5C_CTRL2_CTFG  (1 << 2)
+#  define RS5C_CTRL2_AAFG  (1 << 1)/* or WAFG */
+#  define RS5C_CTRL2_BAFG  (1 << 0)/* or DAFG */
+
+
+/* to read (style 1) or write registers starting at R */
+#define RS5C_ADDR(R)   (((R) << 4) | 0)
+
+
+enum rtc_type {
+   rtc_undef = 0,
+   rtc_rs5c372a,
+   rtc_rs5c372b,
+   rtc_rv5c386,
+   rtc_rv5c387a,
+};
 
+/* REVISIT:  this assumes that:
+ *  - we're in the 21st century, so it's safe to ignore the century
+ *bit for rv5c38[67] (REG_MONTH bit 7);
+ *  - we should use ALARM_A not ALARM_B (may be wrong on some boards)
+ */
 struct rs5c372 {
-   u8 reg_addr;
-   u8 regs[17];
-   struct i2c_msg msg[1];
-   struct i2c_client client;
-   struct rtc_device *rtc;
-};
+   struct i2c_client   *client;
+   struct 

Re: [PATCH] video: pvrusb2-hdw kfree cleanup

2007-01-02 Thread Mike Isely
On Tue, 2 Jan 2007, Mariusz Kozlowski wrote:

> Hello, 
> 
> > This patch removes redundant argument check for kfree().
> > 
> >  drivers/media/video/pvrusb2/pvrusb2-hdw.c |   16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
> 
> 

Signed-off-by: Mike Isely <[EMAIL PROTECTED]>

  -Mike

-- 
| Mike Isely  | PGP fingerprint
 Spammers Die!! | | 03 54 43 4D 75 E5 CC 92
|   isely @ pobox (dot) com   | 71 16 01 E2 B5 F5 C1 E8
| |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Patch] scsi: megaraid_{mm,mbox}: init fix for kdump

2007-01-02 Thread Patro, Sumant

Thanks for the review. 
I will resubmit the patch.

Regards,

Sumant

-Original Message-
From: Randy Dunlap [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 29, 2006 1:38 PM
To: Patro, Sumant
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; Kolli, Neela;
Yang, Bo; Patro, Sumant
Subject: Re: [Patch] scsi: megaraid_{mm,mbox}: init fix for kdump

On Fri, 29 Dec 2006 08:02:17 -0800 Sumant Patro wrote:

See Documentation/SubmittingPatches:
Please include output of "diffstat -p1 -w70" so that we can easily see
the scope of the changes.

and see Documentation/CodingStyle for comments below:


> diff -uprN linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 
> linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c
> --- linux-2.6.orig/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-28 
> 09:56:04.0 -0800
> +++ linux-2.6.new/drivers/scsi/megaraid/megaraid_mbox.c 2006-12-29 
> +++ 05:31:48.0 -0800
> @@ -779,6 +780,22 @@ megaraid_init_mbox(adapter_t *adapter)
>   goto out_release_regions;
>   }
>  
> + // initialize the mutual exclusion lock for the mailbox
> + spin_lock_init(_dev->mailbox_lock);

Linux uses /*...*/ C89-style comments, not // C99 comments.

> + // allocate memory required for commands
> + if (megaraid_alloc_cmd_packets(adapter) != 0) {
> + goto out_iounmap;
> + }
> +
> + /*
> +  * Issue SYNC cmd to flush the pending cmds in the adapter
> +  * and initialize its internal state
> +  */
> +
> + if (megaraid_mbox_fire_sync_cmd(adapter))
> + con_log(CL_ANN, ("megaraid: sync cmd failed\n"));
> +

>   // Product info
>   if (megaraid_mbox_product_info(adapter) != 0) {
> - goto out_alloc_cmds;
> + goto out_free_irq;

Don't uses {} braces around 1-statement "blocks".

> @@ -875,7 +883,7 @@ megaraid_init_mbox(adapter_t *adapter)
>* accessed
>*/
>   if (megaraid_sysfs_alloc_resources(adapter) != 0) {
> - goto out_alloc_cmds;
> + goto out_free_irq;

Ditto.

>   }
>  
>   // Set the DMA mask to 64-bit. All supported controllers as
capable 
> of @@ -3380,6 +3388,86 @@ megaraid_mbox_flush_cache(adapter_t *ada
>  
>  
>  /**
> + * megaraid_mbox_fire_sync_cmd - fire the sync cmd
> + * @param adapter: soft state for the controller
> + */
> +static int
> +megaraid_mbox_fire_sync_cmd(adapter_t *adapter) {
> + mbox_t  *mbox;
> + uint8_t raw_mbox[sizeof(mbox_t)];
> + mraid_device_t  *raid_dev = ADAP2RAIDDEV(adapter);
> + mbox64_t *mbox64;
> + uint8_t status = 0;
> + int i;
> + uint32_t dword;
> +
> + mbox = (mbox_t *)raw_mbox;
> +
> + memset((caddr_t)raw_mbox, 0, sizeof(mbox_t));
> +
> + raw_mbox[0] = 0xFF;
> +
> + mbox64  = raid_dev->mbox64;
> + mbox= raid_dev->mbox;
> +
> + /*
> +  * Wait until mailbox is free
> +  */
> + if (megaraid_busywait_mbox(raid_dev) != 0) {
> + status = 1;
> + goto blocked_mailbox;
> + }
> +
> + /*
> +  * Copy mailbox data into host structure
> +  */
> + memcpy((caddr_t)mbox, (caddr_t)raw_mbox, 16);
> + mbox->cmdid = 0xFE;
> + mbox->busy  = 1;
> + mbox->poll  = 0;
> + mbox->ack   = 0;
> + mbox->numstatus = 0;
> + mbox->status= 0;
> +
> + wmb();
> + WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1);
> +
> + // wait for maximum 1 min for status to post.
> + // If the Firmware SUPPORTS the ABOVE COMMAND,
> + // mbox->cmd will be set to 0
> + // else
> + // the firmware will reject the command with
> + // mbox->numstatus set to 1

Don't use // comment style.  Also, for multi-line comments in Linux,
please use this preferred style:

/*
 * This is the preferred style for multi-line
 * comments in the Linux kernel source code.
 * Please use it consistently.
 *
 * Description:  A column of asterisks on the left side,
 * with beginning and ending almost-blank lines.
 */

Thanks.
---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


snd/core, freeing the device driver when an USB audio device is unplugged

2007-01-02 Thread Jon Smirl

There is a basic problem in the way snd/core is handling the removal
of devices. If I unplug a USB audio device snd/core will end up in:
snd_card_free_when_closed. This isn't good because some desktop app
(don't know which one) keeps the sound device open until it exits.
This is not a good state to be in, the hardware is gone but the device
is still around.

Now if I plug the USB audio device back in, I get a sysfs error for
registering duplicate devices. Because snd_card_free_when_closed. is
waiting for the old device to be closed it is never freeing the sysfs
device. More looks to be messed up inside of snd/core when in this
state, but this is the obvious symptom.

Things do work properly if I restart gnome (killing whoever has the
card open) after I unplug the device and before I plug it back in
again.

I added a few printk to snd/core/init.c

-- I unplug my USB sound device
usb 2-1: USB disconnect, address 2
-- Inside sound core, I go into snd_card_free_when_closed
snd_card_free_when_closed

-- If I plug the snd device back in , the kernel will complain about a
device being registered twice. That because gnome (or something in the
desktop) is still holding the device open.

--- Now I restart gnome which closes whatever had the card open
snd_card_file_remove
snd_card_do_free
-- device finishes getting unregistered
unregistering device

using 2.6.20-rc3

--
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM
> On Jan 2, 2007, at 5:50 PM, Chen, Kenneth W wrote:
> > Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM
> >>> That is not possible because when multiple tasks waiting for
> >>> events, they
> >>> enter the wait queue in FIFO order, prepare_to_wait_exclusive() does
> >>> __add_wait_queue_tail().  So first io_getevents() with min_nr of 2
> >>> will be woken up when 2 ops completes.
> >>
> >> So switch the order of the two sleepers in the example?
> >
> > Not sure why that would be a problem though:  whoever sleep first will
> > be woken up first.
> 
> Why would the min_nr = 3 sleeper be woken up in that case?  Only 2  
> ios were issued.
> 
> Maybe the app was relying on the min_nr = 2 completion to issue 3  
> more ios for the min_nr = 3 sleeper, who knows.
> 
> Does that clear up the confusion?


Not really. I don't think I understand your concern. You gave an example:

issue 2 ops
first io_getevents sleeps with a min_nr of 2
second io_getevents sleeps with min_nr of 3
2 ops complete but only test the second sleeper's min_nr of 3
first sleeper twiddles thumbs

Or:

issue 2 ops
first io_getevents sleeps with a min_nr of 3
second io_getevents sleeps with min_nr of 2
2 ops complete but only test the second sleeper's min_nr of 2
first sleeper twiddles thumbs


First scenario doesn't exist because in the new scheme, we test first
sleeper (as in head of the queue) when 2 ops complete. It wakes up first.

2nd scenario is OK to me because first sleeper waiting for 3 events,
and there are only 2 ops completed, so it waits.

The one scenario that I can think of that breaks down is that one task
sleeps with min_nr of 100.  Then 50 ops completed.  Comes along 2nd
thread does a io_getevents and it will take all 50 events in the 2nd
thread.  Is that what you are talking about?  It doesn't involve two
sleepers.  That I can fix by testing whether wait queue is active or
not at the beginning of fast path in read_events().

The bigger question is: what is the semantics on event reap order for
thread? Random, FIFO or round robin?  It is not specified anywhere.
What would be the most optimal policy?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0

2007-01-02 Thread Tejun Heo
[cc'ing linux-ide]

bbee wrote:
> Tejun Heo  gmail.com> writes:
>> Andrew Lyon wrote:
>>> My system is gigabyte ds3 motherboard with onboard SATA JMicron
>>> 20360/20363 AHCI Controller (rev 02), drive connected is WDC
>>> WD740ADFD-00 20.0, I am running 2.6.18.6 32 bit, under heavy i/o I get
>>> the following messaegs:
>>>
>>> ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
>>> ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
>>>
>>> Is this condition dangerous?
>> Not usually.  Might indicate something is going wrong in some really
>> rare cases.  I think vendors are getting NCQ right these days.  Maybe
>> it's time to remove that printk.
> 
> Hi Tejun, it's funny you should say that, because in the subthread at
> http://thread.gmane.org/gmane.linux.ide/10264/focus=10334
> you seemed to have major issues with this very error and were saying there
> could even be data corruption.

Yeap, I have major issues with SDB FISes which contains spurious
completions but most other spurious interrupts shouldn't be dangerous
and I haven't seen spurious completions for quite some time, so I was
thinking either removing the message or printing it only on SDB FIS
containing spurious completions.

But, Andrew Lyon *is* reporting spurious completions.  Now I just wanna
update those printks such that more info is reported only on spurious
SDB FISes.

> I too have this error, on a Asrock 939Dual-SATA2 board wich has the same
> controller. Syslog lines like
> ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xf4)
> every so often.
> 
> However, in my case it gets a lot worse. The following happens infrequently,
> usually within 15 days of uptime on a light I/O load:
> 
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: (irq_stat 0x4800, interface fatal error)
> ata1.00: tag 0 cmd 0xea Emask 0x12 stat 0x37 err 0x0 (ATA bus error)
> ata1: soft resetting port
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: qc timeout (cmd 0xec)
> ata1.00: failed to IDENTIFY (I/O error, err_mask=0x104)
> ata1.00: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ata1: hard resetting port
> ata1: SATA link down (SStatus 0 SControl 300)
> ata1: failed to recover some devices, retrying in 5 secs
> ata1: hard resetting port
> ata1: SATA link down (SStatus 0 SControl 300)
> ata1.00: disabled
> ata1: EH complete
> ata1.00: detaching (SCSI 0:0:0:0)
> scsi 0:0:0:0: rejecting I/O to dead device
> 
> The drive then dissapears from the system. This is not preceded by any
> spurious interrupt messages, but I have a hunch it is related because
> following your grave comments in the referenced thread, I looked for a kernel
> option to disable NCQ. Astonished to find none, I changed the source using the
> flag you added in this patch:

Yeah, it usually indicates lousy NCQ implementation on drive's side.  I
can't tell whether the drive going offline is directly related tho.

> http://article.gmane.org/gmane.linux.ide/11527
> With NCQ disabled, the spurious interrupt messages as well as the exceptions
> go away.

Hmmm... How certain are you about disabling NCQ fixing the problem?  Are
other conditions controlled?  How many times did you verify the fix?  If
you undo the change and leave everything else the same, does the
exception come back?  Can you post the results of 'dmesg' and 'hdparm -I
/dev/sdX'?

> This has been happening for a few months on a box whose log I'd been 
> neglecting
> and I hadn't even noticed the issue since the drive is part of a md array. The
> drive would get re-detected when I rebooted the box and md would rebuild the
> array.
> 
> Here comes the weird part. When I discovered the problem, I backtracked 
> through
> the syslog to see when the problems started. They started a few months ago 
> when
> I added a DVB card to the system (it is a mythtv box). I noticed in Andrew's
> dmesg that he also has a DVB card.
> 
> Could the DVB subsystem have anything to do with this? I realize the systems
> are completely unrelated..
> Perhaps the JMicron chip has noise issues? These are often triggered by adding
> tuner cards..
> 
> It probably won't make any difference to system performance, but it would be
> nice if we could resolve this so I can re-enable NCQ and stop patching my
> kernels ;)

Yeap, I'm definitely interested in resolving this problem.  It's not
likely but possible that the *controller* is responsible for spurious
interrupts.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix sparse warnings from {asm,net}/checksum.h

2007-01-02 Thread Tilman Schmidt
Rename the variable "sum" in the __range_ok macros to avoid name
collisions causing lots of "symbol shadows an earlier one" warnings
by sparse.

Signed-off-by: Tilman Schmidt <[EMAIL PROTECTED]>

---

 asm-arm/uaccess.h   |4 ++--
 asm-arm26/uaccess-asm.h |4 ++--
 asm-i386/uaccess.h  |4 ++--
 asm-m32r/uaccess.h  |4 ++--
 asm-x86_64/uaccess.h|4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff -pru linux-2.6.20-rc3-work/include/asm-arm/uaccess.h 
linux-2.6.20-rc3-new/include/asm-arm/uaccess.h
--- linux-2.6.20-rc3-work/include/asm-arm/uaccess.h 2006-11-29 
22:57:37.0 +0100
+++ linux-2.6.20-rc3-new/include/asm-arm/uaccess.h  2007-01-02 
00:23:34.0 +0100
@@ -76,10 +76,10 @@ static inline void set_fs(mm_segment_t f
 
 /* We use 33-bit arithmetic here... */
 #define __range_ok(addr,size) ({ \
-   unsigned long flag, sum; \
+   unsigned long flag, roksum; \
__chk_user_ptr(addr);   \
__asm__("adds %1, %2, %3; sbcccs %1, %1, %0; movcc %0, #0" \
-   : "=" (flag), "=" (sum) \
+   : "=" (flag), "=" (roksum) \
: "r" (addr), "Ir" (size), "0" 
(current_thread_info()->addr_limit) \
: "cc"); \
flag; })
diff -pru linux-2.6.20-rc3-work/include/asm-arm26/uaccess-asm.h 
linux-2.6.20-rc3-new/include/asm-arm26/uaccess-asm.h
--- linux-2.6.20-rc3-work/include/asm-arm26/uaccess-asm.h   2006-11-29 
22:57:37.0 +0100
+++ linux-2.6.20-rc3-new/include/asm-arm26/uaccess-asm.h2007-01-02 
00:23:50.0 +0100
@@ -34,9 +34,9 @@ static inline void set_fs (mm_segment_t 
 }
 
 #define __range_ok(addr,size) ({   \
-   unsigned long flag, sum;\
+   unsigned long flag, roksum; \
__asm__ __volatile__("subs %1, %0, %3; cmpcs %1, %2; movcs %0, #0" \
-   : "=" (flag), "=" (sum) \
+   : "=" (flag), "=" (roksum)  \
: "r" (addr), "Ir" (size), "0" 
(current_thread_info()->addr_limit)  \
: "cc");\
flag; })
diff -pru linux-2.6.20-rc3-work/include/asm-i386/uaccess.h 
linux-2.6.20-rc3-new/include/asm-i386/uaccess.h
--- linux-2.6.20-rc3-work/include/asm-i386/uaccess.h2006-11-29 
22:57:37.0 +0100
+++ linux-2.6.20-rc3-new/include/asm-i386/uaccess.h 2007-01-02 
00:24:04.0 +0100
@@ -54,10 +54,10 @@ extern struct movsl_mask {
  * This needs 33-bit arithmetic. We have a carry...
  */
 #define __range_ok(addr,size) ({ \
-   unsigned long flag,sum; \
+   unsigned long flag,roksum; \
__chk_user_ptr(addr); \
asm("addl %3,%1 ; sbbl %0,%0; cmpl %1,%4; sbbl $0,%0" \
-   :"=" (flag), "=r" (sum) \
+   :"=" (flag), "=r" (roksum) \
:"1" (addr),"g" ((int)(size)),"rm" 
(current_thread_info()->addr_limit.seg)); \
flag; })
 
diff -pru linux-2.6.20-rc3-work/include/asm-m32r/uaccess.h 
linux-2.6.20-rc3-new/include/asm-m32r/uaccess.h
--- linux-2.6.20-rc3-work/include/asm-m32r/uaccess.h2006-11-29 
22:57:37.0 +0100
+++ linux-2.6.20-rc3-new/include/asm-m32r/uaccess.h 2007-01-02 
00:24:25.0 +0100
@@ -68,7 +68,7 @@ static inline void set_fs(mm_segment_t s
  * This needs 33-bit arithmetic. We have a carry...
  */
 #define __range_ok(addr,size) ({   \
-   unsigned long flag, sum;\
+   unsigned long flag, roksum; \
__chk_user_ptr(addr);   \
asm (   \
"   cmpu%1, %1; clear cbit\n"   \
@@ -76,7 +76,7 @@ static inline void set_fs(mm_segment_t s
"   subx%0, %0\n"   \
"   cmpu%4, %1\n"   \
"   subx%0, %5\n"   \
-   : "=" (flag), "=r" (sum)  \
+   : "=" (flag), "=r" (roksum)   \
: "1" (addr), "r" ((int)(size)),\
  "r" (current_thread_info()->addr_limit.seg), "r" (0)  \
: "cbit" ); \
diff -pru linux-2.6.20-rc3-work/include/asm-x86_64/uaccess.h 
linux-2.6.20-rc3-new/include/asm-x86_64/uaccess.h
--- linux-2.6.20-rc3-work/include/asm-x86_64/uaccess.h  2007-01-01 
21:14:56.0 +0100
+++ linux-2.6.20-rc3-new/include/asm-x86_64/uaccess.h   2007-01-02 
00:25:05.0 +0100
@@ -37,11 +37,11 @@
  * Uhhuh, this needs 65-bit arithmetic. We have a carry..
  */
 #define __range_not_ok(addr,size) ({ \
-  

Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Mikael Pettersson
On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > The suggestions I've had so far which I have not yet tried:
> > 
> > -   Select a different x86 CPU in the config.
> > -   Unfortunately the C3-2 flags seem to simply tell GCC
> > to schedule for ppro (like i686) and enabled MMX and SSE
> > -   Probably useless
> 
> Actually, try this one. Try using something that doesn't like "cmov". 
> Maybe the C3-2 simply has some internal cmov bugginess. 

That's a good suggestion. Earlier C3s didn't have cmov so it's 
not entirely unlikely that cmov in C3-2 is broken in some cases.
Configuring for P5MMX or 486 should be good safe alternatives.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Alistair John Strachan
On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > >
> > > - Select a different x86 CPU in the config.
> > >   -   Unfortunately the C3-2 flags seem to simply tell GCC
> > >   to schedule for ppro (like i686) and enabled MMX and SSE
> > >   -   Probably useless
> >
> > Actually, try this one. Try using something that doesn't like "cmov".
> > Maybe the C3-2 simply has some internal cmov bugginess.
>
> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

Or just C3 (not C3-2), which is what I've done.

I'll report back whether it crashes or not.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: streamline read events after woken up

2007-01-02 Thread Zach Brown



buffer index there.  By then, most of you would probably veto the
patch anyway ;-)


haha, touche :)

I still think it'd be the right thing, though.  We can let the patch  
speak for itself :).


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Horst H. von Brand
D. Hazelton <[EMAIL PROTECTED]> wrote:

[...]

> None. I didn't file a report on this because I didn't find the big, just
> noted a problem that appears to occur. In this case the call's generated
> seem to wrap loops - something I've never heard of anyone doing.

Example code showing this weirdness?

>  These
> *might* be causing the off-by-one that is causing the function to
> re-enter in the middle of an instruction.

If something like this happened, programs would be crashing left and right.

> Seeing this I'd guess that this follows for all system-level code
> generated by 4.1.1

Define "system-level code". What makes it different from, say,
bog-of-the-mill compiler code (yes, gcc compiles itself as part of its
sanity checking)?

>and this is exactly what I was reporting. If you'd
> like I'll go dig up the dumps he posted and post the two related segments
> side-by-side to give you a better example what I'm referring to.

If the related segments show code that is somehow wrong, by all means
report it /with your detailed analysis/ to the compiler people. Just a
warning, gcc is pretty smart in what it does, its code is often surprising
to the unwashed. Also, the C standard is subtle, the error might be in a
unwarranted assumption in the source code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Zach Brown


On Jan 2, 2007, at 5:50 PM, Chen, Kenneth W wrote:


Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM

That is not possible because when multiple tasks waiting for
events, they
enter the wait queue in FIFO order, prepare_to_wait_exclusive() does
__add_wait_queue_tail().  So first io_getevents() with min_nr of 2
will be woken up when 2 ops completes.


So switch the order of the two sleepers in the example?


Not sure why that would be a problem though:  whoever sleep first will
be woken up first.


Why would the min_nr = 3 sleeper be woken up in that case?  Only 2  
ios were issued.


Maybe the app was relying on the min_nr = 2 completion to issue 3  
more ios for the min_nr = 3 sleeper, who knows.



Before I challenge that semantics, I want to mention that in current
implementation, dribbling AIO events will be distributed in round  
robin

fashion to all pending tasks waiting in io_getevents.


Yeah, don't misunderstand me -- we agree that the current situation  
is bad.



  In the example you
gave earlier, task with min_nr of 2 will be woken up after 4 completed
events.


I only gave 2 ios/events in that example.

Does that clear up the confusion?

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Contents of core dumps (was: Re: fs/binfmt_elf.c:maydump())

2007-01-02 Thread Daniel Jacobowitz
[Please CC, I am not subscribed to lkml.]

On Thu, Apr 06, 2006 at 10:18:07PM -0700, David S. Miller wrote:
> How about something like the following patch?  If it's executable
> and not written to, skip it.  This would skip the main executable
> image and all text segments of the shared libraries mapped in.

I've been going through GDB test failures (... again...) and I'm down
to a respectably small number on x86_64, but this is one of the
remaining ones.  I don't suppose there's been any change since we
discussed this in April?

A refresher for those following along: there's a GDB test that mmaps a
file using MAP_PRIVATE and PROT_WRITE.  It expects the contents to end
up in the core dump.  Right now, they don't.  I can fix the test by
making sure it writes to the mapping, but before I change the test,
I want to raise the question of what _should_ be in a core dump.

I took a peek at what Solaris includes in core dumps.  They offer
(not surprisingly) a pile of configuration options.  The default is
just about everything except for file-backed shared memory and some
symbol table data - it includes text segments, rodata, anonymous shared
memory, file backed mappings, et cetera.  I guess that's another
argument in favor of dumping more.  Then you can control it globally,
per process, et cetera.

http://src.opensolaris.org/source/xref/loficc/crypto/usr/src/uts/common/sys/corectl.h

I also checked an AIX manual since there was a reference to SA_FULLDUMP
in the GDB test:

 By default, the user data, anonymously mapped regions, and vm_infox
 structures are not included in a core dump. This partial core dump
 includes the current thread stack, the thread thrdctx structures, the
 user structure, and the state of the registers at the time of the
 fault. A partial core dump contains sufficient information for a stack
 traceback. The size of a core dump can also be limited by the setrlimit
 or setrlimit64 subroutine.

 To enable a full core dump, set the SA_FULLDUMP flag in the sigaction
 subroutine for the signal that is to generate a full core dump. If this
 flag is set when the core is dumped, the user data section, vm_infox,
 and anonymously mapped region structures are included in the core dump.

Not really sure what that translates to, but it's less than what
Solaris dumps, I think.

Does Linux need knobs for this?

> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 537893a..9ec5c2b 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1167,8 +1167,10 @@ static int maydump(struct vm_area_struct
>   if (vma->vm_flags & VM_SHARED)
>   return vma->vm_file->f_dentry->d_inode->i_nlink == 0;
>  
> - /* If it hasn't been written to, don't write it out */
> - if (!vma->anon_vma)
> + /* If it is executable and hasn't been written to,
> +  * don't write it out.
> +  */
> + if ((vma->vm_flags & VM_EXEC) && !vma->anon_vma)
>   return 0;
>  
>   return 1;
> 
> 

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the scheduled IEEE1394_OUI_DB removal

2007-01-02 Thread Stefan Richter
Adrian Bunk wrote:
> This patch contains the scheduled IEEE1394_OUI_DB removal.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
>  Documentation/feature-removal-schedule.txt |8 
>  drivers/ieee1394/Kconfig   |   14 
>  drivers/ieee1394/Makefile  |   10 
>  drivers/ieee1394/nodemgr.c |   39 
>  drivers/ieee1394/oui.db| 7048 -
>  drivers/ieee1394/oui2c.sh  |   22 
>  6 files changed, 7141 deletions(-)

Thanks.

We can now also delete drivers/ieee1394/.gitignore. I'll do so when I
commit your patch, if nobody objects.
-- 
Stefan Richter
-=-=-=== ---= ---==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: streamline read events after woken up

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:06 PM
> To: Chen, Kenneth W
> > Given the previous patch "aio: add per task aio wait event condition"
> > that we properly wake up event waiting process knowing that we have
> > enough events to reap, it's just plain waste of time to insert itself
> > into a wait queue, and then immediately remove itself from the wait
> > queue for *every* event reap iteration.
> 
> Hmm, I dunno.  It seems like we're still left with a pretty silly loop.
> 
> Would it be reasonable to have a loop that copied multiple events at  
> a time?  We could use some __copy_to_user_inatomic(), it didn't exist  
> when this stuff was first written.

It sounds reasonable, but I think it will be complicated because of
kmap_atomic on the ring buffer, along with tail wraps around ring
buffer index there.  By then, most of you would probably veto the
patch anyway ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM
> > That is not possible because when multiple tasks waiting for  
> > events, they
> > enter the wait queue in FIFO order, prepare_to_wait_exclusive() does
> > __add_wait_queue_tail().  So first io_getevents() with min_nr of 2  
> > will be woken up when 2 ops completes.
> 
> So switch the order of the two sleepers in the example?

Not sure why that would be a problem though:  whoever sleep first will
be woken up first.


> The point is that there's no way to guarantee that the head of the  
> wait queue will be the lowest min_nr.

Before I challenge that semantics, I want to mention that in current
implementation, dribbling AIO events will be distributed in round robin
fashion to all pending tasks waiting in io_getevents.  In the example you
gave earlier, task with min_nr of 2 will be woken up after 4 completed
events.  I consider that as an undesirable behavior as well.

Going back to your counter argument, why do we need the lowest min_nr in
the head of the queue?  These are tasks that shares one aio ctx and ioctx
is shareable only among threads.  Any reason why round robin policy is
superior than FIFO?  Also presumably, threads that shares ioctx should be
capable of handling events for the same ioctx.

>From wakeup order point of view, yes, tasks with lowest min_nr wakes up
first, but looking from io completion order, they are not. And these are
the source of excessive ctx switch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0

2007-01-02 Thread bbee
Tejun Heo  gmail.com> writes:
> Andrew Lyon wrote:
> > My system is gigabyte ds3 motherboard with onboard SATA JMicron
> > 20360/20363 AHCI Controller (rev 02), drive connected is WDC
> > WD740ADFD-00 20.0, I am running 2.6.18.6 32 bit, under heavy i/o I get
> > the following messaegs:
> > 
> > ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
> > ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
> > 
> > Is this condition dangerous?
> 
> Not usually.  Might indicate something is going wrong in some really
> rare cases.  I think vendors are getting NCQ right these days.  Maybe
> it's time to remove that printk.

Hi Tejun, it's funny you should say that, because in the subthread at
http://thread.gmane.org/gmane.linux.ide/10264/focus=10334
you seemed to have major issues with this very error and were saying there
could even be data corruption.

I too have this error, on a Asrock 939Dual-SATA2 board wich has the same
controller. Syslog lines like
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xf4)
every so often.

However, in my case it gets a lot worse. The following happens infrequently,
usually within 15 days of uptime on a light I/O load:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (irq_stat 0x4800, interface fatal error)
ata1.00: tag 0 cmd 0xea Emask 0x12 stat 0x37 err 0x0 (ATA bus error)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x104)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata1: SATA link down (SStatus 0 SControl 300)
ata1: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata1: SATA link down (SStatus 0 SControl 300)
ata1.00: disabled
ata1: EH complete
ata1.00: detaching (SCSI 0:0:0:0)
scsi 0:0:0:0: rejecting I/O to dead device

The drive then dissapears from the system. This is not preceded by any
spurious interrupt messages, but I have a hunch it is related because
following your grave comments in the referenced thread, I looked for a kernel
option to disable NCQ. Astonished to find none, I changed the source using the
flag you added in this patch:
http://article.gmane.org/gmane.linux.ide/11527
With NCQ disabled, the spurious interrupt messages as well as the exceptions
go away.

This has been happening for a few months on a box whose log I'd been neglecting
and I hadn't even noticed the issue since the drive is part of a md array. The
drive would get re-detected when I rebooted the box and md would rebuild the
array.

Here comes the weird part. When I discovered the problem, I backtracked through
the syslog to see when the problems started. They started a few months ago when
I added a DVB card to the system (it is a mythtv box). I noticed in Andrew's
dmesg that he also has a DVB card.

Could the DVB subsystem have anything to do with this? I realize the systems
are completely unrelated..
Perhaps the JMicron chip has noise issues? These are often triggered by adding
tuner cards..

It probably won't make any difference to system performance, but it would be
nice if we could resolve this so I can re-enable NCQ and stop patching my
kernels ;)

Thanks for reading and please CC,

bbee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


announce: ls1394 - tool to list connected FireWire devices

2007-01-02 Thread Stefan Richter
Hi all,

I wrote a small utility which works similar to lspci, lsscsi, and lsusb:

http://me.in-berlin.de/~s5r6/linux1394/ls1394/

Usage: ls1394 [options]
List FireWire devices

Options:
  -h, --help
  Show this help
  -v, --verbose
  Increase verbosity
  -s [[bus]:][node]
  Show only devices at selected bus (in decimal) and/or with specified
  node number (node ID or physical ID, in hexadecimal)
  -d [companyid:][guid]
  Show only devices with specified company ID or GUID (in hexadecimal)
  -a, --active
  Show only nodes with active link and general ROM
  -r, --remote
  Show only remote nodes
  -i file
  Use specified company ID (OUI) database instead of
  /usr/share/misc/oui.db
  --fetch-oui-db
  Read http://standards.ieee.org/regauth/oui/oui.txt and save
  /usr/share/misc/oui.db; required for translation of company IDs to
  company names; location of oui.db can be overridden by -i file
  -V, --version
  Show version of program

This is a preliminary version. Notably, a big missing feature is the
listing of each node's unit directories.

ls1394 is written as a bash script and only uses sysfs information. I.e.
its only driver requirements are ieee1394 and ohci1394. There are no
libraries required. ls1394 can be used by unprivileged users.

In order to get the full functionality of ls1394, you need to run it once
with the --fetch-oui-db option. This requires that you have "sed" and one
of "wget" or "curl" installed. Alternatively, you can let ls1394 access
/usr/src/linux/drivers/ieee1394/oui.db if you have kernel sources installed.
However this file will probably no longer be available from Linux 2.6.21
onwards. If you don't install a oui.db (systemwide or in a user directory),
ls1394 won't translate company IDs into human-readable vendor names, but
everything else will still work.


Example output:
$ ls1394
0:ffc0 00027a0e010020c2 IOI Technology Corporation 'CD-ROM GCR-8520B' '91021U2  
   '
0:ffc1 0030e0a5e0080293 OXFORD SEMICONDUCTOR LTD. 'S8001'
0:ffc2 08002856319b Texas Instruments (local)
1:ffc0 0001042033000e16 DVICO Co., Ltd. 'MOMOBAY FX-3A'
1:ffc1 0010dc5600fed2d4 MICRO-STAR INTERNATIONAL CO., LTD. (local)
1:ffc2 0001041010004beb DVICO Co., Ltd. 'MOMOBAY CX-1'
1:ffc3 unknown
1:ffc4 00301bac2ba4 SHUTTLE, INC.

This is from a PC with two FireWire cards which are marked as "(local)".
Node 3 on bus 1 is a hub without configuration ROM. Node 4 on bus 1 is a
remote PC.

The first column contains the number of host adapter (card) and node ID. The
second column contains the GUID. Next come the company name, names of unit
directories enclosed in '' if any exist, and the (local) flag if appropriate.
Note, the company name should be the one of the manufacturer of the device or
author of the device's firmware. But sometimes the manufacturer choses a
company ID which he doesn't own. For example, the company ID of the node 0:ffc2
should rather be the one of Sunix, not of Texas Instruments, as it is a Sunix
card (although with a TI chip.)

$ ls1394 -ar
0:ffc0 00027a0e010020c2 IOI Technology Corporation 'CD-ROM GCR-8520B' '91021U2  
   '
0:ffc1 0030e0a5e0080293 OXFORD SEMICONDUCTOR LTD. 'S8001'
1:ffc0 0001042033000e16 DVICO Co., Ltd. 'MOMOBAY FX-3A'
1:ffc2 0001041010004beb DVICO Co., Ltd. 'MOMOBAY CX-1'
1:ffc4 00301bac2ba4 SHUTTLE, INC.

The local nodes (host adapters) and the hub are suppressed in this example.

$ ls1394 -s 0:0 -v
0:ffc0 00027a0e010020c2 IOI Technology Corporation 'CD-ROM GCR-8520B' '91021U2  
   '
IRMC(0) CMC(0) ISC(0) BMC(0) PMC(0) GEN(0)
LSPD(3) MAX_REC(4096) MAX_ROM(0) CYC_CLK_ACC(255)
capabilities: 0x0083c0
vendor_id: 0x00027a IOI Technology Corporation
vendor_name_kv: IOI

Here, bus 0 and physical ID 0 is selected. This could also be written as
ls1394 -s 0:ffc0. The -v option adds further information. The device in this
example is a node with two unit directories. The units are named
'CD-ROM GCR-8520B' and '91021U2 ' respectively. All other nodes had
only one unit directory each.

$ ls1394 -d 000104: -v
1:ffc0 0001042033000e16 DVICO Co., Ltd. 'MOMOBAY FX-3A'
IRMC(0) CMC(0) ISC(0) BMC(0) PMC(0) GEN(0)
LSPD(2) MAX_REC(2048) MAX_ROM(0) CYC_CLK_ACC(255)
capabilities: 0x0083c0
vendor_id: 0x000104 DVICO Co., Ltd.
vendor_name_kv:  DViCO

1:ffc2 0001041010004beb DVICO Co., Ltd. 'MOMOBAY CX-1'
IRMC(0) CMC(0) ISC(0) BMC(0) PMC(0) GEN(0)
LSPD(2) MAX_REC(2048) MAX_ROM(0) CYC_CLK_ACC(255)
capabilities: 0x0083c0
vendor_id: 0x000104 DVICO Co., Ltd.
vendor_name_kv: "DViCO"

In this example, nodes were selected per company ID (first 6 digits of the
GUID, IOW the 24 high bits of the EUI-64).
-- 
Stefan Richter
-=-=-=== ---= ---==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Linus Torvalds


On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>
> eax: 0008   ebx:    ecx: 0008   edx: 
> esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
>
> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
> 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 
> 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c
> 
> Chuck observed that the kernel tries to reenter pipe_poll half way through an 
> instruction (c0156f5f->c0156f60); it's not a single-bit error but an 
> off-by-one.

It's not an off-by-one either (eg say we're taking an exception and 
screiwing up %eip by one somehow).

The code sequence in question is

mov%ecx,%edx
mov0x6c(%esi),%eax
or $0x10,%edx
cmp0x168(%edi),%eax <--
cmovne %edx,%ecx
jmp...

and it's in the second byte of the "cmp".

And yes, it definitely entered there, because trying other random 
entry-points will have either invalid instructions or instructions that 
would fault due to NULL pointers. HOWEVER, it's also not as simple as 
"took an interrupt, and returned with %eip incremented by one", becasue 
your %edx is zero, so it won't have done that "or $10,%edx" and then some 
interrupt happened and screwed up just %eip.

So it's literally a random %eip, but since you say it's consistently in 
that function, it's not truly "random". There's something that triggers it 
just _there_.

However, that's a damn simple function. There's _nothing_ there. The 
particular code that is involved right there is literally

if (!pipe->writers && filp->f_version != pipe->w_counter)
mask |= POLLHUP;

and that's it.  There's not even anything half-way interesting around it, 
except for the "poll_wait()" call, but even that is about as common as
you can humanly get..

Looking at the register set and the stack, I see:

Stack:  
  <- saved %ebx (dunno, seems dead in caller)
f70f3e9c  <- saved %esi (== pollfd in do_pollfd)
f6e111c0  <- saved %edi (== filp)
f70f3fa4  <- outer EBP (looks reasonable) 
c015d7f3  <- return address (do_sys_poll+0x253/0x480)

and the strange thing is that when the oops happens, it really looks like 
%esi _still_ contains the value it had originally (and that is saved on 
the stack). But afaik, from your disassembly, it should have been 
overwritten by the initial %eax, which should have had the same value as 
%edi on entry...

IOW, none of it really makes any sense. The stack frames look fine, so we 
_did_ enter at the beginning of the function (and it wasn't the *poll fn 
pointer that was corrupt.

> The suggestions I've had so far which I have not yet tried:
> 
> - Select a different x86 CPU in the config.
>   -   Unfortunately the C3-2 flags seem to simply tell GCC
>   to schedule for ppro (like i686) and enabled MMX and SSE
>   -   Probably useless

Actually, try this one. Try using something that doesn't like "cmov". 
Maybe the C3-2 simply has some internal cmov bugginess. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH -rt] Almost-ready-for-prime-time RCU priority boosting

2007-01-02 Thread Paul E. McKenney
Hello!

An update to the long-standing need for priority boosting of RCU readers
in -rt kernels.  This patch passes moderate testing, considerably more
severe than that faced by previous versions.  Known shortcomings:

o   This patch has not yet been subjected to enterprise-level
stress testing.  It therefore likely still contains a few bugs.
My next step is to write some enterprise-level tests, probably
as extensions to the RCU torture tests.

o   No tie-in to the OOM system.  Note that the RCU priority booster,
unlike other subsystems that respond to OOM, takes action over
a timeframe.  Boosting the priority of long-blocked RCU readers
does not immediately complete the grace period, so the RCU priority
booster needs to know the duration of the OOM event rather than
just being told to free up memory immediately.  This likely also
means that the RCU priority booster should be given early warning
of impending OOM, so that it has the time it needs to react.

I have not worried much about this yet, since my belief is that
the current approach will get the RCU callbacks processed in
a timely manner in almost all cases.  However, the tie-in to
OOM might be needed for small-memory systems.

o   Although the RCU priority booster's own priority is easily adjusted
in a running kernel, it currently boosts blocked RCU readers
to a fixed priority just below that of IRQ handler threads.
This is straightforward and important, but I need to get all
the bugs shaken out before worrying about ease of use.

o   A design document is needed.  This is on my list!

A couple of questions:

o   Currently, the rcu_boost_prio() and rcu_unboost_prio() functions
are in kernel/rcupreempt.c, because this allows them to be
declared as static.  But if I ignore the desire to declare them
as static, I would instead put them into kernel/rt_mutex.c
with the other priority-inheritance code.  Should I move them
to kernel/rt_mutex.c?

o   It appears to me that one must always hold ->pi_lock when calling
rt_mutex_setprio().  Is this really the case?  (If so, I will
add words to this effect to its comment header.  And the longevity
of my test kernels seemed to increase dramatically when I added
this locking, for whatever that is worth.)

Anyway, here is the patch.  Any and all comments greatly appreciated.

Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
---

 include/linux/init_task.h  |   13 
 include/linux/rcupdate.h   |   14 
 include/linux/rcupreempt.h |   30 +
 include/linux/sched.h  |   17 
 init/main.c|1 
 kernel/Kconfig.preempt |   32 +
 kernel/exit.c  |1 
 kernel/fork.c  |7 
 kernel/rcupreempt.c|  826 +
 kernel/rtmutex.c   |9 
 kernel/sched.c |5 
 11 files changed, 952 insertions(+), 3 deletions(-)

diff -urpNa -X dontdiff linux-2.6.19-rt12/include/linux/init_task.h 
linux-2.6.19-rt12-rcubpl/include/linux/init_task.h
--- linux-2.6.19-rt12/include/linux/init_task.h 2006-12-22 21:21:42.0 
-0800
+++ linux-2.6.19-rt12-rcubpl/include/linux/init_task.h  2006-12-24 
16:20:08.0 -0800
@@ -86,6 +86,18 @@ extern struct nsproxy init_nsproxy;
.siglock= __SPIN_LOCK_UNLOCKED(sighand.siglock),\
 }
 
+#ifdef CONFIG_PREEMPT_RCU_BOOST
+#define INIT_RCU_BOOST_PRIO .rcu_prio  = MAX_PRIO,
+#define INIT_PREEMPT_RCU_BOOST(tsk)\
+   .rcub_rbdp  = NULL, \
+   .rcub_state = RCU_BOOST_IDLE,   \
+   .rcub_entry = LIST_HEAD_INIT(tsk.rcub_entry),   \
+   .rcub_rbdp_wq   = NULL,
+#else /* #ifdef CONFIG_PREEMPT_RCU_BOOST */
+#define INIT_RCU_BOOST_PRIO
+#define INIT_PREEMPT_RCU_BOOST(tsk)
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU_BOOST */
+
 extern struct group_info init_groups;
 
 /*
@@ -142,6 +154,7 @@ extern struct group_info init_groups;
.pi_lock= RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),  \
INIT_TRACE_IRQFLAGS \
INIT_LOCKDEP\
+   INIT_PREEMPT_RCU_BOOST(tsk) \
 }
 
 
diff -urpNa -X dontdiff linux-2.6.19-rt12/include/linux/rcupdate.h 
linux-2.6.19-rt12-rcubpl/include/linux/rcupdate.h
--- linux-2.6.19-rt12/include/linux/rcupdate.h  2006-12-22 21:21:42.0 
-0800
+++ linux-2.6.19-rt12-rcubpl/include/linux/rcupdate.h   2006-12-24 
23:27:52.0 -0800
@@ -227,6 +227,20 @@ extern void rcu_barrier(void);
 extern void rcu_init(void);
 extern void rcu_advance_callbacks(int cpu, int user);
 extern void rcu_check_callbacks(int cpu, int 

Re: [PATCH 2.6.20-rc3] fix for bugzilla #7544 (keyspan USB-to-serial converter)

2007-01-02 Thread Greg KH
On Tue, Jan 02, 2007 at 07:16:54PM +0100, Rainer Weikusat wrote:
> At least the Keyspan USA-19HS USB-to-serial converter supports
> two different configurations, one where the input endpoints
> have interrupt transfer type and one where they are bulk endpoints.
> The default UHCI configuration uses the interrupt input endpoints.
> The keyspan driver, OTOH, assumes that the device has only bulk
> endpoints (all URBs are initialized by calling usb_fill_bulk_urb
> in keyspan.c/ keyspan_setup_urb).

So, this means that Keyspan changed the USB configuration of this
device?

Can you send me the output of 'cat /proc/bus/usb/devices' with this
keyspan device plugged in?  I'll compare it with the devices I have
here.

> This causes the interval field
> of the input URBs to have a value of zero instead of one, which
> 'accidentally' worked with Linux at least up to 2.6.17.11 but
> stopped to with 2.6.18, which changed the UHCI support code handling
> URBs for interrupt endpoints. The patch below modifies to driver to
> initialize its input URBs either as interrupt or as bulk URBs,
> depending on the transfertype contained in the associated endpoint
> descriptor (only tested with the default configuration) enabling
> the driver to again receive data from the serial converter.
> 
> Signed-off-by: Rainer Weikusat <[EMAIL PROTECTED]>
> ---
> diff -pNur linux-2.6.20-rc3/drivers/usb/serial/keyspan.c 
> linux-2.6.20-rc3-keyspan/drivers/usb/serial/keyspan.c
> --- linux-2.6.20-rc3/drivers/usb/serial/keyspan.c 2007-01-02 
> 11:10:22.0 +0100
> +++ linux-2.6.20-rc3-keyspan/drivers/usb/serial/keyspan.c 2007-01-02 
> 18:54:16.0 +0100
> @@ -95,6 +95,7 @@
>  */
>  
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1275,11 +1276,29 @@ static int keyspan_fake_startup (struct 
>  }
>  
>  /* Helper functions used by keyspan_setup_urbs */
> +static struct usb_endpoint_descriptor const *
> +find_ep_desc_for(struct usb_serial const *serial, int endpoint)
> +{
> + struct usb_host_endpoint const *p, *e;
> +
> + p = serial->interface->cur_altsetting->endpoint;
> + e = p + serial->interface->cur_altsetting->desc.bNumEndpoints;
> + while (p < e && p->desc.bEndpointAddress != endpoint) ++p;
> + 
> + if (unlikely(p == e)) panic("found no endpoint descriptor for "
> + "endpoint %d\n", endpoint);

No need for "unlikely" on such a slow path.

Also, panic() should be on the next line for the proper coding style.

And, we don't want to panic() for such a trivial thing.  Just abort the
probe sequence at most, but never shut down the machine for an odd
device that we find.

> +
> + return >desc;
> +}
> +
>  static struct urb *keyspan_setup_urb (struct usb_serial *serial, int 
> endpoint,
> int dir, void *ctx, char *buf, int len,
> void (*callback)(struct urb *))
>  {
>   struct urb *urb;
> + struct usb_endpoint_descriptor const *ep_desc;
> + char const *ep_type_name;
> + unsigned ep_type;
>  
>   if (endpoint == -1)
>   return NULL;/* endpoint not needed */
> @@ -1290,12 +1309,31 @@ static struct urb *keyspan_setup_urb (st
>   dbg ("%s - alloc for endpoint %d failed.", __FUNCTION__, 
> endpoint);
>   return NULL;
>   }
> + 
> + ep_desc = find_ep_desc_for(serial, endpoint);
> + ep_type = ep_desc->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK;
> + switch (ep_type) {
> + case USB_ENDPOINT_XFER_INT:
> + ep_type_name = "INT";
> + usb_fill_int_urb(urb, serial->dev,
> +  usb_sndintpipe(serial->dev, endpoint) | dir,
> +  buf, len, callback, ctx,
> +  ep_desc->bInterval);
> + break;
> +
> + case USB_ENDPOINT_XFER_BULK:
> + ep_type_name = "BULK";
> + usb_fill_bulk_urb(urb, serial->dev,
> +   usb_sndbulkpipe(serial->dev, endpoint) | dir,
> +   buf, len, callback, ctx);
> + break;
>  
> - /* Fill URB using supplied data. */
> - usb_fill_bulk_urb(urb, serial->dev,
> -   usb_sndbulkpipe(serial->dev, endpoint) | dir,
> -   buf, len, callback, ctx);
> + default:
> + panic("unsupported endpoint type %d", ep_type);

Again, no usb driver should be calling panic().

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: make aio_ring_info->nr_pages an unsigned int

2007-01-02 Thread Zach Brown


I had that changes earlier, but dropped it to make the patch smaller.


Still have it kicking around?

Making this stuff more consistent would be nice, I agree, I'm just  
not sure it's worth the risk of running into some subtle bugs.


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: make aio_ring_info->nr_pages an unsigned int

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:14 PM
> To: Chen, Kenneth W
> > --- ./include/linux/aio.h.orig  2006-12-24 22:31:55.0 -0800
> > +++ ./include/linux/aio.h   2006-12-24 22:41:28.0 -0800
> > @@ -165,7 +165,7 @@ struct aio_ring_info {
> >
> > struct page **ring_pages;
> > spinlock_t  ring_lock;
> > -   longnr_pages;
> > +   unsignednr_pages;
> >
> > unsignednr, tail;
> 
> Hmm.
> 
> This seems so trivial as to not be worth it.  It'd be more compelling  
> if it was more thorough -- doing things like updating the 'long i'  
> iterators that a feww have over ->nr_pages.  That kind of thing.   
> Giving some confidence that the references of ->nr_pages were audited.


I had that changes earlier, but dropped it to make the patch smaller. It
all started with head and tail index, which is defined as unsigned int in
structure, but in aio.c, all local variables that does temporary head and
tail calculation are unsigned long. While cleaning that, it got expanded
into nr_pages etc.  Oh well.

- Ken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Zach Brown


That is not possible because when multiple tasks waiting for  
events, they

enter the wait queue in FIFO order, prepare_to_wait_exclusive() does
__add_wait_queue_tail().  So first io_getevents() with min_nr of 2  
will

be woken up when 2 ops completes.


So switch the order of the two sleepers in the example?

The point is that there's no way to guarantee that the head of the  
wait queue will be the lowest min_nr.


I got list_add() from the add_wait_queue() still being used in  
wait_for_all_aios(), fwiw.  My mistake.


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Jan Engelhardt

On Jan 3 2007 01:52, Segher Boessenkool wrote:
>> > Leaving aside the issue of in-memory or not, I don't think
>> > it is realistic to think any completely common implementation
>> > will work for this -- it might for current SPARC+PowerPC+OLPC,
>> > but more stuff will be added over time...
>> 
>> I see nothing supporting this IMHO bogus claim.
>
> Please keep in mind that not all systems want to kill OF
> as soon as they enter the kernel -- some want to keep it
> active basically forever (or only remove it when the user
> asks for it).

Kill OF? sparc does not want that IMO, how else should I return to
the 'ok' prompt?


-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Heads up on a series of AIO patchsets

2007-01-02 Thread Kent Overstreet

> Any details?

Well, one path I tried I couldn't help but post a blog entry about
for my friends.  I'm not sure it's the direction I'll take with linux-
kernel, but the fundamentals are there:  the api should be the
syscall interface, and there should be no difference between sync and
async behaviour.

http://www.zabbo.net/?p=72


Any code you're willing to let people play with? I could at least have
real test cases, and a library to go along with it as it gets
finished.

Another pie in the sky idea:
One thing that's been bugging me lately (working on a 9p server), is
sendfile is hard to use in practice because you need packet headers
and such, and they need to go out at the same time.

Sendfile listio support would fix this, but it's not a general
solution. What would be really usefull is a way to say that a certain
batch of async ops either all succeed or all fail, and happen
atomically; i.e., transactions for syscalls.

Probably even harder to do than general async syscalls, but it'd be
the best thing since sliced bread... and hey, it seems the logical
next step.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 4:49 PM
> On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote:
> > This patch adds a wait condition to the wait queue and only wake-up
> > process when that condition meets.  And this condition is added on a
> > per task base for handling multi-threaded app that shares single  
> > ioctx.
> 
> But only one of the waiting tasks is tested, the one at the head of  
> the list.  It looks like this change could starve a io_getevents()  
> with a low min_nr in the presence of another io_getevents() with a  
> larger min_nr.
> 
> > +   if (waitqueue_active(>wait)) {
> > +   struct aio_wait_queue *wait;
> > +   wait = container_of(ctx->wait.task_list.next,
> > +   struct aio_wait_queue, wait.task_list);
> > +   if (nr_evt >= wait->nr_wait)
> > +   wake_up(>wait);
> > +   }
> 
> First is the fear of starvation as mentioned previously.
> 
> issue 2 ops
> first io_getevents sleeps with a min_nr of 2
> second io_getevents sleeps with min_nr of 3
> 2 ops complete but only test the second sleeper's min_nr of 3
> first sleeper twiddles thumbs

That is not possible because when multiple tasks waiting for events, they
enter the wait queue in FIFO order, prepare_to_wait_exclusive() does
__add_wait_queue_tail().  So first io_getevents() with min_nr of 2 will
be woken up when 2 ops completes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: make aio_ring_info->nr_pages an unsigned int

2007-01-02 Thread Zach Brown

--- ./include/linux/aio.h.orig  2006-12-24 22:31:55.0 -0800
+++ ./include/linux/aio.h   2006-12-24 22:41:28.0 -0800
@@ -165,7 +165,7 @@ struct aio_ring_info {

struct page **ring_pages;
spinlock_t  ring_lock;
-   longnr_pages;
+   unsignednr_pages;

unsignednr, tail;




Hmm.

This seems so trivial as to not be worth it.  It'd be more compelling  
if it was more thorough -- doing things like updating the 'long i'  
iterators that a feww have over ->nr_pages.  That kind of thing.   
Giving some confidence that the references of ->nr_pages were audited.


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Segher Boessenkool

[snipping a bit for now]


It's easier to start merging powerpc and sparc I reckon


Well it won't hurt to merge and clean up these two first, sure.


and then, "fix"
that so that it works on x86 :-)


That works, if the goal is to just add x86/OLPC to the list of
platforms that have a device tree fs.  I thought the plan was
to create a single, more generic, OF interface/API in the kernel
though.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: remove spurious ring head index modulo info->nr

2007-01-02 Thread Zach Brown
This makes the modulo of ring->head into local variable head  
unnecessary.

This patch removes that bogus code.


Looks fine to me:

Acked-by: Zach Brown <[EMAIL PROTECTED]>

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Shrink the held_lock struct by using bitfields.

2007-01-02 Thread Dave Jones
On Wed, Jan 03, 2007 at 01:47:36AM +0100, Bodo Eggert wrote:
 > Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > > Shrink the held_lock struct by using bitfields.
 > > This shrinks task_struct on lockdep enabled kernels by 480 bytes.
 > 
 > >  * The following field is used to detect when we cross into an
 > >  * interrupt context:
 > >  */
 > > - int irq_context;
 > [...]
 > > + unsigned char irq_context:1;
 > [...]
 > 
 > Can these fields be set by concurrent processes, e.g.:
 > CPU0CPU1
 > load flags
 > load flags
 > flip bit
 > store
 > flip bit
 > store

It's a per-process structure.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: streamline read events after woken up

2007-01-02 Thread Zach Brown

Given the previous patch "aio: add per task aio wait event condition"
that we properly wake up event waiting process knowing that we have
enough events to reap, it's just plain waste of time to insert itself
into a wait queue, and then immediately remove itself from the wait
queue for *every* event reap iteration.


Hmm, I dunno.  It seems like we're still left with a pretty silly loop.

Would it be reasonable to have a loop that copied multiple events at  
a time?  We could use some __copy_to_user_inatomic(), it didn't exist  
when this stuff was first written.


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: fix combined mode (was Re: Happy New Year (and v2.6.20-rc3 released))

2007-01-02 Thread Jeff Garzik

Alan wrote:

Once combined mode is fixed not to abuse resources (and it originally
did it that way for a good reason I grant and am not criticising that) the
entire management for legacy mode, mixed mode and native mode resources
for an ATA device (including 0x170, 0x3F6 and other wacky magic) becomes

if (pci_request_regions(pdev, "libata")) ...



Make sense ?


Yes.  For 2.6.21.  As I've always said.

But for 2.6.20, we are only HALFWAY there, and all these /new/ bugs 
exist as a result.


Your patch makes far more sense for 2.6.21, where the "halfway to 
salvation" state, and associated rough edges, is not exposed to users.


Fixing the resource tree was only half the solution, since the drivers 
that /use/ the resource tree now need updating.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cdrom: longer timeout for "Read Track Info" command

2007-01-02 Thread Jeremy Higdon
On Tue, Jan 02, 2007 at 02:50:53PM +0100, Jens Axboe wrote:
> Yep, I suspect this patch is long overdue. Jeremy, is this enough to fix
> it for you?

Yes, the 7 second timeout is fine.  It actually takes about 6.7 seconds.
I guess if "another popular OS" has a 7 second timeout that we won't find
multimedia devices out there that take longer than that.  :-)

My 15 seconds assumed that the observed case wasn't the worst case, but
it probably is.

This patch looks good.

Thanks

jeremy

> diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
> index 66d028d..3105ddd 100644
> --- a/drivers/cdrom/cdrom.c
> +++ b/drivers/cdrom/cdrom.c
> @@ -337,6 +337,12 @@ static const char *mrw_address_space[] = { "DMA", "GAA" 
> };
>  /* used in the audio ioctls */
>  #define CHECKAUDIO if ((ret=check_for_audio_disc(cdi, cdo))) return ret
>  
> +/*
> + * Another popular OS uses 7 seconds as the hard timeout for default
> + * commands, so it is a good choice for us as well.
> + */
> +#define CDROM_DEF_TIMEOUT(7 * HZ)
> +
>  /* Not-exported routines. */
>  static int open_for_data(struct cdrom_device_info * cdi);
>  static int check_for_audio_disc(struct cdrom_device_info * cdi,
> @@ -1528,7 +1534,7 @@ void init_cdrom_command(struct packet_command *cgc, 
> void *buf, int len,
>   cgc->buffer = (char *) buf;
>   cgc->buflen = len;
>   cgc->data_direction = type;
> - cgc->timeout = 5*HZ;
> + cgc->timeout = CDROM_DEF_TIMEOUT;
>  }
>  
>  /* DVD handling */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the scheduled IEEE1394_EXPORT_FULL_API removal

2007-01-02 Thread Stefan Richter
Adrian Bunk wrote:
>  Documentation/feature-removal-schedule.txt |8 ---
>  drivers/ieee1394/Kconfig   |7 --
>  drivers/ieee1394/ieee1394_core.c   |   22 -
>  3 files changed, 37 deletions(-)
> 
> --- linux-2.6.20-rc2-mm1/Documentation/feature-removal-schedule.txt.old   
> 2007-01-02 21:09:52.0 +0100
> +++ linux-2.6.20-rc2-mm1/Documentation/feature-removal-schedule.txt   
> 2007-01-02 21:10:00.0 +0100
> @@ -61,8 +60,0 @@
> -What:ieee1394's *_oui sysfs attributes (CONFIG_IEEE1394_OUI_DB)
> -When:January 2007
> -Files:   drivers/ieee1394/: oui.db, oui2c.sh
> -Why: big size, little value
> -Who: Stefan Richter <[EMAIL PROTECTED]>
> -
> 
> -

This hunk is wrong.
-- 
Stefan Richter
-=-=-=== ---= ---==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Shrink the held_lock struct by using bitfields.

2007-01-02 Thread Bodo Eggert
Dave Jones <[EMAIL PROTECTED]> wrote:

> Shrink the held_lock struct by using bitfields.
> This shrinks task_struct on lockdep enabled kernels by 480 bytes.

>  * The following field is used to detect when we cross into an
>  * interrupt context:
>  */
> - int irq_context;
[...]
> + unsigned char irq_context:1;
[...]

Can these fields be set by concurrent processes, e.g.:
CPU0CPU1
load flags
load flags
flip bit
store
flip bit
store

?
-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

http://david.woodhou.se/why-not-spf.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Segher Boessenkool

Leaving aside the issue of in-memory or not, I don't think
it is realistic to think any completely common implementation
will work for this -- it might for current SPARC+PowerPC+OLPC,
but more stuff will be added over time...


I see nothing supporting this IMHO bogus claim.


Please keep in mind that not all systems want to kill OF
as soon as they enter the kernel -- some want to keep it
active basically forever (or only remove it when the user
asks for it).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Zach Brown


On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote:


The AIO wake-up notification from aio_complete is really inefficient
in current AIO implementation in the presence of process waiting in
io_getevents().


Yeah, it's a real deficiency.  Thanks for taking a stab at it.


This patch adds a wait condition to the wait queue and only wake-up
process when that condition meets.  And this condition is added on a
per task base for handling multi-threaded app that shares single  
ioctx.


But only one of the waiting tasks is tested, the one at the head of  
the list.  It looks like this change could starve a io_getevents()  
with a low min_nr in the presence of another io_getevents() with a  
larger min_nr.



Before:
 0  0  0 3972608   7056  3131200 14100 0 7885  
13747  0  2 98  0

After:
 0  0  0 3972608   7056  3131200 13800 0 7885 
42  0  2 98  0


Nice.  What min_nr was used in this test?


+struct aio_wait_queue {
+   int nr_wait;/* wake-up condition */


It appears that this is never assigned a negative?  Can we make it  
that explicit in the type so that we reviewers don't have to worry  
about wrapping and signed comparisons?



-   DECLARE_WAITQUEUE(wait, tsk);
+   struct aio_wait_queue wait;



+   aio_init_wait();


This just changed from using default_wake_function() to  
autoremove_wait_function().  Very sneaky!  wait_for_all_aios() should  
be adding the wait queue before going to sleep each time.  (better  
still to just use wait_event()).


Was this on purpose?  I'm all for it as a way to reduce wakeups from  
a stream of completions to a single waiter.



+   nr_evt = ring->tail - ring->head;
+   if (nr_evt < 0)
+   nr_evt += info->nr;


 int = unsigned - unsigned;
 if (int < 0)

My head already hurts.  Can we clean this up so one doesn't have to  
live and breath type conversion rules to tell if this code is correct?



+   if (waitqueue_active(>wait)) {
+   struct aio_wait_queue *wait;
+   wait = container_of(ctx->wait.task_list.next,
+   struct aio_wait_queue, wait.task_list);
+   if (nr_evt >= wait->nr_wait)
+   wake_up(>wait);
+   }


First is the fear of starvation as mentioned previously.

issue 2 ops
first io_getevents sleeps with a min_nr of 2
second io_getevents sleeps with min_nr of 3
2 ops complete but only test the second sleeper's min_nr of 3
first sleeper twiddles thumbs

This makes me think this elegant task_list approach is doomed.  I  
think this is what stopped Ben and I from being interested in this  
last time we talked about it :).


Also, is that container_of() and dereference safe in the presence of  
racing wake-ups?  It looks like we could get deref a freed wait and  
get a bogus nr_wait and decide not to wake.


Andrew, I fear we should remove this from -mm until it's fixed up.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Segher Boessenkool

Not single thread -- but a "global OF lock" yes.  Not that
it matters too much, (almost) all property accesses are init
time anyway (which is effectively single threaded).


Not that true anymore. A lot of driver probe is being threaded
nowadays,
either bcs of the new multithread probing bits, or because they get
loaded by userland from some initramfs etc..


The kernel doesn't care if one CPU is in OF land while the others
are doing other stuff -- well you have to make sure the OF won't
try to use a hardware device at the same time as the kernel, true.


True, but at the very least you have to prevent multiple cpus
from enterring OFW.  In fact this is very important.


Yes.  "Global OF lock".


OFW is not multi-threaded


You are not _guaranteed_ it is multithreaded, and you don't
know it's threading model (or how to do thread synchronisation).


therefore you can't let multiple CPUs call
into OFW at one time.  You must use some kind of locking mechanism,
and that locking mechanism is not simple because it has to not just
stop the other cpus, it has to be able to stop the other cpus yet
still allow them to receive SMP cross-calls from the firmware if the
OFW call is 'stop' or similar.


YOu don't need to *stop* the other CPUs, you just need to
prevent them from entering the client interface.  Put a lock
in front.


I'm a bit concerned about the 100kB or so of data duplication
(on a *quite big* device tree), and the extra code you need
(all changes have to be done to both tree copies).  Maybe
I shouldn't be worried; still, it's obviously not a great
idea to *require* any arch to get and keep a full copy of
the tree -- it's wasteful and unnecessary.


The largest amount of memory I've ever seen consumed on sparc64
was 76K and this is 1) 64-bit and 2) an ENORMOUS machine with
lots of cpus and devices.  And I know because sparc64 prints
a kernel message at boot which states how much memory was
consumed by the in-kernel device tree copy.


The in-OF tree uses a bit more memory, depending on implementation.
It's hard to tell though, it contains so much more than the
properties-only tree, perhaps you're right.


Please let's get over this memory consumption non-issue and move
on to more productive talk.


Okay -- so answer the second part of my concern please: if you keep
a copy, you need to keep both in sync -- that means every change
by the kernel has to be done twice, and you won't ever be told about
changes by the OF, so you have to get a full fresh copy every single
time you return from an OF client call that could have changed a
property.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Benjamin Herrenschmidt

> The biggest problem is the huge collection of workarounds we have
> for PowerPC alone already -- if that can be moved into some
> quirk collection thing, where certain quirks are only run on some
> systems, it might scale.

Well, if you notice, I've been moving most of such workarounds to
prom_init.c, that is, to the code that actually fetches the device-tree
from the firmware, which make things easier.

There are still a few, notably in prom_parse.c, but I'm pretty confident
they aren't too invasive (and I need to backport more fixes from david
there) and in macio_asic, but the later is specific enough to not be a
problem.

> You'll also have to deal with endianness finally (you can *not*
> access an integer property via an int*).

Yes, that's one big thing we'll need to do.

> It will be easiest to start with a biggish collection of hooks,
> that doesn't require too much code change, and slowly converge
> stuff.

Hooks ? How so ?

It's easier to start merging powerpc and sparc I reckon and then, "fix"
that so that it works on x86 :-)

> All properties can be changed, any new property can be created.
> Oh you mean after you killed OF -- yeah, it gets a bit harder
> then eh :-)

You know very well what I meant :-) The ones where it makes sense to
write them back to OF. On machines where OF dies, there are mecanism via
the nvram to store properties in /options (like boot-device etc...),
there are at least 2 such mecanisms (apple oldworld and chrp), and on
machines where OF is still alive, that should probably be an OF call of
some sort.

So we do want some sort of platform hook for -these-, but at the same
time, that's fairly low on the list. Currently, we have no code in the
kernel to deal with that on powerpc, it's all userland via /dev/nvram,
and I reckon it might just stay that way with platform specific userland
tools for a little longer.

Ben.
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Open letter to Linux kernel developers (was Re: Binary Drivers)

2007-01-02 Thread David Schwartz

> On Tue, 2007-01-02 at 12:14 -0800, David Schwartz wrote:
> > > The recommendet _serving_ temperature for coffe is 55 °C or below.
> > 
> > Nonsense! 55C (100F) is ludicrously low for coffee.
> > 
> > 70C (125F) is the *minimum* recommended serving temperature. 
> 165-190F is the
> > preferred serving range. I can cite source after source for this. For
> > example:
> > http://www.bunn.com/pages/coffeebasics/cb6holding.html
> > http://www.millcreekcoffee.com/holding.htm
> 
> Do you actually read your citations? Your cited sources both give the
> SERVING temp as 155 - 175 F.

The conversion was incorrect. 70C is about 160F, and 55C is about 130F. As I 
said in the correction, every number is correct in the unit it was first posted 
in, and all the claims are correct.

160F is the mininum recommended serving temperature and 165-190F is the 
preferred range. 130F is a ludicrously low serving temperature for coffee. 180F 
seems to be about ideal.

Stella Liebeck's lawyers argued that coffee should never be served hotter than 
140F. This is no different from arguing that knives should be dull.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


FWA7304 VIA C3 system work-around for power-off bug.

2007-01-02 Thread Ben Greear

Older versions of the iBase FWA7304 BIOS have a bug that causes the
system to use way too much power when you run 'init 0', causing
the power brick to burn out after about 3 hours.

The fix for this is to get an updated BIOS from the manufacturer:
IB798F-T2-CP1A-1229

The problem still happens if you enable ACPI, but at least it seems
fixed if you disable ACPI.

Hope this helps someone!

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Open Firmware device tree virtual filesystem

2007-01-02 Thread Segher Boessenkool

Leaving aside the issue of in-memory or not, I don't think
it is realistic to think any completely common implementation
will work for this -- it might for current SPARC+PowerPC+OLPC,
but more stuff will be added over time...


And ? I don't see why a mostly common implementations wouldn't work,
provided that we provide hooks in the right place.


Now read back and see that that is very close to what I said.


It's pretty clear to me that the actual construction of the in-memory
tree will remain platform specific (powerpc has this flattened format
used for the trampoline for example and so far, I don't think other
platforms plan to use it, though it might be a good idea too :-) sparc
has "issues" related to firmwares that aren't quite OF, etc...)

But it's also clear that the in-kernel representation, accessors and
filesystem could/should be totally identical, including all we build on
top, like prom_parse, of_device/of_platform device stuff etc.. (for
which I need to re-sync with davem too btw, as he did some fixes that I
didn't backport to powerpc... sigh)


The biggest problem is the huge collection of workarounds we have
for PowerPC alone already -- if that can be moved into some
quirk collection thing, where certain quirks are only run on some
systems, it might scale.

You'll also have to deal with endianness finally (you can *not*
access an integer property via an int*).

It will be easiest to start with a biggish collection of hooks,
that doesn't require too much code change, and slowly converge
stuff.


The other -one- thing that has to be different is the write back for
properties that can be changed (/options typically) where the write 
back

mecanism is definitely platform specific.


All properties can be changed, any new property can be created.
Oh you mean after you killed OF -- yeah, it gets a bit harder
then eh :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: fix combined mode (was Re: Happy New Year (and v2.6.20-rc3 released))

2007-01-02 Thread Alan
> Thus, if you avoid calling pci_request_regions (as your patch does), you 
> must manually provide the same guarantees that pci_request_regions 
> provides to its callers.

pci_request_regions reserves only BAR4/BAR5 in legacy mode because of the
fact the resources are mashed and eventually cleaare by the existing (pre
2.6.20-rc) PCI code. The new code does provide that guarantee which is
(unfortunately) precisely why you get the problem - because the combined
mode hack currently relies on it failing to do so.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/2] fix flush_workqueue() vs CPU_DEAD race

2007-01-02 Thread Andrew Morton
On Sat, 30 Dec 2006 19:10:31 +0300
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> "[PATCH 1/2] reimplement flush_workqueue()" fixed one race when CPU goes down
> while flush_cpu_workqueue() plays with it. But there is another problem, CPU
> can die before flush_workqueue() has a chance to call flush_cpu_workqueue().
> In that case pending work_structs can migrate to CPU which was already 
> checked,
> so we should redo the "for_each_online_cpu(cpu)" loop.
> 

I have a mental note that these:

extend-notifier_call_chain-to-count-nr_calls-made.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes.patch
extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release.patch
define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc.patch
eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch
handle-cpu_lock_acquire-and-cpu_lock_release-in-workqueue_cpu_callback.patch

should be scrapped.  But really I forget what their status is.  Gautham,
can you please remind us where we're at?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: fix combined mode (was Re: Happy New Year (and v2.6.20-rc3 released))

2007-01-02 Thread Alan
>   1) Programmatically reserve /all/ resources associated with
>  our PCI device
>   2) Manually reserve resources associated with our PCI device,
>  but are not listed in struct pci_dev.

Which it doesn't actually do. You reserve 0x1F0-0x1F7 but forget the
other register. BTW on unload you forget to release 0x1F0-7 too. I've not
fixed that as its not a regression just a bug. It's just another symptom
of a pile of code trying to do things the wrong way, in the wrong place.

> But then 2.6.21 goes back to:
> 
>   1) Programmatically reserve /all/ resources associated with
>  our PCI device
>   2) Manually reserve resources associated with our PCI device,
>  but are not listed in struct pci_dev.

Nope. You are either very confused about how PCI bus resources work or
you are trying to implement the future code in a very very peculiar way 8)

Remember with the resource tree now correct all the resources for an IDE
controller *are* in the pci_dev struct properly - the special cases are
all gone in libata and in drivers/ide.

Once combined mode is fixed not to abuse resources (and it originally
did it that way for a good reason I grant and am not criticising that) the
entire management for legacy mode, mixed mode and native mode resources
for an ATA device (including 0x170, 0x3F6 and other wacky magic) becomes

if (pci_request_regions(pdev, "libata")) ...

You'll note:
- No special cases for differing modes
- No libata knowledge of PCI legacy mapping rules and addresses
- The death of the magic ATA_PRIMARY/SECONDARY constants and their magic
numbers
- Support for platforms that map legacy space differently
- Trivial cleanup from failure unlike the current code

all in one line. This will also fix all the existing bugs where unloading
a libata driver fails to free resources as pci_release_regions() will also
now do the correct thing.

*That* is one key reason why getting the PCI resource map right is so
important. We turn fifty lines of bug ridden hard to debug code into one
line of code that actually does more than the original, and gets it
right. For free we get the leaked resources after rmmod fixed, we get the
mixed mode resources fixed, we get all this stuff for free. We get to
shoot a chunk of code in drivers/ide if we want as well.

If we want to keep a combined mode in 2.6.21 with drivers/ide (which
seems dumb as libata has progressed far beyond the need for it) then the
-mm tree has a pci_request_regions_mask() function which we can push
to .21 development and we end up with five lines not three for these
cases.

Make sense ?

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [parisc-linux] [RFC][PATCH] use cycle_t instead of u64 in struct

2007-01-02 Thread John David Anglin
> The 32bit and 64bit PARISC Linux kernels suffers from the problem, that the 
> gettimeofday() call sometimes returns non-monotonic times.

This certainly needs to be fixed.  I see stuff like this from ping:

64 bytes from 132.246.100.193: icmp_seq=19 ttl=255 time=0.4 ms
64 bytes from 132.246.100.193: icmp_seq=20 ttl=255 time=429496729.5 ms

tar also occasionally prints warning about times.  This is with a
32bit kernel.

Dave
-- 
J. David Anglin  [EMAIL PROTECTED]
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc: vio of_node_put cleanup

2007-01-02 Thread Segher Boessenkool

The comment used to be inside the "if" block, is this
change correct?


You'd prefer an empty line in there?


Obviously, you should change the comment to include the
conditional, if that is what is needed.


[And, do we want all these changes anyway?  I don't care
either way, both sides have their pros and their cons --
just asking :-) ]


You know my opinion already :-)


Heh.  Ok, I'll rephrase: is there _consensus_ that this is a
good thing :-)  [But never mind, I looked it up, and it is
*documented* as being supported, so fine with me].


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again

2007-01-02 Thread David Miller
From: James Bottomley <[EMAIL PROTECTED]>
Date: Tue, 02 Jan 2007 17:34:18 -0600

> Erm ... for a device driver, if we're preparing to do I/O on the page
> something must have made the user caches coherent ... that can't be
> kmap, because the driver might elect to DMA on the page ... unless
> another component of this API is going to be to make dma_map_... also
> flush the user cache?

The DMA map/unmap/sync performs the necessary cache flushes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] EDAC: K8 Memory scrubbing patch

2007-01-02 Thread Doug Thompson
from:   Frithiof Jensen <[EMAIL PROTECTED]>

 This patch is meant for Kernel version 2.6.19
 
 This is a first, naive, attempt of providing an interface for memory 
 scrubbing in EDAC.
 
 The following things are still outstanding:
 
 - Only the K8 driver has been refactored with a HW scrub function
 
   The patch provide a method of configuring the K8 hardware memory 
   scrubber via the 'mcX' sysfs directory. There should be some 
   fallback to a generic scrubber implemented in software if the 
   hardware does not support scrubbing.
 
   Or .. the scrubbing sysfs entry should not be visible at all.
 
 - Only works with SDRAM,
 
   The K8 can scrub cache and l2cache also - but I think this is
   not so useful as the cache is busy all the time (one hopes). 
 
   One would also expect that cache scrubbing requires hardware
   support.
 
 - Error Handling, 
 
   I would like that errors are returned to the user in 
   "terms of file system". 
 
 - Presentation, 
 
   I chose Bandwidth in Bytes/Second as a representation of 
   the scrubbing rate for the following reasons:
 
   I like that the sysfs entries are sort-of textual, related 
   to something that makes sense instead of magical values 
   that must be looked up. 
 
   "My People" wants "% main memory scrubbed per hour" others 
   prefer "% memory bandwidth used" as representation, "bandwith 
   used" makes it easy to calculate both versions in one-liner 
   scripts.  
 
   If one later wants to scrub cache, the scaling becomes wierd
   for K8 changing from "blocks of 64 byte memory" to "blocks of
   64 cache lines" to "blocks of 64 bit". Using "bandwidth used" 
   makes sense in all three cases, (I.M.O. anyway ;-).
 
 - Discovery,
 
   There is no way to discover the possible settings and what 
   they do without reading the code and the documentation. 
   
   *I* do not know how to make that work in a practical way.
 
 - Bugs(??),
 
   other tools can set invalid values in the memory scrub 
   control register, those will read back as '-1', requiring
   the user to reset the scrub rate. This is how *I* think it
   should be.  
 
 - Afflicting other areas of code,
 
   I made changes to edac_mc.c and edac_mc.h which will show up
   globally - this is not nice, it would be better that the 
   memory scrubbing fuctionality and interface could be entirely
   contained within the memory controller it applies to. 
 
 
Signed-off-by: Frithiof Jensen <[EMAIL PROTECTED]>
Signed-off-by: doug thompson <[EMAIL PROTECTED]>
 
 k8_edac.c |  135
++
 1 file changed, 135 insertions(+)

Index: linux-2.6.19/drivers/edac/k8_edac.c
===
--- linux-2.6.19.orig/drivers/edac/k8_edac.c
+++ linux-2.6.19/drivers/edac/k8_edac.c
@@ -254,6 +254,17 @@
 *  7:0  Err addr high 39:32
 */
 
+#define K8_SCRCTRL  0x58/* Memory scrub control register.
+*
+* 30:21 reserved
+* 20:16 dcache scrub
+* 15:13 reserved
+* 12:8  L2Scrub
+* 7:5   reserved
+* 4:0   dramscrub
+*
+*/
+
 #define K8_NBCAP   0xE8/* MCA NB capabilities (32b)
 *
 * 31:9  reserved
@@ -412,6 +423,45 @@ static const struct k8_dev_info k8_devs[
 .misc_ctl = PCI_DEVICE_ID_AMD_OPT_3_MISCCTL},
 };
 
+/* Valid scrub rates for the K8 hardware memory scrubber. We map
+   maps the scrubbing bandwith to a valid bit pattern. The 'set'
+   operation finds the 'matching- or higher value'.
+
+   FIXME: Produce a better mapping/linearisation.
+*/
+
+# define SDRATE_EOD 0x
+
+static struct scrubrate {
+   u32 scrubval;   /* bit pattern for scrub rate */
+   u32 bandwidth;  /* bandwidth consumed by scrubbing in bytes/sec 
*/
+} scrubrates[] = {
+   {0x00,  0UL},   /* Scrubbing Off */
+   {0x16,761UL},   /* Slowest Rate  */
+   {0x15,   1523UL},
+   {0x14,   3051UL},
+   {0x13,   6101UL},
+   {0x12,  12213UL},
+   {0x11,  24427UL},
+   {0x10,  48854UL},
+   {0x0F,  97650UL},
+   {0x0E, 195300UL},
+   {0x0D, 390720UL},
+   {0x0C, 781440UL},
+   {0x0B,1560975UL},
+   {0x0A,3121951UL},
+   {0x09,6274509UL},
+   {0x08,   12284069UL},
+   {0x07,   2500UL},
+   {0x06,   5000UL},
+   {0x05,  1UL},
+   {0x04,  2UL},
+   {0x03,  4UL},
+   {0x02,  8UL},
+   {0x01, 16UL},
+   {0x00, SDRATE_EOD}  /* End Of Data */
+};
+
 static struct 

[PATCH 2/2] EDAC: e752x-byte-access-fix

2007-01-02 Thread Doug Thompson
from: Brian Pomerantz <[EMAIL PROTECTED]>

Source: MontaVista Software, Inc.
MR: 17525
Type: Defect Fix
Disposition: local
Description:
The reading of the DRA registers should be a byte at a time (one
register at a time) instead of 4 bytes at a time (four registers).
Reading a dword at a time retrieves erronious information from all
but the first register.  A change was made to read in each
register in a loop prior to using the data in those registers.

Signed-off-by: Brian Pomerantz <[EMAIL PROTECTED]>
Signed-off-by: Dave Jiang <[EMAIL PROTECTED]>
Signed-off-by: Doug Thompson <[EMAIL PROTECTED]>

 e752x_edac.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


Index: linux-2.6.18/drivers/edac/e752x_edac.c
===
--- linux-2.6.18.orig/drivers/edac/e752x_edac.c
+++ linux-2.6.18/drivers/edac/e752x_edac.c
@@ -787,7 +787,12 @@ static void e752x_init_csrows(struct mem
u8 value;
u32 dra, drc, cumul_size;
 
-   pci_read_config_dword(pdev, E752X_DRA, );
+   dra = 0;
+   for (index=0; index < 4; index++) {
+   u8 dra_reg;
+   pci_read_config_byte(pdev, E752X_DRA+index, _reg);
+   dra |= dra_reg << (index * 8);
+   }
pci_read_config_dword(pdev, E752X_DRC, );
drc_chan = dual_channel_active(ddrcsr);
drc_drbg = drc_chan + 1;  /* 128 in dual mode, 64 in single */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Subject: [PATCH 2/2] EDAC: K8 Memory scrubbing patch

2007-01-02 Thread Doug Thompson
from:   Frithiof Jensen <[EMAIL PROTECTED]>

 This patch is meant for Kernel version 2.6.19
 
 This is a first, naive, attempt of providing an interface for memory 
 scrubbing in EDAC.
 
 The following things are still outstanding:
 
 - Only the K8 driver has been refactored with a HW scrub function
 
   The patch provide a method of configuring the K8 hardware memory 
   scrubber via the 'mcX' sysfs directory. There should be some 
   fallback to a generic scrubber implemented in software if the 
   hardware does not support scrubbing.
 
   Or .. the scrubbing sysfs entry should not be visible at all.
 
 - Only works with SDRAM,
 
   The K8 can scrub cache and l2cache also - but I think this is
   not so useful as the cache is busy all the time (one hopes). 
 
   One would also expect that cache scrubbing requires hardware
   support.
 
 - Error Handling, 
 
   I would like that errors are returned to the user in 
   "terms of file system". 
 
 - Presentation, 
 
   I chose Bandwidth in Bytes/Second as a representation of 
   the scrubbing rate for the following reasons:
 
   I like that the sysfs entries are sort-of textual, related 
   to something that makes sense instead of magical values 
   that must be looked up. 
 
   "My People" wants "% main memory scrubbed per hour" others 
   prefer "% memory bandwidth used" as representation, "bandwith 
   used" makes it easy to calculate both versions in one-liner 
   scripts.  
 
   If one later wants to scrub cache, the scaling becomes wierd
   for K8 changing from "blocks of 64 byte memory" to "blocks of
   64 cache lines" to "blocks of 64 bit". Using "bandwidth used" 
   makes sense in all three cases, (I.M.O. anyway ;-).
 
 - Discovery,
 
   There is no way to discover the possible settings and what 
   they do without reading the code and the documentation. 
   
   *I* do not know how to make that work in a practical way.
 
 - Bugs(??),
 
   other tools can set invalid values in the memory scrub 
   control register, those will read back as '-1', requiring
   the user to reset the scrub rate. This is how *I* think it
   should be.  
 
 - Afflicting other areas of code,
 
   I made changes to edac_mc.c and edac_mc.h which will show up
   globally - this is not nice, it would be better that the 
   memory scrubbing fuctionality and interface could be entirely
   contained within the memory controller it applies to. 
 
 
Signed-off-by: Frithiof Jensen <[EMAIL PROTECTED]>
Signed-off-by: doug thompson <[EMAIL PROTECTED]>
 
 k8_edac.c |  135
++
 1 file changed, 135 insertions(+)

Index: linux-2.6.19/drivers/edac/k8_edac.c
===
--- linux-2.6.19.orig/drivers/edac/k8_edac.c
+++ linux-2.6.19/drivers/edac/k8_edac.c
@@ -254,6 +254,17 @@
 *  7:0  Err addr high 39:32
 */
 
+#define K8_SCRCTRL  0x58/* Memory scrub control register.
+*
+* 30:21 reserved
+* 20:16 dcache scrub
+* 15:13 reserved
+* 12:8  L2Scrub
+* 7:5   reserved
+* 4:0   dramscrub
+*
+*/
+
 #define K8_NBCAP   0xE8/* MCA NB capabilities (32b)
 *
 * 31:9  reserved
@@ -412,6 +423,45 @@ static const struct k8_dev_info k8_devs[
 .misc_ctl = PCI_DEVICE_ID_AMD_OPT_3_MISCCTL},
 };
 
+/* Valid scrub rates for the K8 hardware memory scrubber. We map
+   maps the scrubbing bandwith to a valid bit pattern. The 'set'
+   operation finds the 'matching- or higher value'.
+
+   FIXME: Produce a better mapping/linearisation.
+*/
+
+# define SDRATE_EOD 0x
+
+static struct scrubrate {
+   u32 scrubval;   /* bit pattern for scrub rate */
+   u32 bandwidth;  /* bandwidth consumed by scrubbing in bytes/sec 
*/
+} scrubrates[] = {
+   {0x00,  0UL},   /* Scrubbing Off */
+   {0x16,761UL},   /* Slowest Rate  */
+   {0x15,   1523UL},
+   {0x14,   3051UL},
+   {0x13,   6101UL},
+   {0x12,  12213UL},
+   {0x11,  24427UL},
+   {0x10,  48854UL},
+   {0x0F,  97650UL},
+   {0x0E, 195300UL},
+   {0x0D, 390720UL},
+   {0x0C, 781440UL},
+   {0x0B,1560975UL},
+   {0x0A,3121951UL},
+   {0x09,6274509UL},
+   {0x08,   12284069UL},
+   {0x07,   2500UL},
+   {0x06,   5000UL},
+   {0x05,  1UL},
+   {0x04,  2UL},
+   {0x03,  4UL},
+   {0x02,  8UL},
+   {0x01, 16UL},
+   {0x00, SDRATE_EOD}  /* End Of Data */
+};
+
 static struct 

[PATCH 1/2] EDAC: e752x-bit-mask-fix

2007-01-02 Thread Doug Thompson
from: Brian Pomerantz <[EMAIL PROTECTED]>

Description:
The fatal vs. non-fatal mask for the sysbus FERR status is
incorrect
according to the E7520 datasheet.  This patch corrects the mask to
correctly
handle fatal and non-fatal errors.

Signed-off-by: Brian Pomerantz <[EMAIL PROTECTED]>
Signed-off-by: Dave Jiang <[EMAIL PROTECTED]>
Signed-off-by: Doug Thompson <[EMAIL PROTECTED]>

 e752x_edac.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6.18/drivers/edac/e752x_edac.c
===
--- linux-2.6.18.orig/drivers/edac/e752x_edac.c
+++ linux-2.6.18/drivers/edac/e752x_edac.c
@@ -561,17 +561,17 @@ static void e752x_check_sysbus(struct e7
error32 = (stat32 >> 16) & 0x3ff;
stat32 = stat32 & 0x3ff;
 
-   if(stat32 & 0x083)
-   sysbus_error(1, stat32 & 0x083, error_found, handle_error);
+   if(stat32 & 0x087)
+   sysbus_error(1, stat32 & 0x087, error_found, handle_error);
 
-   if(stat32 & 0x37c)
-   sysbus_error(0, stat32 & 0x37c, error_found, handle_error);
+   if(stat32 & 0x378)
+   sysbus_error(0, stat32 & 0x378, error_found, handle_error);
 
-   if(error32 & 0x083)
-   sysbus_error(1, error32 & 0x083, error_found, handle_error);
+   if(error32 & 0x087)
+   sysbus_error(1, error32 & 0x087, error_found, handle_error);
 
-   if(error32 & 0x37c)
-   sysbus_error(0, error32 & 0x37c, error_found, handle_error);
+   if(error32 & 0x378)
+   sysbus_error(0, error32 & 0x378, error_found, handle_error);
 }
 
 static void e752x_check_membuf (struct e752x_error_info *info,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the scheduled eepro100 removal

2007-01-02 Thread Eric Piel

02.01.2007 22:57, Adrian Bunk wrote/a écrit:

This patch contains the scheduled removal of the eepro100 driver.


Hi, I've been using e100 for years with no problem, however more by 
curiosity than necessity I'd like to know how will be handled the 
devices which are (supposedly) supported by eepro100 and not by e100?


According to "modinfo eepro100" and "modinfo e100" those devices IDs are 
only matched by eepro100:

+alias:  pci:v8086d1035sv
+alias:  pci:v8086d1036sv
+alias:  pci:v8086d1037sv
+alias:  pci:v8086d1227sv
+alias:  pci:v8086d5200sv
+alias:  pci:v8086d5201sv

Are they matched by some joker rule that I haven't noticed in e100, or 
is support for them really going to disappear?


See you,
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Open letter to Linux kernel developers (was Re: Binary Drivers)

2007-01-02 Thread Brian Beattie
On Tue, 2007-01-02 at 12:14 -0800, David Schwartz wrote:
> > The recommendet _serving_ temperature for coffe is 55 °C or below.
> 
> Nonsense! 55C (100F) is ludicrously low for coffee.
> 
> 70C (125F) is the *minimum* recommended serving temperature. 165-190F is the
> preferred serving range. I can cite source after source for this. For
> example:
> http://www.bunn.com/pages/coffeebasics/cb6holding.html
> http://www.millcreekcoffee.com/holding.htm

Do you actually read your citations? Your cited sources both give the
SERVING temp as 155 - 175 F.
-- 
Brian Beattie
Firmware Engineer
APCON, Inc.
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bus #06 (-#09) is hidden behind transparent bridge

2007-01-02 Thread Michael Ringe
I am posting this because my kernel told me so...
The attached file contains dmesg's with and without pci-assign-busses.
My hardware is a Samsung Q35 Pro. Feel free to contact me
if you need any further information.
--Michael






dmesg.bz2
Description: Binary data


inaccurate migration cost calculation?

2007-01-02 Thread Dave Jones
Across different boots using the same 2.6.19 kernel on a quad-core xeon
I see huge variance in the migration_cost being reported during boot.

-migration_cost=39,3940
+migration_cost=25,4941

This CPU has a very large cache which could be key here...
 L1 Instruction cache: 32KB, 8-way associative. 64 byte line size.
 L1 Data cache: 32KB, 8-way associative. 64 byte line size.
 L3 unified cache: 4MB, 16-way associative. 64 byte line size.


Here's the output of migration_debug=1 for another two boots..

-> [0][1][  65536]   0.0 [  0.0] (1): (   4250721253)
-> [0][1][  72817]   0.0 [  0.0] (1): (   4635012548)
-> [0][1][  80907]   0.0 [  0.0] (1): (   5857112384)
-> [0][1][  89896]   0.0 [  0.0] (1): (   57738 6608)
-> [0][1][  99884]   0.0 [  0.0] (1): (   60743 4806)
-> [0][1][ 110982]   0.0 [  0.0] (1): (   67393 5728)
-> [0][1][ 123313]   0.0 [  0.0] (1): (   72416 5375)
-> [0][1][ 137014]   0.0 [  0.0] (1): (   73890 3424)
-> [0][1][ 152237]   0.0 [  0.0] (1): (   75735 2634)
-> [0][1][ 169152]   0.0 [  0.0] (1): (   91870 9384)
-> [0][1][ 187946]   0.0 [  0.0] (1): (   88701 6276)
-> [0][1][ 208828]   0.1 [  0.1] (1): (  102106 9840)
-> [0][1][ 232031]   0.1 [  0.1] (1): (  11243310083)
-> [0][1][ 257812]   0.1 [  0.1] (1): (  118441 8045)
-> [0][1][ 286457]   0.1 [  0.1] (1): (  126302 7953)
-> [0][1][ 318285]   0.1 [  0.1] (1): (  135442 8546)
-> [0][1][ 353650]   0.1 [  0.1] (1): (  16543819271)
-> [0][1][ 392944]   0.1 [  0.1] (1): (  19614324988)
-> [0][1][ 436604]   0.1 [  0.1] (1): (  18062620252)
-> [0][1][ 485115]   0.2 [  0.2] (1): (  20901724321)
-> [0][1][ 539016]   0.2 [  0.2] (1): (  23166423484)
-> [0][1][ 598906]   0.2 [  0.2] (1): (  27756934694)
-> [0][1][ 665451]   0.2 [  0.2] (1): (  27635117956)
-> [0][1][ 739390]   0.3 [  0.3] (1): (  31870730156)
-> [0][1][ 821544]   0.3 [  0.3] (1): (  38286547157)
-> [0][1][ 912826]   0.3 [  0.3] (1): (  39856631429)
-> [0][1][1014251]   0.4 [  0.4] (1): (  44175737310)
-> [0][1][1126945]   0.5 [  0.5] (1): (  51610055826)
-> [0][1][1252161]   0.5 [  0.5] (1): (  54354941637)
-> [0][1][1391290]   0.6 [  0.6] (1): (  60306850578)
-> [0][1][1545877]   0.6 [  0.6] (1): (  63708342296)
-> [0][1][1717641]   0.8 [  0.8] (1): (  800596   102904)
-> [0][1][1908490]   0.8 [  0.8] (1): (  85678679547)
-> [0][1][2120544]   0.9 [  0.9] (1): (  91989571328)
-> [0][1][2356160]   0.9 [  0.9] (1): (  97308562259)
-> [0][1][2617955]   1.1 [  1.1] (1): ( 1114043   101608)
-> [0][1][2908838]   1.2 [  1.2] (1): ( 1219850   103707)
-> [0][1][3232042]   1.3 [  1.3] (1): ( 1327969   105913)
-> [0][1][3591157]   1.4 [  1.4] (1): ( 1463296   120620)
-> [0][1][3990174]   1.6 [  1.6] (1): ( 1630438   143881)
-> [0][1][4433526]   1.7 [  1.7] (1): ( 1729272   121357)
-> [0][1][4926140]   1.7 [  1.7] (1): ( 173327862681)
-> [0][1][5473488]   1.9 [  1.9] (1): ( 1958482   143942)
-> [0][1][6081653]   1.8 [  1.9] (1): ( 1872926   114749)
-> [0][1][6757392]   1.7 [  1.9] (1): ( 1713013   137331)
-> [0][1][7508213]   1.3 [  1.9] (1): ( 1392757   228793)
-> [0][1][8342458]   1.1 [  1.9] (1): ( 1116378   252586)
-> found max.
[0][1] working set size found: 5473488, cost: 1958482
-> [0][2][  65536]   0.0 [  0.0] (0): (9187 4593)
-> [0][2][  72817]   0.0 [  0.0] (0): (   13719 4562)
-> [0][2][  80907]   0.0 [  0.0] (0): (8008 5136)
-> [0][2][  89896]   0.0 [  0.0] (0): (7924 2610)
-> [0][2][  99884]   0.0 [  0.0] (0): (2640 3947)
-> found max.
[0][2] working set size found: 72817, cost: 13719
migration: max_cache_size: 0, cpu: 2666 MHz:
migration_cost=27,3916
migration: 0 seconds



-> [0][1][  65536]   0.0 [  0.0] (1): (   3635518177)
-> [0][1][  72817]   0.0 [  0.0] (1): (   4910715464)
-> [0][1][  80907]   0.0 [  0.0] (1): (   5712411740)
-> [0][1][  89896]   0.0 [  0.0] (1): (   57766 6191)
-> [0][1][  99884]   0.0 [  0.0] (1): (   7232410374)
-> [0][1][ 110982]   0.0 [  0.0] (1): (   80125 9087)
-> [0][1][ 123313]   0.0 [  0.0] (1): (   74042 7585)
-> [0][1][ 137014]   0.0 [  0.0] (1): (   78227 5885)
-> [0][1][ 152237]   0.0 [  0.0] (1): (   80157 3907)
-> [0][1][ 169152]   0.0 [  0.0] (1): (   1 6315)
-> [0][1][ 187946]   0.0 [  0.0] (1): (   99344 8389)
-> [0][1][ 208828]   0.1 [  0.1] (1): (  104998 7021)
-> [0][1][ 232031]   0.1 [  0.1] (1): (  113660 7841)
-> [0][1][ 257812]   0.1 [  0.1] (1): (  124690 9435)
-> [0][1][ 286457]   0.1 [  0.1] (1): (  13583510290)
-> [0][1][ 318285]   0.1 [  0.1] (1): (  15313513795)
-> [0][1][ 353650]   0.1 [  0.1] (1): (  14502410953)
-> [0][1][ 392944]   0.2 [  0.2] (1): (  21599840963)
-> [0][1][ 436604]   0.2 [  0.2] (1): (  21208622437)
-> [0][1][ 485115]   0.2 [  0.2] (1): (  25063230491)
-> [0][1][ 539016]   0.2 [  0.2] (1): (  22101430054)
-> [0][1][ 598906]   0.2 [  0.2] 

Re: [RFC] Heads up on a series of AIO patchsets

2007-01-02 Thread Zach Brown

Sorry for the delay, I'm finally back from the holiday break :)

(1) The filesystem AIO patchset, attempts to address one part of  
the problem
which is to make regular file IO, (without O_DIRECT)  
asynchronous (mainly
the case of reads of uncached or partially cached files, and  
O_SYNC writes).


One of the properties of the currently implemented EIOCBRETRY aio  
path is that ->mm is the only field in current which matches the  
submitting task_struct while inside the retry path.


It looks like a retry-based aio write path would be broken because of  
this.  generic_write_checks() could run in the aio thread and get its  
task_struct instead of that of the submitter.  The wrong rlimit will  
be tested and SIGXFSZ won't be raised.  remove_suid() could check the  
capabilities of the aio thread instead of those of the submitter.


I don't think EIOCBRETRY is the way to go because of this increased  
(and subtle!) complexity.  What are the chances that we would have  
ever found those bugs outside code review?  How do we make sure that  
current references don't sneak back in after having initially audited  
the paths?


Take the io_cmd_epoll_wait patch..

issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach  
Brown with
modifications from Jeff Moyer and me) addresses this problem  
for native

linux aio in a simple manner.


It's simple looking, sure.  This current flipping didn't even occur  
to me while throwing the patch together!


But that patch ends up calling ->poll (and poll_table->qproc) and  
writing to userspace (so potentially calling ->nopage) from the aio  
threads.  Are we sure that none of them will behave surprisingly  
because current changed under them?


It might be safe now, but that isn't really the point.  I'd rather we  
didn't have yet one more subtle invariant to audit and maintain.


At the risk of making myself vulnerable to the charge of mentioning  
vapourware, I will admit that I've been working on a (slightly mad)  
implementation of async syscalls.  I've been quiet about it because I  
don't want to whip up complicated discussion without being able to  
show code that works, even if barely.  I mention it now only to make  
it clear that I want to be constructive, not just critical :).


- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc: vio of_node_put cleanup

2007-01-02 Thread Mariusz Kozlowski
Hello Segher, 

> The comment used to be inside the "if" block, is this
> change correct?

You'd prefer an empty line in there?

> [And, do we want all these changes anyway?  I don't care
> either way, both sides have their pros and their cons --
> just asking :-) ]

You know my opinion already :-)

-- 
Regards,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: fix combined mode (was Re: Happy New Year (and v2.6.20-rc3 released))

2007-01-02 Thread Jeff Garzik

Or maybe this rephrase helps:

Regardless of how the IDE quirks have configured the PCI BARs, libata is 
written to assume that /all/ struct pci_dev resources for a single PCI 
device are reserved to the libata driver.


Thus, if you avoid calling pci_request_regions (as your patch does), you 
must manually provide the same guarantees that pci_request_regions 
provides to its callers.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >