date:20050121

[PATCH] PPC: fix stack alignment for signal handlers

2005-01-21 Thread Roland McGrath

Both the PPC32 and PPC64 ABIs specify that the stack should be kept aligned
to 16 bytes.  However, signal handlers on PPC64 are getting run with the
stack misaligned (sp % 16 == 8).  This patch fixes that by ensuring that
the signal frame allocated is a multiple of 16 bytes.  The PPC32 signal
frame structures are already sized appropriately, though it may be wise to
put an __attribute__ on them as well to make sure they stay that way.

In addition to the PPC64 signal frame itself being of misaligned size, the
explicit alignment of the starting stack pointer is also to 8 instead of
16.  I've corrected this as well, so signal frames are aligned even if the
interrupted registers contained a misaligned stack pointer.  

For PPC32 signal handlers, while the frame itself was of properly aligned
size, no alignment of the starting stack pointer was done at all, so that a
signal handler can still get a misaligned stack pointer if the interrupted
registers had one, though the kernel isn't gratuitously misaligning good
ones like it is for PPC64.  I added explicit alignment to fix that.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>

--- linux-2.6/arch/ppc64/kernel/signal.c
+++ linux-2.6/arch/ppc64/kernel/signal.c
@@ -67,7 +67,7 @@ struct rt_sigframe {
struct siginfo info;
/* 64 bit ABI allows for 288 bytes below sp before decrementing it. */
char abigap[288];
-};
+} __attribute__ ((aligned (16)));
 
 
 /*
@@ -254,7 +254,7 @@ static inline void __user * get_sigframe
newsp = (current->sas_ss_sp + current->sas_ss_size);
}
 
-return (void __user *)((newsp - frame_size) & -8ul);
+return (void __user *)((newsp - frame_size) & -16ul);
 }
 
 /*
--- linux-2.6/arch/ppc64/kernel/signal32.c
+++ linux-2.6/arch/ppc64/kernel/signal32.c
@@ -626,9 +626,12 @@ static int handle_rt_signal32(unsigned l
 {
struct rt_sigframe32 __user *rt_sf;
struct mcontext32 __user *frame;
-   unsigned long origsp = newsp;
+   unsigned long origsp;
compat_sigset_t c_oldset;
 
+   newsp &= -16UL;  /* Force the stack to be aligned properly.  */
+   origsp = newsp;
+
/* Set up Signal Frame */
/* Put a Real Time Context onto stack */
newsp -= sizeof(*rt_sf);
@@ -799,7 +802,10 @@ static int handle_signal32(unsigned long
 {
struct sigcontext32 __user *sc;
struct sigregs32 __user *frame;
-   unsigned long origsp = newsp;
+   unsigned long origsp;
+
+   newsp &= -16UL;  /* Force the stack to be aligned properly.  */
+   origsp = newsp;
 
/* Set up Signal Frame */
newsp -= sizeof(struct sigregs32);
--- linux-2.6/arch/ppc/kernel/signal.c
+++ linux-2.6/arch/ppc/kernel/signal.c
@@ -366,7 +366,10 @@ handle_rt_signal(unsigned long sig, stru
 {
struct rt_sigframe __user *rt_sf;
struct mcontext __user *frame;
-   unsigned long origsp = newsp;
+   unsigned long origsp;
+
+   newsp &= -16UL;  /* Force the stack to be aligned properly.  */
+   origsp = newsp;
 
/* Set up Signal Frame */
/* Put a Real Time Context onto stack */
@@ -609,7 +612,10 @@ handle_signal(unsigned long sig, struct 
 {
struct sigcontext __user *sc;
struct sigregs __user *frame;
-   unsigned long origsp = newsp;
+   unsigned long origsp;
+
+   newsp &= -16UL;  /* Force the stack to be aligned properly.  */
+   origsp = newsp;
 
/* Set up Signal Frame */
newsp -= sizeof(struct sigregs);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pollable Semaphores

2005-01-21 Thread Ulrich Drepper

On Fri, 21 Jan 2005 23:05:04 -0800, Chris Wright <[EMAIL PROTECTED]> wrote:
> Yeah, here it is.  I refreshed it against a current kernel.  It passes my
> same old test, where I select on /proc//status fd in exceptfds.

Looks certainly attractive to me.  Nice small patch.  How quickly
after the death of the process is proc_pid_flush() called?

If this could go in and the futex stuff is handled, there is "only"
async I/O to handle.  After that we could finally create a uniform
event mechanism at userlevel which binds all these events (I/O,
process/thread termination, sync primitives) together.  Maybe support
for legacy sync primitives (SysV semaphores, msg queues) is needed as
well, don't know yet.  Note that I assume that polling of POSIX
mqueues works as it did the last time I tried it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64: Trivial Cleanup: EEH_REGION

2005-01-21 Thread Paul Mackerras

This patch is originally from Linas Vepstas <[EMAIL PROTECTED]>.

This is a dumb, dorky cleanup patch:
Per last round of emails, the concept of EEH_REGION is gone, 
but a few stubs remained.  This patch removes them.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/mm/hash_utils.c test/arch/ppc64/mm/hash_utils.c
--- linux-2.5/arch/ppc64/mm/hash_utils.c2005-01-06 13:13:08.0 
+1100
+++ test/arch/ppc64/mm/hash_utils.c 2005-01-22 16:42:48.0 +1100
@@ -294,12 +294,6 @@
vsid = get_kernel_vsid(ea);
break;
 #if 0
-   case EEH_REGION_ID:
-   /*
-* Should only be hit if there is an access to MMIO space
-* which is protected by EEH.
-* Send the problem up to do_page_fault 
-*/
case KERNEL_REGION_ID:
/*
 * Should never get here - entire 0xC0... region is bolted.
diff -urN linux-2.5/arch/ppc64/mm/slb.c test/arch/ppc64/mm/slb.c
--- linux-2.5/arch/ppc64/mm/slb.c   2005-01-06 13:13:08.0 +1100
+++ test/arch/ppc64/mm/slb.c2005-01-22 16:44:26.0 +1100
@@ -78,7 +78,7 @@
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
unsigned long offset = get_paca()->slb_cache_ptr;
-   unsigned long esid_data;
+   unsigned long esid_data = 0;
unsigned long pc = KSTK_EIP(tsk);
unsigned long stack = KSTK_ESP(tsk);
unsigned long unmapped_base;
@@ -97,11 +97,8 @@
}
 
/* Workaround POWER5 < DD2.1 issue */
-   if (offset == 1 || offset > SLB_CACHE_ENTRIES) {
-   /* flush segment in EEH region, we shouldn't ever
-* access addresses in this region. */
-   asm volatile("slbie %0" : : "r"(EEHREGIONBASE));
-   }
+   if (offset == 1 || offset > SLB_CACHE_ENTRIES)
+   asm volatile("slbie %0" : : "r" (esid_data));
 
get_paca()->slb_cache_ptr = 0;
get_paca()->context = mm->context;
diff -urN linux-2.5/include/asm-ppc64/page.h test/include/asm-ppc64/page.h
--- linux-2.5/include/asm-ppc64/page.h  2005-01-06 13:13:10.0 +1100
+++ test/include/asm-ppc64/page.h   2005-01-22 16:42:48.0 +1100
@@ -205,10 +205,8 @@
 #define KERNELBASE  PAGE_OFFSET
 #define VMALLOCBASE ASM_CONST(0xD000)
 #define IOREGIONBASEASM_CONST(0xE000)
-#define EEHREGIONBASE   ASM_CONST(0xA000)
 
 #define IO_REGION_ID   (IOREGIONBASE>>REGION_SHIFT)
-#define EEH_REGION_ID  (EEHREGIONBASE>>REGION_SHIFT)
 #define VMALLOC_REGION_ID  (VMALLOCBASE>>REGION_SHIFT)
 #define KERNEL_REGION_ID   (KERNELBASE>>REGION_SHIFT)
 #define USER_REGION_ID (0UL)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dynamic tick patch

2005-01-21 Thread George Anzinger

Zwane Mwaikambo wrote:
Hello George,
On Fri, 21 Jan 2005, George Anzinger wrote:

The VST patch on sourceforge
(http://sourceforge.net/projects/high-res-timers/) uses the local apic timer
to do the wake up.  This is the same timer that is used for the High Res work.

I've been meaning to look into it, although it's quite a bit of work going 
through all the extra code from the highres timer patch.
Well, really all it uses is the HR timer.  The rest of HRT is not really used 
for VST.  (Unless, of course, you are refering to the work over of the tsc timer 
tick code.)

-g
Thanks,
Zwane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

Con Kolivas wrote:
Con Kolivas wrote:
Jack O'Quin wrote:
Con Kolivas <[EMAIL PROTECTED]> writes:

Here's fresh results on more stressed hardware (on ext3) with
2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load
hovering at 50% spikes at times close to 70 which tests the behaviour
under iso throttling.


What version of JACK are you running (`jackd --version')?
You're still getting zero Delay Max.  That is an important measure.
Ok updated jackd
So let's try again, sorry about the noise:
==> jack_test4-2.6.11-rc1-mm2-fifo.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . : 3
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 20161   usecs
Cycle Maximum . . . . . . . . :  1072   usecs
Average DSP Load. . . . . . . :47.2 %
Average CPU System Load . . . : 5.1 %
Average CPU User Load . . . . :18.0 %
Average CPU Nice Load . . . . : 0.1 %
Average CPU I/O Wait Load . . : 0.3 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1701.6 /sec
Average Context-Switch Rate . : 19343.7 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
==> jack_test4-2.6.11-rc1-mm2-iso.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . : 6
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . :  4604   usecs
Cycle Maximum . . . . . . . . :  1190   usecs
Average DSP Load. . . . . . . :54.5 %
Average CPU System Load . . . :11.6 %
Average CPU User Load . . . . :18.4 %
Average CPU Nice Load . . . . : 0.1 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1697.9 /sec
Average Context-Switch Rate . : 19046.2 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
Pretty pictures:
http://ck.kolivas.org/patches/SCHED_ISO/iso2-benchmarks/
Note these are on a full desktop environment, although it is pretty much 
idle apart from checking email. No changes between fifo and iso runs.

Cheers,
Con


signature.asc
Description: OpenPGP digital signature

Re: Pollable Semaphores

2005-01-21 Thread Chris Wright

* Chris Wright ([EMAIL PROTECTED]) wrote:
> * Ulrich Drepper ([EMAIL PROTECTED]) wrote:
> > And is another thing to consider.  There is at least one other event
> > which should be pollable: process (maybe threads) deaths.  I was
> > hoping that we get support for this, perhaps in the form of polling
> > the /proc/PID directory.  For poll(), a POLLERR value could mean the
> > process/thread died.  For select(), once again a  bit in the except
> > array could be set.
> 
> I have a simple patch that does just that.  It worked after brief testing,
> then I never went back to look at it any more.  I'll see if I can't dig
> it up, maybe it's useful.

Yeah, here it is.  I refreshed it against a current kernel.  It passes my
same old test, where I select on /proc//status fd in exceptfds.

= fs/proc/base.c 1.86 vs edited =
--- 1.86/fs/proc/base.c 2005-01-10 17:29:31 -08:00
+++ edited/fs/proc/base.c   2005-01-21 22:51:00 -08:00
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 /*
@@ -519,8 +520,21 @@ static ssize_t proc_info_read(struct fil
return length;
 }
 
+static unsigned int proc_info_poll(struct file *file, poll_table *wait)
+{
+   struct inode *inode = file->f_dentry->d_inode;
+   struct task_struct *task = proc_task(inode);
+   wait_queue_head_t *pid_wait = _I(task->proc_dentry->d_inode)->wait;
+
+   poll_wait(file, pid_wait, wait);
+   if (!pid_alive(task))
+   return POLLPRI;
+   return 0;
+}
+
 static struct file_operations proc_info_file_operations = {
.read   = proc_info_read,
+   .poll   = proc_info_poll,
 };
 
 static int mem_open(struct inode* inode, struct file* file)
@@ -1489,6 +1503,8 @@ void proc_pid_flush(struct dentry *proc_
 {
might_sleep();
if(proc_dentry != NULL) {
+   wait_queue_head_t *pid_wait=_I(proc_dentry->d_inode)->wait;
+   wake_up_interruptible(pid_wait);
shrink_dcache_parent(proc_dentry);
dput(proc_dentry);
}
= fs/proc/inode.c 1.31 vs edited =
--- 1.31/fs/proc/inode.c2005-01-04 18:48:14 -08:00
+++ edited/fs/proc/inode.c  2005-01-21 22:48:06 -08:00
@@ -97,6 +97,7 @@ static struct inode *proc_alloc_inode(st
ei->type = 0;
ei->op.proc_get_link = NULL;
ei->pde = NULL;
+   init_waitqueue_head(>wait);
inode = >vfs_inode;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
return inode;
= include/linux/proc_fs.h 1.41 vs edited =
--- 1.41/include/linux/proc_fs.h2005-01-07 21:44:33 -08:00
+++ edited/include/linux/proc_fs.h  2005-01-21 22:48:06 -08:00
@@ -243,6 +243,7 @@ struct proc_inode {
int (*proc_read)(struct task_struct *task, char *page);
} op;
struct proc_dir_entry *pde;
+   wait_queue_head_t wait;
struct inode vfs_inode;
 };
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

Con Kolivas wrote:
Jack O'Quin wrote:
Con Kolivas <[EMAIL PROTECTED]> writes:

Here's fresh results on more stressed hardware (on ext3) with
2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load
hovering at 50% spikes at times close to 70 which tests the behaviour
under iso throttling.

What version of JACK are you running (`jackd --version')?
You're still getting zero Delay Max.  That is an important measure.

Ok updated jackd
Here's an updated set of runs. Not very impressive even with SCHED_FIFO, 
but the same from both policies.

==> jack_test4-2.6.11-rc1-mm2-fifo.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :   404
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 261254   usecs
Cycle Maximum . . . . . . . . :  2701   usecs
Average DSP Load. . . . . . . :52.4 %
Average CPU System Load . . . : 5.1 %
Average CPU User Load . . . . :18.1 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1699.3 /sec
Average Context-Switch Rate . : 19018.9 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
==> jack_test4-2.6.11-rc1-mm2-iso.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :   408
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 269804   usecs
Cycle Maximum . . . . . . . . :  2449   usecs
Average DSP Load. . . . . . . :52.6 %
Average CPU System Load . . . : 5.0 %
Average CPU User Load . . . . :17.8 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.1 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1699.2 /sec
Average Context-Switch Rate . : 19041.0 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
Bah stupid me. Both of those are SCHED_NORMAL.
Ignore those, and I'll try again.
Con


signature.asc
Description: OpenPGP digital signature

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

Jack O'Quin wrote:
Con Kolivas <[EMAIL PROTECTED]> writes:

Here's fresh results on more stressed hardware (on ext3) with
2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load
hovering at 50% spikes at times close to 70 which tests the behaviour
under iso throttling.

What version of JACK are you running (`jackd --version')?
You're still getting zero Delay Max.  That is an important measure.
Ok updated jackd
Here's an updated set of runs. Not very impressive even with SCHED_FIFO, 
but the same from both policies.

==> jack_test4-2.6.11-rc1-mm2-fifo.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :   404
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 261254   usecs
Cycle Maximum . . . . . . . . :  2701   usecs
Average DSP Load. . . . . . . :52.4 %
Average CPU System Load . . . : 5.1 %
Average CPU User Load . . . . :18.1 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1699.3 /sec
Average Context-Switch Rate . : 19018.9 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
==> jack_test4-2.6.11-rc1-mm2-iso.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :   408
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 269804   usecs
Cycle Maximum . . . . . . . . :  2449   usecs
Average DSP Load. . . . . . . :52.6 %
Average CPU System Load . . . : 5.0 %
Average CPU User Load . . . . :17.8 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.1 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1699.2 /sec
Average Context-Switch Rate . : 19041.0 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
I've updated the pretty graphs and removed the dud runs from here:
http://ck.kolivas.org/patches/SCHED_ISO/iso2-benchmarks/
Con


signature.asc
Description: OpenPGP digital signature

Re: OOM fixes 1/5

2005-01-21 Thread Andrea Arcangeli

I noticed 1/5 had a glitch, this is an update. It won't alter the
ordering, the other patches will still apply cleanly.

Thanks.

From: [EMAIL PROTECTED]
Subject: protect-pids

This is protect-pids, a patch to allow the admin to tune the oom killer.
The tweak is inherited between parent and child so it's easy to write a
wrapper for complex apps.

I made used_math a char at the light of later patches. Current patch
breaks alpha, but future patches will fix it.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

--- x/fs/proc/base.c2005-01-15 20:44:58.0 +0100
+++ xx/fs/proc/base.c   2005-01-22 07:02:50.0 +0100
@@ -72,6 +72,8 @@ enum pid_directory_inos {
PROC_TGID_ATTR_FSCREATE,
 #endif
PROC_TGID_FD_DIR,
+   PROC_TGID_OOM_SCORE,
+   PROC_TGID_OOM_ADJUST,
PROC_TID_INO,
PROC_TID_STATUS,
PROC_TID_MEM,
@@ -98,6 +100,8 @@ enum pid_directory_inos {
PROC_TID_ATTR_FSCREATE,
 #endif
PROC_TID_FD_DIR = 0x8000,   /* 0x8000-0x */
+   PROC_TID_OOM_SCORE,
+   PROC_TID_OOM_ADJUST,
 };
 
 struct pid_entry {
@@ -133,6 +137,8 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_SCHEDSTATS
E(PROC_TGID_SCHEDSTAT, "schedstat", S_IFREG|S_IRUGO),
 #endif
+   E(PROC_TGID_OOM_SCORE, "oom_score",S_IFREG|S_IRUGO),
+   E(PROC_TGID_OOM_ADJUST,"oom_adj", S_IFREG|S_IRUGO|S_IWUSR),
{0,0,NULL,0}
 };
 static struct pid_entry tid_base_stuff[] = {
@@ -158,6 +164,8 @@ static struct pid_entry tid_base_stuff[]
 #ifdef CONFIG_SCHEDSTATS
E(PROC_TID_SCHEDSTAT, "schedstat",S_IFREG|S_IRUGO),
 #endif
+   E(PROC_TID_OOM_SCORE,  "oom_score",S_IFREG|S_IRUGO),
+   E(PROC_TID_OOM_ADJUST, "oom_adj", S_IFREG|S_IRUGO|S_IWUSR),
{0,0,NULL,0}
 };
 
@@ -384,6 +392,18 @@ static int proc_pid_schedstat(struct tas
 }
 #endif
 
+/* The badness from the OOM killer */
+unsigned long badness(struct task_struct *p, unsigned long uptime);
+static int proc_oom_score(struct task_struct *task, char *buffer)
+{
+   unsigned long points;
+   struct timespec uptime;
+
+   do_posix_clock_monotonic_gettime();
+   points = badness(task, uptime.tv_sec);
+   return sprintf(buffer, "%lu\n", points);
+}
+
 //
 /*   Here the fs part begins*/
 //
@@ -657,6 +677,56 @@ static struct file_operations proc_mem_o
.open   = mem_open,
 };
 
+static ssize_t oom_adjust_read(struct file * file, char * buf,
+   size_t count, loff_t *ppos)
+{
+   struct task_struct *task = proc_task(file->f_dentry->d_inode);
+   char buffer[8];
+   size_t len;
+   int oom_adjust = task->oomkilladj;
+   loff_t __ppos = *ppos;
+
+   len = sprintf(buffer, "%i\n", oom_adjust);
+   if (__ppos >= len)
+   return 0;
+   if (count > len-__ppos)
+   count = len-__ppos;
+   if (copy_to_user(buf, buffer + __ppos, count)) 
+   return -EFAULT;
+   *ppos = __ppos + count;
+   return count;
+}
+
+static ssize_t oom_adjust_write(struct file * file, const char * buf,
+   size_t count, loff_t *ppos)
+{
+   struct task_struct *task = proc_task(file->f_dentry->d_inode);
+   char buffer[8], *end;
+   int oom_adjust;
+
+   if (!capable(CAP_SYS_RESOURCE))
+   return -EPERM;
+   memset(buffer, 0, 8);   
+   if (count > 6)
+   count = 6;
+   if (copy_from_user(buffer, buf, count)) 
+   return -EFAULT;
+   oom_adjust = simple_strtol(buffer, , 0);
+   if (oom_adjust < -16 || oom_adjust > 15)
+   return -EINVAL;
+   if (*end == '\n')
+   end++;
+   task->oomkilladj = oom_adjust;
+   if (end - buffer == 0) 
+   return -EIO;
+   return end - buffer;
+}
+
+static struct file_operations proc_oom_adjust_operations = {
+   read:   oom_adjust_read,
+   write:  oom_adjust_write,
+};
+
 static struct inode_operations proc_mem_inode_operations = {
.permission = proc_permission,
 };
@@ -1336,6 +1406,15 @@ static struct dentry *proc_pident_lookup
ei->op.proc_read = proc_pid_schedstat;
break;
 #endif
+   case PROC_TID_OOM_SCORE:
+   case PROC_TGID_OOM_SCORE:
+   inode->i_fop = _info_file_operations;
+   ei->op.proc_read = proc_oom_score;
+   break;
+   case PROC_TID_OOM_ADJUST:
+   case PROC_TGID_OOM_ADJUST:
+   inode->i_fop = _oom_adjust_operations;
+   break;
default:
printk("procfs: impossible type (%d)",p->type);
iput(inode);
---

Re: Linux 2.6.11-rc2

2005-01-21 Thread Udo A. Steinberg

On Fri, 21 Jan 2005 18:13:55 -0800 (PST) Linus Torvalds (LT) wrote:

LT> Ok, trying to calm things down again for a 2.6.11 release.

Connection tracking does not compile...

 CC  net/ipv4/netfilter/ip_conntrack_standalone.o
In file included from net/ipv4/netfilter/ip_conntrack_standalone.c:34:
include/linux/netfilter_ipv4/ip_conntrack.h:135: warning: "struct ip_conntrack" 
declared inside parameter list
include/linux/netfilter_ipv4/ip_conntrack.h:135: warning: its scope is only 
this definition or declaration, which is probably not what you want
include/linux/netfilter_ipv4/ip_conntrack.h:305: warning: "enum 
ip_nat_manip_type" declared inside parameter list
include/linux/netfilter_ipv4/ip_conntrack.h:306: error: parameter `manip' has 
incomplete type
include/linux/netfilter_ipv4/ip_conntrack.h: In function `ip_nat_initialized':
include/linux/netfilter_ipv4/ip_conntrack.h:307: error: `IP_NAT_MANIP_SRC' 
undeclared (first use in this function)
include/linux/netfilter_ipv4/ip_conntrack.h:307: error: (Each undeclared 
identifier is reported only once
include/linux/netfilter_ipv4/ip_conntrack.h:307: error: for each function it 
appears in.)


-Udo.


pgpvBTo55ykQ4.pgp
Description: PGP signature

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

Jack O'Quin wrote:
Con Kolivas <[EMAIL PROTECTED]> writes:

Here's fresh results on more stressed hardware (on ext3) with
2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load
hovering at 50% spikes at times close to 70 which tests the behaviour
under iso throttling.

What version of JACK are you running (`jackd --version')?
You're still getting zero Delay Max.  That is an important measure.
Oops I haven't updated it on this machine.
jackd version 0.99.0 tmpdir /tmp protocol 13
Con


signature.asc
Description: OpenPGP digital signature

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Jack O'Quin

Con Kolivas <[EMAIL PROTECTED]> writes:

> Here's fresh results on more stressed hardware (on ext3) with
> 2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load
> hovering at 50% spikes at times close to 70 which tests the behaviour
> under iso throttling.

What version of JACK are you running (`jackd --version')?

You're still getting zero Delay Max.  That is an important measure.
-- 
  joq
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Jack O'Quin

Con Kolivas <[EMAIL PROTECTED]> writes:

> As for priority support, I have been working on it. While the test
> cases I've been involved in show no need for it, I can understand why
> it would be desirable.

Yes.  Rui's jack_test3.2 does not require multiple realtime
priorities, but I can point to applications that do.  Their reasons
for working that way make sense and should be supported.

For example, the JACK Audio Mastering interface (JAMin) does a Fast
Fourier Transform on the audio for phase-neutral frequency domain
crossover and EQ processing.  This is very CPU intensive, but modern
processors can handle it and the sound is outstanding.  The FFT
algorithm uses a moving window with a natural block size of 256
frames.  When the JACK buffer size is large enough, JAMin performs
this operation directly in the process callback.

When the JACK buffer size is smaller than 256 frames that won't work.
So, JAMin queues the audio to a realtime helper thread running at a
priority one less than the JACK process thread.  So, when JACK is
running at 64 frames per cycle (the jack_test3.2 default), JAMin's FFT
thread will have four process cycles in which to compute its next FFT
window.  This adds latency, but permits the application to work even
when the overall JACK graph is running at rather low latencies.  If
the scheduler were to run that thread at the same priority as the JACK
process thread, it would practically guarantee xruns.  This would
cause JAMin to be unfairly ejected from the JACK graph for failing to
meet its realtime deadlines.

So, there are legitimate examples of realtime applications needing to
use more than one scheduler priority.
-- 
  joq
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pollable Semaphores

2005-01-21 Thread Chris Wright

* Ulrich Drepper ([EMAIL PROTECTED]) wrote:
> And is another thing to consider.  There is at least one other event
> which should be pollable: process (maybe threads) deaths.  I was
> hoping that we get support for this, perhaps in the form of polling
> the /proc/PID directory.  For poll(), a POLLERR value could mean the
> process/thread died.  For select(), once again a  bit in the except
> array could be set.

I have a simple patch that does just that.  It worked after brief testing,
then I never went back to look at it any more.  I'll see if I can't dig
it up, maybe it's useful.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in __cpu_up

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with msleep to simplify the code and to
express the delay in milliseconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/smp.c 2005-01-15 16:55:41.0 
-0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/smp.c   2005-01-15 17:30:16.0 
-0800
@@ -459,8 +459,7 @@ int __devinit __cpu_up(unsigned int cpu)
 * hotplug case.  Wait five seconds.
 */
for (c = 25; c && !cpu_callin_map[cpu]; c--) {
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(HZ/5);
+   msleep(200);
}
 #endif
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in iSeries_pci_reset

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with msleep to simplify the code and to
express the delay in milliseconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/iSeries_pci_reset.c   2005-01-15 
16:55:41.0 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 
17:17:54.0 -0800
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -49,7 +50,7 @@
 int iSeries_Device_ToggleReset(struct pci_dev *PciDev, int AssertTime,
int DelayTime)
 {
-   unsigned long AssertDelay, WaitDelay;
+   unsigned int AssertDelay, WaitDelay;
struct iSeries_Device_Node *DeviceNode =
(struct iSeries_Device_Node *)PciDev->sysdata;
 
@@ -62,14 +63,14 @@ int iSeries_Device_ToggleReset(struct pc
 * Set defaults, Assert is .5 second, Wait is 3 seconds.
 */
if (AssertTime == 0)
-   AssertDelay = (5 * HZ) / 10;
+   AssertDelay = 500;
else
-   AssertDelay = (AssertTime * HZ) / 10;
+   AssertDelay = AssertTime * 100;
 
if (DelayTime == 0)
-   WaitDelay = (30 * HZ) / 10;
+   WaitDelay = 3000;
else
-   WaitDelay = (DelayTime * HZ) / 10;
+   WaitDelay = DelayTime * 100;
 
/*
 * Assert reset
@@ -77,8 +78,7 @@ int iSeries_Device_ToggleReset(struct pc
DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
0x00, DeviceNode->AgentId, 1);
if (DeviceNode->ReturnCode == 0) {
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(AssertDelay);   /* Sleep for the time */
+   msleep(AssertDelay);/* Sleep for the time */
DeviceNode->ReturnCode =
HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
0x00, DeviceNode->AgentId, 0);
@@ -86,8 +86,7 @@ int iSeries_Device_ToggleReset(struct pc
/*
 * Wait for device to reset
 */
-   set_current_state(TASK_UNINTERRUPTIBLE);  
-   schedule_timeout(WaitDelay);
+   msleep(WaitDelay);
}
if (DeviceNode->ReturnCode == 0)
PCIFR("Slot 0x%04X.%02 Reset\n", ISERIES_BUS(DeviceNode),
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in die

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with ssleep to simplify the code and to
express the delay in seconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/traps.c   2005-01-15 16:55:41.0 
-0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/traps.c 2005-01-15 17:30:39.0 
-0800
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -137,8 +138,7 @@ int die(const char *str, struct pt_regs 
 
if (panic_on_oops) {
printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(5 * HZ);
+   ssleep(5);
panic("Fatal exception");
}
do_exit(SIGSEGV);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [LTP] Re: [Dev] Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread Chris Wright

* Bryce Harrington ([EMAIL PROTECTED]) wrote:
> Well, I'm not having much luck.  strace isn't installed on the system
> (and is giving errors when trying to compile it).  Also, the ssh session
> (and sshd) quits whenever I try running the following growfiles command
> manually, so I'm having trouble replicating the kernel panic manually.

Sounds very much like oom killer gone nuts.

> # growfiles -W gf14 -b -e 1 -u -i 0 -L 20 -w -l -C 1 -T 10 glseek19 glseek19.2
> 
> Anyway, if anyone wants to investigate this further, I can provide
> access to the machine (email me).  Otherwise, I'm probably just going to
> wait for -rc2 and see if the problem's still there.

Wait no longer, it's here ;-)

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loopback mounting from a file with a partition table?

2005-01-21 Thread kernel

On Fri, 2005-01-21 at 20:45, Dan Stromberg wrote:
> Has anyone tried loopback mounting individual partitions from within a
> file that contains a partition table?
> 

Yes, lots of folks.


> When I mount -o loop the file, I seem to get the first partition in the
> file, but I don't see anything in the man page for mount that indicates a
> way of getting any other partitions from a file with a partition table.
> 
> Any comments?

Sure, find the starting offset for the filesystem and pass that to mount
via the '-o offset=XXX' flag.


regards,

-fd

www.farmerdude.com




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 Clear MSR_RI earlier in syscall exit path

2005-01-21 Thread Paul Mackerras

This patch is from Craig Chaney <[EMAIL PROTECTED]>.

This patch moves the restoring of the stack pointer in the system call
exit path to after the point where we clear the RI (recoverable
interrupt) bit in the MSR.  Normally, loading the stack pointer before
clearing RI doesn't cause any problem because there is no trap that
can normally occur in between.  But if we are tracing the code using a
tool that single-steps instructions, this can cause a problem.  In
this case, clearing RI serves as an indication that the following code
can't be safely single-stepped.

Signed-off-by: Craig Chaney <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -Naur clean/arch/ppc64/kernel/entry.S edited/arch/ppc64/kernel/entry.S
--- clean/arch/ppc64/kernel/entry.S 2004-09-26 14:24:27.0 +
+++ edited/arch/ppc64/kernel/entry.S2004-09-27 14:36:29.221308744 +
@@ -185,10 +185,10 @@
beq-1f  /* only restore r13 if */
ld  r13,GPR13(r1)   /* returning to usermode */
 1: ld  r2,GPR2(r1)
-   ld  r1,GPR1(r1)
li  r12,MSR_RI
andcr10,r10,r12
mtmsrd  r10,1   /* clear MSR.RI */
+   ld  r1,GPR1(r1)
mtlrr4
mtcrr5
mtspr   SRR0,r7
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

utz lehmann wrote:
On Sat, 2005-01-22 at 10:48 +1100, Con Kolivas wrote:
utz lehmann wrote:
Hi
I dislike the behavior of the SCHED_ISO patch that iso tasks are
degraded to SCHED_NORMAL if they exceed the limit.
IMHO it's better to throttle them at the iso_cpu limit.
I have modified Con's iso2 patch to do this. If iso_cpu > 50 iso tasks
only get stalled for 1 tick (1ms on x86).
Some tasks are so cache intensive they would make almost no forward 
progress running for only 1ms.

Ok. The throttle duration can be exceed.
What is a good value? 5ms, 10ms?
It's architecture and cpu dependant. Useful timeslices to avoid cache 
trashing vary from 2ms to 20ms. Also HZ varies between architectures and 
setups, and almost certainly will vary in some dynamic way in the future 
altering substantially the accuracy of such a setup.

Fortunately there is a currently unused task prio (MAX_RT_PRIO-1) [1]. I
Your implementation is not correct. The "prio" field of real time tasks 
is determined by MAX_RT_PRIO-1-rt_priority. Therefore you're limiting 
the best real time priority, not the other way around.

Really? The task prios are (lower value is higher priority):
0
..  For SCHED_FIFO/SCHED_RR (rt_priority 99..1)
98  MAX_RT_PRIO-2
99  MAX_RT_PRIO-1   ISO_PRIO (rt_priority 0)
100 MAX_RT_PRIO
..  For SCHED_NORMAL
139 MAX_PRIO-1
ISO_PRIO is between the SCHED_FIFO/SCHED_RR and the SCHED_NORMAL range.
I wan't debating that fact. I was saying that decreasing the range of 
priorities you can have for real time will lose the highest priority ones.

if (SCHED_RT(policy))
p->prio = MAX_USER_RT_PRIO-1 - p->rt_priority;
Throttling them for only 1ms will make it very easy to starve the system 
 with 1 or more short running (<1ms) SCHED_NORMAL tasks running. Lower 
priority tasks will never run.
can I also comment on:
+   while (!list_empty(queue)) {
+   next = list_entry(queue->next, task_t, run_list);
+   dequeue_task(next, active);
+   enqueue_task(next, expired);
+   }
O(n) functions are a bad idea in critical codepaths, even if they only 
get hit when there is more than one SCHED_ISO task queued.

Apart from those, I'm not really sure what advantage this different 
design has. Once you go over the cpu limit the behaviour is grey and 
your design basically complicates what is already simple - to make an 
unprivileged task starvation free you run it SCHED_NORMAL. I know you 
want it back to high priority as soon as possible, but I fail to see how 
this is any better. They're either real time or not depending on what 
limits you set in either design.

As for priority support, I have been working on it. While the test cases 
I've been involved in show no need for it, I can understand why it would 
be desirable.

Cheers,
Con


signature.asc
Description: OpenPGP digital signature

Re: Pollable Semaphores

2005-01-21 Thread Ulrich Drepper

On Fri, 21 Jan 2005 17:17:51 -0600, Brent Casavant <[EMAIL PROTECTED]> wrote:

>   2. select/poll on the fd return EWOULDBLOCK if the current value of
>  the futex is not equal to the value of interest.  Otherwise it
>  behaves as FUTEX_FD currently does.

This is the problematic part.  The expected value, as you suggested,
can be handled with a write() and since the expected value is often
constant, this is a low-overhead method.

But the poll() interface is not so easy.  You cannot change the poll()
semantic to return such an error.  It makes really no sense.

What I thought could be done is to define instead a new POLL* constant
which signals the EWOULDBLOCK condition of the futex() syscall in the
revents member.  The poll/epoll syscall would do it's normal work and
just fill all the appropriate revents.  A futex value mismatch would
mean the call is not blocking at all, just as available data would be
for POLLIN.

For select, I would use the exception bitmap.  The bit is set for
futex fds in the EWOULDBLOCK case.

All this _could_ work.  But we've been bitten quite a few times in the
past.  There might be special cases which may need at least some
additional functionality.  This should be taken into account in the
original design.

So, if people are interested in this, code something up and try it. 
Stress it as much as you can.  I would oppose adding any new futex
interface created at a hunch if I'd be Andrew.

And is another thing to consider.  There is at least one other event
which should be pollable: process (maybe threads) deaths.  I was
hoping that we get support for this, perhaps in the form of polling
the /proc/PID directory.  For poll(), a POLLERR value could mean the
process/thread died.  For select(), once again a  bit in the except
array could be set.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli

On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
> Hi Andrew,
> 
> I have another question. You included an oom_adj entry in /proc for
> each process. This was the approach you used in order to allow someone
> or something to interfere the ranking algorithm from userland, right?
> So if i have an another ranking algorithm in user space, I can use it
> to complement the kernel decision as necessary. Was it your idea?

Yes, you should use your userspace algorithm to tune the oom killer via
the oom_adj and you can check the effect of your changes with oom_score.
I posted a one liner ugly script to do that a few days ago on l-k.

The oom_adj has this effect on the badness() code:

/* 
 * Adjust the score by oomkilladj.
 */
if (p->oomkilladj) {
if (p->oomkilladj > 0)
points <<= p->oomkilladj;
else
points >>= -(p->oomkilladj);
}

The biggest the points become, the more likely the task will be choosen
by the oom killer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: seccomp for 2.6.11-rc1-bk8

2005-01-21 Thread Andrea Arcangeli

On Fri, Jan 21, 2005 at 01:31:46PM -0800, Roland McGrath wrote:
> When gdb has a bug, people want to be able to kill it and get on with using
> their program, not have their program always be killed too.

What I need is that the program is killed right away synchronously as
soon as the "debugger" detaches (to me that's a needed feature). No
matter why the debugger detached.  This is the opposite of what
ptrace/strace does right now.

Just try to attach to a task with strace -p, then kill strace with -9,
the task will keep going like if nothing has happened. I need the child
killed too instead (before the parent unptrace the child).

Probably the reason why the app gets killed is that gdb is the ptrace
task is the process leader of the process group like Ingo suggested. But
I'd rather not depend on leaders/groups/pids/signals, when I can do it
with do_exit and a check on the syscall number.

Ptrace does a lot more of what I need, I don't care about parameters or
anything more than the syscall number, I don't need to change the
retvals during syscall return or to check registers or to stop a task.
Even the auditing subsystem could be implemented by putting all tasks
under strace and by having the ptracers communicating with each other
with pipes to generate a global info. But it wouldn't be as reliable and
as simple as having kernel code doing it.

I'm still open to do it with ptrace if there's a consensus on l-k to do
it in that direction, it's probably going to work fine too but if I
didn't feel safer with seccomp I would be doing ptrace in the first
place, it's not like I forgotten I could do it with ptrace too (like
Pavel already reminded me some month ago).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [LTP] Re: [Dev] Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread Bryce Harrington

On Fri, 21 Jan 2005, Bryce Harrington wrote:
> On Fri, 21 Jan 2005, Chris Wright wrote:
> > * Andrew Morton ([EMAIL PROTECTED]) wrote:
> > > Bryce Harrington <[EMAIL PROTECTED]> wrote:
> > > I am unable to find the oops trace amongst all that stuff.  Help?
> > >
> > > (It would have been handy to include it in the bug report, actually)
> >
> > Yes, it would.  Or at least some better granularity leading up to the
> > problem.  I ran growfiles locally on 2.6.11-rc-bk and didn't have any
> > problem.  Could you strace growfiles and see what it was doing when it
> > killed the machine?
>
> Okay, I'll set up another run and try collecting that info.  Is there
> any other data that would be useful to collect while I'm at it?

Well, I'm not having much luck.  strace isn't installed on the system
(and is giving errors when trying to compile it).  Also, the ssh session
(and sshd) quits whenever I try running the following growfiles command
manually, so I'm having trouble replicating the kernel panic manually.

# growfiles -W gf14 -b -e 1 -u -i 0 -L 20 -w -l -C 1 -T 10 glseek19 glseek19.2

Anyway, if anyone wants to investigate this further, I can provide
access to the machine (email me).  Otherwise, I'm probably just going to
wait for -rc2 and see if the problem's still there.

Thanks,
Bryce

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli

On Fri, Jan 21, 2005 at 05:27:11PM -0400, Mauricio Lin wrote:
> Hi Andrea,
> 
> I applied your patch and I am checking your code. It is really a very
> interesting work. I have a question about the function
> __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
> function. Do not you think it would be better put set_current_state
> instead of __set_current_state function? AFAIK the set_current_state
> function is more feasible for SMP systems, right?

set_current_state is needed only when you need to place a memory barrier
after __set_current_state. So it's needed in the usual wait_event loop,
right after registering in the waitqueue. Example:

unsigned long flags;

wait->flags &= ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(>lock, flags);
if (list_empty(>task_list))
__add_wait_queue(q, wait);
/*
 * don't alter the task state if this is just going to
 * queue an async wait queue callback
 */
if (is_sync_wait(wait))
set_current_state(state);
spin_unlock_irqrestore(>lock, flags);

and even in the above is needed only because spin_unlock has inclusive
semantics in ia64. In 2.4 there was no unlock at all after
set_current_state and it was like this:

set_current_state(TASK_UNINTERRUPTIBLE);
\
if (condition)
\
break;
\
schedule();
\

The rule of thumb is that if there's nothing between set_current_state
and schedule() then __set_current_state is more efficient and equally
safe to use. And the oom killer path I posted falls in this category,
nothing in between set_current_state and schedule, so no reason to place
memory barries in there.

Hope this helps ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.4.29] i810_audio: offset LVI from CIV to avoid stalled start

2005-01-21 Thread Herbert Xu

On Fri, Jan 21, 2005 at 09:07:13AM +1100, herbert wrote:
> On Thu, Jan 20, 2005 at 05:01:21PM -0500, John W. Linville wrote:
> > On Thu, Jan 20, 2005 at 04:23:46PM -0500, John W. Linville wrote:
> > 
> > > + /* if we are currently stopped, then our CIV is actually set to our
> > > +  * *last* sg segment and we are ready to wrap to the next.  However,
> > > +  * if we set our LVI to the last sg segment, then it won't wrap to
> > > +  * the next sg segment, it won't even get a start.  So, instead, when
> > > +  * we are stopped, we increment the CIV value to the next sg segment
> > > +  * to be played so that when we call start, things will operate
> > > +  * properly
> > > +  */
> > 
> > Is this (slightly altered) comment more to your liking?  If so,
> > I'll post an additive patch for the 2.6 version...
> 
> IMHO the last sentence is still wrong.  We're not touching the value
> of CIV at all.  We're setting LVI to CIV + 1...
> 
> OTOH, perhaps we should actually try implementing what the comment
> suggests?

OK I dug into the archives and found that the reason we need to do
it this way is because you can't set the value of CIV directly.  So
how about s/CIV/LVI/ in the last sentence?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: possible CPU bug and request for Intel contacts

2005-01-21 Thread Seth, Rohit

Hello Kirill,

Thanks for sending the detailed information. Based on our experiments
and analysis, we believe at this point that this is a known E80 issue
mentioned in the PIII spec update at this location
(http://www.intel.com/design/pentiumiii/specupdt/24445351.pdf)

Could you please try one of the suggested work arounds for this issue.  

Thanks, rohit

Kirill Korotaev  wrote on Friday, January 21, 2005
4:47 AM:

> Hello,
> 
> Here are the details about CPU bug I mentioned in my previous post.
> Though it turned out later that it happens on P-III systems only I
> still 
> hope it can be of interest.
> 
> Brief description
> ~
> 
> This issue was found by Vasily Averin ([EMAIL PROTECTED]) when playing
> with uselib security exploit on kernels with my 4gb split patch.
> 
> This bug results in strange effects such as calltraces below,
> reboots, impossible call traces and so on.
> 
> I started to resolve the bug, narrowed down uselib exploit and
> got a simple testcase for the bug, which can be found in attach.
> This testcase does a simple thing - it maps pages at low addresses
> from 0x0400 downto 0x, page by page and touches them
> for write. Sometimes when running this exploit I got oopses,
> sometimes reboots and I found that this is sensitive to the page
> addresses which exploit maps.
> 
> Why it crashes? I think this is due to virtual addresses of
> kernel code and mapped user space pages overlap. I was able even to
> reboot machine if mapped user space pages were filled with some
> appropriate asm code.
> 
> I found that Ingo Molnar 4gb split is not vulnerable, and after
> investigations I found that Ingo patch doesn't map kernel entry code
> (trampline) as _PAGE_GLOBAL. This was the answer.
> 
> I tested it on 4 different P-III machines - all of them were
> vulnerable. 
> But lately I tested it on Celeron 2.4Ghz and P4 systems - it doesn't
> happen, so this bug can be of low interest to Intel people :(
> 
> Below you can find the way how to reproduce the bug, call traces
> and why I think it's a hardware bug.
> 
> How to reproduce a bug
> ~~
> 
> - take any FedoraCore kernel with Ingo Molnar 4gb split patch
>or mainstream kernel and apply 4GB split patch
> - apply attached diff-arch-4gb-global patch to make
>trampline code to be GLOBAL
> - compile kernel with turned on 4gb split, i.e. CONFIG_X86_4GB=y
> - boot the kernel and run the attached testcase:
> 
> # while true; do ./4gbtest; done;
> 
> or
> 
> # ./elflbl -l ./lib -a 0x400  (where elflbl is uselib exploit)
> 
> During each 4-5 test runs I get the following oops:
> 
> Jan 21 12:15:17 ts Unable to handle kernel NULL pointer dereference at
> virtual address 00c0
> Jan 21 12:15:17 ts  printing eip:
> Jan 21 12:15:17 ts 02114450
> Jan 21 12:15:17 ts *pde = 
> Jan 21 12:15:17 ts Oops: 0002
> Jan 21 12:15:17 ts SMP
> Jan 21 12:15:17 ts Modules linked in:
> Jan 21 12:15:17 ts CPU:0
> Jan 21 12:15:17 ts EIP:0060:[<02114450>]Not tainted
> Jan 21 12:15:17 ts EFLAGS: 00010246   (2.6.8-dev)
> Jan 21 12:15:17 ts EIP is at sys_mmap2+0x0/0xb0
> Jan 21 12:15:17 ts eax: 00c0   ebx: 31524fc4   ecx: 1000  
> edx: 004ec000
> Jan 21 12:15:17 ts esi: 0032   edi:    ebp: 31524000  
> esp: 31524fc0
> Jan 21 12:15:17 ts ds: 007b   es: 007b   ss: 0068
> Jan 21 12:15:17 ts Process test (pid: 25, threadinfo=31524000
> task=31f680c0) Jan 21 12:15:17 ts Stack: fffec200 01a2a000 1000
> 0003 0032   00c0
> Jan 21 12:15:17 ts007b 007b 00c0 08048541 0073
> 0282 bdcc 007b
> Jan 21 12:15:17 ts Call Trace:
> Jan 21 12:15:17 ts Code: 55 bd f7 ff ff ff 57 31 ff 56 53 83 ec 18 8b
> 44 24 38 89 c6
> 
>   Unable to handle kernel NULL pointer dereference at virtual address
> 00c0
>   02114450
>   *pde = 
>   Oops: 0002
>   CPU:0
>   EIP:0060:[<02114450>]Not tainted
>   EFLAGS: 00010246   (2.6.8-dev)
>   eax: 00c0   ebx: 31524fc4   ecx: 1000   edx: 004ec000
>   esi: 0032   edi:    ebp: 31524000   esp: 31524fc0
>   ds: 007b   es: 007b   ss: 0068
>   Stack: fffec200 01a2a000 1000 0003 0032 
>  00c0
>  007b 007b 00c0 08048541 0073 0282
> bdcc 007b
>   Call Trace:
>   Code: 55 bd f7 ff ff ff 57 31 ff 56 53 83 ec 18 8b 44 24 38 89 c6
> 
> 
>  >>EIP; 02114450<=
> 
>  >>ebx; 31524fc4 
>  >>ebp; 31524000 
>  >>esp; 31524fc0 
> 
> Code;  02114450 
>  <_EIP>:
> Code;  02114450<=
> 0:   55push   %ebp   <=
> Code;  02114451 
> 1:   bd f7 ff ff ffmov$0xfff7,%ebp
> Code;  02114456 
> 6:   57push   %edi
> Code;  02114457 
> 7:   31 ff xor%edi,%edi
> Code;  02114459 
> 9:   56push   %esi
> Code;  0211445a 
> a:   53

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Christoph Lameter writes:

> I had the name "zero_page" in V1 and V2 of the patch where it was
> separate. Then someone complained about code duplication.

Well, if you duplicated each arch's clear_page implementation in
zero_page, then yes, that would be unnecessary code duplication.  I
would suggest that for architectures where the clear_page
implementation can easily be extended, rename it to clear_page_order
(or something) and #define clear_page(x) to be clear_page_order(x, 0).
For architectures where it can't, leave clear_page as clear_page and
define clear_page_order as an inline function that calls clear_page in
a loop.

> clear_page is called clear_page because it clears one page of *any* order
> not just higher orders. zero-order pages are not segregated nor are they
> intrisincally better just because they contain more memory ;-).

You have missed my point, which was about address constraints, not a
distinction between zero-order pages and higher-order pages.

Anyway, I remain of the opinion that your naming is inconsistent with
the naming of other functions that deal with zero-order and
higher-order pages, such as get_free_pages, alloc_pages, free_pages,
etc., and that your patch is unnecessarily intrusive.  I guess it's up
to Andrew to decide which way we go.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: seccomp for 2.6.11-rc1-bk8

2005-01-21 Thread Andrea Arcangeli

On Fri, Jan 21, 2005 at 09:54:16PM +0100, Ingo Molnar wrote:
> - the second barrier is the 'jail' of the ptraced task. Especially with
>   PTRACE_SYSCALL, the things a child ptraced process can do are
>   extremely limited, everything it tries to do will trap, the task will
>   suspend and the parent runs. The task is completely passive and ptrace
>   on that end is a pretty small engine that stops/traps/restarts user
>   processing without alot of frills.
> 
> historically there has been alot less problems with the second barrier. 
> (in fact i cannot remember even one security issue in that area.)

I agree there are less problems in that area.  But there's still a great
deal of complexity in ptrace that I preferred to keep it out of the
security equation.

uml can't run with seccomp, uml is forced to ptrace, it has to trap the
arguments and everything.

Once kernel CVS returns up, I'll get an email as soon as somebody
touches kernel/seccomp.c or the other files involved, and I can keep the
eye on the code and verify all modifications very quickly (plus there
will be very few modifications on those files, unlike for the ptrace
code that is much more under deveopment). Keeping ptrace under control
would be more costly on my side.

> i'm not forcing anyone to do anything, but i think the most logical
> solution is to use ptrace. It's there on every Linux box so your client
> can run even on 'older' Linux boxes. (You might want to detect in the
> client whether the OOM race is fixed in a kernel, but it should not be a
> truly big issue.) Waiting for any extra API to get significant userbase
> takes at least 1-2 years - while ptrace is here and available on every

Note that I'm not ready for production myself yet, I'm suggesting to
include this now, exactly to get some real userbase ready in 1-2 years.
And after that with trusted computing it'll take another few years
before the trusted Cpushare exchange can start in parallel to the
seccomp one.  My schedule is planned for a much longer timeframe, I
doubt anything significant could happen this year regardless of ptrace
or seccomp.

Plus I would never depend on the users to do the right thing (i.e. not
to run oom etc..). So I'm forced to wait the 1-2 years anyways either to
get seccomp merged, or to get your ptrace extension merged. If I use
ptrace, the current kernels can't prevent the Cpushare users to hurt
themself, so I won't allow current unpatched kernels to run.

I have no hurry, my first prio is to do everything safely, I don't care
to grow the userbase fast if I have to add some risk to the users to
do that.

Note also that all Cpushare client software that runs on the user
computers is GPL, in turn without pending patents and completely free
software, so you're very free to take it, rewrite it with ptrace, and
ship it to your users now. Even Microsoft can write its own Cpushare
client and ship it in Windows just fine.  You can fake the kernel
version to tell the server 2.6.11+seccom is running, despite 2.6.9 with
the insecure ptrace might be running instead (the Cpushare protocol does
most checks on the server side btw).  I have no control on that and as
long as I have no liability I'm fine (and I write in capital letters no
liability and no warranty in the account creation procedure of course).
But the client I will ship myself on cpushare.com will have security as
priority number 1 in mind, and in turn I can't allow it to run with the
current ptrace kernel code.

(however if you want to write your own client for your own OS, please
let me know privately, instead of faking the kernel version, that's
going to be more secure shall you need me to shutdown just your clients
because you found a security issue in your code)

If you noticed, I also made sure that after seccomp is enabled, it is
impossible to disable it:

/* can set it only once to be even more secure */
if (unlikely(tsk->seccomp_mode))
return -EPERM;

This is a *major* feature. I'm sure we can hack ptrace for that too with
yet another patch, but isn't it so much simpler to merge seccomp to get
the highest degree of security? The only way an user can screw himself
with seccomp is to write the right bit in /dev/mem at the right bit
offset. And I exclude that can happen by mistake. I mean, it has a
lower probability than a ram bitflip ;).

> Linux box. If you require 'users' to go with a new (or worse: patched)
> kernel then you are creating a pretty significant artificial market
> penetration barrier for your application.

This is fine. It's a long term project, I don't care about the short
term, I only care that the users are as safe as possible.

> also, with more applications relying on ptrace it will become more
> tested, more robust and people will do speedups. I think the fact that
> UML uses ptrace is already a very good sign that it's robust for such
> purposes. (_Also_, if there's a security problem in the ptrace barrier,
> you'd like to know about

Re: [PATCH][RFC] swsusp: speed up image restoring on x86-64

2005-01-21 Thread Andi Kleen

On Thu, Jan 20, 2005 at 08:32:31PM +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> The following patch speeds up the restoring of swsusp images on x86-64
> and makes the assembly code more readable (tested and works on AMD64).  It's
> against 2.6.11-rc1-mm1, but applies to 2.6.11-rc1-mm2.  Please consifer for 
> applying.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

Thanks. I applied it with some small changes to not hardcode any 
C fields. 

BTW Pavel, while reading the code I noticed some dubious things
in the code:

- The TLB flush doesn't flush global pages (turn of PGE and turn it
on again). Since that handles kernel pages which are marked global
this is surely wrong. 

- Also is it really needed to flush the TLB after each page and wouldn't
INVLPG be better here? Or do you want to flush other pages than the
just copied one there too? INVLPG would also take care of the global
pages at least on x86-64 (iirc there are some bugs in this regard on some
older i386 cpus) 

- There is a comment that says the code shouldn't use stack, but 
it definitely uses the stack for some things. Either the comment
or the code is wrong. Which is?


-Andi


The following patch speeds up the restoring of swsusp images on x86-64
and makes the assembly code more readable (tested and works on AMD64).  It's
against 2.6.11-rc1-mm1, but applies to 2.6.11-rc1-mm2.  Please consifer for 
applying.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

Changed by AK to not hardcode any C values and get them from offset.h
instead.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux/arch/x86_64/kernel/suspend_asm.S
===
--- linux.orig/arch/x86_64/kernel/suspend_asm.S 2004-10-19 01:55:08.%N +0200
+++ linux/arch/x86_64/kernel/suspend_asm.S  2005-01-22 03:20:28.%N +0100
@@ -11,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 ENTRY(swsusp_arch_suspend)
 
@@ -49,43 +51,31 @@
movq%rcx, %cr3;
movq%rax, %cr4;  # turn PGE back on
 
+   movqpagedir_nosave(%rip), %rdx
+   /* compute the limit */
movlnr_copy_pages(%rip), %eax
-   xorl%ecx, %ecx
-   movq$0, %r10
testl   %eax, %eax
jz  done
-.L105:
-   xorl%esi, %esi
-   movq$0, %r11
-   jmp .L104
-   .p2align 4,,7
-copy_one_page:
-   movq%r10, %rcx
-.L104:
-   movqpagedir_nosave(%rip), %rdx
-   movq%rcx, %rax
-   salq$5, %rax
-   movq8(%rdx,%rax), %rcx
-   movq(%rdx,%rax), %rax
-   movzbl  (%rsi,%rax), %eax
-   movb%al, (%rsi,%rcx)
-
-   movq%cr3, %rax;  # flush TLB
-   movq%rax, %cr3;
-
-   movq%r11, %rax
-   incq%rax
-   cmpq$4095, %rax
-   movq%rax, %rsi
-   movq%rax, %r11
-   jbe copy_one_page
-   movq%r10, %rax
-   incq%rax
-   movq%rax, %rcx
-   movq%rax, %r10
-   mov nr_copy_pages(%rip), %eax
-   cmpq%rax, %rcx
-   jb  .L105
+   movq%rdx,%r8
+   movl$SIZEOF_PBE,%r9d
+   mul %r9  # with rax, clobbers rdx
+   movq%r8, %rdx
+   addq%r8, %rax
+loop:
+   /* get addresses from the pbe and copy the page */
+   movqpbe_address(%rdx), %rsi
+   movqpbe_orig_address(%rdx), %rdi
+   movq$512, %rcx
+   rep
+   movsq
+
+   movq%cr3, %rcx;  # flush TLB
+   movq%rcx, %cr3;
+
+   /* progress to the next pbe */
+   addq$SIZEOF_PBE, %rdx
+   cmpq%rax, %rdx
+   jb  loop
 done:
movl$24, %eax
movl%eax, %ds
Index: linux/arch/x86_64/kernel/asm-offsets.c
===
--- linux.orig/arch/x86_64/kernel/asm-offsets.c 2004-10-19 01:55:08.%N +0200
+++ linux/arch/x86_64/kernel/asm-offsets.c  2005-01-22 03:09:50.%N +0100
@@ -8,6 +8,7 @@
 #include 
 #include  
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,6 +62,8 @@
   offsetof (struct rt_sigframe32, uc.uc_mcontext));
BLANK();
 #endif
-
+   DEFINE(SIZEOF_PBE, sizeof(struct pbe));
+   DEFINE(pbe_address, offsetof(struct pbe, address));
+   DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));   
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Thoughts about capabilities and prototype patch for user-capabilities

2005-01-21 Thread Chris Wright

* Alexander Nyberg ([EMAIL PROTECTED]) wrote:
> I recently had an idea of having something similar
> to /etc/user_capabilites which would consist of
> username:CAP_CHOWN,CAP_SOMETHING,CAP_SOMETHING2

pam_cap should do this (alas due to brokeness of current scheme it
doesn't).

> This could very well be loaded into linux at the time of an application
> doing sys_setuid, sys_setreuid and the likes by hooking into glibc. The
> kernel also requires a small patch which I'll inline below for comments.
> I sure would use some of these capabilities on my user. And they could
> be used to solve things like a certain program/user needs to use
> real-time scheduling (just something like CAP_SCHED allowing arbitray
> scheduling or CAP_MEMLOCK allowing a user to lock memory. This would
> solve some of the old problems).

ITYM CAP_IPC_LOCK and CAP_SYS_NICE.  The ability to mlock is already
controlled by an rlimit, so the capability is not needed except as an
override.  As for scheduling, this is being actively worked on.  I have
a patch to make scheduling controlled by rlimits as well, and Ingo and
Con are working on scheduler modifications that may allow low-latency
for normal unprivilged users.

> I haven't looked into making capability bits for certain files, mostly
> cause I don't know vfs much and also I don't see this colliding with it,
> I think both should exist. 

This work has been done a few times, but never got much traction mainly
due to the complexity of managing such a scheme.

> I don't know what the plans are for capabilities in the future but there
> are quite some shortcomings in the current implementation. I think it at
> least should allow to set/get capabilites of process ids, user ids and

Olaf Dietsche posted a patch recently that allows you to do this as well,
you might look at that.  It's a bit different approach, but all of these
are working around the fact that it's simply broken right now.

> group ids. These are the things that come to my mind directly. I also
> see that 28 of the 32 currently possible slots are taken which will be a
> problem if capabilities are ever gonna get serious.

Actually we just added two more.  Yes we're against the edge.  FreeBSD
went straight to 64 bits.  That's probably the easiest transistion.
Of course, there's also the problem that some capabilities are so broad
as to be meaningless for privilege separation.

> While looking at it today I noticed user-space over here doesn't even
> seem to have macros/functions to set/test/clear bits in the blackboxed
> types which means anyone want to use it will be exposed to the raw
> structure.

Check libcap, there's quite a few functions for handling that.

> So I'm curious of what the thoughts are for capabilities, are they gonna
> go away some day in the future? I hope not, I think the idea is simple
> and good but something needs to be done about the implementation to
> allow it to be of full use. And yes, I volunteer, but I'm not all sure
> about which direction has been discussed and where this is going.

The biggest thing to fix right now is the fact that you can't allow
capabilities to be inherited as they should be.  This is what makes them
of very limited use.

> Below you will find a test program and at the bottom a patch which allows
> a priviledged user to set the capabilities of another user (i386 only). 
> Setting capabilities on groups does not work, it requires some more code
> (I don't think much though).

This is potentially quite risky.  In fact, the underlying ability is
there already (in capset), but the required privilege has been removed
(CAP_SETPCAP).  As for the syscalls, don't think we want to add them or
this type of functionality now.  We're better off fixing what we have
instead of adding more features.

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Advise on: panic - Attempting to free lock with active block list

2005-01-21 Thread Stuart Sheldon

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Just a heads up,
I had the same panic and screen error with a 2.6.9 PIII SMP system
acting as an NFS client. This was after downgrading from a 2.6.10 kernel
that was panic'ing in the same way. I reverted to 2.6.8 but left the
Server (also a PIII SMP system) running 2.6.9. This occurred during
moderate to heavy NFS activity. The patch referenced in
http://seclists.org/lists/linux-kernel/2005/Jan/1237.html appeared to
resolve the panic with 2.6.10, but I was having strange things happen,
like failing to release file locks when the client reboots. This is a
production system and needs to be available for users. I am currently
trying to piece together another smp box to test with. I will post more
if I can duplicate the problem on demand.
We have a duplicate system that is not SMP that has not shown any of
these problems. That might be by chance though...
Hope this info is useful,
Stu Sheldon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFB8bvRN2GRn8Iq8wYRAmLVAJ0dp1Zk/5KpraG1saWUCNoMD17IogCgmyPr
kOHIUD5g5EqNl+JCYzWuUc0=
=1Wji
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drivers/block/scsi_ioctl.c, Video DVD playback support

2005-01-21 Thread Elias da Silva

Moin.

Attached patch fixes a problem of reading Video DVDs
through the cdrom_ioctl interface. VMware is among
the prominent victims.

The bug was introduced in kernel version 2.6.8 in the
function verify_command().

Regards,

Elias da Silva
--- linux-2.6.10/drivers/block/scsi_ioctl.c	2004-12-24 22:35:40.0 +0100
+++ linux-2.6.10-dvd/drivers/block/scsi_ioctl.c	2005-01-22 02:31:28.223951296 +0100
@@ -159,6 +159,11 @@
 		safe_for_read(GPCMD_SEEK),
 		safe_for_read(GPCMD_STOP_PLAY_SCAN),
 
+/* Video DVD playback support */
+		safe_for_read(GPCMD_SET_STREAMING),
+		safe_for_read(GPCMD_SEND_KEY),
+/* safe_for_read(0xe9), missing this opcode definition */
+
 		/* Basic writing commands */
 		safe_for_write(WRITE_6),
 		safe_for_write(WRITE_10),
@@ -179,13 +184,11 @@
 		safe_for_write(GPCMD_RESERVE_RZONE_TRACK),
 		safe_for_write(GPCMD_SEND_DVD_STRUCTURE),
 		safe_for_write(GPCMD_SEND_EVENT),
-		safe_for_write(GPCMD_SEND_KEY),
 		safe_for_write(GPCMD_SEND_OPC),
 		safe_for_write(GPCMD_SEND_CUE_SHEET),
 		safe_for_write(GPCMD_SET_SPEED),
 		safe_for_write(GPCMD_PREVENT_ALLOW_MEDIUM_REMOVAL),
 		safe_for_write(GPCMD_LOAD_UNLOAD),
-		safe_for_write(GPCMD_SET_STREAMING),
 	};
 	unsigned char type = cmd_type[cmd[0]];
 
@@ -194,13 +197,11 @@
 		return 0;
 
 	/* Write-safe commands just require a writable open.. */
-	if (type & CMD_WRITE_SAFE) {
-		if (file->f_mode & FMODE_WRITE)
-			return 0;
-	}
+	if ((type & CMD_WRITE_SAFE) && (file->f_mode & FMODE_WRITE))
+		return 0;
 
-	if (!(type & CMD_WARNED)) {
-		cmd_type[cmd[0]] = CMD_WARNED;
+	if (!type) {
+		type = cmd_type[cmd[0]] = CMD_WARNED;
 		printk(KERN_WARNING "scsi: unknown opcode 0x%02x\n", cmd[0]);
 	}
 
@@ -208,7 +209,14 @@
 	if (capable(CAP_SYS_RAWIO))
 		return 0;
 
-	/* Otherwise fail it with an "Operation not permitted" */
+if (!(type & CMD_WARNED))
+{
+  cmd_type[cmd[0]] |= CMD_WARNED;
+  printk(KERN_WARNING "scsi: opcode 0x%02x write/rawio"
+ " permission denied\n", cmd[0]);
+}
+
+/* Otherwise fail it with an "Operation not permitted" */
 	return -EPERM;
 }

Re: Memory leak in 2.6.11-rc1?

2005-01-21 Thread Alexander Nyberg

fre 2005-01-21 klockan 17:19 +0100 skrev Jan Kasprzak:
>   Hi all,
> 
> I've been running 2.6.11-rc1 on my dual opteron Fedora Core 3 box for a week
> now, and I think there is a memory leak somewhere. I am measuring the
> size of active and inactive pages (from /proc/meminfo), and it seems
> that the count of sum (active+inactive) pages is decreasing. Please
> take look at the graphs at
> 
> http://www.linux.cz/stats/mrtg-rrd/vm_active.html
> 
> (especially the "monthly" graph) - I've booted 2.6.11-rc1 last Friday,
> and since then the size of "inactive" pages is decreasing almost
> constantly, while "active" is not increasing. The active+inactive
> sum has been steady before, as you can see from both the monthly
> and yearly graphs.
> 
> Now I am playing with 2.6.11-rc1-bk snapshots to see what happens.
> I have been running 2.6.10-rc3 before. More info is available, please ask me.
> The box runs 3ware 7506-8 controller with SW RAID-0, 1, and 5 volumes,
> Tigon3 network card. The main load is FTP server, and there is also
> a HTTP server and Qmail.

Others have seen this as well, the reports indicated that it takes a day
or two before it becomes noticeable. When it happens next time please
capture the output of /proc/meminfo and /proc/slabinfo.

Thanks
Alexander

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ps/2 mouse going crazy

2005-01-21 Thread zhilla

Martin Zwickel wrote:
Hmm, I have similar problems with my mouse since I'm using kernel 2.6.
Sometimes (once a day or only every second day) my mouse goes to the
left upper corner. But then works a normal. Extremly annoying while
playing UT2004.
But I don't get any kernel messages. With 2.4 everything worked fine.
(Currently using 2.6.8-rc2-mm1)
well, problem is still not that uncommon. i saw few other examples on 
google. anybody has any ideas for fixes?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.11-rc2

2005-01-21 Thread Linus Torvalds


Ok, trying to calm things down again for a 2.6.11 release.

Tons of small cleanups, annotations and fixes here. Driver updates, 
cpufreq, ppc, parisc, arm.. Pls check that I got it all.

Linus

---
Summary of changes from v2.6.11-rc1 to v2.6.11-rc2


Adam Kropelin:
  o contort getdents64 to pacify gcc-2.96

Adrian Bunk:
  o [NET]: misc cleanups
  o SCSI aic7xxx: kill kernel 2.2 #ifdef's
  o [DECNET]: Misc cleanups
  o [IPV6]: Misc cleanups
  o [NET]: net/802/: some cleanups
  o [XFRM]: Unexport xfrm_policy_delete

Alan Cox:
  o [AX25]: Revert to 2.6.9 behavior
  o smbfs fixes

Alan Stern:
  o USB UHCI: protect DMA-able fields with barriers
  o USB: correct and clarify error-code documentation

Alexander Viro:
  o miri_sbus iomem annotations
  o hamachi iomem annotations
  o bmac iomem annotations
  o s2io iomem annotations and cleanups

Alexey Dobriyan:
  o USB: drivers/usb/*: s/0/NULL/ in pointer context

Alexey Kuznetsov:
  o [TCP]: Do not try to collapse multi-packet SKBs

Andi Kleen:
  o Fix gcc4 compilation in s2io net driver
  o x86_64: Fix ACPI SRAT NUMA parsing
  o x86_64: Fix K8 NUMA discovery
  o [3/4] x86_64: Fix NUMA hash setup
  o [4/4] Fix numa=off command line parsing
  o x86_64: Add brackets to bitops
  o x86_64: Move early CPU detection earlier
  o x86_64: Disable uselib when possible
  o x86_64: Optimize nodemask operations slightly
  o [NET]: Use unlocked_ioctl for sock_ioctl
  o x86_64: Fix CMP with interleaving
  o x86_64: fix flush race on context switch
  o i386/x86-64: Fix SMP NMI watchdog race
  o x86-64: Fix pud typo in ioremap
  o x86-64: Clean up cpuid level detection
  o Use -Wno-pointer-sign for gcc 4.0
  o Convert XFS to unlocked_ioctl and compat_ioctl
  o Some fixes for compat ioctl
  o Convert Infiniband MAD driver to compat/unlocked_ioctl
  o Support compat_ioctl for block devices
  o Convert cciss to compat_ioctl
  o Add compat_ioctl to frame buffer layer
  o Convert sis fb driver to compat_ioctl
  o Convert dv1394 driver to compat_ioctl
  o Convert video1394 driver to compat_ioctl
  o Convert amdtp driver to compat_ioctl

Andreas Gruenbacher:
  o ext3/ea: revert old ea-in-inode patch
  o ext3/EA: mbcache cleanup
  o ext3/EA: Race in ext[23] xattr sharing code
  o ext3/EA: Ext3: do not use journal_release_buffer
  o ext3/EA: Ext3: factor our common xattr code; unnecessary lock
  o ext3/EA: Ext[23]: no spare xattr handler slots needed
  o ext3/EA: Cleanup and prepare ext3 for in-inode xattrs
  o ext3/EA: Hide ext3_get_inode_loc in_mem option
  o ext3/EA: In-inode extended attributes for ext3

Andreas Schwab:
  o [IA64] Fix PTRACE_GETEVENTMSG ia32 emulation

Andrew Morton:
  o eepro build fix
  o ixgb whitespace fix
  o 3c515 warning fix
  o [SPARC64]: Make first arg to find_next_zero_bit() const
  o acpi build fix
  o convert-cciss-to-compat_ioctl fix

Anton Blanchard:
  o ppc64: lacks definition of MM_VM_SIZE()
  o ppc64: Remove CONFIG_IRQ_ALL_CPUS

Antonino Daplas:
  o fbdev: Cleanup broken edid fixup code
  o fbcon: Catch blank events on both device and console level
  o fbcon: Fix compile error
  o fbdev: Fbmon cleanup
  o i810fb: Module param fix
  o atyfb: Fix module parameter descriptions
  o radeonfb: Fix init/exit section usage
  o pxafb: Reorder add_wait_queue() and set_current_state()
  o sa1100fb: Reorder add_wait_queue() and set_current_state()
  o backlight: Add Backlight/LCD device basic support
  o fbdev: Add w100 framebuffer driver

Aristeu Sergio Rozanski Filho:
  o eepro: cache EEPROM values
  o eepro: use module_param macros
  o eepro: basic ethtool support
  o eepro: fix return value in init_module()
  o eepro: fix auto-detection option

Arjan van de Ven:
  o [NETLINK]: Kill netlink_post, no longer used
  o [IPVS]: Kill check_for_ip_vs_out, no longer used

Arkadiusz Miskiewicz:
  o USB: add Ever UPS vendor/product id to ftdi_sio driver

Arnaldo Carvalho de Melo:
  o [UDP] merge udp_sock with udp_opt
  o [RAW] merge raw_sock with raw_opt
  o [SCTP] merge sctp_sock with sctp_opt
  o [IPV6] merge raw6_sock with raw6_opt
  o [IPX] use a private slab cache for socks

Arthur Kepner:
  o [TG3]: Always copy receive packets when 5701 PCIX workaround
enabled

Bart De Schuymer:
  o [BRIDGE-NF]: Check ipv4 vs ipv6 more reliably in ip_sabotage_out()

Bartlomiej Zolnierkiewicz:
  o [ide] ide-cd: use ssleep() instead of schedule_timeout()
  o [ide] make try_to_flush_leftover_data() static
  o [ide] kill ide_drive_t->suspend_reset
  o [ide] icside: use ide_dma_intr()
  o [ide] ide-v10: use ide_dma_intr()
  o [ide] kill default_{attach,cleanup}()

Ben Dooks:
  o [ARM PATCH] 2376/1: S3C2410 - cleanup 2410/2440 distinctions, fix
build
  o [ARM PATCH] 2390/1: Simtec Electronics MAINTAINERS file entries
  o [ARM PATCH] 2403/1: S3C2410 - clock initialsation tidy
  o [ARM PATCH] 2407/1: S3C2410 - remove fixed base from IIS registers
  o [ARM PATCH] 2408/1: S3C2410 - dma get position call
  o [ARM PATCH]

Re: Something very strange on x86_64 2.6.X kernels

2005-01-21 Thread Linus Torvalds



On Sat, 22 Jan 2005, Andi Kleen wrote:
> 
> I applied the patch to my tree.

I already applied it as obvious ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] Thoughts about capabilities and prototype patch for user-capabilities

2005-01-21 Thread Alexander Nyberg

Hi!

I recently had an idea of having something similar
to /etc/user_capabilites which would consist of
username:CAP_CHOWN,CAP_SOMETHING,CAP_SOMETHING2

This could very well be loaded into linux at the time of an application
doing sys_setuid, sys_setreuid and the likes by hooking into glibc. The
kernel also requires a small patch which I'll inline below for comments.
I sure would use some of these capabilities on my user. And they could
be used to solve things like a certain program/user needs to use
real-time scheduling (just something like CAP_SCHED allowing arbitray
scheduling or CAP_MEMLOCK allowing a user to lock memory. This would
solve some of the old problems).

I haven't looked into making capability bits for certain files, mostly
cause I don't know vfs much and also I don't see this colliding with it,
I think both should exist. 

I don't know what the plans are for capabilities in the future but there
are quite some shortcomings in the current implementation. I think it at
least should allow to set/get capabilites of process ids, user ids and
group ids. These are the things that come to my mind directly. I also
see that 28 of the 32 currently possible slots are taken which will be a
problem if capabilities are ever gonna get serious.

While looking at it today I noticed user-space over here doesn't even
seem to have macros/functions to set/test/clear bits in the blackboxed
types which means anyone want to use it will be exposed to the raw
structure.

So I'm curious of what the thoughts are for capabilities, are they gonna
go away some day in the future? I hope not, I think the idea is simple
and good but something needs to be done about the implementation to
allow it to be of full use. And yes, I volunteer, but I'm not all sure
about which direction has been discussed and where this is going.

Below you will find a test program and at the bottom a patch which allows
a priviledged user to set the capabilities of another user (i386 only). 
Setting capabilities on groups does not work, it requires some more code
(I don't think much though).
Unfortunately the patch has to introduce two system calls and a new
data structure not to break any backward-compatiblity. But if something is
to be done in the future it looks like either compatibility has to be broken
completely (2.7 maybe? if ever?) or new syscalls must be introduced. I guess it
depends on how many programs actually uses capabilities, does anyone know?




#undef _POSIX_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define __NR_sys_id_capget  289
#define __NR_sys_id_capset  290

typedef struct __user_cap_id_header_struct {
__u32 version;
uid_t uid;
gid_t gid;
} *cap_id_header_t;

inline _syscall3(int, sys_id_capget, int, type, void *, head, void *, data);
inline _syscall3(int, sys_id_capset, int, type, void *, head, void *, data);

/* 
 * Had to steal these from kernel header...
*/
#define cap_t(x) (x)
#define CAP_TO_MASK(x) (1 << (x))
#define cap_raise(c, flag)   (cap_t(c) |=  CAP_TO_MASK(flag))
#define cap_clear(c) do { cap_t(c) =  0; } while(0)
#define cap_raised(c, flag)  (cap_t(c) & CAP_TO_MASK(flag))
#define cap_isclear(c)   (!cap_t(c))

#define CAP_TYPE_UID1
#define CAP_TYPE_GID2


/* 
 * Silly program that sets UID 1000 to have CAP_CHOWN
 * and tests that it returns a cap_user_data_t with
 * CAP_CHOWN set.
*/
int main()
{
int ret;
cap_id_header_t head = malloc(sizeof(*head));
cap_user_data_t data = malloc(sizeof(*data));


head->version = _LINUX_CAPABILITY_VERSION;
head->gid = 0;
head->uid = 1000;

cap_clear(data->effective);
cap_clear(data->permitted);
cap_clear(data->inheritable);

cap_raise(data->effective, CAP_CHOWN);
ret = sys_id_capset(CAP_TYPE_UID, head, data);
if (ret) {
perror("capset: ");
}

cap_clear(data->effective);
cap_clear(data->permitted);
cap_clear(data->inheritable);
ret = sys_id_capget(CAP_TYPE_UID, head, data);
if (ret)
perror("capget: ");

if (cap_raised(data->effective, CAP_CHOWN))
printf("yay!\n");


return 0;
}





= arch/i386/kernel/entry.S 1.89 vs edited =
--- 1.89/arch/i386/kernel/entry.S   2005-01-08 06:44:02 +01:00
+++ edited/arch/i386/kernel/entry.S 2005-01-22 00:11:38 +01:00
@@ -864,5 +864,7 @@ ENTRY(sys_call_table)
.long sys_add_key
.long sys_request_key
.long sys_keyctl
+   .long sys_id_capget
+   .long sys_id_capset /* 290 */
 
 syscall_table_size=(.-sys_call_table)
= include/linux/capability.h 1.6 vs edited =
--- 1.6/include/linux/capability.h  2003-05-12 23:35:19 +02:00
+++ edited/include/linux/capability.h   2005-01-22 02:08:34 +01:00
@@ -34,6 +34,12 @@ typedef struct __user_cap_header_struct 
int

Re: [PATCH][RFC] swsusp: speed up image restoring on x86-64

2005-01-21 Thread Andi Kleen

> With this patch, at least 8 times less memory accesses are required to 
> restore an image
> than without it, and in the original code cr3 is reloaded after copying each 
> _byte_,
> let alone the SIB arithmetics.  I'd expect it to be 10 times faster or so.

Probably more. CR3 reload is a serializing operation and is really expensive.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Christoph Lameter

On Sat, 22 Jan 2005, Paul Mackerras wrote:

> Christoph's patch is bigger than it needs to be because he has to
> change all the occurrences of clear_page(x) to clear_page(x, 0), and
> then he has to change a lot of architectures' clear_page functions to
> be called _clear_page instead.  If he picked a different name for the
> "clear a higher order page" function it would end up being less
> invasive as well as less confusing.

I had the name "zero_page" in V1 and V2 of the patch where it was
separate. Then someone complained about code duplication.

> The argument that clear_page is called that because it clears a higher
> order page won't wash; all the clear_page implementations in his patch
> are perfectly capable of clearing any contiguous set of 2^order pages
> (oops, I mean "zero-order pages"), not just a "higher order page".

clear_page is called clear_page because it clears one page of *any* order
not just higher orders. zero-order pages are not segregated nor are they
intrisincally better just because they contain more memory ;-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Something very strange on x86_64 2.6.X kernels

2005-01-21 Thread Andi Kleen

On Fri, Jan 21, 2005 at 05:26:01PM +0100, Petr Vandrovec wrote:
> On Thu, Jan 20, 2005 at 09:53:36PM +0100, Eric Dumazet wrote:
> > 
> > Examining linux sources, I found that 0xe000 is 'special' (ia 32 
> > vsyscall) and 0xe600 is about sigreturn subsection of this special area.
> > 
> > Is it possible some vm trick just kicks in and corrupts my true 64bits 
> > program ?
> 
> Maybe I already missed answer, but try patch below.  It is definitely bad
> to mark syscall page as global one...

Patch looks good thanks. Ugh, what a stupid bug.

I applied the patch to my tree.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 Fix in_be64 definition

2005-01-21 Thread Paul Mackerras

This patch is from Jake Moilanen <[EMAIL PROTECTED]>.

The instruction syntax for the in_be64 inline asm was incorrect for
the "m" constraint for the address parameter.  This patch fixes the
instruction in the inline asm.

Signed-off-by: Jake Moilanen <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h
--- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan  4 15:33:22 2005
+++ linux-2.6-bk-moilanen/include/asm-ppc64/io.hWed Jan  5 08:08:03 2005
@@ -371,7 +371,7 @@ static inline unsigned long in_be64(cons
 {
unsigned long ret;
 
-   __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
+   __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync"
 : "=r" (ret) : "m" (*addr));
return ret;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Loopback mounting from a file with a partition table?

2005-01-21 Thread Dan Stromberg

Has anyone tried loopback mounting individual partitions from within a
file that contains a partition table?

When I mount -o loop the file, I seem to get the first partition in the
file, but I don't see anything in the man page for mount that indicates a
way of getting any other partitions from a file with a partition table.

Any comments?

Thanks!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 xmon data breakpoints on partitioned systems

2005-01-21 Thread Paul Mackerras

This patch is originally from Jake Moilanen <[EMAIL PROTECTED]>,
substantially modified by me.

On PPC64 systems with a hypervisor, we can't set the Data Address
Breakpoint Register (DABR) directly, we have to do it through a
hypervisor call.

Signed-off-by: Jake Moilanen <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/xmon/xmon.c test/arch/ppc64/xmon/xmon.c
--- linux-2.5/arch/ppc64/xmon/xmon.c2005-01-12 18:20:48.0 +1100
+++ test/arch/ppc64/xmon/xmon.c 2005-01-22 10:55:46.664345064 +1100
@@ -624,6 +624,17 @@
return 0;
 }
 
+/* On systems with a hypervisor, we can't set the DABR
+   (data address breakpoint register) directly. */
+static void set_controlled_dabr(unsigned long val)
+{
+   if (systemcfg->platform == PLATFORM_PSERIES_LPAR) {
+   int rc = plpar_hcall_norets(H_SET_DABR, val);
+   if (rc != H_Success)
+   xmon_printf("Warning: setting DABR failed (%d)\n", rc);
+   } else
+   set_dabr(val);
+}
 
 static struct bpt *at_breakpoint(unsigned long pc)
 {
@@ -711,7 +722,7 @@
 static void insert_cpu_bpts(void)
 {
if (dabr.enabled)
-   set_dabr(dabr.address | (dabr.enabled & 7));
+   set_controlled_dabr(dabr.address | (dabr.enabled & 7));
if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR))
set_iabr(iabr->address
 | (iabr->enabled & (BP_IABR|BP_IABR_TE)));
@@ -739,7 +750,7 @@
 
 static void remove_cpu_bpts(void)
 {
-   set_dabr(0);
+   set_controlled_dabr(0);
if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR))
set_iabr(0);
 }
@@ -1049,8 +1060,8 @@
 "b  [cnt]   set breakpoint at given instr addr\n"
 "bc   clear all breakpoints\n"
 "bc   clear breakpoint number n or at addr\n"
-"bi  [cnt]  set hardware instr breakpoint (broken?)\n"
-"bd  [cnt]  set hardware data breakpoint (broken?)\n"
+"bi  [cnt]  set hardware instr breakpoint (POWER3/RS64 only)\n"
+"bd  [cnt]  set hardware data breakpoint\n"
 "";
 
 static void
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Andrew Morton writes:

> It is, actually, from the POV of the page allocator.  It's a "higher order
> page" and is controlled by a struct page*, just like a zero-order page...

So why is the function that gets me one of these "higher order pages"
called "get_free_pages" with an "s"? :)

Christoph's patch is bigger than it needs to be because he has to
change all the occurrences of clear_page(x) to clear_page(x, 0), and
then he has to change a lot of architectures' clear_page functions to
be called _clear_page instead.  If he picked a different name for the
"clear a higher order page" function it would end up being less
invasive as well as less confusing.

The argument that clear_page is called that because it clears a higher
order page won't wash; all the clear_page implementations in his patch
are perfectly capable of clearing any contiguous set of 2^order pages
(oops, I mean "zero-order pages"), not just a "higher order page".

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Roman Zippel

Hi,

On Fri, 21 Jan 2005, Andrew Morton wrote:

> Paul Mackerras <[EMAIL PROTECTED]> wrote:
> >
> > A cluster of 2^n contiguous pages
> >  isn't one page by any normal definition.
> 
> It is, actually, from the POV of the page allocator.  It's a "higher order
> page" and is controlled by a struct page*, just like a zero-order page...

OTOH we also have alloc_page/alloc_pages.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Page fault in umount

2005-01-21 Thread Pierre Ossman

When I yank out my MP3 player, the programs trying to umount the disk 
cause the following page fault:

usb 1-5: USB disconnect, address 2
scsi0 (0:0): rejecting I/O to dead device
FAT bread failed in fat_clusters_flush
Unable to handle kernel paging request at virtual address 6b6b6b6b
 printing eip:
e0a0ecaf
*pde = 
Oops:  [#1]
PREEMPT
Modules linked in: radeon pcspkr iptable_filter tun parport_pc lp 
parport irport irnet ppp_generic slhc ircomm_tty ircomm irda crc_ccitt 
sd_mod autofs4 hidp rfcomm l2cap bluetooth pcmcia sunrpc ipt_MASQUERADE 
iptable_nat usb_storage scsi_mod ip_conntrack ip_tables microcode 
binfmt_misc nls_iso8859_1 nls_cp437 vfat fat ext3 jbd video button 
battery ac ohci1394 ieee1394 yenta_socket pcmcia_core uhci_hcd ehci_hcd 
i2c_i801 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss 
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc 8139cp mii
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010296   (2.6.10)
EIP is at scsi_device_put+0xf/0x70 [scsi_mod]
eax: 6b6b6b6b   ebx: d8994150   ecx:    edx: 
esi: 6b6b6b6b   edi: dfcae46c   ebp:    esp: ce6b3ee4
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 7190, threadinfo=ce6b2000 task=cbe5c000)
Stack: d883b9f4 d8994150 d883b9f4 e0b38103 6b6b6b6b e0b3a190 d8994150 
e0b386f7
   d8994150  dc6a1764 dc6a17f8 c0195e56 dc6a17f8  
dc6a19b8
   c0472680   c0195ed0 dc6a1764 c0193acd d89645f8 
e09f6ec0
Call Trace:
 [] scsi_disk_put+0x33/0x50 [sd_mod]
 [] scsi_disk_release+0x0/0x1b0 [sd_mod]
 [] sd_release+0x47/0x90 [sd_mod]
 [] blkdev_put+0xc6/0x170
 [] blkdev_put+0x140/0x170
 [] kill_block_super+0x3d/0x60
 [] deactivate_super+0xac/0x120
 [] __mntput+0x25/0x40
 [] sys_umount+0x3f/0xa0
 [] sys_munmap+0x44/0x70
 [] sysenter_past_esp+0x52/0x75
Code: 34 24 e8 05 ba 90 df ba fa ff ff ff eb e0 8d b4 26 00 00 00 00 8d 
bc 27 00 00 00 00 83 ec 0c 89 74 24 08 8b 74 24 10 89 5c 24 04 <8b> 06 
8b 80 b4 00 00 00 8b 00 85 c0 74 1f bb 00 e0 ff ff 21 e3

The device is mounted in sync so shouldn't this be safe as long as the 
device isn't busy? This works on other USB devices I have.
A page fault seems severe either way.

Rgds
Pierre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Dev] Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread Bryce Harrington

On Fri, 21 Jan 2005, Chris Wright wrote:
> * Andrew Morton ([EMAIL PROTECTED]) wrote:
> > Bryce Harrington <[EMAIL PROTECTED]> wrote:
> > I am unable to find the oops trace amongst all that stuff.  Help?
> >
> > (It would have been handy to include it in the bug report, actually)
>
> Yes, it would.  Or at least some better granularity leading up to the
> problem.  I ran growfiles locally on 2.6.11-rc-bk and didn't have any
> problem.  Could you strace growfiles and see what it was doing when it
> killed the machine?

Okay, I'll set up another run and try collecting that info.  Is there
any other data that would be useful to collect while I'm at it?

Bryce

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Andrew Morton writes:

> It is, actually, from the POV of the page allocator.  It's a "higher order
> page" and is controlled by a struct page*, just like a zero-order page...

OK.  I still reckon it's confusing terminology for the rest of us who
don't have our heads deep in the page allocator code.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/8] core-small: Shrink futex queue hash

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL reduce futex hash table

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny-new/kernel/futex.c
===
--- tiny-new.orig/kernel/futex.c2004-11-17 00:04:03.0 -0800
+++ tiny-new/kernel/futex.c 2004-11-17 10:30:20.749824672 -0800
@@ -40,7 +40,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_CORE_SMALL
+#define FUTEX_HASHBITS 4
+#else
 #define FUTEX_HASHBITS 8
+#endif
 
 /*
  * Futexes are matched on equal values of this key.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Radeon framebuffer weirdness in -mm2

2005-01-21 Thread Matt Mackall

On Fri, Jan 21, 2005 at 01:33:39PM +0100, Roman Zippel wrote:
> Hi,
> 
> On Thu, 20 Jan 2005, Matt Mackall wrote:
> 
> > On Thu, Jan 20, 2005 at 08:07:11PM -0800, Andrew Morton wrote:
> > > Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Next suspects would be:
> > > > 
> > > >  +cleanup-vc-array-access.patch
> > > >  +remove-console_macrosh.patch
> > > >  +merge-vt_struct-into-vc_data.patch
> > > > 
> > > > 
> > > 
> > > Make that:
> > > 
> > > +cleanup-vc-array-access.patch
> > > +remove-console_macrosh.patch
> > > +merge-vt_struct-into-vc_data.patch
> > > +vgacon-fixes-to-help-font-restauration-in-x11.patch
> > 
> > It's something in this batch. Which is good, as I'd be a bit
> > disappointed if the "vt leakage" were somehow attributable to the fb
> > layer. More bisection after dinner.
> 
> Could you try the patch below. I cleaned up the logic a little in 
> redraw_screen() and the code below really wants to do a update_screen().
> The old switch_screen(fg_console) behaved like update_screen(fg_console).

Same behaviour.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/8] core-small: Collapse major names hash

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL degrade genhd major names hash to linked list

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny-new/drivers/block/genhd.c
===
--- tiny-new.orig/drivers/block/genhd.c 2004-11-17 00:04:36.0 -0800
+++ tiny-new/drivers/block/genhd.c  2004-11-17 10:30:10.992098381 -0800
@@ -15,7 +15,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_CORE_SMALL
+#define MAX_PROBE_HASH 1 /* degrade to linked list */
+#else
 #define MAX_PROBE_HASH 255 /* random */
+#endif
 
 static struct subsystem block_subsys;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/8] core-small: Shrink PID lookup tables

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL reduce size of pidmap table for small machines

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny/include/linux/threads.h
===
--- tiny.orig/include/linux/threads.h   2004-12-04 15:42:35.0 -0800
+++ tiny/include/linux/threads.h2004-12-04 19:42:19.032212529 -0800
@@ -25,11 +25,19 @@
 /*
  * This controls the default maximum pid allocated to a process
  */
+#ifdef CONFIG_CORE_SMALL
+#define PID_MAX_DEFAULT 0x1000
+#else
 #define PID_MAX_DEFAULT 0x8000
+#endif
 
 /*
  * A maximum of 4 million PIDs should be enough for a while:
  */
+#ifdef CONFIG_CORE_SMALL
+#define PID_MAX_LIMIT (PAGE_SIZE*8) /* one pidmap entry */
+#else
 #define PID_MAX_LIMIT (sizeof(long) > 4 ? 4*1024*1024 : PID_MAX_DEFAULT)
+#endif
 
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Andrew Morton

Paul Mackerras <[EMAIL PROTECTED]> wrote:
>
> A cluster of 2^n contiguous pages
>  isn't one page by any normal definition.

It is, actually, from the POV of the page allocator.  It's a "higher order
page" and is controlled by a struct page*, just like a zero-order page...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/8] ide-disk: add basic refcounting

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Similar changes as for ide-cd.c (except that struct ide_disk_obj is added).

diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 23:41:03 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 23:41:03 +01:00
@@ -71,6 +71,38 @@
 #include 
 #include 

+struct ide_disk_obj {
+   ide_drive_t *drive;
+   struct kref kref;
+};
+
+static DECLARE_MUTEX(idedisk_ref_sem);
+
+#define to_ide_disk(obj) container_of(obj, struct ide_disk_obj, kref)
+
+#define ide_disk_g(disk) ((disk)->private_data)
+
+static struct ide_disk_obj *ide_disk_get(struct gendisk *disk)
+{
+   struct ide_disk_obj *idkp = NULL;
+
+   down(_ref_sem);
+   idkp = ide_disk_g(disk);
+   if (idkp)
+   kref_get(>kref);
+   up(_ref_sem);
+   return idkp;
+}
+
+static void ide_disk_release(struct kref *);
+
+static void ide_disk_put(struct ide_disk_obj *idkp)
+{
+   down(_ref_sem);
+   kref_put(>kref, ide_disk_release);
+   up(_ref_sem);
+}
+
 /*
  * lba_capacity_is_ok() performs a sanity check on the claimed "lba_capacity"
  * value for this drive (from its reported identification information).
@@ -941,14 +973,30 @@

 static int idedisk_cleanup (ide_drive_t *drive)
 {
+   struct ide_disk_obj *idkp = drive->driver_data;
struct gendisk *g = drive->disk;
+
ide_cacheflush_p(drive);
if (ide_unregister_subdriver(drive))
return 1;
del_gendisk(g);
+
+   ide_disk_put(idkp);
+
+   return 0;
+}
+
+static void ide_disk_release(struct kref *kref)
+{
+   struct ide_disk_obj *idkp = to_ide_disk(kref);
+   ide_drive_t *drive = idkp->drive;
+   struct gendisk *g = drive->disk;
+
+   drive->driver_data = NULL;
drive->devfs_name[0] = '\0';
+   g->private_data = NULL;
g->fops = ide_fops;
-   return 0;
+   kfree(idkp);
 }

 static int idedisk_attach(ide_drive_t *drive);
@@ -1006,7 +1054,15 @@

 static int idedisk_open(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_disk_obj *idkp;
+   ide_drive_t *drive;
+
+   if (!(idkp = ide_disk_get(disk)))
+   return -ENXIO;
+
+   drive = idkp->drive;
+
drive->usage++;
if (drive->removable && drive->usage == 1) {
ide_task_t args;
@@ -1028,7 +1084,10 @@

 static int idedisk_release(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_disk_obj *idkp = ide_disk_g(disk);
+   ide_drive_t *drive = idkp->drive;
+
if (drive->usage == 1)
ide_cacheflush_p(drive);
if (drive->removable && drive->usage == 1) {
@@ -1041,6 +1100,9 @@
drive->doorlocking = 0;
}
drive->usage--;
+
+   ide_disk_put(idkp);
+
return 0;
 }

@@ -1048,13 +1110,14 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   ide_drive_t *drive = bdev->bd_disk->private_data;
-   return generic_ide_ioctl(drive, file, bdev, cmd, arg);
+   struct ide_disk_obj *idkp = ide_disk_g(bdev->bd_disk);
+   return generic_ide_ioctl(idkp->drive, file, bdev, cmd, arg);
 }

 static int idedisk_media_changed(struct gendisk *disk)
 {
-   ide_drive_t *drive = disk->private_data;
+   struct ide_disk_obj *idkp = ide_disk_g(disk);
+   ide_drive_t *drive = idkp->drive;

/* do not scan partitions twice if this is a removable device */
if (drive->attach) {
@@ -1067,8 +1130,8 @@

 static int idedisk_revalidate_disk(struct gendisk *disk)
 {
-   ide_drive_t *drive = disk->private_data;
-   set_capacity(disk, idedisk_capacity(drive));
+   struct ide_disk_obj *idkp = ide_disk_g(disk);
+   set_capacity(disk, idedisk_capacity(idkp->drive));
return 0;
 }

@@ -1085,6 +1148,7 @@

 static int idedisk_attach(ide_drive_t *drive)
 {
+   struct ide_disk_obj *idkp;
struct gendisk *g = drive->disk;

/* strstr("foo", "") is non-NULL */
@@ -1095,10 +1159,22 @@
if (drive->media != ide_disk)
goto failed;

+   idkp = kmalloc(sizeof(*idkp), GFP_KERNEL);
+   if (!idkp)
+   goto failed;
+
if (ide_register_subdriver(drive, _driver)) {
printk (KERN_ERR "ide-disk: %s: Failed to register the driver 
with ide.c\n", drive->name);
-   goto failed;
+   goto out_free_idkp;
}
+
+   memset(idkp, 0, sizeof(*idkp));
+
+   kref_init(>kref);
+
+   idkp->drive = drive;
+   drive->driver_data = idkp;
+
DRIVER(drive)->busy++;
idedisk_setup(drive);
if ((!drive->head || drive->head > 16) && !drive->select.b.lba) {
@@ -1114,8 +1190,11 @@

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Jack O'Quin

Ingo Molnar <[EMAIL PROTECTED]> writes:

> just finished a short testrun with nice--20 compared to SCHED_FIFO, on a
> relatively slow 466 MHz box:

> this shows the surprising result that putting all RT tasks on nice--20
> reduced context-switch rate by 20% and the Delay Maximum is lower as
> well. (although the Delay Maximum is quite unreliable so this could be a
> fluke.) But the XRUN count is the same.

> can anyone else reproduce this, with the test-patch below applied?

I finally made new kernel builds for the latest patches from both Ingo
and Con.  I kept the two patch sets separate, as they modify some of
the same files.

I ran three sets of tests with three or more 5 minute runs for each
case.  The results (log files and graphs) are in these directories...

  1) sched-fifo -- as a baseline
 http://www.joq.us/jack/benchmarks/sched-fifo

  2) sched-iso -- Con's scheduler, no privileges
 http://www.joq.us/jack/benchmarks/sched-iso

  3) nice-20 -- Ingo's "nice --20" scheduler hack
 http://www.joq.us/jack/benchmarks/nice-20

The SCHED_FIFO runs are all with Con's scheduler.  I could not figure
out how to get SCHED_FIFO working with Ingo's version.  With or
without the appropriate privileges, it used nice --20, instead.  I
used schedtool to verify that the realtime threads were running in the
expected class for each test.

It's hard to make much sense out of all this information.  The
SCHED_FIFO results are clearly best.  There were no xruns at all in
those three runs.  All of the others had at least a few, some quite
severe.  But, one of the nice-20 runs had just one small sub-
millisecond xrun.  I made some extra runs with that, because I was
puzzled by its lack of consistency.

Yet, both Ingo's and Con's schedulers basically seem to work well.
I'm not sure how to explain the xruns.  Maybe they are caused by other
kernel latency bugs.  (But then, why not SCHED_FIFO?)  Maybe those
schedulers work most of the time, but are not sufficiently careful to
always preempt the running process when an audio interrupt arrives?

I had some problems with the y2 graph axis (for XRUN and DELAY).  In
most of the graphs it is unreadable.  In some it is inconsistent.  I
hacked on the jack_test3_plot.sh script several times, trying to set
readable values, mostly without success.  There is too much variation
in those numbers.  So, be careful reading and comparing that
information.  Some xruns look better or worse than they really are.

These tests were run without any other heavy demands on the system.  I
want to try some with a compile running in the background.  But, I
won't have time for that until tomorrow at the earliest.  So, I'll
post these preliminary results now for your enjoyment.
-- 
  joq
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ide-dev 5/5] kill ide_driver_t->pre_reset

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Add ide_drive_t->post_reset flag and use it to signal post reset
condition to the ide-tape driver (the only user of ->pre_reset).

diff -Nru a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
--- a/drivers/ide/ide-iops.c2005-01-22 00:09:32 +01:00
+++ b/drivers/ide/ide-iops.c2005-01-22 00:09:32 +01:00
@@ -1132,7 +1132,7 @@
if (drive->media == ide_disk)
ide_disk_pre_reset(drive);
else
-   drive->driver->pre_reset(drive);
+   drive->post_reset = 1;

if (!drive->keep_settings) {
if (drive->using_dma) {
diff -Nru a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- a/drivers/ide/ide-tape.c2005-01-22 00:09:32 +01:00
+++ b/drivers/ide/ide-tape.c2005-01-22 00:09:32 +01:00
@@ -2428,6 +2428,11 @@
if (!drive->dsc_overlap && !(rq->cmd[0] & REQ_IDETAPE_PC2))
set_bit(IDETAPE_IGNORE_DSC, >flags);

+   if (drive->post_reset == 1) {
+   set_bit(IDETAPE_IGNORE_DSC, >flags);
+   drive->post_reset = 0;
+   }
+
if (tape->tape_still_time > 100 && tape->tape_still_time < 200)
tape->measure_insert_time = 1;
if (time_after(jiffies, tape->insert_time))
@@ -3558,16 +3563,6 @@
 }

 /*
- * idetape_pre_reset is called before an ATAPI/ATA software reset.
- */
-static void idetape_pre_reset (ide_drive_t *drive)
-{
-   idetape_tape_t *tape = drive->driver_data;
-   if (tape != NULL)
-   set_bit(IDETAPE_IGNORE_DSC, >flags);
-}
-
-/*
  * idetape_space_over_filemarks is now a bit more complicated than just
  * passing the command to the tape since we may have crossed some
  * filemarks during our pipelined read-ahead mode.
@@ -4690,7 +4685,6 @@
.cleanup= idetape_cleanup,
.do_request = idetape_do_request,
.end_request= idetape_end_request,
-   .pre_reset  = idetape_pre_reset,
.proc   = idetape_proc,
.attach = idetape_attach,
.drives = LIST_HEAD_INIT(idetape_driver.drives),
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c 2005-01-22 00:09:32 +01:00
+++ b/drivers/ide/ide.c 2005-01-22 00:09:32 +01:00
@@ -2037,10 +2037,6 @@
return __ide_error(drive, rq, stat, err);
 }

-static void default_pre_reset (ide_drive_t *drive)
-{
-}
-
 static sector_t default_capacity (ide_drive_t *drive)
 {
return 0x7fff;
@@ -2059,7 +2055,6 @@
if (d->end_request == NULL) d->end_request = default_end_request;
if (d->error == NULL)   d->error = default_error;
if (d->abort == NULL)   d->abort = default_abort;
-   if (d->pre_reset == NULL)   d->pre_reset = default_pre_reset;
if (d->capacity == NULL)d->capacity = default_capacity;
 }

diff -Nru a/include/linux/ide.h b/include/linux/ide.h
--- a/include/linux/ide.h   2005-01-22 00:09:32 +01:00
+++ b/include/linux/ide.h   2005-01-22 00:09:32 +01:00
@@ -721,6 +721,7 @@
 *  3=64-bit
 */
unsigned scsi   : 1;/* 0=default, 1=ide-scsi emulation */
+   unsigned post_reset : 1;

 u8 quirk_list; /* considered quirky, set for a specific host */
 u8 init_speed; /* transfer rate set at boot */
@@ -1099,7 +1100,6 @@
ide_startstop_t (*error)(ide_drive_t *, struct request *rq, u8, u8);
ide_startstop_t (*abort)(ide_drive_t *, struct request *rq);
int (*ioctl)(ide_drive_t *, struct inode *, struct file *, 
unsigned int, unsigned long);
-   void(*pre_reset)(ide_drive_t *);
sector_t(*capacity)(ide_drive_t *);
ide_proc_entry_t*proc;
int (*attach)(ide_drive_t *);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ide-dev 3/5] generic Power Management for IDE devices

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Move PM code from ide-cd.c and ide-disk.c to IDE core so:
* PM is supported for other ATAPI devices (floppy, tape)
* PM is supported even if specific driver is not loaded

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 23:53:31 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 23:53:31 +01:00
@@ -3251,45 +3251,6 @@

 static int ide_cdrom_attach (ide_drive_t *drive);

-/*
- * Power Management state machine.
- *
- * We don't do much for CDs right now.
- */
-
-static void ide_cdrom_complete_power_step (ide_drive_t *drive, struct request 
*rq, u8 stat, u8 error)
-{
-}
-
-static ide_startstop_t ide_cdrom_start_power_step (ide_drive_t *drive, struct 
request *rq)
-{
-   ide_task_t *args = rq->special;
-
-   memset(args, 0, sizeof(*args));
-
-   switch (rq->pm->pm_step) {
-   case ide_pm_state_start_suspend:
-   break;
-
-   case ide_pm_state_start_resume: /* Resume step 1 (restore DMA) */
-   /*
-* Right now, all we do is call hwif->ide_dma_check(drive),
-* we could be smarter and check for current xfer_speed
-* in struct drive etc...
-* Also, this step could be implemented as a generic helper
-* as most subdrivers will use it.
-*/
-   if ((drive->id->capability & 1) == 0)
-   break;
-   if (HWIF(drive)->ide_dma_check == NULL)
-   break;
-   HWIF(drive)->ide_dma_check(drive);
-   break;
-   }
-   rq->pm->pm_step = ide_pm_state_completed;
-   return ide_stopped;
-}
-
 static ide_driver_t ide_cdrom_driver = {
.owner  = THIS_MODULE,
.name   = "ide-cdrom",
@@ -3302,8 +3263,6 @@
.capacity   = ide_cdrom_capacity,
.attach = ide_cdrom_attach,
.drives = LIST_HEAD_INIT(ide_cdrom_driver.drives),
-   .start_power_step   = ide_cdrom_start_power_step,
-   .complete_power_step= ide_cdrom_complete_power_step,
 };

 static int idecd_open(struct inode * inode, struct file * file)
diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 23:53:31 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 23:53:31 +01:00
@@ -855,90 +855,6 @@
ide_add_setting(drive,  "max_failures", SETTING_RW, 
-1, -1, 
TYPE_INT,   0,  65535,  1,  1,  
>max_failures,   NULL);
 }

-/*
- * Power Management state machine. This one is rather trivial for now,
- * we should probably add more, like switching back to PIO on suspend
- * to help some BIOSes, re-do the door locking on resume, etc...
- */
-
-enum {
-   idedisk_pm_flush_cache  = ide_pm_state_start_suspend,
-   idedisk_pm_standby,
-
-   idedisk_pm_idle = ide_pm_state_start_resume,
-   idedisk_pm_restore_dma,
-};
-
-static void idedisk_complete_power_step (ide_drive_t *drive, struct request 
*rq, u8 stat, u8 error)
-{
-   switch (rq->pm->pm_step) {
-   case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) 
complete */
-   if (rq->pm->pm_state == 4)
-   rq->pm->pm_step = ide_pm_state_completed;
-   else
-   rq->pm->pm_step = idedisk_pm_standby;
-   break;
-   case idedisk_pm_standby:/* Suspend step 2 (standby) complete */
-   rq->pm->pm_step = ide_pm_state_completed;
-   break;
-   case idedisk_pm_idle:   /* Resume step 1 (idle) complete */
-   rq->pm->pm_step = idedisk_pm_restore_dma;
-   break;
-   }
-}
-
-static ide_startstop_t idedisk_start_power_step (ide_drive_t *drive, struct 
request *rq)
-{
-   ide_task_t *args = rq->special;
-
-   memset(args, 0, sizeof(*args));
-
-   switch (rq->pm->pm_step) {
-   case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) */
-   /* Not supported? Switch to next step now. */
-   if (!drive->wcache || !ide_id_has_flush_cache(drive->id)) {
-   idedisk_complete_power_step(drive, rq, 0, 0);
-   return ide_stopped;
-   }
-   if (ide_id_has_flush_cache_ext(drive->id))
-   args->tfRegister[IDE_COMMAND_OFFSET] = 
WIN_FLUSH_CACHE_EXT;
-   else
-   args->tfRegister[IDE_COMMAND_OFFSET] = WIN_FLUSH_CACHE;
-   args->command_type = IDE_DRIVE_TASK_NO_DATA;
-   args->handler  = _no_data_intr;
-   return do_rw_taskfile(drive, args);
-
-   case idedisk_pm_standby:/* Suspend step 2 (standby) */
-   args->tfRegister[IDE_COMMAND_OFFSET] = WIN_STANDBYNOW1;
-

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Christoph Lameter writes:

> clear_page clears one page of the specified order.

Now you're really being confusing.  A cluster of 2^n contiguous pages
isn't one page by any normal definition.  Call it "clear_page_cluster"
or "clear_page_order" or something, but not "clear_page".

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/8] core-small: Introduce CONFIG_CORE_SMALL from -tiny

2005-01-21 Thread Matt Mackall

This set of patches introduces a new config option CONFIG_CORE_SMALL
from the -tiny tree for small systems. This series should apply
cleanly against 2.6.11-rc1-mm2.

When selected, it enables various tweaks to miscellaneous core data
structures to shrink their size on small systems. While each tweak is
fairly small, in aggregate they can save a substantial amount of
memory.

1 Add option to embedded menu
2 Collapse major names hash
3 Collapse chrdevs hash
4 Shrink PID lookup tables
5 Shrink uid hash
6 Shrink futex queue hash
7 Shrink timer lists
8 Shrink console buffer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

TurboChannel Bsus sysfs port.

2005-01-21 Thread James Simmons


Experimenting with sysfs to figure out how it works. So I'm attempting to 
port the TurboChannel bus code to sysfs. Its a test of concept and a 
learning experience. Comments welcomed.

diff -urN -X /home/jsimmons/dontdiff linus-2.6/drivers/tc/Makefile 
fbdev-2.6/drivers/tc/Makefile
--- linus-2.6/drivers/tc/Makefile   2005-01-21 10:23:32.0 -0800
+++ fbdev-2.6/drivers/tc/Makefile   2005-01-20 14:14:54.0 -0800
@@ -4,7 +4,7 @@
 
 # Object file lists.
 
-obj-$(CONFIG_TC) += tc.o
+obj-$(CONFIG_TC) += tc.o tc-driver.o
 obj-$(CONFIG_ZS) += zs.o
 obj-$(CONFIG_VT) += lk201.o lk201-map.o lk201-remap.o
 
diff -urN -X /home/jsimmons/dontdiff linus-2.6/drivers/tc/tc-driver.c 
fbdev-2.6/drivers/tc/tc-driver.c
--- linus-2.6/drivers/tc/tc-driver.c1969-12-31 16:00:00.0 -0800
+++ fbdev-2.6/drivers/tc/tc-driver.c2005-01-21 10:22:29.0 -0800
@@ -0,0 +1,92 @@
+/*
+ *  TURBO Channel Driver Services
+ *
+ *  Copyright (C) 2005 James Simmons 
+ *
+ *  Loosely based on drivers/tc/dio-driver.c
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of this archive
+ *  for more details.
+ */
+
+#include 
+#include 
+#include 
+
+   /**
+*  tc_register_driver - register a new TC driver
+*  @drv: the driver structure to register
+*
+*  Adds the driver structure to the list of registered drivers
+*  Returns the number of TC devices which were claimed by the driver
+*  during registration.  The driver remains registered even if the
+*  return value is zero.
+*/
+
+int tc_register_driver(struct tc_driver *drv)
+{
+   int count = 0;
+
+   /* initialize common driver fields */
+   drv->driver.name = drv->name;
+   drv->driver.bus = _bus_type;
+
+   /* register with core */
+   count = driver_register(>driver);
+   return count ? count : 1;
+}
+
+   /**
+*  tc_unregister_driver - unregister a TC driver
+*  @drv: the driver structure to unregister
+*
+*  Deletes the driver structure from the list of registered TC drivers,
+*  gives it a chance to clean up by calling its remove() function for
+*  each device it was responsible for, and marks those devices as
+*  driverless.
+*/
+
+void tc_unregister_driver(struct tc_driver *drv)
+{
+   driver_unregister(>driver);
+}
+
+
+   /**
+*  tc_bus_match - Tell if a TC device structure has a matching TC
+*  device id structure
+*  @ids: array of TC device id structures to search in
+*  @dev: the TC device structure to match against
+*
+*  Used by a driver to check whether a TC device present in the
+*  system is in its list of supported devices. Returns the matching
+*  tc_device_id structure or %NULL if there is no match.
+*/
+
+static int tc_bus_match(struct device *dev, struct device_driver *drv)
+{
+   struct tc_driver *tc_drv = to_tc_driver(drv);
+   struct tc_dev *tdev = to_tc_dev(dev);
+
+   return (strncmp(tdev->name, drv->name, 9) == 0);
+}
+
+struct bus_type tc_bus_type = {
+   .name   = "tc",
+   .match  = tc_bus_match
+};
+
+
+static int __init tc_driver_init(void)
+{
+   return bus_register(_bus_type);
+}
+
+postcore_initcall(tc_driver_init);
+
+EXPORT_SYMBOL(tc_match_device);
+EXPORT_SYMBOL(tc_register_driver);
+EXPORT_SYMBOL(tc_unregister_driver);
+EXPORT_SYMBOL(tc_dev_driver);
+EXPORT_SYMBOL(tc_bus_type);
diff -urN -X /home/jsimmons/dontdiff linus-2.6/drivers/tc/tc.c 
fbdev-2.6/drivers/tc/tc.c
--- linus-2.6/drivers/tc/tc.c   2005-01-21 10:23:32.0 -0800
+++ fbdev-2.6/drivers/tc/tc.c   2005-01-21 10:28:15.0 -0800
@@ -26,9 +26,6 @@
 #define TC_DEBUG
 
 MODULE_LICENSE("GPL");
-slot_info tc_bus[MAX_SLOT];
-static int max_tcslot;
-static tcinfo *info;
 
 unsigned long system_base;
 
@@ -40,54 +37,14 @@
  * Interface to the world. Read comment in include/asm-mips/tc.h.
  */
 
-int search_tc_card(char *name)
-{
-   int slot;
-   slot_info *sip;
-
-   for (slot = 0; slot <= max_tcslot; slot++) {
-   sip = _bus[slot];
-   if ((sip->flags & FREE) && (strncmp(sip->name, name, 
strlen(name)) == 0)) {
-   return slot;
-   }
-   }
-
-   return -ENODEV;
-}
-
-void claim_tc_card(int slot)
-{
-   if (tc_bus[slot].flags & IN_USE) {
-   printk("claim_tc_card: attempting to claim a card already in 
use\n");
-   return;
-   }
-   tc_bus[slot].flags &= ~FREE;
-   tc_bus[slot].flags |= IN_USE;
-}
-
-void release_tc_card(int slot)
-{
-   if (tc_bus[slot].flags & FREE) {
-   printk("release_tc_card: attempting to release a card already 
free\n");
-   return;
-   }
-   tc_bus[slot].flags &= ~IN_USE;
-   tc_bus[slot].flags |= FREE;

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread utz lehmann

On Sat, 2005-01-22 at 10:48 +1100, Con Kolivas wrote:
> utz lehmann wrote:
> > Hi
> > 
> > I dislike the behavior of the SCHED_ISO patch that iso tasks are
> > degraded to SCHED_NORMAL if they exceed the limit.
> > IMHO it's better to throttle them at the iso_cpu limit.
> > 
> > I have modified Con's iso2 patch to do this. If iso_cpu > 50 iso tasks
> > only get stalled for 1 tick (1ms on x86).
> 
> Some tasks are so cache intensive they would make almost no forward 
> progress running for only 1ms.

Ok. The throttle duration can be exceed.
What is a good value? 5ms, 10ms?
 
> 
> > Fortunately there is a currently unused task prio (MAX_RT_PRIO-1) [1]. I
> 
> Your implementation is not correct. The "prio" field of real time tasks 
> is determined by MAX_RT_PRIO-1-rt_priority. Therefore you're limiting 
> the best real time priority, not the other way around.

Really? The task prios are (lower value is higher priority):

0
..  For SCHED_FIFO/SCHED_RR (rt_priority 99..1)
98  MAX_RT_PRIO-2

99  MAX_RT_PRIO-1   ISO_PRIO (rt_priority 0)

100 MAX_RT_PRIO
..  For SCHED_NORMAL
139 MAX_PRIO-1

ISO_PRIO is between the SCHED_FIFO/SCHED_RR and the SCHED_NORMAL range.

> 
> Throttling them for only 1ms will make it very easy to starve the system 
>   with 1 or more short running (<1ms) SCHED_NORMAL tasks running. Lower 
> priority tasks will never run.
> 
> Cheers,
> Con

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/8] core-small: Add option to embedded menu

2005-01-21 Thread Matt Mackall

Add CONFIG_CORE_SMALL for miscellaneous core size that don't warrant
their own options. Example users to follow.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny/init/Kconfig
===
--- tiny.orig/init/Kconfig  2004-12-04 15:42:40.394703286 -0800
+++ tiny/init/Kconfig   2004-12-04 19:24:36.404346070 -0800
@@ -287,6 +287,12 @@
   reported.  KALLSYMS_EXTRA_PASS is only a temporary workaround while
   you wait for kallsyms to be fixed.
 
+config CORE_SMALL
+   default n
+   bool "Enable various size reductions for core" if EMBEDDED
+   help
+ This reduces the size of miscellaneous core kernel data structures.
+
 config FUTEX
bool "Enable futex support" if EMBEDDED
default y
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/8] core-small: Shrink console buffer

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL reduce console transfer buffer

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny-queue/include/linux/vt_kern.h
===
--- tiny-queue.orig/include/linux/vt_kern.h 2005-01-21 09:59:49.0 
-0800
+++ tiny-queue/include/linux/vt_kern.h  2005-01-21 15:48:39.0 -0800
@@ -84,7 +84,11 @@
  * vc_screen.c shares this temporary buffer with the console write code so that
  * we can easily avoid touching user space while holding the console spinlock.
  */
-#define CON_BUF_SIZE   PAGE_SIZE
+#ifdef CONFIG_CORE_SMALL
+#define CON_BUF_SIZE 512
+#else
+#define CON_BUF_SIZE PAGE_SIZE
+#endif
 extern char con_buf[CON_BUF_SIZE];
 extern struct semaphore con_buf_sem;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/8] core-small: Shrink timer lists

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL reduce timer list hashes

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny-queue/kernel/timer.c
===
--- tiny-queue.orig/kernel/timer.c  2005-01-21 09:59:50.0 -0800
+++ tiny-queue/kernel/timer.c   2005-01-21 15:31:58.0 -0800
@@ -50,8 +50,14 @@
 /*
  * per-CPU timer vector definitions:
  */
+
+#ifdef CONFIG_CORE_SMALL
+#define TVN_BITS 4
+#define TVR_BITS 6
+#else
 #define TVN_BITS 6
 #define TVR_BITS 8
+#endif
 #define TVN_SIZE (1 << TVN_BITS)
 #define TVR_SIZE (1 << TVR_BITS)
 #define TVN_MASK (TVN_SIZE - 1)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/8] core-small: Collapse chrdevs hash

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL degrade char dev hash table to linked list

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny-queue/fs/char_dev.c
===
--- tiny-queue.orig/fs/char_dev.c   2005-01-21 09:59:45.0 -0800
+++ tiny-queue/fs/char_dev.c2005-01-21 15:31:52.0 -0800
@@ -26,7 +26,11 @@
 
 static struct kobj_map *cdev_map;
 
+#ifdef CONFIG_CORE_SMALL
+#define MAX_PROBE_HASH 1 /* degrade to linked list */
+#else
 #define MAX_PROBE_HASH 255 /* random */
+#endif
 
 static DEFINE_RWLOCK(chrdevs_lock);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/8] core-small: Shrink uid hash

2005-01-21 Thread Matt Mackall

CONFIG_CORE_SMALL reduce UID lookup hash

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: tiny/kernel/user.c
===
--- tiny.orig/kernel/user.c 2004-12-04 15:42:41.0 -0800
+++ tiny/kernel/user.c  2004-12-04 19:42:32.462123939 -0800
@@ -18,7 +18,11 @@
  * UID task count cache, to get fast user lookup in "alloc_uid"
  * when changing user ID's (ie setuid() and friends).
  */
+#ifdef CONFIG_CORE_SMALL
+#define UIDHASH_BITS   3
+#else
 #define UIDHASH_BITS   8
+#endif
 #define UIDHASH_SZ (1 << UIDHASH_BITS)
 #define UIDHASH_MASK   (UIDHASH_SZ - 1)
 #define __uidhashfn(uid)   (((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

Rui Nuno Capela wrote:
OK. Here goes my fresh and newly jack_test4.1 test suite. It might be
still rough, as usual ;)
Thanks
Here's fresh results on more stressed hardware (on ext3) with 
2.6.11-rc1-mm2 (which by the way has SCHED_ISO v2 included). The load 
hovering at 50% spikes at times close to 70 which tests the behaviour 
under iso throttling.

==> jack_test4-2.6.11-rc1-mm2-fifo.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :41
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 0   usecs
Cycle Maximum . . . . . . . . : 10968   usecs
Average DSP Load. . . . . . . :44.3 %
Average CPU System Load . . . : 4.9 %
Average CPU User Load . . . . :17.1 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1689.9 /sec
Average Context-Switch Rate . : 19052.6 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
==> jack_test4-2.6.11-rc1-mm2-iso.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . : 2
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 0   usecs
Cycle Maximum . . . . . . . . :  1282   usecs
Average DSP Load. . . . . . . :50.5 %
Average CPU System Load . . . :11.2 %
Average CPU User Load . . . . :17.6 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.0 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1688.8 /sec
Average Context-Switch Rate . : 18985.1 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
==> jack_test4-2.6.11-rc1-mm2-normal.log <==
Number of runs  . . . . . . . :(1)
*
Timeout Count . . . . . . . . :(0)
XRUN Count  . . . . . . . . . :   325
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 0   usecs
Cycle Maximum . . . . . . . . :  4726   usecs
Average DSP Load. . . . . . . :50.0 %
Average CPU System Load . . . : 5.1 %
Average CPU User Load . . . . :18.7 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load  . . . . : 0.1 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate  . . . :  1704.5 /sec
Average Context-Switch Rate . : 18875.2 /sec
*
Delta Maximum . . . . . . . . : 0.0
*
Full data and pretty pictures:
http://ck.kolivas.org/patches/SCHED_ISO/iso2-benchmarks/
Cheers,
Con


signature.asc
Description: OpenPGP digital signature

[patch 3/8] make ide_generic_ioctl() take ide_drive_t * as an argument

2005-01-21 Thread Bartlomiej Zolnierkiewicz


As a result disk->private_data can be used by device drivers now.

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 22:30:19 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 22:30:19 +01:00
@@ -3317,7 +3317,7 @@
 {
struct block_device *bdev = inode->i_bdev;
ide_drive_t *drive = bdev->bd_disk->private_data;
-   int err = generic_ide_ioctl(file, bdev, cmd, arg);
+   int err = generic_ide_ioctl(drive, file, bdev, cmd, arg);
if (err == -EINVAL) {
struct cdrom_info *info = drive->driver_data;
err = cdrom_ioctl(file, >devinfo, inode, cmd, arg);
diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 22:30:19 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 22:30:19 +01:00
@@ -1048,7 +1048,8 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   return generic_ide_ioctl(file, bdev, cmd, arg);
+   ide_drive_t *drive = bdev->bd_disk->private_data;
+   return generic_ide_ioctl(drive, file, bdev, cmd, arg);
 }

 static int idedisk_media_changed(struct gendisk *disk)
diff -Nru a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
--- a/drivers/ide/ide-floppy.c  2005-01-21 22:30:19 +01:00
+++ b/drivers/ide/ide-floppy.c  2005-01-21 22:30:19 +01:00
@@ -1970,7 +1970,7 @@
ide_drive_t *drive = bdev->bd_disk->private_data;
idefloppy_floppy_t *floppy = drive->driver_data;
void __user *argp = (void __user *)arg;
-   int err = generic_ide_ioctl(file, bdev, cmd, arg);
+   int err = generic_ide_ioctl(drive, file, bdev, cmd, arg);
int prevent = (arg) ? 1 : 0;
idefloppy_pc_t pc;
if (err != -EINVAL)
diff -Nru a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- a/drivers/ide/ide-tape.c2005-01-21 22:30:19 +01:00
+++ b/drivers/ide/ide-tape.c2005-01-21 22:30:19 +01:00
@@ -4724,7 +4724,7 @@
 {
struct block_device *bdev = inode->i_bdev;
ide_drive_t *drive = bdev->bd_disk->private_data;
-   int err = generic_ide_ioctl(file, bdev, cmd, arg);
+   int err = generic_ide_ioctl(drive, file, bdev, cmd, arg);
if (err == -EINVAL)
err = idetape_blkdev_ioctl(drive, cmd, arg);
return err;
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c 2005-01-21 22:30:19 +01:00
+++ b/drivers/ide/ide.c 2005-01-21 22:30:19 +01:00
@@ -1410,10 +1410,9 @@
return ide_do_drive_cmd(drive, , ide_head_wait);
 }

-int generic_ide_ioctl(struct file *file, struct block_device *bdev,
+int generic_ide_ioctl(ide_drive_t *drive, struct file *file, struct 
block_device *bdev,
unsigned int cmd, unsigned long arg)
 {
-   ide_drive_t *drive = bdev->bd_disk->private_data;
ide_settings_t *setting;
int err = 0;
void __user *p = (void __user *)arg;
diff -Nru a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
--- a/drivers/scsi/ide-scsi.c   2005-01-21 22:30:19 +01:00
+++ b/drivers/scsi/ide-scsi.c   2005-01-21 22:30:19 +01:00
@@ -755,7 +755,8 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   return generic_ide_ioctl(file, bdev, cmd, arg);
+   ide_drive_t *drive = bdev->bd_disk->private_data;
+   return generic_ide_ioctl(drive, file, bdev, cmd, arg);
 }

 static struct block_device_operations idescsi_ops = {
diff -Nru a/include/linux/ide.h b/include/linux/ide.h
--- a/include/linux/ide.h   2005-01-21 22:30:19 +01:00
+++ b/include/linux/ide.h   2005-01-21 22:30:19 +01:00
@@ -,7 +,7 @@

 #define DRIVER(drive)  ((drive)->driver)

-extern int generic_ide_ioctl(struct file *, struct block_device *, unsigned, 
unsigned long);
+int generic_ide_ioctl(ide_drive_t *, struct file *, struct block_device *, 
unsigned, unsigned long);

 /*
  * ide_hwifs[] is the master data structure used to keep track
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux 2.4.28 umount oops

2005-01-21 Thread Bjoern Brill

Hello list,

my machine oopses in the umount script at shutdown once every few
weeks (at 1-2 shutdowns / day). Two times this resulted in repairable
errors on an EXT3 filesystem during the next bootup.

This is on an i386 (actually AMD K6-II) machine with a single IDE disk.
The mounted filesystems are EXT3 and VFAT. I don't use LVM, cryptoloop, or
other fancy stuff. The kernel is a self-compiled kernel.org 2.4.28 with no
extra patches; I think the problem did also occur with at least 2.4.27
and 2.4.26.

Attached is the output of ver_linux and ksymoops. (I had to copy the oops
by hand. I tried to be careful, but may have made errors.) More data
(dmesg, etc) can be found at

and I can produce anything else on request.

According to ksymoops, EIP points to an address in
fs/buffer.c::__remove_inode_queue(). Could somebody please have a look at
the oops or point me to the right person? (I couldn't find a suitable
entry in the MAINTAINERS file.)

Please CC me on list replies.


Thanks,

Bjoern Brill
--
Bj"orn Brill <[EMAIL PROTECTED]>
Frankfurt am Main, Germany
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux kvaefjord 2.4.28 #3 Tue Nov 23 01:50:14 CET 2004 i586 unknown
 
Gnu C  2.95.4
Gnu make   3.79.1
util-linux 2.11n
mount  2.11n
modutils   2.4.26
e2fsprogs  1.27
PPP2.4.1
Linux C Library2.2.5
Dynamic linker (ldd)   2.2.5
Procps 2.0.7
Net-tools  1.60
Console-tools  0.2.3
Sh-utils   2.0.11
Modules Loaded radeon ne 8390 crc32 usb-uhci usbcore rtc emu10k1-gp 
joydev analog snd-emu10k1-synth snd-emux-synth snd-seq-midi-emul 
snd-seq-virmidi snd-seq-midi-event snd-seq snd-emu10k1 snd-pcm-oss 
snd-mixer-oss snd-pcm snd-timer snd-hwdep snd-page-alloc snd-util-mem 
snd-ac97-codec snd-rawmidi snd-seq-device snd soundcore aha152x ide-scsi
ksymoops 2.4.5 on i586 2.4.28.  Options used
 -v /usr/src/linux-2.4.28/vmlinux (specified)
 -k 20050119010807.ksyms (specified)
 -l 20050119010807.modules (specified)
 -o /lib/modules/2.4.28/ (default)
 -m /boot/System.map-2.4.28 (default)

Deactivating swap...done
Unmounting local filesystems...Unable to handle kernel NULL pointer dereference
c0131a12
*pde = 
Oops: 0002
CPU: 0
EIP: 0010:[] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 000c   ebx: cca06ad8 ecx: 1000   edx: cca06a84
esi: cca06a80   edi: cca06888 ebp: cf157f54   esp: cf157ef4
ds: 0018es: 0018ss: 0018
Stack:  cca06a98 c0131e73 cca06a84 cca06a88 cca06a80 c01425ec cca06a80 cfd4fc00
cf157f54 cf157f54 0001 0001dcc4  c014269b c0264904 cfd4fc00
cf157f54 c02648fc cfd4fc00 cf157f54 cfd4fc00 cfd4fc48 c0265ba0 bf1b
Call Trace: [] [] [] [] []
[] [] []
Code: 89 48 04 89 01 c7 42 54 00 00 00 00 c7 43 04 00 00 00 00 b8


>>EIP; c0131a12 <__remove_inode_queue+e/2c>   <=

>>ebx; cca06ad8 <_end+c720578/105ceaa0>
>>ecx; 1000 Before first symbol
>>edx; cca06a84 <_end+c720524/105ceaa0>
>>esi; cca06a80 <_end+c720520/105ceaa0>
>>edi; cca06888 <_end+c720328/105ceaa0>
>>ebp; cf157f54 <_end+ee719f4/105ceaa0>
>>esp; cf157ef4 <_end+ee71994/105ceaa0>

Trace; c0131e73 
Trace; c01425ec 
Trace; c014269b 
Trace; c013529d 
Trace; c0144081 <__mntput+1d/24>
Trace; c0138e0e 
Trace; c01446ad 
Trace; c0106b23 

Code;  c0131a12 <__remove_inode_queue+e/2c>
 <_EIP>:
Code;  c0131a12 <__remove_inode_queue+e/2c>   <=
   0:   89 48 04  mov%ecx,0x4(%eax)   <=
Code;  c0131a15 <__remove_inode_queue+11/2c>
   3:   89 01 mov%eax,(%ecx)
Code;  c0131a17 <__remove_inode_queue+13/2c>
   5:   c7 42 54 00 00 00 00  movl   $0x0,0x54(%edx)
Code;  c0131a1e <__remove_inode_queue+1a/2c>
   c:   c7 43 04 00 00 00 00  movl   $0x0,0x4(%ebx)
Code;  c0131a25 <__remove_inode_queue+21/2c>
  13:   b8 00 00 00 00mov$0x0,%eax

Re: Pollable Semaphores

2005-01-21 Thread Davide Libenzi

On Fri, 21 Jan 2005, Brandon Corey wrote:

> I'm trying to find out if there is a pollable semaphore equivalent on Linux.
> 
> The main idea of a "pollable semaphore", is a semaphore with a related
> file descriptor.  The file descriptor can be used to select() when the
> semaphore is acquirable.  This provides a convenient way for users to
> implement code synchronization between threads, where multiple file
> descriptors are already being selected against.
> 
> We have a pollable semaphore implementation on IRIX that provides this
> functionality.  The API consists of a handful of calls for creation and
> destruction of pollable semaphores, as well as a means to attach them
> to a file descriptor.  Beyond that, from the users point of view, they're
> just treated as any other file descriptor.
> 
> These calls are routed through a library and then passed off to a kernel
> driver that handles the events.  If someone selects against a semaphore
> when it's unaquirable, the driver sleeps on a synchronization variable.
> When the semaphore is subsequently made aquirable, the driver will wake up
> any waiters.  Multiple pollable semaphores mixed with other file
> descriptors can be selected against, and a wakeup will occur when any of
> the semaphores become acquirable.
> 
> Is anyone aware of any equivalent functionality?

I used pipe-based semaphores when I need that functionality (call psem_down_fd()
to get the pollable fd):

http://www.xmailserver.org/pipe-sem.c
http://www.xmailserver.org/pipe-sem.h

They have the problem of the maximum pipe buffer size that affects the 
maximum count, but in my case it was fine. Or at least bugs did not come 
biting me at the time ;)



- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pollable Semaphores

2005-01-21 Thread Brent Casavant

On Fri, 21 Jan 2005, Roland Dreier wrote:

> Brandon> I'm trying to find out if there is a pollable semaphore
> Brandon> equivalent on Linux.  The main idea of a "pollable
> Brandon> semaphore", is a semaphore with a related file
> Brandon> descriptor.  The file descriptor can be used to select()
> Brandon> when the semaphore is acquirable.  This provides a
> Brandon> convenient way for users to implement code
> Brandon> synchronization between threads, where multiple file
> Brandon> descriptors are already being selected against.
> 
> Yes, I believe futexes and specifically FUTEX_FD can be used to
> implement this.  See http://people.redhat.com/~drepper/futex.pdf for
> full details.

Thanks for pointing out that paper, I wasn't aware of it (Brandon and
I talked about this problem before he posted).

The problem listed in section 9 of that paper is a showstopper to using
FUTEX_FD.  It's the very problem I ran into when trying to brainstorm a
wrapper library call around it to roughly implement the behavior Brandon
is looking for.

An additional major inconvenience is that the file descriptor can only
be used once, after which it must be closed and another reopened.  If
somehow the poll/select call could reuse the same file descriptor we
could avoid a whole bunch of library glue goo to make it work that
way (i.e. special library routines similar to poll(2) and select(2)
which do some fake file descriptor table hand waving to make it look
like there's reusable futex fds).

Perhaps a new FUTEX_POLLFD would be a reasonable solution for both
problems?  The semantics would be a bit different that FUTEX_FD.

It could solve the race condition (i.e. the problem in section 9 of
the paper) by the following:

  1. A write to the fd is used to set the value of interest analagous to
 the second parmeter to FUTEX_WAIT.  This value is stored on a
 per-fd (or maybe per-fd per-thread if it's not possible to have
 multiple fds per futex) basis.
  2. select/poll on the fd return EWOULDBLOCK if the current value of
 the futex is not equal to the value of interest.  Otherwise it
 behaves as FUTEX_FD currently does.

I think it's even possible that this behavior could be wedged into
the current FUTEX_FD, as a write(2) to this fd is meaningless on it
at present.  It's not a perfect solution, however, in that it would
be possible for one or more of the futexes that are the target of a
select(2) call to be updated so rapidly that we can never make progress
in the "write value then select" loop.

I'm not sure how to fix the reusability problem, as I haven't
determined the technical reason behind the current one-shot design.
Maybe it could be as simple as using another currently unused call
such as read(2) to reset the fd?  But I suspect it's harder than
that, otherwise it wouldn't be a one-shot in the first place.

Brent

-- 
Brent Casavant  If you had nothing to fear,
[EMAIL PROTECTED]how then could you be brave?
Silicon Graphics, Inc.-- Queen Dama, Source Wars
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/8] next bunch of IDE fixes (on the way to the driver-model)

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Hi,

All patches are against ide-dev-2.6 tree (== incremental to 5 previous
patches).  The main part of this series is adding reference counting to
IDE device drivers (ide-scsi is a problematic one and I need some help
from SCSI people).

Bartlomiej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 2.6.10 5/22] xfrm: Attempt to offload bundled xfrm_states for outbound xfrms

2005-01-21 Thread David S. Miller

On Thu, 30 Dec 2004 03:48:35 -0500
David Dillow <[EMAIL PROTECTED]> wrote:

> +static void xfrm_accel_bundle(struct dst_entry *dst)
> +{
> + struct xfrm_bundle_list bundle, *xbl, *tmp;
> + struct net_device *dev = dst->dev;
> + INIT_LIST_HEAD();
> +
> + if (dev && netif_running(dev) && (dev->features & NETIF_F_IPSEC)) {

netif_running() is only steady while the RTNL semaphore is held,
which is not necessarily true when xfrm_lookup() is invoked.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/8] kill ide_driver_t->capacity

2005-01-21 Thread Bartlomiej Zolnierkiewicz


* add private /proc/ide/hd?/capacity handlers to ide-{cd,disk,floppy}.c
* use generic proc_ide_read_capacity() for ide-{scsi,tape}.c
* kill ->capacity, default_capacity() and generic_subdriver_entries[]

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 22:23:17 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 22:23:17 +01:00
@@ -3251,6 +3251,25 @@

 static int ide_cdrom_attach (ide_drive_t *drive);

+#ifdef CONFIG_PROC_FS
+static int proc_idecd_read_capacity
+   (char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+   ide_drive_t*drive = (ide_drive_t *)data;
+   int len;
+
+   len = sprintf(page,"%llu\n", (long long)ide_cdrom_capacity(drive));
+   PROC_IDE_READ_RETURN(page,start,off,count,eof,len);
+}
+
+static ide_proc_entry_t idecd_proc[] = {
+   { "capacity", S_IFREG|S_IRUGO, proc_idecd_read_capacity, NULL },
+   { NULL, 0, NULL, NULL }
+};
+#else
+# define idecd_procNULL
+#endif
+
 static ide_driver_t ide_cdrom_driver = {
.owner  = THIS_MODULE,
.name   = "ide-cdrom",
@@ -3260,7 +3279,7 @@
.supports_dsc_overlap   = 1,
.cleanup= ide_cdrom_cleanup,
.do_request = ide_do_rw_cdrom,
-   .capacity   = ide_cdrom_capacity,
+   .proc   = idecd_proc,
.attach = ide_cdrom_attach,
.drives = LIST_HEAD_INIT(ide_cdrom_driver.drives),
 };
diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 22:23:17 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 22:23:17 +01:00
@@ -580,6 +580,16 @@
PROC_IDE_READ_RETURN(page,start,off,count,eof,len);
 }

+static int proc_idedisk_read_capacity
+   (char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+   ide_drive_t*drive = (ide_drive_t *)data;
+   int len;
+
+   len = sprintf(page,"%llu\n", (long long)idedisk_capacity(drive));
+   PROC_IDE_READ_RETURN(page,start,off,count,eof,len);
+}
+
 static int proc_idedisk_read_smart_thresholds
(char *page, char **start, off_t off, int count, int *eof, void *data)
 {
@@ -620,6 +630,7 @@

 static ide_proc_entry_t idedisk_proc[] = {
{ "cache",  S_IFREG|S_IRUGO,
proc_idedisk_read_cache,NULL },
+   { "capacity",   S_IFREG|S_IRUGO,
proc_idedisk_read_capacity, NULL },
{ "geometry",   S_IFREG|S_IRUGO,proc_ide_read_geometry, 
NULL },
{ "smart_values",   S_IFREG|S_IRUSR,
proc_idedisk_read_smart_values, NULL },
{ "smart_thresholds",   S_IFREG|S_IRUSR,
proc_idedisk_read_smart_thresholds, NULL },
@@ -985,7 +996,6 @@
.supports_dsc_overlap   = 0,
.cleanup= idedisk_cleanup,
.do_request = ide_do_rw_disk,
-   .capacity   = idedisk_capacity,
.proc   = idedisk_proc,
.attach = idedisk_attach,
.drives = LIST_HEAD_INIT(idedisk_driver.drives),
diff -Nru a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
--- a/drivers/ide/ide-floppy.c  2005-01-21 22:23:17 +01:00
+++ b/drivers/ide/ide-floppy.c  2005-01-21 22:23:17 +01:00
@@ -1847,7 +1847,18 @@

 #ifdef CONFIG_PROC_FS

+static int proc_idefloppy_read_capacity
+   (char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+   ide_drive_t*drive = (ide_drive_t *)data;
+   int len;
+
+   len = sprintf(page,"%llu\n", (long long)idefloppy_capacity(drive));
+   PROC_IDE_READ_RETURN(page,start,off,count,eof,len);
+}
+
 static ide_proc_entry_t idefloppy_proc[] = {
+   { "capacity",   S_IFREG|S_IRUGO,proc_idefloppy_read_capacity, 
NULL },
{ "geometry",   S_IFREG|S_IRUGO,proc_ide_read_geometry, NULL },
{ NULL, 0, NULL, NULL }
 };
@@ -1873,7 +1884,6 @@
.cleanup= idefloppy_cleanup,
.do_request = idefloppy_do_request,
.end_request= idefloppy_do_end_request,
-   .capacity   = idefloppy_capacity,
.proc   = idefloppy_proc,
.attach = idefloppy_attach,
.drives = LIST_HEAD_INIT(idefloppy_driver.drives),
diff -Nru a/drivers/ide/ide-proc.c b/drivers/ide/ide-proc.c
--- a/drivers/ide/ide-proc.c2005-01-21 22:23:17 +01:00
+++ b/drivers/ide/ide-proc.c2005-01-21 22:23:17 +01:00
@@ -269,13 +269,11 @@
 int proc_ide_read_capacity
(char *page, char **start, off_t off, int count, int *eof, void *data)
 {
-   ide_drive_t *drive = (ide_drive_t *) data;
-   int len;
-
-   len = sprintf(page,"%llu\n",
- (long long) (DRIVER(drive)->capacity(drive)));
+   int len =

RE: How to add/drop SCSI drives from within the driver?

2005-01-21 Thread James Bottomley

On Fri, 2005-01-21 at 17:11 -0500, Mukker, Atul wrote:
> All right! The implementation is complete for this and the driver has
> thoroughly gone through testing. Everything looks good except for a minor
> glitch.

That's good news.

> After the new logical drives are created with "- - -" written to the
> scsi_host scan attribute, there is a highly noticeable delay before device
> names (e.g., sda) appears in the /dev directory. If the management
> application tried to access the device immediately after creating new, the
> access fails. Putting a 1 second delay helped, but of course this is not a
> deterministic solution.
> 
> What are the other possibilities?

Well, how about hotplug.  The device addition actually triggers a hot
plug event already (there's no need to add anything, it's done by the
mid-layer), so if you just listen for that, you'll know when the scan
has detected a device.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread utz lehmann

Hi

I dislike the behavior of the SCHED_ISO patch that iso tasks are
degraded to SCHED_NORMAL if they exceed the limit.
IMHO it's better to throttle them at the iso_cpu limit.

I have modified Con's iso2 patch to do this. If iso_cpu > 50 iso tasks
only get stalled for 1 tick (1ms on x86).

Fortunately there is a currently unused task prio (MAX_RT_PRIO-1) [1]. I
used it for ISO_PRIO. All SCHED_ISO tasks use it and they not changing
to other priorities. SCHED_ISO is a realtime class with the specialty
that it can preempted by SCHED_NORMAL tasks if iso_throttle is set. With
this the iso queue stuff is not needed.

iso_throttle controls if a SCHED_ISO task can be preempted. It's set by
the RT task load.

With my patch rt_task() also includes iso tasks. I have added a
posix_rt_task() for only SCHED_FIFO and SCHED_RR.
I changed the iso_period sysctl to iso_timeout which is in centisecs.
A iso_throttle_count sysctl is added which count the ticks when a iso
task is preempted by the timer. It uses currently a simple global
variable. It should be per runqueue. And i'm not sure a sysctl is an
appropriate place for it (/sys, /proc?).

It's for 2.6.11-rc1 and i have tested it only on UP x86.

I'm a kernel hacker newbie. Please tell me if this is nonsense, good,
can be improved, ...


utz

[1] Actually MAX_RT_PRIO-1 is used by sched_idle_next() and
migration_call(). I changed it to MAX_RT_PRIO-2 for them. I think it's
ok.


diff -Nrup linux-2.6.11-rc1/include/linux/sched.h 
linux-2.6.11-rc1-uiso2/include/linux/sched.h
--- linux-2.6.11-rc1/include/linux/sched.h  2005-01-21 19:46:54.677616421 
+0100
+++ linux-2.6.11-rc1-uiso2/include/linux/sched.h2005-01-21 
20:30:29.616340716 +0100
@@ -130,6 +130,24 @@ extern unsigned long nr_iowait(void);
 #define SCHED_NORMAL   0
 #define SCHED_FIFO 1
 #define SCHED_RR   2
+/* policy 3 reserved for SCHED_BATCH */
+#define SCHED_ISO  4
+
+extern int iso_cpu, iso_timeout;
+extern int iso_throttle_count;
+extern void account_iso_ticks(struct task_struct *p);
+
+#define SCHED_RANGE(policy)((policy) == SCHED_NORMAL || \
+   (policy) == SCHED_FIFO || \
+   (policy) == SCHED_RR || \
+   (policy) == SCHED_ISO)
+
+#define SCHED_RT(policy)   ((policy) == SCHED_FIFO || \
+   (policy) == SCHED_RR || \
+   (policy) == SCHED_ISO)
+
+#define SCHED_POSIX_RT(policy) ((policy) == SCHED_FIFO || \
+   (policy) == SCHED_RR)
 
 struct sched_param {
int sched_priority;
@@ -342,9 +360,11 @@ struct signal_struct {
 
 /*
  * Priority of a process goes from 0..MAX_PRIO-1, valid RT
- * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL tasks are
- * in the range MAX_RT_PRIO..MAX_PRIO-1. Priority values
- * are inverted: lower p->prio value means higher priority.
+ * priority is 0..MAX_RT_PRIO-1. SCHED_FIFO and SCHED_RR uses
+ * 0..MAX_RT_PRIO-2, SCHED_ISO uses MAX_RT_PRIO-1.
+ * SCHED_NORMAL tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
+ * Priority values are inverted: lower p->prio value means
+ * higher priority.
  *
  * The MAX_USER_RT_PRIO value allows the actual maximum
  * RT priority to be separate from the value exported to
@@ -358,7 +378,12 @@ struct signal_struct {
 
 #define MAX_PRIO   (MAX_RT_PRIO + 40)
 
+#define ISO_PRIO   (MAX_RT_PRIO - 1)
+
 #define rt_task(p) (unlikely((p)->prio < MAX_RT_PRIO))
+#define posix_rt_task(p)   (unlikely((p)->policy == SCHED_FIFO || \
+ (p)->policy == SCHED_RR))
+#define iso_task(p)(unlikely((p)->policy == SCHED_ISO))
 
 /*
  * Some day this will be a full-fledged user tracking system..
diff -Nrup linux-2.6.11-rc1/include/linux/sysctl.h 
linux-2.6.11-rc1-uiso2/include/linux/sysctl.h
--- linux-2.6.11-rc1/include/linux/sysctl.h 2005-01-21 19:46:54.717612339 
+0100
+++ linux-2.6.11-rc1-uiso2/include/linux/sysctl.h   2005-01-21 
20:30:38.105484416 +0100
@@ -135,6 +135,9 @@ enum
KERN_HZ_TIMER=65,   /* int: hz timer on or off */
KERN_UNKNOWN_NMI_PANIC=66, /* int: unknown nmi panic flag */
KERN_BOOTLOADER_TYPE=67, /* int: boot loader type */
+   KERN_ISO_CPU=68,/* int: cpu% allowed by SCHED_ISO class */
+   KERN_ISO_TIMEOUT=69,/* int: centisecs after SCHED_ISO is throttled 
*/
+   KERN_ISO_THROTTLE_COUNT=70, /* int: no. of throttled SCHED_ISO ticks */
 };
 
 
diff -Nrup linux-2.6.11-rc1/kernel/sched.c linux-2.6.11-rc1-uiso2/kernel/sched.c
--- linux-2.6.11-rc1/kernel/sched.c 2005-01-21 19:46:55.650517137 +0100
+++ linux-2.6.11-rc1-uiso2/kernel/sched.c   2005-01-21 23:35:11.531981295 
+0100
@@ -149,9 +149,6 @@
(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
 
-#define TASK_PREEMPTS_CURR(p, rq) \
-   ((p)->prio <

[PATCH] SATA AHCI support for Intel ICH7R - 2.6.11-rc1

2005-01-21 Thread Jason Gaston

This patch adds the Intel ICH7R DID's to the ahci.c SATA AHCI driver for ICH7R 
SATA support.
If acceptable, please apply.

Thanks,

Jason Gaston

Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>

--- linux-2.6.11-rc1/drivers/scsi/ahci.c.orig   2005-01-21 07:46:58.202269784 
-0800
+++ linux-2.6.11-rc1/drivers/scsi/ahci.c2005-01-21 07:48:58.732946336 
-0800
@@ -246,6 +246,10 @@ static struct pci_device_id ahci_pci_tbl
  board_ahci }, /* ICH7 */
{ PCI_VENDOR_ID_INTEL, 0x27c5, PCI_ANY_ID, PCI_ANY_ID, 0, 0,
  board_ahci }, /* ICH7M */
+   { PCI_VENDOR_ID_INTEL, 0x27c2, PCI_ANY_ID, PCI_ANY_ID, 0, 0,
+ board_ahci }, /* ICH7R */
+   { PCI_VENDOR_ID_INTEL, 0x27c3, PCI_ANY_ID, PCI_ANY_ID, 0, 0,
+ board_ahci }, /* ICH7R */
{ } /* terminate list */
 };
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]sched: Isochronous class v2 for unprivileged soft rt scheduling

2005-01-21 Thread Con Kolivas

utz lehmann wrote:
Hi
I dislike the behavior of the SCHED_ISO patch that iso tasks are
degraded to SCHED_NORMAL if they exceed the limit.
IMHO it's better to throttle them at the iso_cpu limit.
I have modified Con's iso2 patch to do this. If iso_cpu > 50 iso tasks
only get stalled for 1 tick (1ms on x86).
Some tasks are so cache intensive they would make almost no forward 
progress running for only 1ms.

Fortunately there is a currently unused task prio (MAX_RT_PRIO-1) [1]. I
Your implementation is not correct. The "prio" field of real time tasks 
is determined by MAX_RT_PRIO-1-rt_priority. Therefore you're limiting 
the best real time priority, not the other way around.

Throttling them for only 1ms will make it very easy to starve the system 
 with 1 or more short running (<1ms) SCHED_NORMAL tasks running. Lower 
priority tasks will never run.

Cheers,
Con


signature.asc
Description: OpenPGP digital signature

Re: Extend clear_page by an order parameter

2005-01-21 Thread Christoph Lameter

On Sat, 22 Jan 2005, Paul Mackerras wrote:

> Christoph Lameter writes:
>
> > The zeroing of a page of a arbitrary order in page_alloc.c and in hugetlb.c 
> > may benefit from a
> > clear_page that is capable of zeroing multiple pages at once (and scrubd
> > too but that is now an independent patch). The following patch extends
> > clear_page with a second parameter specifying the order of the page to be 
> > zeroed to allow an
> > efficient zeroing of pages. Hope I caught everything
>
> Wouldn't it be nicer to call the version that takes the order
> parameter "clear_pages" and then define clear_page(p) as
> clear_pages(p, 0) ?

clear_page clears one page of the specified order. clear_page cannot clear
multiple pages. Calling the function clear_pages would give a wrong
impression on what the function does and may lead to attempts to specify
the number of zero order pages as a parameter instead of the order.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ea-in-inode 0/5] Further fixes

2005-01-21 Thread Andreas Gruenbacher

On Fri, 2005-01-21 at 23:58, Stephen C. Tweedie wrote:
> Hi Andreas,
> 
> On Thu, 2005-01-20 at 02:01, Andreas Gruenbacher wrote:
> 
> > here is a set of fixes for ext3 in-inode attributes:
> 
> Obvious first question --- have these diffs survived the same
> torture-by-tridgell that the previous batch suffered?

No. The fixes are a lot less intrusive than the full xattr rework
though. I obviously ran tests; this included dbench.

Tridge, can you beat the code some more?

Andrew has the five fixes in 2.6.11-rc1-mm2.

-- Andreas.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

re: sysfs patches

2005-01-21 Thread Mitch Williams

My apologies -- I appear to have sent the patches out in reverse order.
Please apply patch 3 before the other two.

This is the first time I've used our automated tools to make small patches
out of big ones, but I think I have it figured out now.

Thanks for your patience.

-Mitch Williams

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread William Lee Irwin III

On Fri, Jan 21, 2005 at 03:35:20PM -0800, Andrew Morton wrote:
> I am unable to find the oops trace amongst all that stuff.  Help?
> (It would have been handy to include it in the bug report, actually)

There was no oops. The panic() in oom_kill.c was triggered.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 8/8] ide-scsi: add basic refcounting

2005-01-21 Thread Bartlomiej Zolnierkiewicz


* pointers to a SCSI host and a drive are added to idescsi_scsi_t
* pointer to the SCSI host is stored in disk->private_data
* ide_scsi_{get,put}() is used to {get,put} reference to the SCSI host

Unfortunately this is not complete fix for ->open() vs ->cleanup()
race, there are two TODO items left (as stated by FIXMEs):

* don't use drive->driver_data
* add driver's private struct gendisk [already DONE]

Last minute note: it seems we can't get rid of drive->driver_data
because of interactions between ide-scsi and IDE core (no surprise),
drive->driver_data should be set to NULL when SCSI host object is
being destroyed - callback from scsi_host_dev_release() is needed.

This will help in fixing another problem - for IDE we need to wait
until driver is really no longer using the device because we have more
than 1 driver for the same class of devices (ide-{cd,floppy,tape} vs
ide-scsi, also there is ide-default hack but it will die soon).

diff -Nru a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
--- a/drivers/scsi/ide-scsi.c   2005-01-21 23:41:42 +01:00
+++ b/drivers/scsi/ide-scsi.c   2005-01-21 23:41:42 +01:00
@@ -96,14 +96,39 @@
  */
 #define IDESCSI_LOG_CMD0   /* Log SCSI commands */

-typedef struct {
-   ide_drive_t *drive;
+typedef struct ide_scsi_obj {
+   ide_drive_t *drive;
+   struct Scsi_Host*host;
+
idescsi_pc_t *pc;   /* Current packet command */
unsigned long flags;/* Status/Action flags */
unsigned long transform;/* SCSI cmd translation layer */
unsigned long log;  /* log flags */
 } idescsi_scsi_t;

+static DECLARE_MUTEX(idescsi_ref_sem);
+
+#define ide_scsi_g(disk)   ((disk)->private_data)
+
+static struct ide_scsi_obj *ide_scsi_get(struct gendisk *disk)
+{
+   struct ide_scsi_obj *scsi = NULL;
+
+   down(_ref_sem);
+   scsi = ide_scsi_g(disk);
+   if (scsi)
+   scsi_host_get(scsi->host);
+   up(_ref_sem);
+   return scsi;
+}
+
+static void ide_scsi_put(struct ide_scsi_obj *scsi)
+{
+   down(_ref_sem);
+   scsi_host_put(scsi->host);
+   up(_ref_sem);
+}
+
 static inline idescsi_scsi_t *scsihost_to_idescsi(struct Scsi_Host *host)
 {
return (idescsi_scsi_t*) ([1]);
@@ -693,16 +718,20 @@
 static int idescsi_cleanup (ide_drive_t *drive)
 {
struct Scsi_Host *scsihost = drive->driver_data;
+   struct gendisk *g = drive->disk;

if (ide_unregister_subdriver(drive))
return 1;
-
-   /* FIXME?: Are these two statements necessary? */
+
+   /* FIXME: drive->driver_data shouldn't be used */
drive->driver_data = NULL;
-   drive->disk->fops = ide_fops;
+   /* FIXME: add driver's private struct gendisk */
+   g->private_data = NULL;
+   g->fops = ide_fops;

scsi_remove_host(scsihost);
-   scsi_host_put(scsihost);
+   ide_scsi_put(scsihost_to_idescsi(scsihost));
+
return 0;
 }

@@ -739,15 +768,30 @@

 static int idescsi_ide_open(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_scsi_obj *scsi;
+   ide_drive_t *drive;
+
+   if (!(scsi = ide_scsi_get(disk)))
+   return -ENXIO;
+
+   drive = scsi->drive;
+
drive->usage++;
+
return 0;
 }

 static int idescsi_ide_release(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_scsi_obj *scsi = ide_scsi_g(disk);
+   ide_drive_t *drive = scsi->drive;
+
drive->usage--;
+
+   ide_scsi_put(scsi);
+
return 0;
 }

@@ -755,8 +799,8 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   ide_drive_t *drive = bdev->bd_disk->private_data;
-   return generic_ide_ioctl(drive, file, bdev, cmd, arg);
+   struct ide_scsi_obj *scsi = ide_scsi_g(bdev->bd_disk);
+   return generic_ide_ioctl(scsi->drive, file, bdev, cmd, arg);
 }

 static struct block_device_operations idescsi_ops = {
@@ -1043,6 +1087,7 @@
 {
idescsi_scsi_t *idescsi;
struct Scsi_Host *host;
+   struct gendisk *g = drive->disk;
static int warned;
int err;

@@ -1071,10 +1116,12 @@
drive->driver_data = host;
idescsi = scsihost_to_idescsi(host);
idescsi->drive = drive;
+   idescsi->host = host;
err = ide_register_subdriver(drive, _driver);
if (!err) {
idescsi_setup (drive, idescsi);
-   drive->disk->fops = _ops;
+   g->fops = _ops;
+   g->private_data = idescsi;
err = scsi_add_host(host, >gendev);
if (!err) {

Re: [Dev] Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread Chris Wright

* Andrew Morton ([EMAIL PROTECTED]) wrote:
> Bryce Harrington <[EMAIL PROTECTED]> wrote:
> I am unable to find the oops trace amongst all that stuff.  Help?
> 
> (It would have been handy to include it in the bug report, actually)

Yes, it would.  Or at least some better granularity leading up to the
problem.  I ran growfiles locally on 2.6.11-rc-bk and didn't have any
problem.  Could you strace growfiles and see what it was doing when it
killed the machine?

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 6/8] ide-floppy: add basic refcounting

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Similar changes as for ide-cd.c.

diff -Nru a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
--- a/drivers/ide/ide-floppy.c  2005-01-21 23:41:14 +01:00
+++ b/drivers/ide/ide-floppy.c  2005-01-21 23:41:14 +01:00
@@ -274,8 +274,9 @@
  * driver due to an interrupt or a timer event is stored in a variable
  * of type idefloppy_floppy_t, defined below.
  */
-typedef struct {
-   ide_drive_t *drive;
+typedef struct ide_floppy_obj {
+   ide_drive_t *drive;
+   struct kref kref;

/* Current packet command */
idefloppy_pc_t *pc;
@@ -514,6 +515,33 @@
u8  reserved[4];
 } idefloppy_mode_parameter_header_t;

+static DECLARE_MUTEX(idefloppy_ref_sem);
+
+#define to_ide_floppy(obj) container_of(obj, struct ide_floppy_obj, kref)
+
+#define ide_floppy_g(disk) ((disk)->private_data)
+
+static struct ide_floppy_obj *ide_floppy_get(struct gendisk *disk)
+{
+   struct ide_floppy_obj *floppy = NULL;
+
+   down(_ref_sem);
+   floppy = ide_floppy_g(disk);
+   if (floppy)
+   kref_get(>kref);
+   up(_ref_sem);
+   return floppy;
+}
+
+static void ide_floppy_release(struct kref *);
+
+static void ide_floppy_put(struct ide_floppy_obj *floppy)
+{
+   down(_ref_sem);
+   kref_put(>kref, ide_floppy_release);
+   up(_ref_sem);
+}
+
 /*
  * Too bad. The drive wants to send us data which we are not ready to 
accept.
  * Just throw it away.
@@ -1792,9 +1820,6 @@
struct idefloppy_id_gcw gcw;

*((u16 *) ) = drive->id->config;
-   drive->driver_data = floppy;
-   memset(floppy, 0, sizeof(idefloppy_floppy_t));
-   floppy->drive = drive;
floppy->pc = floppy->pc_stack;
if (gcw.drq_type == 1)
set_bit(IDEFLOPPY_DRQ_INTERRUPT, >flags);
@@ -1838,13 +1863,26 @@

if (ide_unregister_subdriver(drive))
return 1;
-   drive->driver_data = NULL;
-   kfree(floppy);
+
del_gendisk(g);
-   g->fops = ide_fops;
+
+   ide_floppy_put(floppy);
+
return 0;
 }

+static void ide_floppy_release(struct kref *kref)
+{
+   struct ide_floppy_obj *floppy = to_ide_floppy(kref);
+   ide_drive_t *drive = floppy->drive;
+   struct gendisk *g = drive->disk;
+
+   drive->driver_data = NULL;
+   g->private_data = NULL;
+   g->fops = ide_fops;
+   kfree(floppy);
+}
+
 #ifdef CONFIG_PROC_FS

 static int proc_idefloppy_read_capacity
@@ -1893,14 +1931,21 @@

 static int idefloppy_open(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
-   idefloppy_floppy_t *floppy = drive->driver_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_floppy_obj *floppy;
+   ide_drive_t *drive;
idefloppy_pc_t pc;
+   int ret = 0;

-   drive->usage++;
-
debug_log(KERN_INFO "Reached idefloppy_open\n");

+   if (!(floppy = ide_floppy_get(disk)))
+   return -ENXIO;
+
+   drive = floppy->drive;
+
+   drive->usage++;
+
if (drive->usage == 1) {
clear_bit(IDEFLOPPY_FORMAT_IN_PROGRESS, >flags);
/* Just in case */
@@ -1920,13 +1965,15 @@
*/
) {
drive->usage--;
-   return -EIO;
+   ret = -EIO;
+   goto out_put_floppy;
}

if (floppy->wp && (filp->f_mode & 2)) {
drive->usage--;
-   return -EROFS;
-   }
+   ret = -EROFS;
+   goto out_put_floppy;
+   }
set_bit(IDEFLOPPY_MEDIA_CHANGED, >flags);
/* IOMEGA Clik! drives do not support lock/unlock commands */
 if (!test_bit(IDEFLOPPY_CLIK_DRIVE, >flags)) {
@@ -1936,21 +1983,26 @@
check_disk_change(inode->i_bdev);
} else if (test_bit(IDEFLOPPY_FORMAT_IN_PROGRESS, >flags)) {
drive->usage--;
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out_put_floppy;
}
return 0;
+
+out_put_floppy:
+   ide_floppy_put(floppy);
+   return ret;
 }

 static int idefloppy_release(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_floppy_obj *floppy = ide_floppy_g(disk);
+   ide_drive_t *drive = floppy->drive;
idefloppy_pc_t pc;

debug_log(KERN_INFO "Reached idefloppy_release\n");

if (drive->usage == 1) {
-   idefloppy_floppy_t *floppy = drive->driver_data;
-
/* IOMEGA Clik! drives do not support lock/unlock commands */
 if (!test_bit(IDEFLOPPY_CLIK_DRIVE, >flags)) {
idefloppy_create_prevent_cmd(, 0);
@@ -1960,6 +2012,9

quick question on dmesg output

2005-01-21 Thread Chris Friesen

The following is an edited output of dmesg, for a dual Xeon running 2.6.9:
Detected 1196.514 MHz processor.

Calibrating delay loop... 2383.87 BogoMIPS (lpj=1191936)

CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 09

I'm a bit confused why a 2GHz chip gets detected as a 1.2 GHz cpu.  Is 
this something strange in my hardware?

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 7/8] ide-tape: add basic refcounting

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Similar changes as for ide-cd.c.

diff -Nru a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- a/drivers/ide/ide-tape.c2005-01-21 23:41:25 +01:00
+++ b/drivers/ide/ide-tape.c2005-01-21 23:41:25 +01:00
@@ -781,8 +781,10 @@
  * driver due to an interrupt or a timer event is stored in a variable
  * of type idetape_tape_t, defined below.
  */
-typedef struct {
-   ide_drive_t *drive;
+typedef struct ide_tape_obj {
+   ide_drive_t *drive;
+   struct kref kref;
+
/*
 *  Since a typical character device operation requires more
 *  than one packet command, we provide here enough memory
@@ -1007,6 +1009,33 @@
  int debug_level;
 } idetape_tape_t;

+static DECLARE_MUTEX(idetape_ref_sem);
+
+#define to_ide_tape(obj) container_of(obj, struct ide_tape_obj, kref)
+
+#define ide_tape_g(disk)   ((disk)->private_data)
+
+static struct ide_tape_obj *ide_tape_get(struct gendisk *disk)
+{
+   struct ide_tape_obj *tape = NULL;
+
+   down(_ref_sem);
+   tape = ide_tape_g(disk);
+   if (tape)
+   kref_get(>kref);
+   up(_ref_sem);
+   return tape;
+}
+
+static void ide_tape_release(struct kref *);
+
+static void ide_tape_put(struct ide_tape_obj *tape)
+{
+   down(_ref_sem);
+   kref_put(>kref, ide_tape_release);
+   up(_ref_sem);
+}
+
 /*
  * Tape door status
  */
@@ -4522,9 +4551,7 @@
int stage_size;
struct sysinfo si;

-   memset(tape, 0, sizeof (idetape_tape_t));
spin_lock_init(>spinlock);
-   drive->driver_data = tape;
drive->dsc_overlap = 1;
 #ifdef CONFIG_BLK_DEV_IDEPCI
if (HWIF(drive)->pci_dev != NULL) {
@@ -4542,7 +4569,6 @@
/* Seagate Travan drives do not support DSC overlap. */
if (strstr(drive->id->model, "Seagate STT3401"))
drive->dsc_overlap = 0;
-   tape->drive = drive;
tape->minor = minor;
tape->name[0] = 'h';
tape->name[1] = 't';
@@ -4636,13 +4662,25 @@
spin_unlock_irqrestore(_lock, flags);
DRIVER(drive)->busy = 0;
(void) ide_unregister_subdriver(drive);
+
+   ide_tape_put(tape);
+
+   return 0;
+}
+
+static void ide_tape_release(struct kref *kref)
+{
+   struct ide_tape_obj *tape = to_ide_tape(kref);
+   ide_drive_t *drive = tape->drive;
+   struct gendisk *g = drive->disk;
+
drive->driver_data = NULL;
devfs_remove("%s/mt", drive->devfs_name);
devfs_remove("%s/mtn", drive->devfs_name);
-   devfs_unregister_tape(drive->disk->number);
-   kfree (tape);
-   drive->disk->fops = ide_fops;
-   return 0;
+   devfs_unregister_tape(g->number);
+   g->private_data = NULL;
+   g->fops = ide_fops;
+   kfree(tape);
 }

 #ifdef CONFIG_PROC_FS
@@ -4707,15 +4745,30 @@

 static int idetape_open(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_tape_obj *tape;
+   ide_drive_t *drive;
+
+   if (!(tape = ide_tape_get(disk)))
+   return -ENXIO;
+
+   drive = tape->drive;
+
drive->usage++;
+
return 0;
 }

 static int idetape_release(struct inode *inode, struct file *filp)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct ide_tape_obj *tape = ide_tape_g(disk);
+   ide_drive_t *drive = tape->drive;
+
drive->usage--;
+
+   ide_tape_put(tape);
+
return 0;
 }

@@ -4723,7 +4776,8 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   ide_drive_t *drive = bdev->bd_disk->private_data;
+   struct ide_tape_obj *tape = ide_tape_g(bdev->bd_disk);
+   ide_drive_t *drive = tape->drive;
int err = generic_ide_ioctl(drive, file, bdev, cmd, arg);
if (err == -EINVAL)
err = idetape_blkdev_ioctl(drive, cmd, arg);
@@ -4740,6 +4794,7 @@
 static int idetape_attach (ide_drive_t *drive)
 {
idetape_tape_t *tape;
+   struct gendisk *g = drive->disk;
int minor;

if (!strstr("ide-tape", drive->driver_req))
@@ -4772,6 +4827,15 @@
}
for (minor = 0; idetape_chrdevs[minor].drive != NULL; minor++)
;
+
+   memset(tape, 0, sizeof(*tape));
+
+   kref_init(>kref);
+
+   tape->drive = drive;
+
+   drive->driver_data = tape;
+
idetape_setup(drive, tape, minor);
idetape_chrdevs[minor].drive = drive;

@@ -4782,8 +4846,10 @@
S_IFCHR | S_IRUGO | S_IWUGO,
"%s/mtn", drive->devfs_name);

-   drive->disk->number = devfs_register_tape(drive->devfs_name);
-   drive->disk->fops = _block_ops;
+   g->number = devfs_register_tape(drive->devfs_name);
+   g->fops = _block_ops;
+   g->private_data =

Re: Kernel Panic with LTP on 2.6.11-rc1 (was Re: LTP Results for 2.6.x and 2.4.x)

2005-01-21 Thread Andrew Morton

Bryce Harrington <[EMAIL PROTECTED]> wrote:
>
> cmdline="mkfifo gffifo18; growfiles -b -W gf13 -e 1 -u -i 0 -L 30 -I r
> -r 1-4096 gffifo18"
> contacts=""
> analysis=exit
> initiation_status="ok"
> <<>>
> growfiles(gf13): 17094 DEBUG1 Using random seed of 1106350453
> Kernel panic - not syncing: Out of memory and no killable processes...
> 
> 
> The full output are available at these links:
> 
> FAIL   LTP  2.6.11-rc1  SuSE 9.0  2-way  http://khack.osdl.org/stp/300213/
> FAIL   LTP  2.6.11-rc1  SuSE 9.2  2-way  http://khack.osdl.org/stp/300219/
> FAIL   LTP  2.6.11-rc1  SuSE 9.2  1-way  http://khack.osdl.org/stp/300209/
> 
> OK LTP  2.6.10  SuSE 9.2  2-way  http://khack.osdl.org/stp/300230
> OK LTP  2.6.10  SuSE 9.0  2-way  http://khack.osdl.org/stp/300229
> OK LTP  2.6.10  RH 9.02-way  http://khack.osdl.org/stp/300228
> OK OPTS 2.6.11-rc1  RH 9.22-way  http://khack.osdl.org/stp/300227

I am unable to find the oops trace amongst all that stuff.  Help?

(It would have been handy to include it in the bug report, actually)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/8] ide-cd: add basic refcounting

2005-01-21 Thread Bartlomiej Zolnierkiewicz


* based on reference counting in drivers/scsi/{sd,sr}.c
* fixes race between ->open() and ->cleanup() (ide_unregister_subdriver()
  tests for drive->usage != 0 but there is no protection against new users)
* struct kref and pointer to a drive are added to struct ide_cdrom_info
* pointer to drive's struct ide_cdrom_info is stored in disk->private_data
* ide_cd_{get,put}() is used to {get,put} reference to struct ide_cdrom_info
* ide_cd_release() is a release method for struct ide_cdrom_info

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 23:40:52 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 23:40:52 +01:00
@@ -324,6 +324,33 @@

 #include "ide-cd.h"

+static DECLARE_MUTEX(idecd_ref_sem);
+
+#define to_ide_cd(obj) container_of(obj, struct cdrom_info, kref)
+
+#define ide_cd_g(disk) ((disk)->private_data)
+
+static struct cdrom_info *ide_cd_get(struct gendisk *disk)
+{
+   struct cdrom_info *cd = NULL;
+
+   down(_ref_sem);
+   cd = ide_cd_g(disk);
+   if (cd)
+   kref_get(>kref);
+   up(_ref_sem);
+   return cd;
+}
+
+static void ide_cd_release(struct kref *);
+
+static void ide_cd_put(struct cdrom_info *cd)
+{
+   down(_ref_sem);
+   kref_put(>kref, ide_cd_release);
+   up(_ref_sem);
+}
+
 /
  * Generic packet command support and error handling routines.
  */
@@ -3225,14 +3252,27 @@
 int ide_cdrom_cleanup(ide_drive_t *drive)
 {
struct cdrom_info *info = drive->driver_data;
-   struct cdrom_device_info *devinfo = >devinfo;
-   struct gendisk *g = drive->disk;

if (ide_unregister_subdriver(drive)) {
printk(KERN_ERR "%s: %s: failed to ide_unregister_subdriver\n",
__FUNCTION__, drive->name);
return 1;
}
+
+   del_gendisk(drive->disk);
+
+   ide_cd_put(info);
+
+   return 0;
+}
+
+static void ide_cd_release(struct kref *kref)
+{
+   struct cdrom_info *info = to_ide_cd(kref);
+   struct cdrom_device_info *devinfo = >devinfo;
+   ide_drive_t *drive = info->drive;
+   struct gendisk *g = drive->disk;
+
if (info->buffer != NULL)
kfree(info->buffer);
if (info->toc != NULL)
@@ -3240,13 +3280,13 @@
if (info->changer_info != NULL)
kfree(info->changer_info);
if (devinfo->handle == drive && unregister_cdrom(devinfo))
-   printk(KERN_ERR "%s: ide_cdrom_cleanup failed to unregister 
device from the cdrom driver.\n", drive->name);
-   kfree(info);
+   printk(KERN_ERR "%s: %s failed to unregister device from the 
cdrom "
+   "driver.\n", __FUNCTION__, drive->name);
drive->driver_data = NULL;
blk_queue_prep_rq(drive->queue, NULL);
-   del_gendisk(g);
+   g->private_data = NULL;
g->fops = ide_fops;
-   return 0;
+   kfree(info);
 }

 static int ide_cdrom_attach (ide_drive_t *drive);
@@ -3289,9 +3329,16 @@

 static int idecd_open(struct inode * inode, struct file * file)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
-   struct cdrom_info *info = drive->driver_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct cdrom_info *info;
+   ide_drive_t *drive;
int rc = -ENOMEM;
+
+   if (!(info = ide_cd_get(disk)))
+   return -ENXIO;
+
+   drive = info->drive;
+
drive->usage++;

if (!info->buffer)
@@ -3299,16 +3346,24 @@
GFP_KERNEL|__GFP_REPEAT);
 if (!info->buffer || (rc = cdrom_open(>devinfo, inode, file)))
drive->usage--;
+
+   if (rc < 0)
+   ide_cd_put(info);
+
return rc;
 }

 static int idecd_release(struct inode * inode, struct file * file)
 {
-   ide_drive_t *drive = inode->i_bdev->bd_disk->private_data;
-   struct cdrom_info *info = drive->driver_data;
+   struct gendisk *disk = inode->i_bdev->bd_disk;
+   struct cdrom_info *info = ide_cd_g(disk);
+   ide_drive_t *drive = info->drive;

cdrom_release (>devinfo, file);
drive->usage--;
+
+   ide_cd_put(info);
+
return 0;
 }

@@ -3316,27 +3371,27 @@
unsigned int cmd, unsigned long arg)
 {
struct block_device *bdev = inode->i_bdev;
-   ide_drive_t *drive = bdev->bd_disk->private_data;
-   int err = generic_ide_ioctl(drive, file, bdev, cmd, arg);
-   if (err == -EINVAL) {
-   struct cdrom_info *info = drive->driver_data;
+   struct cdrom_info *info = ide_cd_g(bdev->bd_disk);
+   int err;
+
+   err  = generic_ide_ioctl(info->drive, file, bdev, cmd, arg);
+   if (err == -EINVAL)
err = cdrom_ioctl(file, >devinfo, inode, cmd, arg);
-   }
+
return err;
 }

 static int idecd_media_changed(struct gendisk *disk)

[patch 2/8] kill setup_driver_defaults()

2005-01-21 Thread Bartlomiej Zolnierkiewicz


* move default_do_request() to ide-default.c
* fix drivers to set ide_driver_t->{do_request,end_request,error,abort}
* kill setup_driver_defaults()

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 22:27:18 +01:00
@@ -3279,6 +3279,9 @@
.supports_dsc_overlap   = 1,
.cleanup= ide_cdrom_cleanup,
.do_request = ide_do_rw_cdrom,
+   .end_request= ide_end_request,
+   .error  = __ide_error,
+   .abort  = __ide_abort,
.proc   = idecd_proc,
.attach = ide_cdrom_attach,
.drives = LIST_HEAD_INIT(ide_cdrom_driver.drives),
diff -Nru a/drivers/ide/ide-default.c b/drivers/ide/ide-default.c
--- a/drivers/ide/ide-default.c 2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-default.c 2005-01-21 22:27:18 +01:00
@@ -38,6 +38,12 @@

 static int idedefault_attach(ide_drive_t *drive);

+static ide_startstop_t idedefault_do_request(ide_drive_t *drive, struct 
request *rq, sector_t block)
+{
+   ide_end_request(drive, 0, 0);
+   return ide_stopped;
+}
+
 /*
  * IDE subdriver functions, registered with ide.c
  */
@@ -47,6 +53,10 @@
.version=   IDEDEFAULT_VERSION,
.attach =   idedefault_attach,
.cleanup=   ide_unregister_subdriver,
+   .do_request =   idedefault_do_request,
+   .end_request=   ide_end_request,
+   .error  =   __ide_error,
+   .abort  =   __ide_abort,
.drives =   LIST_HEAD_INIT(idedefault_driver.drives)
 };

diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 22:27:18 +01:00
@@ -996,6 +996,9 @@
.supports_dsc_overlap   = 0,
.cleanup= idedisk_cleanup,
.do_request = ide_do_rw_disk,
+   .end_request= ide_end_request,
+   .error  = __ide_error,
+   .abort  = __ide_abort,
.proc   = idedisk_proc,
.attach = idedisk_attach,
.drives = LIST_HEAD_INIT(idedisk_driver.drives),
diff -Nru a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
--- a/drivers/ide/ide-floppy.c  2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-floppy.c  2005-01-21 22:27:18 +01:00
@@ -1884,6 +1884,8 @@
.cleanup= idefloppy_cleanup,
.do_request = idefloppy_do_request,
.end_request= idefloppy_do_end_request,
+   .error  = __ide_error,
+   .abort  = __ide_abort,
.proc   = idefloppy_proc,
.attach = idefloppy_attach,
.drives = LIST_HEAD_INIT(idefloppy_driver.drives),
diff -Nru a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
--- a/drivers/ide/ide-io.c  2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-io.c  2005-01-21 22:27:18 +01:00
@@ -622,6 +622,8 @@
return ide_atapi_error(drive, rq, stat, err);
 }

+EXPORT_SYMBOL_GPL(__ide_error);
+
 /**
  * ide_error   -   handle an error on the IDE
  * @drive: drive the error occurred on
@@ -665,6 +667,8 @@
DRIVER(drive)->end_request(drive, 0, 0);
return ide_stopped;
 }
+
+EXPORT_SYMBOL_GPL(__ide_abort);

 /**
  * ide_abort   -   abort pending IDE operatins
diff -Nru a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- a/drivers/ide/ide-tape.c2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide-tape.c2005-01-21 22:27:18 +01:00
@@ -4686,6 +4686,8 @@
.cleanup= idetape_cleanup,
.do_request = idetape_do_request,
.end_request= idetape_end_request,
+   .error  = __ide_error,
+   .abort  = __ide_abort,
.proc   = idetape_proc,
.attach = idetape_attach,
.drives = LIST_HEAD_INIT(idetape_driver.drives),
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c 2005-01-21 22:27:18 +01:00
+++ b/drivers/ide/ide.c 2005-01-21 22:27:18 +01:00
@@ -197,7 +197,6 @@
 EXPORT_SYMBOL(ide_hwifs);

 extern ide_driver_t idedefault_driver;
-static void setup_driver_defaults(ide_driver_t *driver);

 /*
  * Do not even *think* about calling this!
@@ -301,8 +300,6 @@
return; /* already initialized */
magic_cookie = 0;

-   setup_driver_defaults(_driver);
-
/* Initialise all interface structures */
for (index = 0; index < MAX_HWIFS; ++index) {
hwif = _hwifs[index];
@@ -2015,38 +2012,6 @@
 #endif
 }

-static ide_startstop_t default_do_request (ide_drive_t

Re: linux capabilities ?

2005-01-21 Thread Olaf Dietsche

jnf <[EMAIL PROTECTED]> writes:

> Thank you, when I get a second I will take a look through it. I've already
> written a couple programs to set/get capabilities, so I am aware of the
> interface/api, it was just that even with the capabilities it was not
> working ;]
> Either way I will take a look through the code, I appreciate the reply.

You might want to look at

And in
,
you will find some example programs: execcap, setpcaps and sucap.

Regards, Olaf.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ide-dev 4/5] fix some rare ide-default vs ide-disk races

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Some rare races between ide-default and ide-disk are possible, i.e.:
* ide-default is used, I/O request is triggered (ie. /proc/ide/hd?/identify),
  drive->special is cleared silently (so CHS is not initialized properly),
  ide-disk is loaded and fails if drive uses CHS
* ide-disk is used, drive is resetted, ide-disk is unloaded, ide-default
  takes control over drive and on the first I/O request silently clears
  drive->special without restoring settings

Fix them by moving idedisk_{special,pre_reset}() and company to IDE core.

diff -Nru a/drivers/ide/Kconfig b/drivers/ide/Kconfig
--- a/drivers/ide/Kconfig   2005-01-21 23:53:42 +01:00
+++ b/drivers/ide/Kconfig   2005-01-21 23:53:42 +01:00
@@ -150,7 +150,6 @@

 config IDEDISK_MULTI_MODE
bool "Use multi-mode by default"
-   depends on BLK_DEV_IDEDISK
help
  If you get this error, try to say Y here:

diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c2005-01-21 23:53:42 +01:00
+++ b/drivers/ide/ide-disk.c2005-01-21 23:53:42 +01:00
@@ -517,75 +517,6 @@
return drive->capacity64 - drive->sect0;
 }

-#define IS_PDC4030_DRIVE   0
-
-static ide_startstop_t idedisk_special (ide_drive_t *drive)
-{
-   special_t *s = >special;
-
-   if (s->b.set_geometry) {
-   s->b.set_geometry   = 0;
-   if (!IS_PDC4030_DRIVE) {
-   ide_task_t args;
-   memset(, 0, sizeof(ide_task_t));
-   args.tfRegister[IDE_NSECTOR_OFFSET] = drive->sect;
-   args.tfRegister[IDE_SECTOR_OFFSET]  = drive->sect;
-   args.tfRegister[IDE_LCYL_OFFSET]= drive->cyl;
-   args.tfRegister[IDE_HCYL_OFFSET]= drive->cyl>>8;
-   args.tfRegister[IDE_SELECT_OFFSET]  = 
((drive->head-1)|drive->select.all)&0xBF;
-   args.tfRegister[IDE_COMMAND_OFFSET] = WIN_SPECIFY;
-   args.command_type = IDE_DRIVE_TASK_NO_DATA;
-   args.handler  = _geometry_intr;
-   do_rw_taskfile(drive, );
-   }
-   } else if (s->b.recalibrate) {
-   s->b.recalibrate = 0;
-   if (!IS_PDC4030_DRIVE) {
-   ide_task_t args;
-   memset(, 0, sizeof(ide_task_t));
-   args.tfRegister[IDE_NSECTOR_OFFSET] = drive->sect;
-   args.tfRegister[IDE_COMMAND_OFFSET] = WIN_RESTORE;
-   args.command_type = IDE_DRIVE_TASK_NO_DATA;
-   args.handler  = _intr;
-   do_rw_taskfile(drive, );
-   }
-   } else if (s->b.set_multmode) {
-   s->b.set_multmode = 0;
-   if (drive->mult_req > drive->id->max_multsect)
-   drive->mult_req = drive->id->max_multsect;
-   if (!IS_PDC4030_DRIVE) {
-   ide_task_t args;
-   memset(, 0, sizeof(ide_task_t));
-   args.tfRegister[IDE_NSECTOR_OFFSET] = drive->mult_req;
-   args.tfRegister[IDE_COMMAND_OFFSET] = WIN_SETMULT;
-   args.command_type = IDE_DRIVE_TASK_NO_DATA;
-   args.handler  = _multmode_intr;
-   do_rw_taskfile(drive, );
-   }
-   } else if (s->all) {
-   int special = s->all;
-   s->all = 0;
-   printk(KERN_ERR "%s: bad special flag: 0x%02x\n", drive->name, 
special);
-   return ide_stopped;
-   }
-   return IS_PDC4030_DRIVE ? ide_stopped : ide_started;
-}
-
-static void idedisk_pre_reset (ide_drive_t *drive)
-{
-   int legacy = (drive->id->cfs_enable_2 & 0x0400) ? 0 : 1;
-
-   drive->special.all = 0;
-   drive->special.b.set_geometry = legacy;
-   drive->special.b.recalibrate  = legacy;
-   if (OK_TO_RESET_CONTROLLER)
-   drive->mult_count = 0;
-   if (!drive->keep_settings && !drive->using_dma)
-   drive->mult_req = 0;
-   if (drive->mult_req != drive->mult_count)
-   drive->special.b.set_multmode = 1;
-}
-
 #ifdef CONFIG_PROC_FS

 static int smart_enable(ide_drive_t *drive)
@@ -893,28 +824,6 @@

printk(KERN_INFO "%s: max request size: %dKiB\n", drive->name, 
drive->queue->max_sectors / 2);

-   /* Extract geometry if we did not already have one for the drive */
-   if (!drive->cyl || !drive->head || !drive->sect) {
-   drive->cyl = drive->bios_cyl  = id->cyls;
-   drive->head= drive->bios_head = id->heads;
-   drive->sect= drive->bios_sect = id->sectors;
-   }
-
-   /* Handle logical geometry translation by the drive */
-   if ((id->field_valid & 1) && id->cur_cyls &&
-   id->cur_heads && (id->cur_heads <= 16) && id->cur_sectors) {
-   drive->cyl

[ide-dev 1/5] ignore BIOS enable bits for Promise controllers

2005-01-21 Thread Bartlomiej Zolnierkiewicz


Since there are no Promise binary drivers for 2.6.x kernels:
* ignore BIOS enable bits completely
* remove CONFIG_PDC202XX_FORCE
* kill IDEPCI_FLAG_FORCE_PDC hack

diff -Nru a/drivers/ide/Kconfig b/drivers/ide/Kconfig
--- a/drivers/ide/Kconfig   2005-01-21 23:53:09 +01:00
+++ b/drivers/ide/Kconfig   2005-01-21 23:53:09 +01:00
@@ -659,13 +659,6 @@
 config BLK_DEV_PDC202XX_NEW
tristate "PROMISE PDC202{68|69|70|71|75|76|77} support"

-# FIXME - probably wants to be one for old and for new
-config PDC202XX_FORCE
-   bool "Enable controller even if disabled by BIOS"
-   depends on BLK_DEV_PDC202XX_NEW
-   help
- Enable the PDC202xx controller even if it has been disabled in the 
BIOS setup.
-
 config BLK_DEV_SVWKS
tristate "ServerWorks OSB4/CSB5/CSB6 chipsets support"
help
diff -Nru a/drivers/ide/pci/pdc202xx_new.h b/drivers/ide/pci/pdc202xx_new.h
--- a/drivers/ide/pci/pdc202xx_new.h2005-01-21 23:53:09 +01:00
+++ b/drivers/ide/pci/pdc202xx_new.h2005-01-21 23:53:09 +01:00
@@ -73,9 +73,6 @@
.init_hwif  = init_hwif_pdc202new,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
},{ /* 3 */
.name   = "PDC20271",
@@ -100,9 +97,6 @@
.init_hwif  = init_hwif_pdc202new,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
},{ /* 6 */
.name   = "PDC20277",
diff -Nru a/drivers/ide/pci/pdc202xx_old.h b/drivers/ide/pci/pdc202xx_old.h
--- a/drivers/ide/pci/pdc202xx_old.h2005-01-21 23:53:09 +01:00
+++ b/drivers/ide/pci/pdc202xx_old.h2005-01-21 23:53:09 +01:00
@@ -79,9 +79,6 @@
.init_dma   = init_dma_pdc202xx,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
.extra  = 16,
},{ /* 1 */
@@ -92,12 +89,8 @@
.init_dma   = init_dma_pdc202xx,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
.extra  = 48,
-   .flags  = IDEPCI_FLAG_FORCE_PDC,
},{ /* 2 */
.name   = "PDC20263",
.init_setup = init_setup_pdc202ata4,
@@ -106,9 +99,6 @@
.init_dma   = init_dma_pdc202xx,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
.extra  = 48,
},{ /* 3 */
@@ -119,12 +109,8 @@
.init_dma   = init_dma_pdc202xx,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
.extra  = 48,
-   .flags  = IDEPCI_FLAG_FORCE_PDC,
},{ /* 4 */
.name   = "PDC20267",
.init_setup = init_setup_pdc202xx,
@@ -133,9 +119,6 @@
.init_dma   = init_dma_pdc202xx,
.channels   = 2,
.autodma= AUTODMA,
-#ifndef CONFIG_PDC202XX_FORCE
-   .enablebits = {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
-#endif
.bootable   = OFF_BOARD,
.extra  = 48,
}
diff -Nru a/drivers/ide/setup-pci.c b/drivers/ide/setup-pci.c
--- a/drivers/ide/setup-pci.c   2005-01-21 23:53:09 +01:00
+++ b/drivers/ide/setup-pci.c   2005-01-21 23:53:09 +01:00
@@ -579,7 +579,6 @@
int port;
int at_least_one_hwif_enabled = 0;
ide_hwif_t *hwif, *mate = NULL;
-   static int secondpdc = 0;
u8 tmp;

index->all = 0xf0f0;
@@ -590,22 +589,10 @@

for (port = 0; port <= 1; ++port) {
ide_pci_enablebit_t *e = &(d->enablebits[port]);
-
-   /*
-* If this is a Promise FakeRaid controller,
-* the 2nd controller will be marked as
-* disabled while it is actually there and enabled
-* by the bios for raid purposes.
-* Skip the normal "is it enabled" test for

[ide-dev 2/5] fix drive->ready_stat for ATAPI

2005-01-21 Thread Bartlomiej Zolnierkiewicz


ATAPI devices ignore DRDY bit so drive->ready_stat must be set to zero.
It is currently done by device drivers (including ide-default fake driver)
but for PMAC driver it is too late as wait_for_ready() may be called during
probe: probe_hwif()->pmac_ide_dma_check()->pmac_ide_{mdma,udma}_enable()->
->pmac_ide_do_setfeature()->wait_for_ready().

Fixup drive->ready_stat just after detecting ATAPI device.

diff -Nru a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
--- a/drivers/ide/ide-cd.c  2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide-cd.c  2005-01-21 23:53:20 +01:00
@@ -3088,7 +3088,6 @@
drive->queue->unplug_delay = 1;

drive->special.all  = 0;
-   drive->ready_stat   = 0;

CDROM_STATE_FLAGS(drive)->media_changed = 1;
CDROM_STATE_FLAGS(drive)->toc_valid = 0;
diff -Nru a/drivers/ide/ide-default.c b/drivers/ide/ide-default.c
--- a/drivers/ide/ide-default.c 2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide-default.c 2005-01-21 23:53:20 +01:00
@@ -57,13 +57,6 @@
"driver with ide.c\n", drive->name);
return 1;
}
-
-   /* For the sake of the request layer, we must make sure we have a
-* correct ready_stat value, that is 0 for ATAPI devices or we will
-* fail any request like Power Management
-*/
-   if (drive->media != ide_disk)
-   drive->ready_stat = 0;

return 0;
 }
diff -Nru a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
--- a/drivers/ide/ide-floppy.c  2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide-floppy.c  2005-01-21 23:53:20 +01:00
@@ -1793,7 +1793,6 @@

*((u16 *) ) = drive->id->config;
drive->driver_data = floppy;
-   drive->ready_stat = 0;
memset(floppy, 0, sizeof(idefloppy_floppy_t));
floppy->drive = drive;
floppy->pc = floppy->pc_stack;
diff -Nru a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
--- a/drivers/ide/ide-probe.c   2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide-probe.c   2005-01-21 23:53:20 +01:00
@@ -221,6 +221,8 @@
}
printk (" drive\n");
drive->media = type;
+   /* an ATAPI device ignores DRDY */
+   drive->ready_stat = 0;
return;
}

diff -Nru a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- a/drivers/ide/ide-tape.c2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide-tape.c2005-01-21 23:53:20 +01:00
@@ -4530,8 +4530,6 @@
memset(tape, 0, sizeof (idetape_tape_t));
spin_lock_init(>spinlock);
drive->driver_data = tape;
-   /* An ATAPI device ignores DRDY */
-   drive->ready_stat = 0;
drive->dsc_overlap = 1;
 #ifdef CONFIG_BLK_DEV_IDEPCI
if (HWIF(drive)->pci_dev != NULL) {
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c 2005-01-21 23:53:20 +01:00
+++ b/drivers/ide/ide.c 2005-01-21 23:53:20 +01:00
@@ -1747,6 +1747,8 @@
case -4: /* "cdrom" */
drive->present = 1;
drive->media = ide_cdrom;
+   /* an ATAPI device ignores DRDY */
+   drive->ready_stat = 0;
hwif->noprobe = 0;
goto done;
case -5: /* "serialize" */
diff -Nru a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
--- a/drivers/scsi/ide-scsi.c   2005-01-21 23:53:20 +01:00
+++ b/drivers/scsi/ide-scsi.c   2005-01-21 23:53:20 +01:00
@@ -679,7 +679,6 @@
 static void idescsi_setup (ide_drive_t *drive, idescsi_scsi_t *scsi)
 {
DRIVER(drive)->busy++;
-   drive->ready_stat = 0;
if (drive->id && (drive->id->config & 0x0060) == 0x20)
set_bit (IDESCSI_DRQ_INTERRUPT, >flags);
set_bit(IDESCSI_TRANSFORM, >transform);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch to fix race between the NMI code and the CMOS clock

2005-01-21 Thread Corey Minyard


This patch fixes a race between the CMOS clock setting and the NMI
code.  The NMI code indiscriminatly sets index registers and values
in the same place the CMOS clock is set.  If you are setting the
CMOS clock and an NMI occurs, Bad values could be written to or
read from the CMOS RAM, or the NMI operation might not occur
correctly.

Fixing this requires creating a special lock so the NMI code can
know its CPU owns the lock an "do the right thing" in that case.

This was discovered and the fix has been tested by a very demanding
customer who tests the heck of out the software we deliver.

Signed-off-by: Corey Minyard <[EMAIL PROTECTED]>

Index: linux-2.6.11-rc1/arch/i386/kernel/time.c
===
--- linux-2.6.11-rc1.orig/arch/i386/kernel/time.c   2005-01-19 
09:53:59.0 -0600
+++ linux-2.6.11-rc1/arch/i386/kernel/time.c2005-01-21 09:42:09.0 
-0600
@@ -83,6 +83,14 @@
 
 spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
 
+/*
+ * This is a special lock that is owned by the CPU and holds the index
+ * register we are working with.  It is required for NMI access to the
+ * CMOS/RTC registers.  See include/asm-i386/mc146818rtc.h for details.
+ */
+volatile unsigned long cmos_lock = 0;
+EXPORT_SYMBOL(cmos_lock);
+
 spinlock_t i8253_lock = SPIN_LOCK_UNLOCKED;
 EXPORT_SYMBOL(i8253_lock);
 
Index: linux-2.6.11-rc1/include/asm-i386/mach-default/mach_traps.h
===
--- linux-2.6.11-rc1.orig/include/asm-i386/mach-default/mach_traps.h
2004-12-24 15:34:58.0 -0600
+++ linux-2.6.11-rc1/include/asm-i386/mach-default/mach_traps.h 2005-01-21 
09:42:09.0 -0600
@@ -7,6 +7,8 @@
 #ifndef _MACH_TRAPS_H
 #define _MACH_TRAPS_H
 
+#include 
+
 static inline void clear_mem_error(unsigned char reason)
 {
reason = (reason & 0xf) | 4;
@@ -20,10 +22,20 @@
 
 static inline void reassert_nmi(void)
 {
+   int old_reg = -1;
+
+   if (do_i_have_lock_cmos())
+   old_reg = current_lock_cmos_reg();
+   else
+   lock_cmos(0); /* register doesn't matter here */
outb(0x8f, 0x70);
inb(0x71);  /* dummy */
outb(0x0f, 0x70);
inb(0x71);  /* dummy */
+   if (old_reg >= 0)
+   outb(old_reg, 0x70);
+   else
+   unlock_cmos();
 }
 
 #endif /* !_MACH_TRAPS_H */
Index: linux-2.6.11-rc1/include/asm-i386/mc146818rtc.h
===
--- linux-2.6.11-rc1.orig/include/asm-i386/mc146818rtc.h2004-12-24 
15:35:23.0 -0600
+++ linux-2.6.11-rc1/include/asm-i386/mc146818rtc.h 2005-01-21 
09:42:10.0 -0600
@@ -5,24 +5,102 @@
 #define _ASM_MC146818RTC_H
 
 #include 
+#include 
+#include 
 
 #ifndef RTC_PORT
 #define RTC_PORT(x)(0x70 + (x))
 #define RTC_ALWAYS_BCD 1   /* RTC operates in binary mode */
 #endif
 
+#ifdef __HAVE_ARCH_CMPXCHG
+/*
+ * This lock provides nmi access to the CMOS/RTC registers.  It has some
+ * special properties.  It is owned by a CPU and stores the index register
+ * currently being accessed (if owned).  The idea here is that it works
+ * like a normal lock (normally).  However, in an NMI, the NMI code will
+ * first check to see if it's CPU owns the lock, meaning that the NMI
+ * interrupted during the read/write of the device.  If it does, it goes ahead
+ * and performs the access and then restores the index register.  If it does
+ * not, it locks normally.
+ *
+ * Note that since we are working with NMIs, we need this lock even in
+ * a non-SMP machine just to mark that the lock is owned.
+ *
+ * This only works with compare-and-swap.  There is no other way to
+ * atomically claim the lock and set the owner.
+ */
+extern volatile unsigned long cmos_lock;
+
+/*
+ * All of these below must be called with interrupts off, preempt
+ * disabled, etc.
+ */
+
+static inline void lock_cmos(unsigned char reg)
+{
+   unsigned long new;
+   new = ((smp_processor_id()+1) << 8) | reg;
+   for (;;) {
+   if (cmos_lock)
+   continue;
+   if (__cmpxchg(_lock, 0, new, sizeof(cmos_lock)) == 0)
+   return;
+   }
+}
+
+static inline void unlock_cmos(void)
+{
+   cmos_lock = 0;
+}
+static inline int do_i_have_lock_cmos(void)
+{
+   return (cmos_lock >> 8) == (smp_processor_id()+1);
+}
+static inline unsigned char current_lock_cmos_reg(void)
+{
+   return cmos_lock & 0xff;
+}
+#define lock_cmos_prefix(reg) \
+   do {\
+   unsigned long cmos_flags;   \
+   local_irq_save(cmos_flags); \
+   lock_cmos(reg)
+#define lock_cmos_suffix(reg) \
+   unlock_cmos();  \
+   local_irq_restore(cmos_flags);  \
+   } while (0)
+#else
+#define lock_cmos_prefix(reg) do {} while (0)
+#define

Re: 2.6.11-rc1 vs. PowerMac 8500/G3 (and VAIO laptop) [usb-storage]

2005-01-21 Thread John Mock

> We always used to byte-swap just a few fields in the descriptor, to
> optimise access to those. We never bothered to put them back when we
> passed them up to userspace via usbdevfs -- we gave a structure which
> was mostly LE but had precisely four fields byteswapped to host-endian.

> The upstream version of usbutils doesn't expect this -- it expects the
> descriptor to be entirely little-endian, as it's received from the
> device. John's version of usbutils (which distro, is that, btw?)
> evidently has a hack to work around it.

I'm running Debian 'Sarge' and there are currently no bug reports for
either 'usbutils' or 'libusb'.

I would support doing things consistently, especially if the relevant 
utility(s) can readily determine whether byte-swapping is necessary or 
not.
   -- JM

P.S. Other readers:  Hey, what about the SCSI oops???
---
-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: powerpc (ppc)
Kernel: Linux 2.6.10
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages usbutils depends on:
ii  libc6   2.3.2.ds1-20 GNU C Library: Shared libraries an
ii  libusb-0.1-41:0.1.8-17   Userspace USB programming library

-- no debconf information
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BK PATCHES] ide-dev-2.6 update

2005-01-21 Thread Bartlomiej Zolnierkiewicz

On Fri, 21 Jan 2005 14:13:08 -0800, Greg KH <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 21, 2005 at 09:02:39PM +0100, Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > ide-dev-2.6 tree has been resurrected.  It now contains first bunch
> > of fixes needed for converting IDE device drivers to driver-model.
> 
> Yeah!
> 
> > NOTE: If you have a local copy of the tree please re-clone it, thanks.
> >
> > BK users:
> >
> >   bk pull bk://bart.bkbits.net/ide-dev-2.6
> 
> Hm, have a patch anywhere that people can look at?

I sent patches to linux-ide (since they are highly IDE specific and I
don't want to
spam LKML too much) some time ago but since they are not that big I will resend
them here...

[ I don't make patch versions of ide-dev because it is pulled into -mm ]

Bartlomiej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ea-in-inode 0/5] Further fixes

2005-01-21 Thread Stephen C. Tweedie

Hi Andreas,

On Thu, 2005-01-20 at 02:01, Andreas Gruenbacher wrote:

> here is a set of fixes for ext3 in-inode attributes:

Obvious first question --- have these diffs survived the same
torture-by-tridgell that the previous batch suffered?

Cheers,
 Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 655 matches

Mail list logo