date:20071127

[PATCH, v2] get rid of NR_OPEN and introduce a sysctl_nr_open

2007-11-27 Thread Eric Dumazet


V1->V2

Some NR_OPEN were left unchanged for alpha, mips & sparc64.

As changing NR_OPEN from 1024*1024 to 16*1024*1024 was considered a litle
bit dangerous, just let it default to 1024*1024 but adds a new sysctl
to let sysadmins change this value.

Thank you

[PATCH] get rid of NR_OPEN and introduce a sysctl_nr_open

NR_OPEN (historically set to 1024*1024) actually forbids processes to open 
more than 1024*1024 handles.


Unfortunatly some production servers hit the not so 'ridiculously high value' 
of 1024*1024 file descriptors per process.


Changing NR_OPEN is not considered safe because of vmalloc space potential 
exhaust.


This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to 
1024*1024, so that admins can decide to change this limit if their workload 
needs it.



Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

 Documentation/filesystems/proc.txt |8 
 Documentation/sysctl/fs.txt|   10 ++
 arch/alpha/kernel/osf_sys.c|2 +-
 arch/mips/kernel/sysirix.c |2 +-
 arch/sparc64/solaris/fs.c  |2 +-
 arch/sparc64/solaris/timod.c   |6 --
 fs/file.c  |8 +---
 include/linux/fs.h |2 +-
 kernel/sys.c   |2 +-
 kernel/sysctl.c|8 
 10 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index dec9945..9b390d7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -989,6 +989,14 @@ nr_inodes
 Denotes the  number  of  inodes the system has allocated. This number will
 grow and shrink dynamically.
 
+nr_open
+---
+
+Denotes the maximum number of file-handles a process can
+allocate. Default value is 1024*1024 (1048576) which should be
+enough for most machines. Actual limit depends on RLIMIT_NOFILE
+resource limit.
+
 nr_free_inodes
 --
 
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index aa986a3..f992543 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -23,6 +23,7 @@ Currently, these files are in /proc/sys/fs:
 - inode-max
 - inode-nr
 - inode-state
+- nr_open
 - overflowuid
 - overflowgid
 - suid_dumpable
@@ -91,6 +92,15 @@ usage of file handles and you don't need to increase the 
maximum.
 
 ==
 
+nr_open:
+
+This denotes the maximum number of file-handles a process can
+allocate. Default value is 1024*1024 (1048576) which should be
+enough for most machines. Actual limit depends on RLIMIT_NOFILE
+resource limit.
+
+==
+
 inode-max, inode-nr & inode-state:
 
 As with file handles, the kernel allocates the inode structures
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 6413c5f..72f9a61 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -430,7 +430,7 @@ sys_getpagesize(void)
 asmlinkage unsigned long
 sys_getdtablesize(void)
 {
-   return NR_OPEN;
+   return sysctl_nr_open;
 }
 
 /*
diff --git a/arch/mips/kernel/sysirix.c b/arch/mips/kernel/sysirix.c
index 4c477c7..22fd41e 100644
--- a/arch/mips/kernel/sysirix.c
+++ b/arch/mips/kernel/sysirix.c
@@ -356,7 +356,7 @@ asmlinkage int irix_syssgi(struct pt_regs *regs)
retval = NGROUPS_MAX;
goto out;
case 5:
-   retval = NR_OPEN;
+   retval = sysctl_nr_open;
goto out;
case 6:
retval = 1;
diff --git a/arch/sparc64/solaris/fs.c b/arch/sparc64/solaris/fs.c
index 61be597..9311bfe 100644
--- a/arch/sparc64/solaris/fs.c
+++ b/arch/sparc64/solaris/fs.c
@@ -624,7 +624,7 @@ asmlinkage int solaris_ulimit(int cmd, int val)
case 3: /* UL_GMEMLIM */
return current->signal->rlim[RLIMIT_DATA].rlim_cur;
case 4: /* UL_GDESLIM */
-   return NR_OPEN;
+   return sysctl_nr_open;
}
return -EINVAL;
 }
diff --git a/arch/sparc64/solaris/timod.c b/arch/sparc64/solaris/timod.c
index a9d32ce..f53123c 100644
--- a/arch/sparc64/solaris/timod.c
+++ b/arch/sparc64/solaris/timod.c
@@ -859,7 +859,8 @@ asmlinkage int solaris_getmsg(unsigned int fd, u32 arg1, 
u32 arg2, u32 arg3)
 
SOLD("entry");
lock_kernel();
-   if(fd >= NR_OPEN) goto out;
+   if (fd >= sysctl_nr_open)
+   goto out;
 
fdt = files_fdtable(current->files);
filp = fdt->fd[fd];
@@ -927,7 +928,8 @@ asmlinkage int solaris_putmsg(unsigned int fd, u32 arg1, 
u32 arg2, u32 arg3)
 
SOLD("entry");
lock_kernel();
-   if(fd >= NR_OPEN) goto out;
+   if (fd >= sysctl_nr_open)
+

Re: [ANNOUNCE] Open-FCoE - Fibre Channel over Ethernet Project

2007-11-27 Thread Christoph Hellwig

I just did a very quick glance over the tree.  Some extremly highlevel
comments to start with before actually starting the source review:


 - why do you need your own libcrc?  lib/crc32.c has a crc32_le
 - libsa should go.  Much of it is just wrappers of kernel functions
   that should be used directly.  Other like that hash, even or state
   helpers might either be opencoded in the caller or made completely
   generic in lib/.  Probably the former but we'll have to see.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Error returns not handled correctly by sysfs.c:subsys_attr_store()

2007-11-27 Thread Tejun Heo

Greg KH wrote:
> On Mon, Nov 26, 2007 at 08:31:16PM -0800, Andrew Morton wrote:
>> On Wed, 21 Nov 2007 15:16:59 -0700 Andrew Patterson <[EMAIL PROTECTED]> 
>> wrote:
>>
>>> The buf in fs/sysfs.c:subsys_attr_store() does not seem to be updated
>>> correctly when returning a negative value (indicating that an error
>>> condition has occurred) is returned.  If a negative value is returned,
>>> the next subsequent call to subsys_attr_store will have the contents of
>>> buf appended to the previous call.
>> subsys_attr_store() gets deleted by
>> http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver/kset-kill-subsys-attr.patch
>>
>> So maybe we will soon accidentally fix whatever-this-is?  Or maybe we will
>> faithfully maintain it.
> 
> Yes, subsys attributes go away, but this is showing a bug in the sysfs
> core with attributes, not in the "middle" layers of attributes.
> 
> I bounced the original bug report to Tejun, who has been changing the
> logic around this area to see if he sees anything that might be
> different now.
> 
> Tejun?

Weird, the problem is not reproducible here.

# echo a > allow_restart
-bash: echo: write error: Invalid argument
[  437.518024] buf_ptr = 0x810005e2, buf = x
[  437.518027] , count = 2
# echo b > allow_restart
-bash: echo: write error: Invalid argument
[  438.972973] buf_ptr = 0x81001be6f000, buf = y
[  438.972976] , count = 2
# echo c > allow_restart
-bash: echo: write error: Invalid argument
[  440.539747] buf_ptr = 0x81001d4ba000, buf = z
[  440.539750] , count = 2

Which is expected.  On each open, sysfs_buffer is allocated with kzalloc
and the buffer is freed on close, so I don't see how it can happen.
Behavior for multiple write can be considered peculiar in that ppos is
essentially ignored and each write is passed just like brand new write
to ->store method but this too is the expected behavior.

# (echo a; echo b; echo c) > allow_restart
[  765.257132] buf_ptr = 0x81001be4f000, buf = a
[  765.257135] , count = 2
[  765.285474] buf_ptr = 0x81001be4f000, buf = b
[  765.285484] , count = 2
[  765.314002] buf_ptr = 0x81001be4f000, buf = c
[  765.314004] , count = 2
-bash: echo: write error: Invalid argument
-bash: echo: write error: Invalid argument
-bash: echo: write error: Invalid argument

Andrew Petterson, can you please build 2.6.24-rc3 from clean source tree
and retry?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-27 Thread Andrew Morton

On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> Hi,
> > 
> > (cc linux-scsi, for sym53c8xx)
> > 
> >> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
> > 
> > I assume this is a post-2.6.23 regression?
> > 
> >> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
> >> NIP: c002f02c LR: d01414fc CTR: c002f018
> >> REGS: c0077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
> >> MSR: 80009032   CR: 24022088  XER: 
> >> TASK = c0077cbd8000[375] 'insmod' THREAD: c0077cbec000 CPU: 1
> >> GPR00: d01414fc c0077cbef330 c052b930 d80080002014 
> >> GPR04: d8008000202c  c0077ca1cb00 d014ce54 
> >> GPR08: c0077ca1c63c  002a c002f018 
> >> GPR12: d0143610 c0473d00 
> >> NIP [c002f02c] .ioread8+0x14/0x60
> >> LR [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> >> Call Trace:
> >> [c0077cbef330] [c0077cbef3c0] 0xc0077cbef3c0 (unreliable)
> >> [c0077cbef3a0] [d01414fc] .sym_hcb_attach+0x1188/0x1378 
> >> [sym53c8xx]
> >> [c0077cbef470] [d01395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> >> [c0077cbef710] [c01bc118] .pci_device_probe+0x124/0x1b0
> >> [c0077cbef7b0] [c0221138] .driver_probe_device+0x144/0x20c
> >> [c0077cbef850] [c0221450] .__driver_attach+0xcc/0x154
> >> [c0077cbef8e0] [c021ff94] .bus_for_each_dev+0x7c/0xd4
> >> [c0077cbef9a0] [c0220e9c] .driver_attach+0x28/0x40
> >> [c0077cbefa20] [c02204d8] .bus_add_driver+0x90/0x228
> >> [c0077cbefac0] [c0221858] .driver_register+0x94/0xb0
> >> [c0077cbefb40] [c01bc430] .__pci_register_driver+0x6c/0xcc
> >> [c0077cbefbe0] [d0143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> >> [c0077cbefc80] [c008ce80] .sys_init_module+0x17c4/0x1958
> >> [c0077cbefe30] [c000872c] syscall_exit+0x0/0x40
> >> Instruction dump:
> >> 6000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
> >> f8010010 f821ff91 7c0004ac 8923 <0c09> 4c00012c 79290620 2f8900ff 
> > 
> > I see no obvious lockup sites near the end of sym_hcb_attach().  Maybe it's
> > being called lots of times from a higher level..  Do the traces all look
> > the same?
> 
> Hi Andrew,
> 
> I see this call trace twice and both looks similar and on another reboot
> the following trace is seen twice in different cpu
> 
> BUG: soft lockup detected on CPU#3!
> Call Trace:
> [C0003FEDEDA0] [C0010220] .show_stack+0x68/0x1b0 (unreliable)
> [C0003FEDEE40] [C00A061C] .softlockup_tick+0xf0/0x13c
> [C0003FEDEEF0] [C0072E2C] .run_local_timers+0x1c/0x30
> [C0003FEDEF70] [C0022FA0] .timer_interrupt+0xa8/0x488
> [C0003FEDF050] [C00034EC] decrementer_common+0xec/0x100
> --- Exception: 901 at .ioread8+0x14/0x60
> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
> [C0003FEDF340] [D02B3BC0] 0xd02b3bc0 (unreliable)
> [C0003FEDF3B0] [D029A3C0] .sym_hcb_attach+0x1194/0x1384 
> [sym53c8xx]
> [C0003FEDF480] [D0291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
> [C0003FEDF710] [C01B65A4] .pci_device_probe+0x13c/0x1dc
> [C0003FEDF7D0] [C0219A0C] .driver_probe_device+0xa0/0x15c
> [C0003FEDF870] [C0219C64] .__driver_attach+0xb4/0x138
> [C0003FEDF900] [C021913C] .bus_for_each_dev+0x7c/0xd4
> [C0003FEDF9C0] [C02198B0] .driver_attach+0x28/0x40
> [C0003FEDFA40] [C0218BA4] .bus_add_driver+0x98/0x18c
> [C0003FEDFAE0] [C021A064] .driver_register+0xa8/0xc4
> [C0003FEDFB60] [C01B68AC] .__pci_register_driver+0x5c/0xa4
> [C0003FEDFBF0] [D029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
> [C0003FEDFC90] [C008D1F4] .sys_init_module+0x1764/0x1998
> [C0003FEDFE30] [C000869C] syscall_exit+0x0/0x40
> 

hm, odd.

Can you look up sym_hcb_attach+0x1194/0x1384 in gdb?  Something like

- Enable CONFIG_DEBUG_INFO

- gdb sym53c8xx.o

(gdb) p sym_hcb_attach

(gdb) p/x 0xsomething + 0x1194

(gdb) l *0xsomethingelse

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-27 Thread Kamalesh Babulal

Andrew Morton wrote:
> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Hi,
> 
> (cc linux-scsi, for sym53c8xx)
> 
>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
> 
> I assume this is a post-2.6.23 regression?
> 
>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
>> NIP: c002f02c LR: d01414fc CTR: c002f018
>> REGS: c0077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
>> MSR: 80009032   CR: 24022088  XER: 
>> TASK = c0077cbd8000[375] 'insmod' THREAD: c0077cbec000 CPU: 1
>> GPR00: d01414fc c0077cbef330 c052b930 d80080002014 
>> GPR04: d8008000202c  c0077ca1cb00 d014ce54 
>> GPR08: c0077ca1c63c  002a c002f018 
>> GPR12: d0143610 c0473d00 
>> NIP [c002f02c] .ioread8+0x14/0x60
>> LR [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>> Call Trace:
>> [c0077cbef330] [c0077cbef3c0] 0xc0077cbef3c0 (unreliable)
>> [c0077cbef3a0] [d01414fc] .sym_hcb_attach+0x1188/0x1378 
>> [sym53c8xx]
>> [c0077cbef470] [d01395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>> [c0077cbef710] [c01bc118] .pci_device_probe+0x124/0x1b0
>> [c0077cbef7b0] [c0221138] .driver_probe_device+0x144/0x20c
>> [c0077cbef850] [c0221450] .__driver_attach+0xcc/0x154
>> [c0077cbef8e0] [c021ff94] .bus_for_each_dev+0x7c/0xd4
>> [c0077cbef9a0] [c0220e9c] .driver_attach+0x28/0x40
>> [c0077cbefa20] [c02204d8] .bus_add_driver+0x90/0x228
>> [c0077cbefac0] [c0221858] .driver_register+0x94/0xb0
>> [c0077cbefb40] [c01bc430] .__pci_register_driver+0x6c/0xcc
>> [c0077cbefbe0] [d0143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>> [c0077cbefc80] [c008ce80] .sys_init_module+0x17c4/0x1958
>> [c0077cbefe30] [c000872c] syscall_exit+0x0/0x40
>> Instruction dump:
>> 6000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
>> f8010010 f821ff91 7c0004ac 8923 <0c09> 4c00012c 79290620 2f8900ff 
> 
> I see no obvious lockup sites near the end of sym_hcb_attach().  Maybe it's
> being called lots of times from a higher level..  Do the traces all look
> the same?

Hi Andrew,

I see this call trace twice and both looks similar and on another reboot
the following trace is seen twice in different cpu

BUG: soft lockup detected on CPU#3!
Call Trace:
[C0003FEDEDA0] [C0010220] .show_stack+0x68/0x1b0 (unreliable)
[C0003FEDEE40] [C00A061C] .softlockup_tick+0xf0/0x13c
[C0003FEDEEF0] [C0072E2C] .run_local_timers+0x1c/0x30
[C0003FEDEF70] [C0022FA0] .timer_interrupt+0xa8/0x488
[C0003FEDF050] [C00034EC] decrementer_common+0xec/0x100
--- Exception: 901 at .ioread8+0x14/0x60
LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
[C0003FEDF340] [D02B3BC0] 0xd02b3bc0 (unreliable)
[C0003FEDF3B0] [D029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
[C0003FEDF480] [D0291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
[C0003FEDF710] [C01B65A4] .pci_device_probe+0x13c/0x1dc
[C0003FEDF7D0] [C0219A0C] .driver_probe_device+0xa0/0x15c
[C0003FEDF870] [C0219C64] .__driver_attach+0xb4/0x138
[C0003FEDF900] [C021913C] .bus_for_each_dev+0x7c/0xd4
[C0003FEDF9C0] [C02198B0] .driver_attach+0x28/0x40
[C0003FEDFA40] [C0218BA4] .bus_add_driver+0x98/0x18c
[C0003FEDFAE0] [C021A064] .driver_register+0xa8/0xc4
[C0003FEDFB60] [C01B68AC] .__pci_register_driver+0x5c/0xa4
[C0003FEDFBF0] [D029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
[C0003FEDFC90] [C008D1F4] .sys_init_module+0x1764/0x1998
[C0003FEDFE30] [C000869C] syscall_exit+0x0/0x40


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386 IOAPIC: de-fang IRQ compression

2007-11-27 Thread Eric W. Biederman

Len Brown <[EMAIL PROTECTED]> writes:

> commit c434b7a6aedfe428ad17cd61b21b125a7b7a29ce
> (x86: avoid wasting IRQs for PCI devices)
> created a concept of "IRQ compression" on i386
> to conserve IRQ numbers on systems with many
> sparsely populated IO APICs.
> 
> The same scheme was also added to x86_64,
> but later removed when x86_64 recieved an IRQ over-haul
> that made it unnecessary -- including per-CPU
> IRQ vectors that greatly increased the IRQ capacity
> on the machine.
> 
> i386 has not received the analogous over-haul,
> and thus a previous attempt to delete IRQ compression
> from i386 was rejected on the theory that there may
> exist machines that actually need it.  The fact is
> that the author of IRQ compression patch was unable
> to confirm the actual existence of such a system.
> 
> As a result, all i386 kernels with IOAPIC support
> pay the following:
> 
> 1. confusion
> 
> IRQ compression re-names the traditional IOAPIC
> pin numbers (aka ACPI GSI's) into sequential IRQ #s:
> 
> ACPI: PCI Interrupt :00:1c.0[A] -> GSI 20 (level, low) -> IRQ 16
> ACPI: PCI Interrupt :00:1c.1[B] -> GSI 21 (level, low) -> IRQ 17
> ACPI: PCI Interrupt :00:1c.2[C] -> GSI 22 (level, low) -> IRQ 18
> ACPI: PCI Interrupt :00:1c.3[D] -> GSI 23 (level, low) -> IRQ 19
> ACPI: PCI Interrupt :00:1c.4[A] -> GSI 20 (level, low) -> IRQ 16
> 
> This makes /proc/interrupts look different
> depending on system configuration and device probe order.
> It is also different than the x86_64 kernel running
> on the exact same system.  As a result, programmers
> get confused when comparing systems.
> 
> 2. complexity
> 
> The IRQ code in Linux is already overly complex,
> and IRQ compression makes it worse.  There have
> already been two bug workarounds related to IRQ
> compression -- the IRQ0 timer workaround and
> the VIA PCI IRQ workaround.
> 
> 3. size
> 
> All i386 kernels with IOAPIC support contain an int[4096] --
> a 4 page array to contain the renamed IRQs.
> 
> So while the irq compression code on i386 should really
> be deleted -- even before merging the x86_64 irq-overhaul,
> this patch simply disables it on all high volume systems
> to avoid problems #1 and #2 on most all i386 systems.
> 
> A large system with pin numbers >=64 will still have compression
> to conserve limited IRQ numbers for sparse IOAPICS.  However,
> the vast majority of the planet, those with only pin numbers < 64
> will use an identity GSI -> IRQ mapping.
> 
> Signed-off-by: Len Brown <[EMAIL PROTECTED]>

Looks reasonable.  As a further cleanup it might be worth yanking the
Via workaround because we simply can not hit it with your change of
disabling irq compression for the first 64 gsis.

I honestly don't understand the "(gsi == 0 && !timer_uses_ioapic_pin_0)"
test but I do know killing irq compression was safe and worked on
x86_64 so I don't expect any problems there.

Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>




> diff --git a/arch/x86/kernel/mpparse_32.c b/arch/x86/kernel/mpparse_32.c
> index 7a05a7f..468d6ed 100644
> --- a/arch/x86/kernel/mpparse_32.c
> +++ b/arch/x86/kernel/mpparse_32.c
> @@ -1041,13 +1041,14 @@ void __init mp_config_acpi_legacy_irqs (void)
>  }
>  
>  #define MAX_GSI_NUM  4096
> +#define IRQ_COMPRESSION_START64
>  
>  int mp_register_gsi(u32 gsi, int triggering, int polarity)
>  {
>   int ioapic = -1;
>   int ioapic_pin = 0;
>   int idx, bit = 0;
> - static int pci_irq = 16;
> + static int pci_irq = IRQ_COMPRESSION_START;
>   /*
>* Mapping between Global System Interrups, which
>* represent all possible interrupts, and IRQs
> @@ -1086,12 +1087,16 @@ int mp_register_gsi(u32 gsi, int triggering, int
> polarity)
>   if ((1<   Dprintk(KERN_DEBUG "Pin %d-%d already programmed\n",
>   mp_ioapic_routing[ioapic].apic_id, ioapic_pin);
> - return gsi_to_irq[gsi];
> + return (gsi < IRQ_COMPRESSION_START ? gsi : gsi_to_irq[gsi]);
>   }
>  
>   mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<  
> - if (triggering == ACPI_LEVEL_SENSITIVE) {
> + /*
> +  * For GSI >= 64, use IRQ compression
> +  */
> + if ((gsi >= IRQ_COMPRESSION_START)
> + && (triggering == ACPI_LEVEL_SENSITIVE)) {
>   /*
>* For PCI devices assign IRQs in order, avoiding gaps
>* due to unused I/O APIC pins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)

2007-11-27 Thread Richard Knutsson


Vegard Nossum wrote:
General description: kmemcheck will trap every read and write to 
memory that was
allocated dynamically (ie. with kmalloc()). If a memory address is 
read that has

not previously been written to, a message is printed to the kernel log.



diff --git a/arch/x86/kernel/kmemcheck_32.c 
b/arch/x86/kernel/kmemcheck_32.c

new file mode 100644
index 000..9d065b9
--- /dev/null
+++ b/arch/x86/kernel/kmemcheck_32.c
@@ -0,0 +1,290 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int

Not 'static bool'?

+page_is_tracked(struct page *page)
+{
+struct page *head;
+
+if (!page)
+return 0;
+head = compound_head(page);
+if (!head)
+return 0;
+if (!(head->flags & (1 << PG_slab)))
+return 0;
+if (!head->slab)
+return 0;
+if (head->slab->flags & __GFP_NOTRACK)
+return 0;
+return 1;

Why not returning 'false' and 'true'?

+}
+
+static void
+show_addr(uint32_t addr)
+{
+struct page *page;
+pte_t *pte;
+
+if (!addr)
+return;
+page = virt_to_page(addr);
+if (!page_is_tracked(page))
+return;
+
+pte = lookup_address(addr);
+change_page_attr(page, 1, __pgprot(pte->pte_low | _PAGE_VISIBLE));
+__flush_tlb_one(addr);
+}
+
+static void
+hide_addr(uint32_t addr)
+{
+struct page *page;
+pte_t *pte;
+
+if (!addr)
+return;
+page = virt_to_page(addr);
+if (!page_is_tracked(page))
+return;
+pte = lookup_address(addr);
+change_page_attr(page, 1, __pgprot(pte->pte_low & ~_PAGE_VISIBLE));
+__flush_tlb_one(addr);
+}
+
+DEFINE_PER_CPU(uint32_t, kmemcheck_read_addr);
+DEFINE_PER_CPU(uint32_t, kmemcheck_write_addr);
+
+void
+kmemcheck_show(struct pt_regs *regs)
+{
+show_addr(__get_cpu_var(kmemcheck_read_addr));
+show_addr(__get_cpu_var(kmemcheck_write_addr));
+regs->eflags |= TF_MASK;
+}
+
+void
+kmemcheck_hide(struct pt_regs *regs)
+{
+hide_addr(__get_cpu_var(kmemcheck_read_addr));
+hide_addr(__get_cpu_var(kmemcheck_write_addr));
+regs->eflags &= ~TF_MASK;
+}
+
+void
+kmemcheck_hide_pages(struct page *p, unsigned int n)
+{
+unsigned int i;
+
+for(i = 0; i < n; ++i) {
+unsigned long address = (unsigned long) page_address([i]);
+pte_t *pte = lookup_address(address);
+
+change_page_attr([i], 1,
+__pgprot(pte->pte_low & ~_PAGE_VISIBLE));
+__flush_tlb_one(address);
+}
+}
+
+static int

'static bool'?

+opcode_is_prefix(uint8_t b)
+{
+return
+/* Group 1 */
+b == 0xf0 || b == 0xf2 || b == 0xf3
+/* Group 2 */
+|| b == 0x2e || b == 0x36 || b == 0x3e || b == 0x26
+|| b == 0x64 || b == 0x65 || b == 0x2e || b == 0x3e
+/* Group 3 */
+|| b == 0x66
+/* Group 4 */
+|| b == 0x67;
+}
+
+/* This is a VERY crude opcode decoder. We only need to find the size 
of the
+ * load/store that caused our #PF and this should work for all the 
opcodes
+ * that we care about. Moreover, the ones who invented this 
instruction set

+ * should be shot. */
+static unsigned int
+opcode_get_size(const uint8_t *opcode)

Are we not using 'u8' in the kernel?

+{
+const uint8_t *i;
and here. Also, I find the name 'i' a bit confusing in this context, 
since it is not really a counter. What about 'cur', 'prefix_op' or 
something of the sort?

+
+/* Default operand size */
+int operand_size_override = 32;
+
+/* prefixes */
+for (i = opcode; opcode_is_prefix(*i); ++i) {
+if (*i == 0x66)
+operand_size_override = 16;
+}
+
+/* escape opcode */
+if (*i == 0x0f) {
+++i;
+
+if(*i == 0xb6)
Please do, as the previous, 'if ()'. A good checker for these kind of 
things is the 'scripts/checkpatch.pl'...

+return operand_size_override >> 1;
+if(*i == 0xb7)
+return 16;
+}
+
+return (*i & 1) ? operand_size_override : 8;
+}
+
+static uint8_t

and here...

+opcode_get_primary(const uint8_t *opcode)

and here..

+{
+const uint8_t *i;
and here. Also, I find the name 'i' a bit confusing in this context, 
since it is not really a counter. What about 'cur', 'prim_op', 
'prefix_op' or something of the sort?

+
+/* skip prefixes */
+for (i = opcode; opcode_is_prefix(*i); ++i);
+return *i;
+}
+
+static void *address_get_shadow_slab(unsigned long address,
+struct page *head)
+{
+if (!head->slab)
+return NULL;
+
+if (head->slab->flags & __GFP_NOTRACK)
+return NULL;
+
+return (void *) address + (PAGE_SIZE << head->slab->order);
+}
+
+static void *address_get_shadow(unsigned long address)
+{
+struct page *page = virt_to_page(address);
+struct page *head = compound_head(page);
+
+if (!head)
+return NULL;
+
+if (head->flags & (1 << PG_slab))
+return address_get_shadow_slab(address, head);
+
+return NULL;
+}
+
+static int

'static bool'?

+test(void *shadow,

Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-27 Thread Andrew Morton

On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Hi,

(cc linux-scsi, for sym53c8xx)

> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox

I assume this is a post-2.6.23 regression?

> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
> NIP: c002f02c LR: d01414fc CTR: c002f018
> REGS: c0077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
> MSR: 80009032   CR: 24022088  XER: 
> TASK = c0077cbd8000[375] 'insmod' THREAD: c0077cbec000 CPU: 1
> GPR00: d01414fc c0077cbef330 c052b930 d80080002014 
> GPR04: d8008000202c  c0077ca1cb00 d014ce54 
> GPR08: c0077ca1c63c  002a c002f018 
> GPR12: d0143610 c0473d00 
> NIP [c002f02c] .ioread8+0x14/0x60
> LR [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> Call Trace:
> [c0077cbef330] [c0077cbef3c0] 0xc0077cbef3c0 (unreliable)
> [c0077cbef3a0] [d01414fc] .sym_hcb_attach+0x1188/0x1378 
> [sym53c8xx]
> [c0077cbef470] [d01395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> [c0077cbef710] [c01bc118] .pci_device_probe+0x124/0x1b0
> [c0077cbef7b0] [c0221138] .driver_probe_device+0x144/0x20c
> [c0077cbef850] [c0221450] .__driver_attach+0xcc/0x154
> [c0077cbef8e0] [c021ff94] .bus_for_each_dev+0x7c/0xd4
> [c0077cbef9a0] [c0220e9c] .driver_attach+0x28/0x40
> [c0077cbefa20] [c02204d8] .bus_add_driver+0x90/0x228
> [c0077cbefac0] [c0221858] .driver_register+0x94/0xb0
> [c0077cbefb40] [c01bc430] .__pci_register_driver+0x6c/0xcc
> [c0077cbefbe0] [d0143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> [c0077cbefc80] [c008ce80] .sys_init_module+0x17c4/0x1958
> [c0077cbefe30] [c000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 6000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
> f8010010 f821ff91 7c0004ac 8923 <0c09> 4c00012c 79290620 2f8900ff 

I see no obvious lockup sites near the end of sym_hcb_attach().  Maybe it's
being called lots of times from a higher level..  Do the traces all look
the same?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/4 -v6] x86_64 EFI runtime service support: EFI basic runtime service support

2007-11-27 Thread Huang, Ying

On Tue, 2007-11-27 at 02:02 -0800, Andrew Morton wrote:
> > +
> > +static pgd_t save_pgd __initdata;
> > +static unsigned long efi_flags __initdata;
> > +/* efi_lock protects efi physical mode call */
> > +static __initdata DEFINE_SPINLOCK(efi_lock);
> 
> It's peculiar to have a spinlock in __initdata.  Often there just isn't any
> code path by which multiple threads/CPUs can access the same data that
> early in boot.

Yes. This spinlock is used only before efi_enter_virtual_mode, which is
far before smp_init, so this spinlock is unnecessary, I will remove it.

> > +void __init efi_call_phys_prelog(void) __acquires(efi_lock)
> > +{
> > +   unsigned long vaddress;
> > +
> > +   /*
> > +* Lock sequence is different from normal case because
> > +* efi_flags is global
> > +*/
> > +   spin_lock(_lock);
> > +   local_irq_save(efi_flags);
> 
> I think we discussed this before, but I forget the result.  It really
> should be described better in the comments here, because this code leaps out
> and shouts "wrong".
> 
> a) Why not use spin_lock_irqsave()?
> 
> b) If this is an open-coded spin_lock_irqsave() then it gets the two
> operations in the wrong order and is hence deadlockable.
> 
> c) it isn't obvious to the reader that this locking is even needed in
> initial bootup.
> 
> Now I _think_ all these issuses were addressed in discussion.  But unless
> the code comment knocks them all on the head (it doesn't) then it will all
> come up again.

Because the efi_lock will removed, so this will be no longer a problem.

> > +   early_runtime_code_mapping_set_exec(1);
> > +   vaddress = (unsigned long)__va(0x0UL);
> > +   pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
> > +   set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
> > +   global_flush_tlb();
> > +}
> > +
> > +void __init efi_call_phys_epilog(void) __releases(efi_lock)
> > +{
> > +   /*
> > +* After the lock is released, the original page table is restored.
> > +*/
> > +   set_pgd(pgd_offset_k(0x0UL), save_pgd);
> > +   early_runtime_code_mapping_set_exec(0);
> > +   global_flush_tlb();
> > +   local_irq_restore(efi_flags);
> > +   spin_unlock(_lock);
> > +}
> > +
> >
> > ...
> >
> > +void __init runtime_code_page_mkexec(void)
> > +{
> > +   efi_memory_desc_t *md;
> 
> I thought we were going to use `struct efi_memory_desc'?

There is even no struct efi_memory_desc definition in
include/linux/efi.h. I can fix all such coding style problem across all
platforms if desired in another patchset.

> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define EFI_DEBUG  0
> 
> I suspect you really want to turn on debug mode during initial public
> testing.  Verify that it generates sufficient information for you to be
> able to fix problems if/when people report them.

OK, I will do it.

> > +void __init efi_init(void)
> > +{
> > +   efi_config_table_t *config_tables;
> > +   efi_runtime_services_t *runtime;
> > +   efi_char16_t *c16;
> > +   char vendor[100] = "unknown";
> > +   int i = 0;
> > +   void *tmp;
> > +
> > +   memset(, 0, sizeof(efi));
> > +   memset(_phys, 0, sizeof(efi_phys));
> 
> These were already zeroed by the compiler (I have a feeling I said that a
> couple of months back)

I will fix it.

> > +#ifdef CONFIG_X86_32
> 
> Strictly this isn't needed until [patch 4/4] but that's a very minor point.
> 
> > +   efi_phys.systab = (efi_system_table_t *)boot_params.efi_info.efi_systab;
> > +   memmap.phys_map = (void *)boot_params.efi_info.efi_memmap;
> > +#else
> > +   efi_phys.systab = (efi_system_table_t *)
> > +   (boot_params.efi_info.efi_systab |
> > +((__u64)boot_params.efi_info.efi_systab_hi<<32));
> > +   memmap.phys_map = (void *)
> > +   (boot_params.efi_info.efi_memmap |
> > +((__u64)boot_params.efi_info.efi_memmap_hi<<32));
> > +#endif
> > +   memmap.nr_map = boot_params.efi_info.efi_memmap_size /
> > +   boot_params.efi_info.efi_memdesc_size;
> > +   memmap.desc_version = boot_params.efi_info.efi_memdesc_version;
> > +   memmap.desc_size = boot_params.efi_info.efi_memdesc_size;
> > +
> > +   efi.systab = efi_early_ioremap((unsigned long)efi_phys.systab,
> > +  sizeof(efi_system_table_t));
> > +   if (efi.systab == NULL)
> > +   printk(KERN_ERR "Woah! Couldn't map the EFI systema table.\n");
> 
> s/systema/system/.
> 
> I'd be inclined to s/Woah! //, too.  Sorry, I'm boring.

I will fix it.

> > +   memcpy(_systab, efi.systab, sizeof(efi_system_table_t));
> > +   efi_early_iounmap(efi.systab, sizeof(efi_system_table_t));
> > +   efi.systab = _systab;
> > +
> > +   /*
> > +* Verify the EFI Table
> > +*/
> > +   if (efi.systab->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
> > +   printk(KERN_ERR "Woah! EFI system table "
> > +  "signature incorrect\n");
> > +   if ((efi.systab->hdr.revision >> 16) == 0)
> > +   printk(KERN_ERR "Warning: EFI system table version "
> > +

RE: ACPI related Warning in 2.6.24-rc3-git2

2007-11-27 Thread Pallipadi, Venkatesh


Yakui,

Can you look at this. Seems to be coming from commit f79f06ab9f86
FixedHW support tries to read MSR with interrupts disabled.

Thanks,
Venki 

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of 
>Rafael J. Wysocki
>Sent: Tuesday, November 27, 2007 7:37 AM
>To: Lukas Hejtmanek
>Cc: linux-kernel@vger.kernel.org; ACPI Devel Maling List; Len 
>Brown; Alexey Starikovskiy
>Subject: Re: ACPI related Warning in 2.6.24-rc3-git2
>
>On Tuesday, 27 of November 2007, Lukas Hejtmanek wrote:
>> Hello,
>> 
>> in recent kernel, I got the following warnings while 
>booting. It's ACPI
>> related. Does anybode care? Lenovo ThinkPad T61 (6465CTO).
>
>Appropriate Ccs added.
>
>Did it happen before?
>
>> [   13.114814] Pid: 1, comm: swapper Not tainted 2.6.24-rc3-git2 #3
>> [   13.114885] 
>> [   13.114885] Call Trace:
>> [   13.115020]  [] 
>acpi_ut_update_ref_count+0x50/0x9d
>> [   13.115095]  [] 
>smp_call_function_single+0xbd/0xd0
>> [   13.115169]  [] _rdmsr_on_cpu+0x5c/0x60
>> [   13.115241]  []
>> acpi_processor_get_throttling_ptc+0xf3/0x158
>> [   13.115323]  []
>> acpi_processor_get_throttling_info+0x460/0x4af
>> [   13.115406]  [] acpi_processor_start+0x54a/0x606
>> [   13.115478]  [] ifind+0x48/0xd0
>> [   13.115550]  [] 
>acpi_start_single_object+0x24/0x46
>> [   13.115622]  [] acpi_device_probe+0x7d/0x91
>> [   13.115694]  [] driver_probe_device+0x9c/0x1b0
>> [   13.115766]  [] __driver_attach+0xc9/0xd0
>> [   13.115840]  [] __driver_attach+0x0/0xd0
>> [   13.115924]  [] bus_for_each_dev+0x4d/0x80
>> [   13.115994]  [] bus_add_driver+0xac/0x220
>> [   13.116080]  [] acpi_processor_init+0x8f/0xfc
>> [   13.116153]  [] kernel_init+0x154/0x330
>> [   13.116225]  [] child_rip+0xa/0x12
>> [   13.116295]  [] kernel_init+0x0/0x330
>> [   13.116365]  [] child_rip+0x0/0x12
>> [   13.116435] 
>> [   13.116504] WARNING: at arch/x86/kernel/smp_64.c:397
>> smp_call_function_mask()
>> [   13.116577] Pid: 1, comm: swapper Not tainted 2.6.24-rc3-git2 #3
>> [   13.116648] 
>> [   13.116648] Call Trace:
>> [   13.116779]  [] 
>acpi_ut_update_ref_count+0x50/0x9d
>> [   13.116851]  [] smp_call_function_mask+0x8f/0xa0
>> [   13.116923]  [] _rdmsr_on_cpu+0x5c/0x60
>> [   13.116994]  []
>> acpi_processor_get_throttling_ptc+0xf3/0x158
>> [   13.117077]  []
>> acpi_processor_get_throttling_info+0x460/0x4af
>> [   13.117169]  [] acpi_processor_start+0x54a/0x606
>> [   13.117248]  [] ifind+0x48/0xd0
>> [   13.117330]  [] 
>acpi_start_single_object+0x24/0x46
>> [   13.117402]  [] acpi_device_probe+0x7d/0x91
>> [   13.117488]  [] driver_probe_device+0x9c/0x1b0
>> [   13.117559]  [] __driver_attach+0xc9/0xd0
>> [   13.117631]  [] __driver_attach+0x0/0xd0
>> [   13.117715]  [] bus_for_each_dev+0x4d/0x80
>> [   13.117786]  [] bus_add_driver+0xac/0x220
>> [   13.117856]  [] acpi_processor_init+0x8f/0xfc
>> [   13.117941]  [] kernel_init+0x154/0x330
>> [   13.118018]  [] child_rip+0xa/0x12
>> [   13.118088]  [] kernel_init+0x0/0x330
>> [   13.118158]  [] child_rip+0x0/0x12
>> [   13.118227] 
>> [...]
>> [   13.124714] WARNING: at arch/x86/kernel/smp_64.c:427
>> smp_call_function_single()
>> [   13.124798] Pid: 1, comm: swapper Not tainted 2.6.24-rc3-git2 #3
>> [   13.125460] 
>> [   13.125461] Call Trace:
>> [   13.125592]  [] 
>acpi_ut_update_ref_count+0x50/0x9d
>> [   13.125665]  [] 
>smp_call_function_single+0xbd/0xd0
>> [   13.125737]  [] _rdmsr_on_cpu+0x5c/0x60
>> [   13.125807]  []
>> acpi_processor_get_throttling_ptc+0xf3/0x158
>> [   13.125903]  []
>> acpi_processor_get_throttling_info+0x460/0x4af
>> [   13.125999]  [] acpi_processor_start+0x54a/0x606
>> [   13.126071]  [] acpi_processor_add+0x24/0x6b
>> [   13.126142]  [] 
>acpi_start_single_object+0x24/0x46
>> [   13.126214]  [] acpi_device_probe+0x7d/0x91
>> [   13.126285]  [] driver_probe_device+0x9c/0x1b0
>> [   13.126357]  [] __driver_attach+0xc9/0xd0
>> [   13.126441]  [] __driver_attach+0x0/0xd0
>> [   13.126518]  [] bus_for_each_dev+0x4d/0x80
>> [   13.126600]  [] bus_add_driver+0xac/0x220
>> [   13.126670]  [] acpi_processor_init+0x8f/0xfc
>> [   13.126755]  [] kernel_init+0x154/0x330
>> [   13.126832]  [] child_rip+0xa/0x12
>> [   13.126916]  [] kernel_init+0x0/0x330
>> [   13.126986]  [] child_rip+0x0/0x12
>> [   13.127059] 
>> [   13.127124] WARNING: at arch/x86/kernel/smp_64.c:397
>> smp_call_function_mask()
>> [   13.127197] Pid: 1, comm: swapper Not tainted 2.6.24-rc3-git2 #3
>> [   13.127267] 
>> [   13.127268] Call Trace:
>> [   13.127398]  [] 
>acpi_ut_update_ref_count+0x50/0x9d
>> [   13.127473]  [] smp_call_function_mask+0x8f/0xa0
>> [   13.127545]  [] _rdmsr_on_cpu+0x5c/0x60
>> [   13.127616]  []
>> acpi_processor_get_throttling_ptc+0xf3/0x158
>> [   13.127699]  []
>> acpi_processor_get_throttling_info+0x460/0x4af
>> [   13.127782]  [] acpi_processor_start+0x54a/0x606
>> [   13.127861]  [] acpi_processor_add+0x24/0x6b
>> [   13.127933]  [] 
>acpi_start_single_object+0x24/0x46
>> [   13.128005]  []

[BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-27 Thread Kamalesh Babulal

Hi,

Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox

BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
NIP: c002f02c LR: d01414fc CTR: c002f018
REGS: c0077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
MSR: 80009032   CR: 24022088  XER: 
TASK = c0077cbd8000[375] 'insmod' THREAD: c0077cbec000 CPU: 1
GPR00: d01414fc c0077cbef330 c052b930 d80080002014 
GPR04: d8008000202c  c0077ca1cb00 d014ce54 
GPR08: c0077ca1c63c  002a c002f018 
GPR12: d0143610 c0473d00 
NIP [c002f02c] .ioread8+0x14/0x60
LR [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
Call Trace:
[c0077cbef330] [c0077cbef3c0] 0xc0077cbef3c0 (unreliable)
[c0077cbef3a0] [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
[c0077cbef470] [d01395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
[c0077cbef710] [c01bc118] .pci_device_probe+0x124/0x1b0
[c0077cbef7b0] [c0221138] .driver_probe_device+0x144/0x20c
[c0077cbef850] [c0221450] .__driver_attach+0xcc/0x154
[c0077cbef8e0] [c021ff94] .bus_for_each_dev+0x7c/0xd4
[c0077cbef9a0] [c0220e9c] .driver_attach+0x28/0x40
[c0077cbefa20] [c02204d8] .bus_add_driver+0x90/0x228
[c0077cbefac0] [c0221858] .driver_register+0x94/0xb0
[c0077cbefb40] [c01bc430] .__pci_register_driver+0x6c/0xcc
[c0077cbefbe0] [d0143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
[c0077cbefc80] [c008ce80] .sys_init_module+0x17c4/0x1958
[c0077cbefe30] [c000872c] syscall_exit+0x0/0x40
Instruction dump:
6000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
f8010010 f821ff91 7c0004ac 8923 <0c09> 4c00012c 79290620 2f8900ff 

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] i386 IOAPIC: de-fang IRQ compression

2007-11-27 Thread Len Brown

commit c434b7a6aedfe428ad17cd61b21b125a7b7a29ce
(x86: avoid wasting IRQs for PCI devices)
created a concept of "IRQ compression" on i386
to conserve IRQ numbers on systems with many
sparsely populated IO APICs.

The same scheme was also added to x86_64,
but later removed when x86_64 recieved an IRQ over-haul
that made it unnecessary -- including per-CPU
IRQ vectors that greatly increased the IRQ capacity
on the machine.

i386 has not received the analogous over-haul,
and thus a previous attempt to delete IRQ compression
from i386 was rejected on the theory that there may
exist machines that actually need it.  The fact is
that the author of IRQ compression patch was unable
to confirm the actual existence of such a system.

As a result, all i386 kernels with IOAPIC support
pay the following:

1. confusion

IRQ compression re-names the traditional IOAPIC
pin numbers (aka ACPI GSI's) into sequential IRQ #s:

ACPI: PCI Interrupt :00:1c.0[A] -> GSI 20 (level, low) -> IRQ 16
ACPI: PCI Interrupt :00:1c.1[B] -> GSI 21 (level, low) -> IRQ 17
ACPI: PCI Interrupt :00:1c.2[C] -> GSI 22 (level, low) -> IRQ 18
ACPI: PCI Interrupt :00:1c.3[D] -> GSI 23 (level, low) -> IRQ 19
ACPI: PCI Interrupt :00:1c.4[A] -> GSI 20 (level, low) -> IRQ 16

This makes /proc/interrupts look different
depending on system configuration and device probe order.
It is also different than the x86_64 kernel running
on the exact same system.  As a result, programmers
get confused when comparing systems.

2. complexity

The IRQ code in Linux is already overly complex,
and IRQ compression makes it worse.  There have
already been two bug workarounds related to IRQ
compression -- the IRQ0 timer workaround and
the VIA PCI IRQ workaround.

3. size

All i386 kernels with IOAPIC support contain an int[4096] --
a 4 page array to contain the renamed IRQs.

So while the irq compression code on i386 should really
be deleted -- even before merging the x86_64 irq-overhaul,
this patch simply disables it on all high volume systems
to avoid problems #1 and #2 on most all i386 systems.

A large system with pin numbers >=64 will still have compression
to conserve limited IRQ numbers for sparse IOAPICS.  However,
the vast majority of the planet, those with only pin numbers < 64
will use an identity GSI -> IRQ mapping.

Signed-off-by: Len Brown <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/mpparse_32.c b/arch/x86/kernel/mpparse_32.c
index 7a05a7f..468d6ed 100644
--- a/arch/x86/kernel/mpparse_32.c
+++ b/arch/x86/kernel/mpparse_32.c
@@ -1041,13 +1041,14 @@ void __init mp_config_acpi_legacy_irqs (void)
 }
 
 #define MAX_GSI_NUM4096
+#define IRQ_COMPRESSION_START  64
 
 int mp_register_gsi(u32 gsi, int triggering, int polarity)
 {
int ioapic = -1;
int ioapic_pin = 0;
int idx, bit = 0;
-   static int pci_irq = 16;
+   static int pci_irq = IRQ_COMPRESSION_START;
/*
 * Mapping between Global System Interrups, which
 * represent all possible interrupts, and IRQs
@@ -1086,12 +1087,16 @@ int mp_register_gsi(u32 gsi, int triggering, int 
polarity)
if ((1<= 64, use IRQ compression
+*/
+   if ((gsi >= IRQ_COMPRESSION_START)
+   && (triggering == ACPI_LEVEL_SENSITIVE)) {
/*
 * For PCI devices assign IRQs in order, avoiding gaps
 * due to unused I/O APIC pins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: __rcu_process_callbacks() in Linux 2.6

2007-11-27 Thread Paul E. McKenney

On Tue, Nov 27, 2007 at 05:49:15PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 26, 2007 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 26, 2007 at 02:48:08PM -0800, James Huang wrote:
> > > > From: James Huang [mailto:[EMAIL PROTECTED]
> > > > Sent: Monday, November 26, 2007 2:21 PM
> > > > To: James Huang
> > > > Subject: Fw: __rcu_process_callbacks() in Linux 2.6
> > > > 
> > > > - Forwarded Message 
> > > > From: Manfred Spraul <[EMAIL PROTECTED]>
> > > > To: James Huang <[EMAIL PROTECTED]>
> > > > Cc: Paul E. McKenney <[EMAIL PROTECTED]>; linux-
> > > > [EMAIL PROTECTED]
> > > > Sent: Monday, November 26, 2007 10:28:37 AM
> > > > Subject: __rcu_process_callbacks() in Linux 2.6
> > > > 
> > > > Hi James,
> > > > 
> > > > If I understand the issue correctly, then the race is:
> > > > 
> > > > step 1: cpu 1: starts a new rcu batch (i.e. rcp->cur++, smb_mb)
> > > > 
> > > > step 2: cpu 2: completes the quiet state
> > > > step 3: cpu 2: reads pointer 0x123 (ptr to a rcu protected struct)
> > > > 
> > > > step 4: cpu 3: call_rcu(0x123): rcu protected struct added to
> > > rdp->nxtlist
> > > > step 5: cpu 3: moves a new batch into rdp->curlist, rdp->batch = rcp-
> > > > >cur+1.
> > > > xxx Problem: where is the smp_rmb() that guarantees that
> > > > xxx  update to rcp->cur from step 1 is seen by cpu 3?
> > > > step 6: cpu 3: completes quiet state
> > > > step 7: cpu 3: struct 0x123 destroyed
> > > > 
> > > > step 8: cpu 2: accesses pointer 0x123, but the struct is already
> > > destroyed
> > > > 
> > > > James: Is that the race?
> > > 
> > > 
> > > [James Huang] 
> > > 
> > > Yes, this is the race condition that I am concerned about.
> > > 
> > > 
> > > > 
> > > > I agree with Paul, there are smb_rmb's on cpu 3 between Step 1 and
> > > Step 5:
> > > > Either the test_and_set_bit in tasklet_action for rcu_process_callback
> > > > if step 4 happens before the tasklet or somewhere in the irq handler
> > > > path if step 4 happens in an irq handler that interrupted
> > > > rcu_process_callback.
> > > > 
> > > > Thus theoretically no additional smb_rmb() should be necessary.
> > > > What is missing is proper documentation.
> > > > 
> > > 
> > > 
> > > [James Huang] 
> > > 
> > > Is it true that a smb_rmb() before a read operation (say from variable
> > > X) will guarantee that the read will always retrieve the most "current"
> > > value of X?   I can not find such a guarantee in atomic_ops.txt or
> > > memory-barriers.txt under Linux's documentation directory.  What is
> > > described in both documents is relative ordering, e.g.
> > > 
> > > CPU1   CPU2
> > >-- --
> > >   write X = x1
> > >   smp_wmb()  
> > >   write Y = y1 
> > > 
> > >   read Y
> > >   smp_rmb()
> > >   read X
> > > 
> > > Then CPU2 will read X with a value of x1 if it reads Y with a value of
> > > y1.
> > > 
> > > Please point me to the right section in the document if smp_rmb() does
> > > provide such a guarantee.
> > 
> > You are correct, smp_rmb() is about ordering rather than about any sort
> > of immediacy.  For one thing, it can be quite difficult to say exactly what
> > the most "current" version of X might be at a given point in time from
> > the viewpoint of a given CPU -- the different CPUs might well disagree as
> > to what the "current" version is for awhile (though they are guaranteed
> > to come to agreement).
> > 
> > > Thanks,
> > > -- James Huang
> > > 
> > > > I'm analyzing the code right now:
> > > > Is it really true that typically a cpu only completes data in every
> > > other
> > > > rcu
> > > > cycle? I.e. that most structures are stored in the rcu callback list
> > > until
> > > > two
> > > > quiet states happened?
> > 
> > That is correct.  This does mean that we should be able to leverage
> > locking primitives and memory barriers executed from the scheduling
> > clock interrupt.
> 
> And I managed to get some time on a 64-CPU POWER5+ system.  Been running
> rcutorture for about 2.5 hours without a failure (128 reader processes)
> running through not quite 1.5M RCU updates.  Of course, this is not
> proof that the Classic RCU implementation works, but is should provide
> some reassurance.
> 
> I will keep it running until I get kicked off (probably rather soon).

More than seven hours, more than 4M RCU updates without failure.
Someone else's turn for the machine.

Again, not proof, but at least some reassurance.

Thanx, Paul
> 
> > > > I've tried to track the values of rcp->cur and rdp->batch.
> > > > If next_pending is set, then cpu_quiet() immetiately starts
> > > > the next rcu cycle and a cpu cannot both complete the currently
> > > > pending rcu callbacks and add new callbacks to the next cycle,
> > > > thus a cpu

Re: Fw: Re: [PATCH 1/3] signal(i386): alternative signal stack wraparound occurs

2007-11-27 Thread Shi Weihua



Roland McGrath wrote::
> cf http://lkml.org/lkml/2007/10/3/41
> 
> To summarize: on Linux, SA_ONSTACK decides whether you are already on the
> signal stack based on the value of the SP at the time of a signal.  If
> you are not already inside the range, you are not "on the signal stack"
> and so the new signal handler frame starts over at the base of the signal
> stack.
> 
> sigaltstack (and sigstack before it) was invented in BSD.  There, the
> SA_ONSTACK behavior has always been different.  It uses a kernel state
> flag to decide, rather than the SP value.  When you first take an
> SA_ONSTACK signal and switch to the alternate signal stack, it sets the
> SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
> Thereafter you are "on the signal stack" and don't switch SP before
> pushing a handler frame no matter what the SP value is.  Only when you
> sigreturn from the original handler context do you clear the SS_ONSTACK
> flag so that a new handler frame will start over at the base of the
> alternate signal stack.
> 
> The undesireable effect of the Linux behavior is that an overflow of the
> alternate signal stack can not only go undetected, but lead to a ring
> buffer effect of clobbering the original handler frame at the base of the
> signal stack for each successive signal that comes just after the
> overflow.  This is what Shi Weihua's test case demonstrates.  Normally
> this does not come up because of the signal mask, but the test case uses
> SA_NODEFER for its SIGSEGV handler.
> 
> The other subtle part of the existing Linux semantics is that a simple
> longjmp out of a signal handler serves to take you off the signal stack
> in a safe and reliable fashion without having used sigreturn (nor having
> just returned from the handler normally, which means the same).  After
> the longjmp (or even informal stack switching not via any proper libc or
> kernel interface), the alternate signal stack stands ready to be used
> again.
> 
> A paranoid program would allocate a PROT_NONE red zone around its
> alternate signal stack.  Then a small overflow would trigger a SIGSEGV in
> handler setup, and be fatal (core dump) whether or not SIGSEGV is
> blocked.  As with thread stack red zones, that cannot catch all overflows
> (or underflows).  e.g., a local array as large as page size allocated in
> a function called from a handler, but not actually touched before more
> calls push more stack, could cause an overflow that silently pushes into
> some unrelated allocated pages.
> 
> The BSD behavior does not do anything in particular about overflow.  But
> it does at least avoid the wraparound or "ring buffer effect", so you'll
> just get a straightforward all-out overflow down your address space past
> the low end of the alternate signal stack.  I don't know what the BSD
> behavior is for longjmp out of an SA_ONSTACK handler.
> 
> The POSIX wording relating to sigaltstack is pretty minimal.  I don't
> think it speaks to this issue one way or another.  (The program that
> overflows its stack is clearly in undefined behavior territory of one
> sort or another anyhow.)
> 
> Given the longjmp issue and the potential for highly subtle complications
> in existing programs relying on this in arcane ways deep in their code, I
> am very dubious about changing the behavior to the BSD style persistent
> flag.  I think Shi Weihua's patches have a similar effect by tracking the
> SP used in the last handler setup.
> 
> I think it would be sensible for the signal handler setup code to detect
> when it would itself be causing a stack overflow.  Maybe something like
> the following patch (untested).  This issue exists in the same way on all
> machines, so ideally they would all do a similar check.  
> 
> When it's the handler function itself or its callees that cause the
> overflow, rather than the signal handler frame setup alone crossing the
> boundary, this still won't help.  But I don't see any way to distinguish
> that from the valid longjmp case.

Thank you for your detailed explanation and patch.
I tested your patch, unfortunately it can not stop all kinds of overflow.

The esp is a user space pointer, so the user can move it. 
For example, the user defines "int i[1000];" in  the handler function.
Please run the following test program, and pay attention to "int i[1000];".
-
#include 
#include 
#include 
#include 

volatile int counter = 0;

#ifdef __i386__
void print_esp()
{
unsigned long esp;
__asm__ __volatile__("movl %%esp, %0":"=g"(esp));

printf("esp = 0x%08lx\n", esp);
}
#endif

void segv_handler()
{
#ifdef __i386__
print_esp();
#endif

int *c = NULL;
counter++;
printf("%d\n", counter);

int i[1000];// <- Pay attention here.

*c = 1; // SEGV
}

int main()
{
int *c = NULL;
char *s = malloc(SIGSTKSZ);
stack_t stack;

m68k build failure

2007-11-27 Thread Andrew Morton


Current Linus tree give me this, with m68k allmodconfig:

FATAL: drivers/bluetooth/btsdio: sizeof(struct sdio_device_id)=12 is not a 
modulo of the size of section __mod_sdio_device_table=30.
Fix definition of struct sdio_device_id in mod_devicetable.h

which I haven't seen before.  Any ideas?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Oops with 2.6.24 git when loading iwl3945

2007-11-27 Thread Andrew Morton

On Tue, 27 Nov 2007 15:43:57 -0500 "Thomas Tuttle" <[EMAIL PROTECTED]> wrote:

> Hey.
> 
> I'm using a git snapshot that gentoo distributed mere hours ago (so I'm
> fairly confident it's current), and I'm getting an Oops when I try to
> load the iwl3945 driver.  I've attached it as plain text.
> 

Let's cc linux-wireless.

> ieee80211_crypt: registered algorithm 'NULL'
> ieee80211_crypt: registered algorithm 'WEP'
> iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for 
> Linux, 1.1.17kds
> iwl3945: Copyright(c) 2003-2007 Intel Corporation
> ACPI: PCI Interrupt :0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device :0c:00.0 to 64
> iwl3945: Detected Intel PRO/Wireless 3945ABG Network Connection
> iwl3945: Tunable channels: 11 802.11bg, 13 802.11a channels
> general protection fault:  [1] SMP 
> CPU 0 
> Modules linked in: iwl3945 ieee80211_crypt_wep ieee80211_crypt mac80211 
> cfg80211 i915 drm snd_seq_oss snd_seq_device snd_seq_midi_event snd_seq 
> snd_pcm_oss snd_mixer_oss snd_hda_intel snd_pcm snd_timer snd soundcore 
> snd_page_alloc coretemp hwmon rtc rfcomm l2cap hci_usb bluetooth uhci_hcd 
> ehci_hcd usbcore evdev b44 ssb psmouse
> Pid: 5230, comm: iwl3945/0 Not tainted 2.6.24-rc3-git2 #1
> RIP: 0010:[]  [] strcmp+0x0/0x1a
> RSP: 0018:810074d93dd8  EFLAGS: 00010287
> RAX: 881c2e80 RBX: 8100752b84e0 RCX: 810074d92000
> RDX: 810074de2073 RSI: 881c086b RDI: 00020074016b
> RBP: 881c8e80 R08: 810074d92000 R09: 810002bf77a0
> R10:  R11: 0001 R12: 88197540
> R13: 810074de2060 R14: 810074de2030 R15: 0038
> FS:  () GS:8059d000() knlGS:
> CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> CR2: 2b313ecef000 CR3: 7d10 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process iwl3945/0 (pid: 5230, threadinfo 810074d92000, task 
> 81007c812000)
> Stack:  88183a0e 810074d52220 810074d518c0 810074de2000
>  881a81e2   74d93e30
>   81007c812000  
> Call Trace:
>  [] :mac80211:ieee80211_rate_control_register+0x3d/0xdd
>  [] :iwl3945:iwl_bg_alive_start+0xc96/0xef9
>  [] thread_return+0x3d/0x81
>  [] :iwl3945:iwl_bg_alive_start+0x0/0xef9
>  [] run_workqueue+0x7f/0x10b
>  [] worker_thread+0x0/0xe4
>  [] worker_thread+0xda/0xe4
>  [] autoremove_wake_function+0x0/0x2e
>  [] kthread+0x47/0x75
>  [] child_rip+0xa/0x12
>  [] kthread+0x0/0x75
>  [] child_rip+0x0/0x12
> 
> 
> Code: 8a 17 89 d0 2a 06 48 ff c6 84 c0 75 09 84 d2 74 05 48 ff c7 
> RIP  [] strcmp+0x0/0x1a
>  RSP 
> 

I have a shiny new t61p which uses iwl3945.  It barely runs carefully
selected bits of the fc8 2.6.23-based kernel and heaven knows what 2.6.24
will do to it.  It will join my Vaio as a
tool-of-kernel-developer-tormenting.  This means that your wireless should
keep working ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] leak in do_ubd_request

2007-11-27 Thread Andrew Morton

On Tue, 27 Nov 2007 17:20:20 -0500 Jeff Dike <[EMAIL PROTECTED]> wrote:

> On Tue, Nov 27, 2007 at 09:29:23PM +0100, Miklos Szeredi wrote:
> > Sure.  The patch works for me, but please check that it also makes
> > sense.
> 
> I did - it's a straight-forward leak and fix.
> 

Do we have any idea which patch this patch fixes?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-27 Thread Andrew Morton

On Mon, 26 Nov 2007 22:36:17 -0600 Robert Hancock <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
> > Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> > system? The problem occurs if I config the kernel with tickless
> > support (i.e. CONFIG_TICK_ONESHOT=y).  (Thanks to "oprofile" for putting me
> > onto this.)
> > 
> > I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> > 2.6.23.9
> > 
> > **
> > *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> > *** This is on a idle machine! ***
> > **
> > 
> > procs ---memory-- ---swap-- -io --system--
> > cpu
> >  r  b   swpd   free   buff  cache   si   sobibo   incs us sy
> > id wa
> >  0  0  0 1925556   4768 11610400   124 26  7538  1  2
> > 96  1
> >  0  0  0 1925556   4768 11610400 0 02 147329  0  1
> > 99  0
> 
> What did oprofile show? It should be able to narrow down what 
> function(s) are responsible for the CPU usage..

Sigh.  I just asked a similar thing.   Let's look at the mail headers:

Message-ID: <[EMAIL PROTECTED]>
...
From: [EMAIL PROTECTED]

From: Robert Hancock <[EMAIL PROTECTED]>
...
In-reply-to: 

Please fix your email client so as to not break threading?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-27 Thread Andrew Morton

On Mon, 26 Nov 2007 20:36:32 -0600 (CST) [EMAIL PROTECTED] wrote:

> Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> system? The problem occurs if I config the kernel with tickless
> support (i.e. CONFIG_TICK_ONESHOT=y).  (Thanks to "oprofile" for putting me
> onto this.)

beware that oprofile can provide misleading results on a paritally-idle
system.  You may have discovered that ksoftirqd is consuming 5-10% of the
non-idle time on that idle system, which is less surprising.

> I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> 2.6.23.9
> 
> **
> *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> *** This is on a idle machine! ***
> **
> 
> procs ---memory-- ---swap-- -io --system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   incs us sy
> id wa
>  0  0  0 1925556   4768 11610400   124 26  7538  1  2
> 96  1
>  0  0  0 1925556   4768 11610400 0 02 147329  0  1
> 99  0
>  0  0  0 1925548   4768 11610400 0 00 154515  0  1
> 99  0
>  0  0  0 1925548   4768 11610400 0 01 153898  0  2
> 98  0
>  0  0  0 1925548   4780 11610400 0163 155216  0  1
> 99  0
>  0  0  0 1925548   4780 11610400 0 01 161718  0  1
> 99  0
>  0  0  0 1925548   4780 11610400 0 00 147587  0  2
> 98  0
>  0  0  0 1925548   4780 11610400 0 01 153524  0  2
> 98  0
>  0  0  0 1925448   4780 11610400 0 00 153434  0  1
> 99  0
>  0  0  0 1925448   4792 11609200 0164 153527  0  2
> 98  0

So what piece of code is scheduling so much?  What does `top' say?  What
does the (sorted) output of oprofile look like?

Did you try shutting down as much userspace code as possible to find out if
some userspace task is misbehaving?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Question regarding mutex locking

2007-11-27 Thread Larry Finger

Robert Hancock wrote:
> Larry Finger wrote:
>> If a particular routine needs to lock a mutex, but it may be entered
>> with that mutex already locked,
>> would the following code be SMP safe?
>>
>> hold_lock = mutex_trylock()
>>
>> ..
>>
>> if (hold_lock)
>> mutex_unlock()
> 
> Not if another task could be acquiring that lock at the same time, which
> is probably the case, otherwise you wouldn't need the mutex. In other
> words, if you're going to do this, you might as well toss the mutex
> entirely as it's about the same effect..
> 

Thanks for the help. Someday, I hope to understand this stuff.

Larry

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch/rfc 1/4] GPIO implementation framework

2007-11-27 Thread eric miao

Sorry, I thought you want a preliminary one, here's the one
tested and including your comments except for one:

if caller holds gpio_lock, gpio_ensure_requested() is actually
safe.

please review:

---
 include/asm-generic/gpio.h |   35 +++-
 lib/gpiolib.c  |  212 +++-
 2 files changed, 104 insertions(+), 143 deletions(-)

diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h
index 869b739..34e60ba 100644
--- a/include/asm-generic/gpio.h
+++ b/include/asm-generic/gpio.h
@@ -14,14 +14,22 @@
  */

 #ifndef ARCH_NR_GPIOS
-#define ARCH_NR_GPIOS  512
+#define ARCH_NR_GPIOS  256
 #endif

-#ifndef ARCH_GPIOS_PER_CHIP
-#define ARCH_GPIOS_PER_CHIPBITS_PER_LONG
+struct seq_file;
+struct gpio_chip;
+
+struct gpio_desc {
+   struct gpio_chip*chip;
+   unsignedis_out:1;
+   unsignedrequested:1;
+#ifdef CONFIG_DEBUG_FS
+   const char  *requested_label;
 #endif
+};

-struct seq_file;
+extern struct gpio_desc gpio_desc[ARCH_NR_GPIOS];

 /**
  * struct gpio_chip - abstract a GPIO controller
@@ -41,8 +49,6 @@ struct seq_file;
  * (base + ngpio - 1).
  * @can_sleep: flag must be set iff get()/set() methods sleep, as they
  * must while accessing GPIO expander chips over I2C or SPI
- * @is_out: bit array where bit N is true iff GPIO with offset N has been
- *  called successfully to configure this as an output
  *
  * A gpio_chip can help platforms abstract various sources of GPIOs so
  * they can all be accessed through a common programing interface.
@@ -70,17 +76,6 @@ struct gpio_chip {
int base;
u16 ngpio;
unsignedcan_sleep:1;
-
-   /* other fields are modified by the gpio library only */
-   DECLARE_BITMAP(is_out, ARCH_GPIOS_PER_CHIP);
-
-#ifdef CONFIG_DEBUG_FS
-   /* fat bits */
-   const char  *requested[ARCH_GPIOS_PER_CHIP];
-#else
-   /* thin bits */
-   DECLARE_BITMAP(requested, ARCH_GPIOS_PER_CHIP);
-#endif
 };

 /* returns true iff a given gpio signal has been requested;
@@ -89,11 +84,7 @@ struct gpio_chip {
 static inline int
 gpiochip_is_requested(struct gpio_chip *chip, unsigned offset)
 {
-#ifdef CONFIG_DEBUG_FS
-   return chip->requested[offset] != NULL;
-#else
-   return test_bit(offset, chip->requested);
-#endif
+   return gpio_desc[chip->base + offset].requested;
 }

 /* add/remove chips */
diff --git a/lib/gpiolib.c b/lib/gpiolib.c
index a853715..6050af5 100644
--- a/lib/gpiolib.c
+++ b/lib/gpiolib.c
@@ -28,39 +28,30 @@
 #defineextra_checks0
 #endif

-/* gpio_lock protects the table of chips and gpio_chip->requested.
+/* gpio_lock protects the table of gpio_desc[] and desc->requested.
  * While any GPIO is requested, its gpio_chip is not removable;
  * each GPIO's "requested" flag serves as a lock and refcount.
  */
 static DEFINE_SPINLOCK(gpio_lock);
-static struct gpio_chip *chips[DIV_ROUND_UP(ARCH_NR_GPIOS,
-   ARCH_GPIOS_PER_CHIP)];
-
+struct gpio_desc gpio_desc[ARCH_NR_GPIOS];

 /* Warn when drivers omit gpio_request() calls -- legal but
  * ill-advised when setting direction, and otherwise illegal.
  */
-static void gpio_ensure_requested(struct gpio_chip *chip, unsigned offset)
+static void gpio_ensure_requested(struct gpio_desc *desc, unsigned gpio)
 {
-   int requested;
-
 #ifdef CONFIG_DEBUG_FS
-   requested = (int) chip->requested[offset];
-   if (!requested)
-   chip->requested[offset] = "[auto]";
-#else
-   requested = test_and_set_bit(offset, chip->requested);
+   if (!desc->requested)
+   desc->requested_label = "(auto)";
 #endif
-
-   if (!requested)
-   printk(KERN_DEBUG "GPIO-%d autorequested\n",
-   chip->base + offset);
+   if (!desc->requested)
+   printk(KERN_DEBUG "GPIO-%d autorequested\n", gpio);
 }

 /* caller holds gpio_lock *OR* gpio is marked as requested */
 static inline struct gpio_chip *gpio_to_chip(unsigned gpio)
 {
-   return chips[gpio / ARCH_GPIOS_PER_CHIP];
+   return gpio_desc[gpio].chip;
 }

 /**
@@ -75,26 +66,25 @@ static inline struct gpio_chip *gpio_to_chip(unsigned gpio)
 int gpiochip_add(struct gpio_chip *chip)
 {
unsigned long   flags;
-   int status = 0;
unsignedid;

-   if (chip->base < 0 || (chip->base % ARCH_GPIOS_PER_CHIP) != 0)
-   return -EINVAL;
-   if ((chip->base + chip->ngpio) >= ARCH_NR_GPIOS)
-   return -EINVAL;
-   if (chip->ngpio > ARCH_GPIOS_PER_CHIP)
+   if (chip->base < 0 || (chip->base  + chip->ngpio) >= ARCH_NR_GPIOS)
return -EINVAL;

spin_lock_irqsave(_lock, flags);

-   id = chip->base / ARCH_GPIOS_PER_CHIP;
-   if (chips[id] == NULL)
-   chips[id] = chip;

[PATCH 2/2] x86: eflags enum

2007-11-27 Thread Roland McGrath


This removes the EF_* enum from .  It is no longer used,
and duplicates the X86_EFLAGS_* constants from .

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 include/asm-x86/ptrace.h |   22 --
 1 files changed, 0 insertions(+), 22 deletions(-)

diff --git a/include/asm-x86/ptrace.h b/include/asm-x86/ptrace.h
index 04204f3..5d73bfc 100644
--- a/include/asm-x86/ptrace.h
+++ b/include/asm-x86/ptrace.h
@@ -116,28 +116,6 @@ extern int ptrace_set_debugreg(struct task_struct *child, 
int n, unsigned long);
 extern unsigned long
 convert_rip_to_linear(struct task_struct *child, struct pt_regs *regs);
 
-enum {
-   EF_CF   = 0x0001,
-   EF_PF   = 0x0004,
-   EF_AF   = 0x0010,
-   EF_ZF   = 0x0040,
-   EF_SF   = 0x0080,
-   EF_TF   = 0x0100,
-   EF_IE   = 0x0200,
-   EF_DF   = 0x0400,
-   EF_OF   = 0x0800,
-   EF_IOPL = 0x3000,
-   EF_IOPL_RING0 = 0x,
-   EF_IOPL_RING1 = 0x1000,
-   EF_IOPL_RING2 = 0x2000,
-   EF_NT   = 0x4000,   /* nested task */
-   EF_RF   = 0x0001,   /* resume */
-   EF_VM   = 0x0002,   /* virtual mode */
-   EF_AC   = 0x0004,   /* alignment */
-   EF_VIF  = 0x0008,   /* virtual interrupt */
-   EF_VIP  = 0x0010,   /* virtual interrupt pending */
-   EF_ID   = 0x0020,   /* id */
-};
 #endif /* __KERNEL__ */
 #endif /* !__i386__ */
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] x86: setup64 eflags constants

2007-11-27 Thread Roland McGrath


This cleans up arch/x86/kernel/setup64.c to use the X86_EFLAGS_* constants
from  instead of the EF_* enum in .

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/setup64.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c
index 3558ac7..51297cc 100644
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -169,7 +169,8 @@ void syscall_init(void)
 #endif
 
/* Flags to clear on syscall */
-   wrmsrl(MSR_SYSCALL_MASK, EF_TF|EF_DF|EF_IE|0x3000); 
+   wrmsrl(MSR_SYSCALL_MASK,
+  X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|X86_EFLAGS_IOPL);
 }
 
 void __cpuinit check_efer(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Re: nozomi version 2.1d for review

2007-11-27 Thread Frank Seidel

On Dienstag 27 November 2007 23:12:20, you (Greg KH) wrote:
> Frank, want me to take the last version you just sent out and update my
> version?

Yes, that would be great.

Thanks a lot,
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc3-mm1 make headers_check fails

2007-11-27 Thread Andrew Morton

On Wed, 21 Nov 2007 12:17:14 +0200 Avi Kivity <[EMAIL PROTECTED]> wrote:

> Avi Kivity wrote:
> >   
> >> The make headers_check fails,
> >>
> >>  CHECK   include/linux/usb/gadgetfs.h
> >>  CHECK   include/linux/usb/ch9.h
> >>  CHECK   include/linux/usb/cdc.h
> >>  CHECK   include/linux/usb/audio.h
> >>  CHECK   include/linux/kvm.h
> >> /root/kernels/linux-2.6.24-rc3/usr/include/linux/kvm.h requires 
> >> asm/kvm.h, which does not exist in exported headers
> >>
> > hm, works for me, on i386 and x86_64.  What's different over there?
> >
>  Hi Andrew,
> 
>  It fails on the powerpc box, with allyesconfig option.
> 
>   
>    
> >>> How do we fix this?  Export linux/kvm.h only on x86?  Seems ugly.
> >>> 
> >>
> >> Is kvm x86 specific? Then move the .h file to asm-x86.
> >> Otherwise no good idea...
> >>
> >>   
> >
> > kvm.h is x86 specific today, but will be s390, ppc, ia64, and x86 
> > specific tomorrow.
> >
> > What about having a asm-generic/kvm.h with a nice #error?would 
> > that suit?
> >
> 
> headers_check continues to complain.  Is the only recourse to add 
> asm/kvm.h for all archs?
> 

That would work.

Meanwhile my recourse is to drop the kvm tree ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

a questio about slab: double free detected in cache

2007-11-27 Thread 旭东王

Hi,

The environment of a device  is as below:

kernel version is 2.6.14, it is OS distribution kernel Fedora 2. The
file system is ext3. The cpu type is arm9.

I use "yafc" to upload the file from sd card in the device to ftp
server through wifi.

I have tried two times,both of them calls below problem:

1)
*
slab: double free detected in cache 'journal_head', objp c0613dc8
kernel BUG at mm/slab.c:2641!
Unable to handle kernel NULL pointer dereference at virtual address 
pgd = c0004000
[] *pgd=
Internal error: Oops: 817 [#1]
Modules linked in: rt73 fxcgpio rtc_pcf8563 keypad gpio_irda hi_sio
tlv320 hi3510_vs adv7179 ohci_hcd tvp5150 hi_i2c hi_gpio
CPU: 0
PC is at __bug+0x44/0x58
LR is at vprintk+0x304/0x368
pc : []lr : []Not tainted
sp : c2ee9e10  ip : c2ee9d70  fp : c2ee9e1c
r10: 000b  r9 : 0010  r8 : c0613dc8
r7 : c2c382c0  r6 : 0033  r5 : c2c179f0  r4 : 
r3 :   r2 : 0004  r1 : c2ee8000  r0 : 0001
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  Segment user
Control: 5317F  Table: 62F6  DAC: 0015
Process kjournald (pid: 324, stack limit = 0xc2ee8194)
Stack: (0xc2ee9e10 to 0xc2eea000)
9e00: c2ee9e50 c2ee9e20 c00651cc c0026ed4
9e20:  c2c3c7d8 c2c3b1c8 0010 c2c179f0 c2c3c7b8 c2c382c0 
9e40: c2c17a18 c2ee9e78 c2ee9e54 c00653f4 c0065108 c00d8cb8 c2c382c0 c2c3c7b8
9e60: c2f39888 4013 c2a54268 c2ee9e9c c2ee9e7c c0064f84 c0065358 c2f3988c
9e80: c1e886f8   c2a5413c c2ee9eb0 c2ee9ea0 c00d8cb8 c0064efc
9ea0: c2f3988c c2ee9ed4 c2ee9eb4 c00d976c c00d8c7c c194a644 c1e886f8 c2ee8000
9ec0: 007d c1d74848 c2ee9ee8 c2ee9ed8 c00d9824 c00d95c4 c1e886f8 c2ee9f80
9ee0: c2ee9eec c00d2da0 c00d97b4 c2ee9f08 c2a54150   
9f00:  c217c8f8 c212d38c 511b 3b9aca00 c2ee9f40 c2ee9f24 c0037864
9f20:  c2ee9f40 c208b5a8 c2a541f0 8013 c2a541f0 c2a541f0 c2ee8000
9f40: 8013 c2ee9f64 c2ee9f54 c2a54150 c2a5413c c02fdd58 c2a54150 c2a5413c
9f60: c02fdd58  c2ee8000 c02fdd58 c2a541c0 c2ee9ff4 c2ee9f84 c00d6d78
9f80: c00d2734 c2ee9f80  c2f863a0 c005216c c2ee9fa8 c2ee9fa8 
9fa0: c2f863a0 c005216c c2ee9fa8 c2ee9fa8  00200200 001a6049 4b87ad6e
9fc0: c00d6c4c c2f863a0 c02f96f8     
9fe0:    c2ee9ff8 c003e304 c00d6c70  
Backtrace:
[] (__bug+0x0/0x58) from [] (free_block+0xd4/0x19c)
[] (free_block+0x0/0x19c) from []
(cache_flusharray+0xac/0x130)
[] (cache_flusharray+0x0/0x130) from []
(kmem_cache_free+0x98/0xb4)
[] (kmem_cache_free+0x0/0xb4) from []
(journal_free_journal_head+0x4c/0x58)
 r8 = C2A5413C  r7 =   r6 =   r5 = C1E886F8
 r4 = C2F3988C
[] (journal_free_journal_head+0x0/0x58) from []
(__journal_remove_journal_head+0x1b8/0x1f0)
 r4 = C2F3988C
[] (__journal_remove_journal_head+0x0/0x1f0) from
[] (journal_remove_journal_head+0x80/0xe0)
 r7 = C1D74848  r6 = 007D  r5 = C2EE8000  r4 = C1E886F8
[] (journal_remove_journal_head+0x0/0xe0) from []
(journal_commit_transaction+0x67c/0x19f0)
 r4 = C1E886F8
[] (journal_commit_transaction+0x0/0x19f0) from []
(kjournald+0x118/0x318)
[] (kjournald+0x0/0x318) from [] (do_exit+0x0/0xc38)
Code: eb005734 e59f0014 eb005732 e3a03000 (e5833000)
 <6>note: kjournald[324] exited with preempt_count 3
BUG: spinlock cpu recursion on CPU#0, subwayD/328
 lock: c2a54268, .magic: dead4ead, .owner: kjournald/324, .owner_cpu: 0
[] (dump_stack+0x0/0x14) from [] (spin_bug+0x9c/0xb4)
[] (spin_bug+0x0/0xb4) from [] (_raw_spin_lock+0x68/0x170)
 r6 = C1FFF50C  r5 =   r4 = C2A54268
[] (_raw_spin_lock+0x0/0x170) from [] (_spin_lock+0x20/0x24)
 r8 = C2A54268  r7 = C20893A4  r6 = C1FFF50C  r5 = 
 r4 = C2A54268
[] (_spin_lock+0x0/0x24) from []
(journal_dirty_data+0xfc/0x394)
 r4 = C1709D4C
[] (journal_dirty_data+0x0/0x394) from []
(ext3_journal_dirty_data+0x1c/0x48)
[] (ext3_journal_dirty_data+0x0/0x48) from []
(walk_page_buffers+0x80/0xb4)
 r6 = C1709D4C  r5 = C1709D4C  r4 = 1000
[] (walk_page_buffers+0x0/0xb4) from []
(ext3_ordered_commit_write+0x74/0x108)
[] (ext3_ordered_commit_write+0x0/0x108) from []
(generic_file_buffered_write+0x418/0x698)
[] (generic_file_buffered_write+0x4/0x698) from []
(__generic_file_aio_write_nolock+0x530/0x560)
[] (__generic_file_aio_write_nolock+0x0/0x560) from
[] (generic_file_aio_write+0xb0/0x144)
[] (generic_file_aio_write+0x4/0x144) from []
(ext3_file_write+0x34/0xb8)
[] (ext3_file_write+0x4/0xb8) from []
(do_sync_write+0xc4/0x108)
 r7 = 0C80  r6 = C20B3F78  r5 = C20B3EEC  r4 = C20B3EB0
[] (do_sync_write+0x0/0x108) from [] (vfs_write+0xbc/0x178)
[] (vfs_write+0x0/0x178) from [] (sys_write+0x4c/0x78)
[] (sys_write+0x0/0x78) from [] (ret_fast_syscall+0x0/0x2c)
 r8 = C0021F64  r7 = 0004  r6 = 0C80  r5 = 000500D0
 r4 = 001D
BUG: spinlock lockup on CPU#0, subwayD/328, c2a54268
[]

Re: Question regarding mutex locking

2007-11-27 Thread Robert Hancock


Larry Finger wrote:

If a particular routine needs to lock a mutex, but it may be entered with that 
mutex already locked,
would the following code be SMP safe?

hold_lock = mutex_trylock()

..

if (hold_lock)
mutex_unlock()


Not if another task could be acquiring that lock at the same time, which 
is probably the case, otherwise you wouldn't need the mutex. In other 
words, if you're going to do this, you might as well toss the mutex 
entirely as it's about the same effect..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker

On 11/27/07 7:27 PM, "Rusty Russell" <[EMAIL PROTECTED]> wrote:

> On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote:
>> On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
>> Explicitly documenting what comprises the kernel API (external,
>> supported) and what comprises the kernel implementation (internal, not
>> supported).
> 
> But the former is currently an empty set.
> 

Yes, I overstated this.

>> - making it obvious to developers when they are binding their
>> implementation to a particular kernel release
> 
> See, there's your problem.  All interfaces can, and will, change.  You're
> always binding yourself to a particular release.
> 

Absolutely in the limit. But there are many bits of code that work quite
nicely from release to release because they use services that live in the
smooth water in the wake of the Linux head.

I think defining that smooth water has merit. I also think that it would be
nice to limit the scope of module externs to avoid polluting the global
namespace. I'm not sure that this particular patch reaches these goals, but
it prompted me to comment.

> So you're not proposing we mark what's not stable, you're arguing that we
> create a subset which is stable.
> 

Well, this is an interesting question. The answer is I think both are
important. It would be nice (and arguably necessary long term) to limit the
scope of externs. This can be accomplished with name spaces "I want bob's
implementation of read."

I think it also has value to define interfaces that are considered stable
(but not inviolate) to allow developers to make better informed decisions
when choosing interfaces. Having this info explicit in the code seems
logical to me.

> That's an argument we're not (yet) having.
> 

Yeah, maybe I'm off in the weeds on this one...

Tom

> Cheers,
> Rusty.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Question regarding mutex locking

2007-11-27 Thread Larry Finger

If a particular routine needs to lock a mutex, but it may be entered with that 
mutex already locked,
would the following code be SMP safe?

hold_lock = mutex_trylock()

...

if (hold_lock)
mutex_unlock()

Thanks,

Larry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ACPI related Warning in 2.6.24-rc3-git2

2007-11-27 Thread Henrique de Moraes Holschuh

On Tue, 27 Nov 2007, Rafael J. Wysocki wrote:
> > in recent kernel, I got the following warnings while booting. It's ACPI
> > related. Does anybode care? Lenovo ThinkPad T61 (6465CTO).

Could we know the BIOS version, please?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-27 Thread Jonathan Corbet

Greg KH <[EMAIL PROTECTED]> wrote:

> Jonathan, I used your old lwn.net article about kobjects as the basis
> for this document, I hope you don't mind

Certainly I have no objections, I'm glad it was useful.

A few little things...

> It is rare (even unknown) for kernel code to create a standalone kobject;
> with one major exception explained below.

You don't keep this promise - bet you thought we wouldn't notice...
Actually I guess you do, in the "creating simple kobjects" section.
When you get to that point, you should mention that this is a situation
where standalone kobjects make sense.

Given that there are quite a few standalone kobjects created by this
patch set (kernel_kobj, security_kobj, s390_kobj, etc.), the "(even
unknown)" should probably come out.

> So, for example, UIO code has a structure that defines the memory region
> associated with a uio device:

*The* UIO code, presumably.

> the given type. So, for example, a pointer to a struct kobject embedded
> within a struct cdev called "kp" could be converted to a pointer to the
> containing structure with:

That should be "struct uio_mem", I think.

> one.  Calling kobject_init() is not sufficient, however. Kobject users
> must, at a minimum, set the name of the kobject; this is the name that will
> be used in sysfs entries.

Is setting the name mandatory now, or are there still places where
kobjects (which do not appear in sysfs) do have - and do not need - a
name?

> Because kobjects are dynamic, they must not be declared statically or on
> the stack, but instead, always from the heap.  Future versions of the

"always be allocated from the heap"?

> "empty" release function, you will be mocked merciously by the kobject
> maintainer if you attempt this.

So just how should severely should we mock kobject maintainers who can't
spell "mercilessly"?  :)

>  - A kset can provide a set of default attributes that all kobjects that
>belong to it automatically inherit and have created whenever a kobject
>is registered belonging to the kset.

Can we try that one again?

 - A kset can provide a set of default attributes for all kobjects which
   belong to it.

> There is currently
> no other way to add a kobject to a kset without directly messing with the
> list pointers.

Presumably the latter way is not recommended; I would either say so or
not mention this possibility at all.

jon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] Remove trailing NULs from network bonding sysfs interface.

2007-11-27 Thread Randy Dunlap

On Wed, 28 Nov 2007 01:49:54 +0100 =?utf-8?q?Ferenc_W=C3=A1gner?= wrote:

> Also remove trailing spaces from multivalued files.
> 
> This fixes output like for example:
> 
> $ od -c /sys/class/net/bond0/bonding/slaves
> 000   e   t   h   -   l   e   f   t   e   t   h   -   r   i   g
> 020   h   t  \n  \0
> 025
> 
> It mostly entails deleting '+1'-s after sprintf() calls: the return value
> of sprintf is the number of characters printed, without the closing NUL,
> ie. exactly what the sysfs interface requires.  The three multivalue
> cases are different, because they also have to swallow back a trailing
> space.
> 
> Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>
> ---
>  drivers/net/bonding/bond_sysfs.c |   66 +
>  1 files changed, 30 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_sysfs.c 
> b/drivers/net/bonding/bond_sysfs.c
> index b29330d..a3f1b4a 100644
> --- a/drivers/net/bonding/bond_sysfs.c
> +++ b/drivers/net/bonding/bond_sysfs.c
> @@ -86,14 +86,13 @@ static ssize_t bonding_show_bonds(struct class *cls, char 
> *buffer)
>   /* not enough space for another interface name */
>   if ((PAGE_SIZE - res) > 10)
>   res = PAGE_SIZE - 10;
> - res += sprintf(buffer + res, "++more++");
> + res += sprintf(buffer + res, "++more++ ");
>   break;
>   }
>   res += sprintf(buffer + res, "%s ",
>  bond->dev->name);
>   }
> - res += sprintf(buffer + res, "\n");
> - res++;
> + if (res) buffer[res-1] = '\n'; /* eat the leftover space */
>   up_read(&(bonding_rwsem));
>   return res;
>  }
> @@ -235,14 +234,13 @@ static ssize_t bonding_show_slaves(struct device *d,
>   /* not enough space for another interface name */
>   if ((PAGE_SIZE - res) > 10)
>   res = PAGE_SIZE - 10;
> - res += sprintf(buf + res, "++more++");
> + res += sprintf(buf + res, "++more++ ");
>   break;
>   }
>   res += sprintf(buf + res, "%s ", slave->dev->name);
>   }
>   read_unlock(>lock);
> - res += sprintf(buf + res, "\n");
> - res++;
> + if (res) buf[res-1] = '\n'; /* eat the leftover space */
>   return res;
>  }
>  
> @@ -711,10 +709,7 @@ static ssize_t bonding_show_arp_targets(struct device *d,
>   res += sprintf(buf + res, "%u.%u.%u.%u ",
>  NIPQUAD(bond->params.arp_targets[i]));
>   }
> - if (res)
> - res--;  /* eat the leftover space */
> - res += sprintf(buf + res, "\n");
> - res++;
> + if (res) buf[res-1] = '\n'; /* eat the leftover space */
>   return res;
>  }

Hi,

Patches 1 & 3 use

if (res) statement;

but the preferred form is

if (res)
statement;

Even if this style was already used in the source file, it should
be cleaned up.

Thanks,
---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch/rfc 2.6.24-rc3-mm] gpiolib grows a gpio_desc

2007-11-27 Thread David Brownell

Update gpiolib to use a table of per-GPIO "struct gpio_desc" instead of
a table of "struct gpio_chip".

 - Change "is_out" and "requested" from arrays in "struct gpio_chip" to
   bit fields in "struct gpio_desc", eliminating ARCH_GPIOS_PER_CHIP.

 - Stop overloading "requested" flag with "label" tracked for debugfs.

 - Change gpiochip_is_requested() into a regular function, since it
   now accesses data that's not exported from the gpiolib code.  Also
   change its signature, for the same reason.

 - Reduce default ARCH_NR_GPIOS to 256 to shrink gpio_desc table cost.
   On 32-bit platforms without debugfs, that table size is 2KB.

This makes it easier to work with chips with different numbers of GPIOs,
and to avoid holes in GPIOs number sequences.  Those holes can cost a
lot of unusable irq_desc space for GPIOs that act as IRQs.

Based on a patch from Eric Miao.

# NOT signed-off yet ... purely for comment.  It's been sanity tested.
---
The question I'm most interested in is whether it's worth paying the
extra data memory.  I'm currently leaning towards "yes", mostly since
it'll let me be lazy about some development boards with four different
kinds of GPIO controller, each with different numbers of GPIOs.

Note that the ARM AT91 updates are against a patch which has been
circulated for comment but not yet submitted.  The at32ap and mcp23s08
parts are currently in the MM tree.  The PXA platform support didn't
need changes; DaVinci won't either.  The OMAP support (not recently
updated) will need slightly more updates than those shown here.

 arch/arm/mach-at91/gpio.c|7 -
 arch/avr32/mach-at32ap/pio.c |7 -
 drivers/spi/mcp23s08.c   |8 -
 include/asm-avr32/arch-at32ap/gpio.h |2 
 include/asm-generic/gpio.h   |   45 +---
 lib/gpiolib.c|  194 ++-
 6 files changed, 129 insertions(+), 134 deletions(-)

--- at91.orig/arch/arm/mach-at91/gpio.c 2007-11-27 14:31:05.0 -0800
+++ at91/arch/arm/mach-at91/gpio.c  2007-11-27 14:32:01.0 -0800
@@ -484,7 +484,10 @@ at91_bank_show(struct seq_file *s, struc
mdsr = __raw_readl(pio + PIO_MDSR);
 
for (i = 0, mask = 1; i < chip->ngpio; i++, mask <<= 1) {
-   if (!gpiochip_is_requested(chip, i)) {
+   const char  *label;
+
+   label = gpiochip_is_requested(chip, i);
+   if (!label) {
/* peripheral-A or peripheral B?
 * else unused/unrequested GPIO; ignore.
 */
@@ -501,7 +504,7 @@ at91_bank_show(struct seq_file *s, struc
 */
seq_printf(s, " gpio-%-3d P%c%-2d (%-12s) %s %s %s",
chip->base + i, 'A' + bank_num, i,
-   chip->requested[i],
+   label,
(osr & mask) ? "out" : "in ",
(mask & pdsr) ? "hi" : "lo",
(mask & pusr) ? "  " : "up");
--- at91.orig/arch/avr32/mach-at32ap/pio.c  2007-11-27 14:30:03.0 
-0800
+++ at91/arch/avr32/mach-at32ap/pio.c   2007-11-27 14:30:04.0 -0800
@@ -315,12 +315,15 @@ static void pio_bank_show(struct seq_fil
bank = 'A' + pio->pdev->id;
 
for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
-   if (!gpiochip_is_requested(chip, i))
+   const char *label;
+
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
continue;
 
seq_printf(s, " gpio-%-3d P%c%-2d (%-12s) %s %s %s",
chip->base + i, bank, i,
-   chip->requested[i],
+   label,
(osr & mask) ? "out" : "in ",
(mask & pdsr) ? "hi" : "lo",
(mask & pusr) ? "  " : "up");
--- at91.orig/drivers/spi/mcp23s08.c2007-11-27 14:29:20.0 -0800
+++ at91/drivers/spi/mcp23s08.c 2007-11-27 14:30:04.0 -0800
@@ -184,12 +184,14 @@ static void mcp23s08_dbg_show(struct seq
}
 
for (t = 0, mask = 1; t < 8; t++, mask <<= 1) {
-   if (!gpiochip_is_requested(chip, t))
+   const char  *label;
+
+   label = gpiochip_is_requested(chip, t);
+   if (!label)
continue;
 
seq_printf(s, " gpio-%-3d P%c.%d (%-12s) %s %s %s",
-   chip->base + t, bank, t,
-   chip->requested[t],
+   chip->base + t, bank, t, label,
(mcp->cache[MCP_IODIR] & mask) ? "in " : "out",
(mcp->cache[MCP_GPIO] & mask) ? "hi" : "lo",
(mcp->cache[MCP_GPPU] & mask) ? "  " : "up");
--- at91.orig/include/asm-avr32/arch-at32ap/gpio.h  2007-11-27 
14:30:03.0 -0800
+++ at91/include/asm-avr32/arch-at32ap/gpio.h

Re: [patch] Fix hpet wrong values

2007-11-27 Thread Andrew Morton

On Tue, 27 Nov 2007 13:23:08 +0100 Pavel Machek <[EMAIL PROTECTED]> wrote:

> If hpet is not enabled in config, its init should not pretend to work,
> and people should not try to get time from it. 
> 
> Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>
> 
> diff --git a/include/asm-x86/hpet.h b/include/asm-x86/hpet.h
> index b1f3c1e..1777d68 100644
> --- a/include/asm-x86/hpet.h
> +++ b/include/asm-x86/hpet.h
> @@ -81,8 +81,8 @@ #endif /* CONFIG_HPET_EMULATE_RTC */
>  
>  #else
>  
> -static inline int hpet_enable(void) { return 0; }
> -static inline unsigned long hpet_readl(unsigned long a) { return 0; }
> +static inline int hpet_enable(void) { return -ENODEV; }
> +static inline unsigned long hpet_readl(unsigned long a) { BUG(); }
>  
>  #endif /* CONFIG_HPET_TIMER */
>  #endif /* ASM_X86_HPET_H */
> 

Dare I point out Documentation/SubmitChecklist?



include/asm/hpet.h: In function 'hpet_readl':
include/asm/hpet.h:86: warning: no return statement in function returning 
non-void

just delete it and rely on -Werror-implicit-function-declaration

--- a/include/asm-x86/hpet.h~fix-hpet-wrong-values-fix
+++ a/include/asm-x86/hpet.h
@@ -82,7 +82,6 @@ extern irqreturn_t hpet_rtc_interrupt(in
 #else
 
 static inline int hpet_enable(void) { return -ENODEV; }
-static inline unsigned long hpet_readl(unsigned long a) { BUG(); }
 
 #endif /* CONFIG_HPET_TIMER */
 #endif /* ASM_X86_HPET_H */
_



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22.14 + rt? vs 2.6.23.9-rt12

2007-11-27 Thread Fernando Lopez-Lezcano

On Tue, 2007-11-27 at 17:02 -0800, Fernando Lopez-Lezcano wrote:
> Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
> having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
> agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
> is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
> and the latest I managed to repatch and run successfully is 2.6.22.10. I
> did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
> the boot... takes... a... long... time... I can send my .14 patch off
> the list if you want/need it. 
> 
> [in my 2.6.23.1-rt11 tests I am getting "delayed..." messages from
> jackd, smells like a problem with internal timing in the kernel]
> 
> I'll try rt12...

Same problems in rt12, getting lots of "delay of xxx usecs exceeds
estimated spare time of ; restart" in jackd (on my T61 Lenovo laptop
running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 11/14] Powerpc: Use generic per cpu

2007-11-27 Thread Paul Mackerras

Christoph Lameter writes:

> On Wed, 28 Nov 2007, Paul Mackerras wrote:
> 
> > Did you try both 32-bit (CONFIG_64BIT=n) and 64-bit (CONFIG_64BIT=y)
> > configurations?  The paca only exists in 64-bit kernels.
> 
> I build both and there is no dependency on 32bit or 64 bit in 
> include/asm-powerpc/percpu.h. Both access the paca (whatever that is).

Look at line 3 of include/asm-powerpc/percpu.h:

#ifdef __powerpc64__

As far as I can see, after applying your series of patches, I end up
with an unbalanced #ifdef in include/asm-powerpc/percpu.h.  I see 3
#ifdef/#ifndef, but only two #endifs.  It needs another #endif after
the #endif /* SMP */ to match the #ifdef __powerpc64__.  With that
change it looks OK, since 32-bit uses asm-generic/percpu.h for both
SMP and UP.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Rusty Russell

On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote:
> On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
> Explicitly documenting what comprises the kernel API (external,
> supported) and what comprises the kernel implementation (internal, not
> supported).

But the former is currently an empty set.

> - making it obvious to developers when they are binding their
> implementation to a particular kernel release

See, there's your problem.  All interfaces can, and will, change.  You're 
always binding yourself to a particular release.

So you're not proposing we mark what's not stable, you're arguing that we 
create a subset which is stable.

That's an argument we're not (yet) having.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Rusty Russell

On Tuesday 27 November 2007 21:50:16 Andi Kleen wrote:
> Goals are:
> - Limit the interfaces available for out of tree modules to reasonably
> stable ones that are already used by a larger set of drivers.

Not the goals.  I haven't seen the *problem* yet.

> - Limit size of exported API to make stable ABIs for enterprise
> distributions easier
> [Yes I know that is not a popular topic on l-k, but it's a day-to-day
> problem for these distros and out of tree solutions do not work]

That's a real problem, and I sympathise with the idea of marking symbols as 
externally useful (or, practically, mark internal).

But we now need to decide what's "externally useful".  The currently line for 
exports is simple: someone in-tree needs it.  You dislike the suggestion to 
extend this to "if more than one in-tree needs it it's open".

Currently your criterion seems to be "does the maintainer hate external 
modules?" which I don't think will be what you want...

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Rusty Russell

On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> On Tue, 27 Nov 2007, Rusty Russell wrote:
> > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > code? It's a bit messier because some archs might not have set up NUMA
> > stuff yet, but it's logically generic...
>
> Yes that will happen later. This is just the early cleanup work. I
> plan to generally bring the two x86 arches in line. The pda will be
> folded into the per cpu area and after that its easy to do.

Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
the ability to use the stack protection config option.  That's because it 
assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
change to make this gs:__builtin_stack_canary_off (where gcc can emit 
__builtin_stack_canary_off as a weak absolute symbol, so we can override it 
for the kernel.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Rick Jones


Adrian Bunk wrote:

On Tue, Nov 27, 2007 at 01:15:23PM -0800, Rick Jones wrote:


The real problem is that these drivers are not in the upstream kernel.

Are there common reasons why these drivers are not upstream?


One might be that upstream has not accepted them.  Anything doing or 
smelling of TOE comes to mind right away.



Which modules doing or smelling of TOE do work with unmodified vendor 
kernels?


At the very real risk of further demonstrating my Linux vocabulary 
limitations, I believe there is a "Linux Sockets Acceleration" 
module/whatnot for NetXen and related 10G NICs, and a cxgb3_toe (?) 
module for Chelsio 10G NICs.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Eric W. Biederman

Nacked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

Andrew Morton <[EMAIL PROTECTED]> writes:

> I don't see why the schedule() will not return?  Because the task has
> PF_EXITING set?  Doesn't TASK_DEAD do that?

This appears to be a work around for an old bug only present in sles9.

It looks like it has been safe to schedule in release task for years
in mainline, and that is simply much more robust.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Eric W. Biederman

Andrea Arcangeli <[EMAIL PROTECTED]> writes:

>> 
>> So you may need to apply this one too (this one is needed to fix the
>> second bug, my previous patch is needed after applying this one):
>
> thinking what happened once already, I think this would be more
> debuggable but maybe not... dunno.

Please god no.   The inability to schedule is a bug that apparently
was fixed after sles9 forked off.

In mainline it is safe to sleep until the very end of do_exit and
your suggest patch breaks that.

None of your patches are necessary for mainline.



I believe the commit below is the one that really fixed this issue,
although it may have been something earlier.  But it may have been
earlier, and there have been some significant clean ups since then.

> commit 00732b345b940de766fefaf1297ce26a67bdcea9
> Author: mingo 
> Date:   Tue Oct 19 06:12:06 2004 +
> 
> [PATCH] fix & clean up zombie/dead task handling & preemption
> 
> This patch fixes all the preempt-after-task->state-is-TASK_DEAD problems 
> we
> had.  Right now, the moment procfs does a down() that sleeps in
> proc_pid_flush() [it could] our TASK_DEAD state is zapped and we might be
> back to TASK_RUNNING to and we trigger this assert:
> 
> schedule();
> BUG();
> /* Avoid "noreturn function does return".  */
> for (;;) ;
> 
> I have split out TASK_ZOMBIE and TASK_DEAD into a separate p->exit_state
> field, to allow the detaching of exit-signal/parent/wait-handling from
> descheduling a dead task.  Dead-task freeing is done via PF_DEAD.
> 
> Tested the patch on x86 SMP and UP, but all architectures should work
> fine.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> BKrev: 4174b036IkpU2-XeLFkjqH681u4Pyg
> 

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2] reiser4: new export ops

2007-11-27 Thread Edward Shishkin


[EMAIL PROTECTED] wrote:


The patch titled
git-nfsd-broke-reiser4
has been removed from the -mm tree.  Its filename was
git-nfsd-broke-reiser4.patch

This patch was dropped because it was folded into reiser4.patch

--
Subject: git-nfsd-broke-reiser4
From: Andrew Morton <[EMAIL PROTECTED]>

fs/reiser4/export_ops.c: In function 'reiser4_decode_fh':
fs/reiser4/export_ops.c:96: error: 'const struct export_operations' has no 
member named 'find_exported_dentry'
fs/reiser4/export_ops.c:96: warning: type defaults to 'int' in declaration of 
'fn'
fs/reiser4/export_ops.c:98: error: 'const struct export_operations' has no 
member named 'find_exported_dentry'
fs/reiser4/export_ops.c:99: warning: comparison between pointer and integer
fs/reiser4/export_ops.c:101: error: called object 'fn' is not a function
fs/reiser4/export_ops.c: At top level:
fs/reiser4/export_ops.c:282: error: unknown field 'decode_fh' specified in 
initializer
fs/reiser4/export_ops.c:282: warning: initialization from incompatible pointer 
type
fs/reiser4/export_ops.c:284: error: unknown field 'get_dentry' specified in 
initializer
fs/reiser4/export_ops.c:285: warning: excess elements in struct initializer
fs/reiser4/export_ops.c:285: warning: (near initialization for 
'reiser4_export_operations')

help!


done

Thanks,
Edward.
Adjust reiser4 for the new export ops.

Signed-off-by: Edward Shishkin <[EMAIL PROTECTED]>
---
 linux-2.6.23-mm1/fs/reiser4/dscale.c|   20 
 linux-2.6.23-mm1/fs/reiser4/dscale.h|3 
 linux-2.6.23-mm1/fs/reiser4/export_ops.c|   72 
 linux-2.6.23-mm1/fs/reiser4/kassign.c   |   19 +++-
 linux-2.6.23-mm1/fs/reiser4/kassign.h   |1 
 linux-2.6.23-mm1/fs/reiser4/plugin/file_plugin_common.c |2 
 6 files changed, 78 insertions(+), 39 deletions(-)

--- linux-2.6.23-mm1/fs/reiser4/export_ops.c.orig
+++ linux-2.6.23-mm1/fs/reiser4/export_ops.c
@@ -29,7 +29,8 @@
 
 /*
  * read serialized object identity from @addr and store information about
- * object in @obj. This is dual to encode_inode().
+ * object in @obj. If @obj == NULL, then don't read, just skip the encoded
+ * object (only return updated position).
  */
 static char *decode_inode(struct super_block *s, char *addr,
 			  reiser4_object_on_wire * obj)
@@ -41,7 +42,8 @@
 	fplug = file_plugin_by_disk_id(reiser4_get_tree(s), (d16 *) addr);
 	if (fplug != NULL) {
 		addr += sizeof(d16);
-		obj->plugin = fplug;
+		if (obj)
+			obj->plugin = fplug;
 		assert("nikita-3520", fplug->wire.read != NULL);
 		/* plugin specific encoding of object identity. */
 		addr = fplug->wire.read(addr, obj);
@@ -50,28 +52,23 @@
 	return addr;
 }
 
+static struct dentry *reiser4_get_dentry(struct super_block *super,
+	 void *data);
 /**
- * reiser4_decode_fh - decode_fh of export operations
- * @super: super block
- * @fh: nfsd file handle
- * @len: length of file handle
- * @fhtype: type of file handle
- * @acceptable: acceptability testing function
- * @context: argument for @acceptable
+ * reiser4_decode_fh: decode onwire object - helper function
+ * for fh_to_dentry, fh_to_parent export operations;
+ * @super: super block;
+ * @fh: here are onwire objects to be extracted for decoding;
+ * @parent: skip first onwire object and decode parent.
  *
- * Returns dentry referring to the same file as @fh.
+ * Returns dentry referring to the object being decoded.
  */
 static struct dentry *reiser4_decode_fh(struct super_block *super, __u32 *fh,
-	int len, int fhtype,
-	int (*acceptable) (void *context,
-			   struct dentry *de),
-	void *context)
+	int len, int fhtype, int parent)
 {
 	reiser4_context *ctx;
 	reiser4_object_on_wire object;
-	reiser4_object_on_wire parent;
 	char *addr;
-	int with_parent;
 
 	ctx = reiser4_init_context(super);
 	if (IS_ERR(ctx))
@@ -80,25 +77,19 @@
 	assert("vs-1482",
 	   fhtype == FH_WITH_PARENT || fhtype == FH_WITHOUT_PARENT);
 
-	with_parent = (fhtype == FH_WITH_PARENT);
-
 	addr = (char *)fh;
 
 	object_on_wire_init();
-	object_on_wire_init();
-#if 0
-	addr = decode_inode(super, addr, );
+
+	if (parent)
+		/* skip first onwire object */
+		addr = decode_inode(super, addr, NULL);
 	if (!IS_ERR(addr)) {
-		if (with_parent)
-			addr = decode_inode(super, addr, );
+		addr = decode_inode(super, addr, );
 		if (!IS_ERR(addr)) {
 			struct dentry *d;
-			typeof(super->s_export_op->find_exported_dentry) fn;
 
-			fn = super->s_export_op->find_exported_dentry;
-			assert("nikita-3521", fn != NULL);
-			d = fn(super, , with_parent ?  : NULL,
-			   acceptable, context);
+			d = reiser4_get_dentry(super, );
 			if (d != NULL && !IS_ERR(d))
 /* FIXME check for -ENOMEM */
 			  	reiser4_get_dentry_fsdata(d)->stateless = 1;
@@ -106,13 +97,24 @@
 		}
 	}
 	object_on_wire_done();
-	object_on_wire_done();
-
 	reiser4_exit_context(ctx);
 	return (void *)addr;
-#else
-	return

Re: Possibly SATA related freeze killed networking and RAID

2007-11-27 Thread Tejun Heo

Pavel Machek wrote:
> Hi!
> 
>>>  kernel: [734344.717844] irq 21: nobody cared (try booting with the
>>> "irqpoll" option)
>>>  kernel: [734344.717866]
>> Your machine decided to emit interrupt 21 without an apparent reason.
>> Whatever caused that made the kernel shut down IRQ 21 at which point the
>> disk drives on that IRQ were no longer being serviced. Everything on IRQ
>> 21 would have died - which may be why your networking failed too.
> 
> Hmm, perhaps that 'nobody cared' message should be worded more
> strongly, and printed and KERN_CRIT?

Agreed.  Nobody cared on ATA controllers is usually very effective at
taking the whole machine down.  Is there any reason why we don't turn on
irqpoll on turned off IRQs automatically?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Eric W. Biederman

Andrea Arcangeli <[EMAIL PROTECTED]> writes:

> On Tue, Nov 27, 2007 at 02:38:52PM -0800, Andrew Morton wrote:
>> I don't see why the schedule() will not return?  Because the task has
>> PF_EXITING set?  Doesn't TASK_DEAD do that?
>
> Ouch, I assumed you couldn't sleep safely anymore in release_task
> given it's the function that will free the task structure itself and
> there was no preempt related action anywhere close to it!
> delayed_put_task_struct can be called if a quiescent point is reached
> and any scheduling would exactly allow it to run (it requires quite a
> bit of a race, with local irq triggering a reschedule and the timer
> irq invoking the tasklet to run to free the task struct before do_exit
> finishes and all other cpus in quiescent state too).
>
> So a corollary question is how can it be safe to call
> preempt_disable() after call_rcu(delayed_put_task_struct)?


> Back in sles9 preempt_disable was implemented as
> _raw_write_unlock(_lock) and it happened _before_
> release_task, and scheduling there wouldn't return because PF_DEAD was
> already set. If mainline can come back, it will crash for a different
> reason because the task struct is long gone by the time
> release_task+schedule() runs. Either ways, still a kernel crashing bug
> there is. Or is there some magic that prevents call_rcu + schedule to
> invoke the rcu callback?

I don't quite see where it comes from but there is another reference
on the task struct held by the scheduler  That we don't drop until
finish_task_switch with exit_state == TASK_DEAD.

So since we have one additional reference the task_struct won't
get freed.

We don't set TASK_DEAD until much later.

> So you may need to apply this one too (this one is needed to fix the
> second bug, my previous patch is needed after applying this one):

No.  We should be fine.

In fact it looks like this is just a sles9 issue and mainline is
fine, as we can safely schedule until just before the end of do_exit.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: __rcu_process_callbacks() in Linux 2.6

2007-11-27 Thread Paul E. McKenney

On Mon, Nov 26, 2007 at 06:39:58PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 26, 2007 at 02:48:08PM -0800, James Huang wrote:
> > 
> > > -Original Message-
> > > From: James Huang [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, November 26, 2007 2:21 PM
> > > To: James Huang
> > > Subject: Fw: __rcu_process_callbacks() in Linux 2.6
> > > 
> > > - Forwarded Message 
> > > From: Manfred Spraul <[EMAIL PROTECTED]>
> > > To: James Huang <[EMAIL PROTECTED]>
> > > Cc: Paul E. McKenney <[EMAIL PROTECTED]>; linux-
> > > [EMAIL PROTECTED]
> > > Sent: Monday, November 26, 2007 10:28:37 AM
> > > Subject: __rcu_process_callbacks() in Linux 2.6
> > > 
> > > Hi James,
> > > 
> > > If I understand the issue correctly, then the race is:
> > > 
> > > step 1: cpu 1: starts a new rcu batch (i.e. rcp->cur++, smb_mb)
> > > 
> > > step 2: cpu 2: completes the quiet state
> > > step 3: cpu 2: reads pointer 0x123 (ptr to a rcu protected struct)
> > > 
> > > step 4: cpu 3: call_rcu(0x123): rcu protected struct added to
> > rdp->nxtlist
> > > step 5: cpu 3: moves a new batch into rdp->curlist, rdp->batch = rcp-
> > > >cur+1.
> > > xxx Problem: where is the smp_rmb() that guarantees that
> > > xxx  update to rcp->cur from step 1 is seen by cpu 3?
> > > step 6: cpu 3: completes quiet state
> > > step 7: cpu 3: struct 0x123 destroyed
> > > 
> > > step 8: cpu 2: accesses pointer 0x123, but the struct is already
> > destroyed
> > > 
> > > James: Is that the race?
> > 
> > 
> > [James Huang] 
> > 
> > Yes, this is the race condition that I am concerned about.
> > 
> > 
> > > 
> > > I agree with Paul, there are smb_rmb's on cpu 3 between Step 1 and
> > Step 5:
> > > Either the test_and_set_bit in tasklet_action for rcu_process_callback
> > > if step 4 happens before the tasklet or somewhere in the irq handler
> > > path if step 4 happens in an irq handler that interrupted
> > > rcu_process_callback.
> > > 
> > > Thus theoretically no additional smb_rmb() should be necessary.
> > > What is missing is proper documentation.
> > > 
> > 
> > 
> > [James Huang] 
> > 
> > Is it true that a smb_rmb() before a read operation (say from variable
> > X) will guarantee that the read will always retrieve the most "current"
> > value of X?   I can not find such a guarantee in atomic_ops.txt or
> > memory-barriers.txt under Linux's documentation directory.  What is
> > described in both documents is relative ordering, e.g.
> > 
> > CPU1   CPU2
> >-- --
> >   write X = x1
> >   smp_wmb()  
> >   write Y = y1 
> > 
> >   read Y
> >   smp_rmb()
> >   read X
> > 
> > Then CPU2 will read X with a value of x1 if it reads Y with a value of
> > y1.
> > 
> > Please point me to the right section in the document if smp_rmb() does
> > provide such a guarantee.
> 
> You are correct, smp_rmb() is about ordering rather than about any sort
> of immediacy.  For one thing, it can be quite difficult to say exactly what
> the most "current" version of X might be at a given point in time from
> the viewpoint of a given CPU -- the different CPUs might well disagree as
> to what the "current" version is for awhile (though they are guaranteed
> to come to agreement).
> 
> > Thanks,
> > -- James Huang
> > 
> > > I'm analyzing the code right now:
> > > Is it really true that typically a cpu only completes data in every
> > other
> > > rcu
> > > cycle? I.e. that most structures are stored in the rcu callback list
> > until
> > > two
> > > quiet states happened?
> 
> That is correct.  This does mean that we should be able to leverage
> locking primitives and memory barriers executed from the scheduling
> clock interrupt.

And I managed to get some time on a 64-CPU POWER5+ system.  Been running
rcutorture for about 2.5 hours without a failure (128 reader processes)
running through not quite 1.5M RCU updates.  Of course, this is not
proof that the Classic RCU implementation works, but is should provide
some reassurance.

I will keep it running until I get kicked off (probably rather soon).

Thanx, Paul

> > > I've tried to track the values of rcp->cur and rdp->batch.
> > > If next_pending is set, then cpu_quiet() immetiately starts
> > > the next rcu cycle and a cpu cannot both complete the currently
> > > pending rcu callbacks and add new callbacks to the next cycle,
> > > thus a cpu only takes part in every other rcu cycle.
> > > 
> > > The oocalc file is at
> > > http://www.colorfullife.com/~manfred/rcu.ods
> > > http://www.colorfullife.com/~manfred/rcu.pdf
> > > 
> > > Is that analysis correct? Perhaps the whole code should be rewritten?
> 
> I believe that the sequencing in spreadsheet is correct (and thank
> you very much for going through it!!!), but it seems to be silent on
>

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Andrea Arcangeli

On Wed, Nov 28, 2007 at 02:21:29AM +0100, Andrea Arcangeli wrote:
> On Tue, Nov 27, 2007 at 02:38:52PM -0800, Andrew Morton wrote:
> > I don't see why the schedule() will not return?  Because the task has
> > PF_EXITING set?  Doesn't TASK_DEAD do that?
> 
> Ouch, I assumed you couldn't sleep safely anymore in release_task
> given it's the function that will free the task structure itself and
> there was no preempt related action anywhere close to it!
> delayed_put_task_struct can be called if a quiescent point is reached
> and any scheduling would exactly allow it to run (it requires quite a
> bit of a race, with local irq triggering a reschedule and the timer
> irq invoking the tasklet to run to free the task struct before do_exit
> finishes and all other cpus in quiescent state too).
> 
> So a corollary question is how can it be safe to call
> preempt_disable() after call_rcu(delayed_put_task_struct)?
> 
> Back in sles9 preempt_disable was implemented as
> _raw_write_unlock(_lock) and it happened _before_
> release_task, and scheduling there wouldn't return because PF_DEAD was
> already set. If mainline can come back, it will crash for a different
> reason because the task struct is long gone by the time
> release_task+schedule() runs. Either ways, still a kernel crashing bug
> there is. Or is there some magic that prevents call_rcu + schedule to
> invoke the rcu callback?
> 
> So you may need to apply this one too (this one is needed to fix the
> second bug, my previous patch is needed after applying this one):

thinking what happened once already, I think this would be more
debuggable but maybe not... dunno.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/kernel/exit.c b/kernel/exit.c
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -841,6 +841,14 @@ static void exit_notify(struct task_stru
 
write_unlock_irq(_lock);
 
+   /*
+* Task struct can go away at the first schedule if this was a
+* self reaping task after calling release_task. Scheduling is
+* forbidden until do_exit finishes.
+*/
+   preempt_disable();
+   tsk->state = TASK_DEAD;
+
/* If the process is dead, release it - nobody will wait for it */
if (state == EXIT_DEAD)
release_task(tsk);
@@ -1042,10 +1050,7 @@ fastcall NORET_TYPE void do_exit(long co
if (tsk->splice_pipe)
__free_pipe_info(tsk->splice_pipe);
 
-   preempt_disable();
/* causes final put_task_struct in finish_task_switch(). */
-   tsk->state = TASK_DEAD;
-
schedule();
BUG();
/* Avoid "noreturn function does return".  */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

> On Tue, 27 Nov 2007 14:20:22 +0100
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>> 
>> this patch fixes a sles9 system hang in start_this_handle from a
>> customer with some heavy workload where all tasks are waiting on
>> kjournald to commit the transaction, but kjournald waits on t_updates
>> to go down to zero (it never does). This was reported as a lowmem
>> shortage deadlock but when checking the debug data I noticed the VM
>> wasn't under pressure at all (well it was really under vm pressure,
>> because lots of tasks hanged in the VM prune_dcache methods trying to
>> flush dirty inodes, but no task was hanging in GFP_NOFS mode, the
>> holder of the journal handle should have if this was a vm issue in the
>> first place). No task was apparently holding the leftover handle in
>> the committing transaction, so I deduced t_updates was stuck to 1
>> because a journal_stop was never run by some path (this turned out to
>> be correct). With a debug patch adding proper reverse links and stack
>> trace logging in ext3 deployed in production, I found journal_stop is
>> never run because mark_inode_dirty_sync is called inside release_task
>> called by do_exit. (that was quite fun because I would have never
>> thought about this subtleness, I thought a regular path in ext3 had a
>> bug and it forgot to call journal_stop)
>> 
>> do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
>> come back to run journal_stop)
>
> I don't see why the schedule() will not return?  Because the task has
> PF_EXITING set?  Doesn't TASK_DEAD do that?

Yes, why do we not come back from schedule?

If we are not allowed to schedule after setting PF_EXITING before
we set TASK_DEAD that entire code path sounds brittle and
error prone.


> What are the implications of not running shrink_dcache_parent() on the exit
> path sometimes?  We'll leave procfs stuff behind?  Will they be reaped by
> memory pressure later on?

It should.  I think the reaping is just an optimization.  Because we
know we will never need those dentries again, and we can pin them by
open directories or opening files.  What I don't know off the top of
my head is if there is a d_drop equivalent going on that might be a
problem if we don't address it.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bluez-devel] 2.6.23.8: kernel panic

2007-11-27 Thread Dave Young

On Tue, Nov 27, 2007 at 04:36:45PM +0100, Marco Pracucci wrote:
> Hi Dave,
> > This problem is caused by the workqueue in hci_sysfs.c, the del_conn
> > is scheduled after the add_conn with same bluetooth address.
> > Please try this patch:
> > 
> >
> > The bluetooth hci conn sysfs add/del executed in the default workqueue. If 
> > the conn del function is executed after the new conn add function with same 
> > bluetooth target address, the connection add will failed and warning about 
> > same kobject name.
> >
> > Here add a btconn workqueue, and flush the workqueue in the add_conn 
> > function to avoid the above issue. 
> >   
> I have applied your patch against kernel 2.6.24-rc3 and I've got the
> following error:
> 
> Jan 1 00:13:01 user.warn kernel: run_workqueue: recursion depth exceeded: 4
> Jan 1 00:13:01 user.warn kernel: [] (dump_stack+0x0/0x14) from
> [] (run_workqueue+0x4c/0x144)
> Jan 1 00:13:01 user.warn kernel: [] (run_workqueue+0x0/0x144)
> from [] (flush_cpu_workqueue+0x34/0x94)
> Jan 1 00:13:01 user.warn kernel: r6:c020d624 r5:c05bc088 r4:0001
> Jan 1 00:13:01 user.warn kernel: []
> (flush_cpu_workqueue+0x0/0x94) from [] (flush_workqueue+0x14/0x18)
> Jan 1 00:13:01 user.warn kernel: r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (flush_workqueue+0x0/0x18)
> from [] (add_conn+0x1c/0x80)
> Jan 1 00:13:01 user.warn kernel: [] (add_conn+0x0/0x80) from
> [] (run_workqueue+0xb4/0x144)
> Jan 1 00:13:01 user.warn kernel: r5:c034 r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (run_workqueue+0x0/0x144)
> from [] (flush_cpu_workqueue+0x34/0x94)
> Jan 1 00:13:01 user.warn kernel: r6:c020d624 r5:c1051e88 r4:0001
> Jan 1 00:13:01 user.warn kernel: []
> (flush_cpu_workqueue+0x0/0x94) from [] (flush_workqueue+0x14/0x18)
> Jan 1 00:13:01 user.warn kernel: r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (flush_workqueue+0x0/0x18)
> from [] (add_conn+0x1c/0x80)
> Jan 1 00:13:01 user.warn kernel: [] (add_conn+0x0/0x80) from
> [] (run_workqueue+0xb4/0x144)
> Jan 1 00:13:01 user.warn kernel: r5:c034 r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (run_workqueue+0x0/0x144)
> from [] (flush_cpu_workqueue+0x34/0x94)
> Jan 1 00:13:01 user.warn kernel: r6:c020d624 r5:c042ca88 r4:0001
> Jan 1 00:13:01 user.warn kernel: []
> (flush_cpu_workqueue+0x0/0x94) from [] (flush_workqueue+0x14/0x18)
> Jan 1 00:13:01 user.warn kernel: r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (flush_workqueue+0x0/0x18)
> from [] (add_conn+0x1c/0x80)
> Jan 1 00:13:01 user.warn kernel: [] (add_conn+0x0/0x80) from
> [] (run_workqueue+0xb4/0x144)
> Jan 1 00:13:01 user.warn kernel: r5:c034 r4:c03c9420
> Jan 1 00:13:01 user.warn kernel: [] (run_workqueue+0x0/0x144)
> from [] (worker_thread+0xa4/0xb8)
> Jan 1 00:13:01 user.warn kernel: r6:c00487e4 r5:c03c9420 r4:c03c9428
> Jan 1 00:13:01 user.warn kernel: [] (worker_thread+0x0/0xb8)
> from [] (kthread+0x5c/0x90)
> Jan 1 00:13:01 user.warn kernel: r5:c03c9420 r4:c034
> Jan 1 00:13:01 user.warn kernel: [] (kthread+0x0/0x90) from
> [] (do_exit+0x0/0x690)
> Jan 1 00:13:01 user.warn kernel: r6: r5: r4:
> 

Hi,Marco 
Thanks for testing, could you please try the below patch instead?

Marcel, thanks for consider my patch. there's some recursion problem in flush 
workqueue in itself, so maybe we should use two workqueue.

Regards
dave
---

The bluetooth hci conn sysfs add/del executed in the default workqueue. If the 
conn del function is executed after the new conn add function with same 
bluetooth target address, the connection add will failed and warning about same 
kobject name.

Here add a btconn workqueue, and flush the workqueue in the add_conn function 
to avoid the above issue. 

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
net/bluetooth/hci_sysfs.c |   25 -
1 file changed, 24 insertions(+), 1 deletion(-)

diff -upr linux/net/bluetooth/hci_sysfs.c linux.new/net/bluetooth/hci_sysfs.c
--- linux/net/bluetooth/hci_sysfs.c 2007-11-27 18:11:11.0 +0800
+++ linux.new/net/bluetooth/hci_sysfs.c 2007-11-28 09:18:45.0 +0800
@@ -12,6 +12,8 @@
 #undef  BT_DBG
 #define BT_DBG(D...)
 #endif
+static struct workqueue_struct *btaddconn;
+static struct workqueue_struct *btdelconn;
 
 static inline char *typetostr(int type)
 {
@@ -279,6 +281,7 @@ static void add_conn(struct work_struct 
struct hci_conn *conn = container_of(work, struct hci_conn, work);
int i;
 
+   flush_workqueue(btdelconn);
if (device_add(>dev) < 0) {
BT_ERR("Failed to register connection device");
return;
@@ -313,6 +316,7 @@ void hci_conn_add_sysfs(struct hci_conn 
 
INIT_WORK(>work, add_conn);
 
+   queue_work(btaddconn, >work);
schedule_work(>work);
 }
 
@@ -331,6 +335,7 @@ void hci_conn_del_sysfs(struct hci_conn 
 
INIT_WORK(>work, del_conn);
 
+   queue_work(btdelconn, >work);

2.6.22.14 + rt?

2007-11-27 Thread Fernando Lopez-Lezcano

Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
and the latest I managed to repatch and run successfully is 2.6.22.10. I
did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
the boot... takes... a... long... time... I can send my .14 patch off
the list if you want/need it. 

[in my 2.6.23.1-rt11 tests I am getting "delayed..." messages from
jackd, smells like a problem with internal timing in the kernel]

I'll try rt12...
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc dcache deadlock in do_exit

2007-11-27 Thread Andrea Arcangeli

On Tue, Nov 27, 2007 at 02:38:52PM -0800, Andrew Morton wrote:
> I don't see why the schedule() will not return?  Because the task has
> PF_EXITING set?  Doesn't TASK_DEAD do that?

Ouch, I assumed you couldn't sleep safely anymore in release_task
given it's the function that will free the task structure itself and
there was no preempt related action anywhere close to it!
delayed_put_task_struct can be called if a quiescent point is reached
and any scheduling would exactly allow it to run (it requires quite a
bit of a race, with local irq triggering a reschedule and the timer
irq invoking the tasklet to run to free the task struct before do_exit
finishes and all other cpus in quiescent state too).

So a corollary question is how can it be safe to call
preempt_disable() after call_rcu(delayed_put_task_struct)?

Back in sles9 preempt_disable was implemented as
_raw_write_unlock(_lock) and it happened _before_
release_task, and scheduling there wouldn't return because PF_DEAD was
already set. If mainline can come back, it will crash for a different
reason because the task struct is long gone by the time
release_task+schedule() runs. Either ways, still a kernel crashing bug
there is. Or is there some magic that prevents call_rcu + schedule to
invoke the rcu callback?

So you may need to apply this one too (this one is needed to fix the
second bug, my previous patch is needed after applying this one):

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/kernel/exit.c b/kernel/exit.c
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -841,6 +841,13 @@ static void exit_notify(struct task_stru

write_unlock_irq(_lock);

+   /*
+* Task struct can go away at the first schedule if this was a
+* self reaping task. Scheduling is forbidden until we set
+* the state to TASK_DEAD.
+*/
+   preempt_disable();
+
/* If the process is dead, release it - nobody will wait for it */
if (state == EXIT_DEAD)
release_task(tsk);
@@ -1042,7 +1049,6 @@ fastcall NORET_TYPE void do_exit(long co
if (tsk->splice_pipe)
__free_pipe_info(tsk->splice_pipe);

-   preempt_disable();
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;

> What are the implications of not running shrink_dcache_parent() on
> the exit path sometimes?  We'll leave procfs stuff behind?  Will
> they be reaped by memory pressure later on?

Yes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git patches] ocfs2 fixes

2007-11-27 Thread Mark Fasheh

Hi Linus,
Here's some more Ocfs2 patches for 2.6.24. This series also includes
a small performance fixup from Jan Kara. The rest of the patches are bug
fixes or very trivial cleanups.

Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus

to receive the following updates:

 fs/Kconfig |9 +
 fs/ocfs2/aops.c|2 +-
 fs/ocfs2/cluster/masklog.h |2 +-
 fs/ocfs2/dcache.c  |   20 
 fs/ocfs2/dlm/dlmmaster.c   |4 ++--
 fs/ocfs2/file.c|   19 +++
 fs/ocfs2/inode.c   |6 +++---
 fs/ocfs2/localalloc.c  |5 +++--
 fs/ocfs2/super.c   |6 +++---
 9 files changed, 53 insertions(+), 20 deletions(-)

Jan Kara (1):
  ocfs2: Remove expensive bitmap scanning

Joe Perches (1):
  fs/ocfs2: Add missing "space"

Mark Fasheh (6):
  ocfs2: Reset journal parameters after s_mount_opt update
  ocfs2: Filter -ENOSPC in mlog_errno()
  ocfs2: log valid inode # on bad inode
  ocfs2: Remove bug statement in ocfs2_dentry_iput()
  ocfs2: Fix comparison in ocfs2_size_fits_inline_data()
  ocfs2: reverse inline-data truncate args

diff --git a/fs/Kconfig b/fs/Kconfig
index 429a002..635f3e2 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -459,6 +459,15 @@ config OCFS2_DEBUG_MASKLOG
  This option will enlarge your kernel, but it allows debugging of
  ocfs2 filesystem issues.
 
+config OCFS2_DEBUG_FS
+   bool "OCFS2 expensive checks"
+   depends on OCFS2_FS
+   default n
+   help
+ This option will enable expensive consistency checks. Enable
+ this option for debugging only as it is likely to decrease
+ performance of the filesystem.
+
 config MINIX_FS
tristate "Minix fs support"
help
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 556e34c..56f7790 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1514,7 +1514,7 @@ int ocfs2_size_fits_inline_data(struct buffer_head 
*di_bh, u64 new_size)
 {
struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
 
-   if (new_size < le16_to_cpu(di->id2.i_data.id_count))
+   if (new_size <= le16_to_cpu(di->id2.i_data.id_count))
return 1;
return 0;
 }
diff --git a/fs/ocfs2/cluster/masklog.h b/fs/ocfs2/cluster/masklog.h
index cd04606..597e064 100644
--- a/fs/ocfs2/cluster/masklog.h
+++ b/fs/ocfs2/cluster/masklog.h
@@ -212,7 +212,7 @@ extern struct mlog_bits mlog_and_bits, mlog_not_bits;
 #define mlog_errno(st) do {\
int _st = (st); \
if (_st != -ERESTARTSYS && _st != -EINTR && \
-   _st != AOP_TRUNCATED_PAGE)  \
+   _st != AOP_TRUNCATED_PAGE && _st != -ENOSPC)\
mlog(ML_ERROR, "status = %lld\n", (long long)_st);  \
 } while (0)
 
diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index 1957a5e..9923278 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -344,12 +344,24 @@ static void ocfs2_dentry_iput(struct dentry *dentry, 
struct inode *inode)
 {
struct ocfs2_dentry_lock *dl = dentry->d_fsdata;
 
-   mlog_bug_on_msg(!dl && !(dentry->d_flags & DCACHE_DISCONNECTED),
-   "dentry: %.*s\n", dentry->d_name.len,
-   dentry->d_name.name);
+   if (!dl) {
+   /*
+* No dentry lock is ok if we're disconnected or
+* unhashed.
+*/
+   if (!(dentry->d_flags & DCACHE_DISCONNECTED) &&
+   !d_unhashed(dentry)) {
+   unsigned long long ino = 0ULL;
+   if (inode)
+   ino = (unsigned long 
long)OCFS2_I(inode)->ip_blkno;
+   mlog(ML_ERROR, "Dentry is missing cluster lock. "
+"inode: %llu, d_flags: 0x%x, d_name: %.*s\n",
+ino, dentry->d_flags, dentry->d_name.len,
+dentry->d_name.name);
+   }
 
-   if (!dl)
goto out;
+   }
 
mlog_bug_on_msg(dl->dl_count == 0, "dentry: %.*s, count: %u\n",
dentry->d_name.len, dentry->d_name.name,
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 62e4a7d..a54d33d 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -908,7 +908,7 @@ lookup:
 * but they might own this lockres.  wait on them. */
bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0);
if (bit < O2NM_MAX_NODES) {
-   mlog(ML_NOTICE, "%s:%.*s: at least one node (%d) to"
+   mlog(ML_NOTICE, "%s:%.*s: at least one node (%d) to "
 "recover

Re: laptop reboots right after hibernation

2007-11-27 Thread Tejun Heo

Kjartan Maraas wrote:
> I get this exact error message on a normal first time boot here. I'm
> using the latest fedora development kernel which is 2.6.24-rc2-git6
> based. And I have the latest BIOS from HP IIRC.
> 
> This is an HP nc6400 intel based laptop:

Care to post boot dmesg?  Or does harddisk detection fail because of this?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Module for keyboard statistics / What should I look at?

2007-11-27 Thread Nelson Castillo

Hi.

More than 2 years ago I wrote a small non-portable patch [1] to gather
some statistics
about keyboard usage[2] ( I show the counters with modulo 10 but still this is a
security risk. Perhaps some random noise should be added ).

[1] 
http://svn.arhuaco.org/svn/src/pcgotchi/trunk/proc-keystrokes.patch.linux-2.6.12.2.txt
[2] http://pcgotchi.blogspot.com/2005/10/11140-times.html

I liked to use this in my PC but stopped using this because I didn't
want to compile whenever I
upgraded Kernels (Debian).

I thought I had already asked in LKML as suggested by Blaisorblade[3],
but I did ask in
kernelnewbies instead[4].

[3] http://www.mail-archive.com/[EMAIL PROTECTED]/msg02168.html
[4] http://mail.nl.linux.org/kernelnewbies/2006-05/msg00367.html

My questions are:

* What should I read to do this in a module?
* Is there a better way to to this? (perhaps using /dev/input/... and
a user-space program) ?

Regards,
Nelson.-

PS: I don't wish to write a keylogger. I really want the statistics
and you are screwed if
someone gets to be root in your machine anyway.

-- 
http://arhuaco.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bonding sysfs output

2007-11-27 Thread Wagner Ferenc

Andrew Morton <[EMAIL PROTECTED]> writes:

> On Tue, 27 Nov 2007 10:56:43 +0100 Ferenc Wagner <[EMAIL PROTECTED]> wrote:
>
>>> - raise patches against the latest Linus tree
>>> (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/)
>> 
>> I thought it was better to change to git.  Isn't it so?
>> SubmittingPatches has nothing to say about that...
>> Can I find collected best practices somewhere?  Which tree, which
>> branch, how/when to rebase, format-patch, etc...
>
> gosh.  Documentation/Submit*,
> http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt,
> http://linux.yyz.us/patch-format.html, other places.  Probably people have
> written books about it by now.  But don't sweat it - you're close enough ;)

I wonder where the information got lost...  I miss docs on submitting
patches from git ONLY.  The general documentation is pretty good and
helpful, just doesn't treat git (not using git in general, but using
it for submitting patches to the Linux kernel).  On the other hand
there's a multitude of repositories to clone times a zillion branches
to follow.  Which should be the basis of the patches?  That's not very
clear.

Anyway, find them in my previous mails.  Too bad I realised just after
the fact that cosmetic changes should go first.  Hope it's mostly OK.
-- 
Regards,
Feri.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] net/bonding: Purely cosmetic: rename a local variable.

2007-11-27 Thread =?utf-8?q?Ferenc_W=C3=A1gner?=

Code for rendering multivalue sysfs files occurs three times
in this module.  Rename 'buffer' to 'buf' in the first, for
the sake of consistency.

Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>
---
 drivers/net/bonding/bond_sysfs.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 6bb91e2..5c31f5c 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -74,7 +74,7 @@ struct rw_semaphore bonding_rwsem;
  * "show" function for the bond_masters attribute.
  * The class parameter is ignored.
  */
-static ssize_t bonding_show_bonds(struct class *cls, char *buffer)
+static ssize_t bonding_show_bonds(struct class *cls, char *buf)
 {
int res = 0;
struct bonding *bond;
@@ -86,13 +86,12 @@ static ssize_t bonding_show_bonds(struct class *cls, char 
*buffer)
/* not enough space for another interface name */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
-   res += sprintf(buffer + res, "++more++ ");
+   res += sprintf(buf + res, "++more++ ");
break;
}
-   res += sprintf(buffer + res, "%s ",
-  bond->dev->name);
+   res += sprintf(buf + res, "%s ", bond->dev->name);
}
-   if (res) buffer[res-1] = '\n'; /* eat the leftover space */
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
up_read(&(bonding_rwsem));
return res;
 }
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: git guidance

2007-11-27 Thread Randy Dunlap

On Wed, 28 Nov 2007 00:52:38 +0100 Willy Tarreau wrote:

> Tilman, there was a howto by Jeff Garzik I believe. It helped me
> a lot when I didn't understand a damn command, even if it was in
> the very old ages (version 0.5 or something like this). The tutorials
> on the GIT site are quite good too. You must read them entirely and
> proceed with the examples as you read them. Believe me, it helps you
> understand a lot of things, specially about the split in 3 parts
> (objects, cache, and working dir).

FYI, Jeff's git info is at
  http://linux.yyz.us/git-howto.html

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] net/bonding: Return nothing for not applicable values

2007-11-27 Thread =?utf-8?q?Ferenc_W=C3=A1gner?=

The previous code returned '\n' (that is, a single empty line)
from most files, with one exception (xmit_hash_policy), where
it returned 'NA\n'.  This patch consolidates each file to return
nothing at all if not applicable, not even a '\n'.

I find this behaviour more usual, more useful, more efficient
and shorter to code from both sides.

Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>
---
 drivers/net/bonding/bond_sysfs.c |   25 -
 1 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a3f1b4a..6bb91e2 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -455,14 +455,11 @@ static ssize_t bonding_show_xmit_hash(struct device *d,
  struct device_attribute *attr,
  char *buf)
 {
-   int count;
+   int count = 0;
struct bonding *bond = to_bond(d);
 
-   if ((bond->params.mode != BOND_MODE_XOR) &&
-   (bond->params.mode != BOND_MODE_8023AD)) {
-   // Not Applicable
-   count = sprintf(buf, "NA\n");
-   } else {
+   if ((bond->params.mode == BOND_MODE_XOR) ||
+   (bond->params.mode == BOND_MODE_8023AD)) {
count = sprintf(buf, "%s %d\n",
xmit_hashtype_tbl[bond->params.xmit_policy].modename,
bond->params.xmit_policy);
@@ -1079,8 +1076,6 @@ static ssize_t bonding_show_primary(struct device *d,
 
if (bond->primary_slave)
count = sprintf(buf, "%s\n", bond->primary_slave->dev->name);
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
@@ -1186,7 +1181,7 @@ static ssize_t bonding_show_active_slave(struct device *d,
 {
struct slave *curr;
struct bonding *bond = to_bond(d);
-   int count;
+   int count = 0;
 
read_lock(>curr_slave_lock);
curr = bond->curr_active_slave;
@@ -1194,8 +1189,6 @@ static ssize_t bonding_show_active_slave(struct device *d,
 
if (USES_PRIMARY(bond->params.mode) && curr)
count = sprintf(buf, "%s\n", curr->dev->name);
-   else
-   count = sprintf(buf, "\n");
return count;
 }
 
@@ -1309,8 +1302,6 @@ static ssize_t bonding_show_ad_aggregator(struct device 
*d,
struct ad_info ad_info;
count = sprintf(buf, "%d\n", 
(bond_3ad_get_active_agg_info(bond, _info)) ?  0 : ad_info.aggregator_id);
}
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
@@ -1331,8 +1322,6 @@ static ssize_t bonding_show_ad_num_ports(struct device *d,
struct ad_info ad_info;
count = sprintf(buf, "%d\n", 
(bond_3ad_get_active_agg_info(bond, _info)) ?  0: ad_info.ports);
}
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
@@ -1353,8 +1342,6 @@ static ssize_t bonding_show_ad_actor_key(struct device *d,
struct ad_info ad_info;
count = sprintf(buf, "%d\n", 
(bond_3ad_get_active_agg_info(bond, _info)) ?  0 : ad_info.actor_key);
}
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
@@ -1375,8 +1362,6 @@ static ssize_t bonding_show_ad_partner_key(struct device 
*d,
struct ad_info ad_info;
count = sprintf(buf, "%d\n", 
(bond_3ad_get_active_agg_info(bond, _info)) ?  0 : ad_info.partner_key);
}
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
@@ -1401,8 +1386,6 @@ static ssize_t bonding_show_ad_partner_mac(struct device 
*d,
print_mac(mac, ad_info.partner_system));
}
}
-   else
-   count = sprintf(buf, "\n");
 
return count;
 }
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] Remove trailing NULs from network bonding sysfs interface.

2007-11-27 Thread =?utf-8?q?Ferenc_W=C3=A1gner?=

Also remove trailing spaces from multivalued files.

This fixes output like for example:

$ od -c /sys/class/net/bond0/bonding/slaves
000   e   t   h   -   l   e   f   t   e   t   h   -   r   i   g
020   h   t  \n  \0
025

It mostly entails deleting '+1'-s after sprintf() calls: the return value
of sprintf is the number of characters printed, without the closing NUL,
ie. exactly what the sysfs interface requires.  The three multivalue
cases are different, because they also have to swallow back a trailing
space.

Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>
---
 drivers/net/bonding/bond_sysfs.c |   66 +
 1 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index b29330d..a3f1b4a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -86,14 +86,13 @@ static ssize_t bonding_show_bonds(struct class *cls, char 
*buffer)
/* not enough space for another interface name */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
-   res += sprintf(buffer + res, "++more++");
+   res += sprintf(buffer + res, "++more++ ");
break;
}
res += sprintf(buffer + res, "%s ",
   bond->dev->name);
}
-   res += sprintf(buffer + res, "\n");
-   res++;
+   if (res) buffer[res-1] = '\n'; /* eat the leftover space */
up_read(&(bonding_rwsem));
return res;
 }
@@ -235,14 +234,13 @@ static ssize_t bonding_show_slaves(struct device *d,
/* not enough space for another interface name */
if ((PAGE_SIZE - res) > 10)
res = PAGE_SIZE - 10;
-   res += sprintf(buf + res, "++more++");
+   res += sprintf(buf + res, "++more++ ");
break;
}
res += sprintf(buf + res, "%s ", slave->dev->name);
}
read_unlock(>lock);
-   res += sprintf(buf + res, "\n");
-   res++;
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -406,7 +404,7 @@ static ssize_t bonding_show_mode(struct device *d,
 
return sprintf(buf, "%s %d\n",
bond_mode_tbl[bond->params.mode].modename,
-   bond->params.mode) + 1;
+   bond->params.mode);
 }
 
 static ssize_t bonding_store_mode(struct device *d,
@@ -463,11 +461,11 @@ static ssize_t bonding_show_xmit_hash(struct device *d,
if ((bond->params.mode != BOND_MODE_XOR) &&
(bond->params.mode != BOND_MODE_8023AD)) {
// Not Applicable
-   count = sprintf(buf, "NA\n") + 1;
+   count = sprintf(buf, "NA\n");
} else {
count = sprintf(buf, "%s %d\n",
xmit_hashtype_tbl[bond->params.xmit_policy].modename,
-   bond->params.xmit_policy) + 1;
+   bond->params.xmit_policy);
}
 
return count;
@@ -527,7 +525,7 @@ static ssize_t bonding_show_arp_validate(struct device *d,
 
return sprintf(buf, "%s %d\n",
   arp_validate_tbl[bond->params.arp_validate].modename,
-  bond->params.arp_validate) + 1;
+  bond->params.arp_validate);
 }
 
 static ssize_t bonding_store_arp_validate(struct device *d,
@@ -627,7 +625,7 @@ static ssize_t bonding_show_arp_interval(struct device *d,
 {
struct bonding *bond = to_bond(d);
 
-   return sprintf(buf, "%d\n", bond->params.arp_interval) + 1;
+   return sprintf(buf, "%d\n", bond->params.arp_interval);
 }
 
 static ssize_t bonding_store_arp_interval(struct device *d,
@@ -711,10 +709,7 @@ static ssize_t bonding_show_arp_targets(struct device *d,
res += sprintf(buf + res, "%u.%u.%u.%u ",
   NIPQUAD(bond->params.arp_targets[i]));
}
-   if (res)
-   res--;  /* eat the leftover space */
-   res += sprintf(buf + res, "\n");
-   res++;
+   if (res) buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -815,7 +810,7 @@ static ssize_t bonding_show_downdelay(struct device *d,
 {
struct bonding *bond = to_bond(d);
 
-   return sprintf(buf, "%d\n", bond->params.downdelay * 
bond->params.miimon) + 1;
+   return sprintf(buf, "%d\n", bond->params.downdelay * 
bond->params.miimon);
 }
 
 static ssize_t bonding_store_downdelay(struct device *d,
@@ -872,7 +867,7 @@ static ssize_t bonding_show_updelay(struct device *d,
 {
struct bonding *bond = to_bond(d);
 
-   return sprintf(buf, "%d\n", bond->params.updelay * bond->params.miimon) 
+ 1;
+   return sprintf(buf,

Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-27 Thread Eric W. Biederman

Neil Horman <[EMAIL PROTECTED]> writes:

>> Close.  There are two options with virtual wire mode.  
>> - Either the local apic is in virtual wire mode, and somehow the
>>   legacy interrupts make it to the local cpu.

> I assume this is the case if the ioapic is also in virtual wire
> mode.

No.  The ioapic is completely disabled in this case.

> and the
> destination field for the appropriate interrupt(s) (the timer interrupt in 
> this
> case) is set to either physical mode with a destination id of the lapic for 
> the
> running cpu, or if it is set to logical mode and the destination id has the
> corresponding bit for the running cpu set.  Is that right?

No.  All of the ioapic routing entries are disabled.

>> - Or an ioapic is in virtual wire mode and the legacy interrupt
>>   controller is connected to it.
>> 
> I thought we only had one ioapic in this system (Ben correct me if I'm wrong 
> on
> that please).  I thought the above printk told us that, because apic2 and pin2
> are both -1, that means that the 8259 isn't physically connected to any cpu, 
> and
> instead is routed through apic 0, and asserts on pin 2 of that ioapic. 
>
>> So I guess fundamentally for any SMP system that only supports the
>> cpu being in local apic mode and only routes interrupts to the boot
>> strap processor we could be in trouble.  That is what our current
>> information about your system suggests.
>> 
> If that were the case, then we would need to support moving kexec boot to 
> cpu0,
> at least in some limited cases.  I've got a patch together that enables the
> handshaking I was brainstorming earlier, which should allow an attempted jump 
> to
> cpu0 on a crash, with a fallback to booting on the crashing processor.  If we
> wind up confirming the above case, then I'll post it.
>
>> However most systems actually connect the i8254 PIC interrupt
> Sorry, to split hairs here, but you mean the 8259 right? 

Grr. Yes. The i8259.  Got the timer and the PIC numbers confused Oops.
The legacy configuration is the timer to the PIC to either the local
or the ioapic.

> Just want to be sure I'm clear on whats going on.  I thought the
> 8254 was the external timer. 

Yep.  You have it straight.

>> controller to the ioapic in virtual wire mode.  As I recall the
>> standard mapping is to ioapic 0, pin 0.  With ioapic 0, pin 2 being
>> the timer interrupt (Possibly it is the other way around).
>> 
>> So as a test we could feed those values into ioapic_8259 and see
>> if the kdump case works.  I believe we prefer putting the ioapic
>> into virtual wire mode over putting the cpu into virtual wire
>> mode.  We can only control which cpu receives the legacy interrupts if
>> we are putting the ioapic in virtual wire mode.
>> 
> I'm sorry, I can't find ioapic_8259 defined anywhere.  Where is that supposed 
> to
> be?  Show me where its defined and I'll happily write the patch.

Grr.  That should have been ioapic_i8259.  The second value we print out
in the ..Timer printk.

>> It may also be an interesting test to just enable the timer for the
>> ioapic in early boot, as you have suggested.  I don't have a clue what
>> that will do.
>> 
> Unfortunately nothing.  We've tried using the local apic timer in a previous
> test, and it resulted in no change, as did transitioning the cpu to the apic
> timer via a call to switch_ipi_to_APIC_timer.  Its possible I did something
> wrong however.

Well if you didn't have the local apic enabled beyond virtual wire
mode that could have cause problems.

Sure.  I suspect the probably is that you were still in a legacy irq
mode.

> Currently I'm writing a patch that calls setup_ioapic_dest after we call
> disable_IO_APIC.  Looking at the implementation, it appears that calling this
> function should rewrite the irq routing table in the ioapic to deliver
> interrupts to the set of online cpus, as defined by the TARGET_CPUS macro.  I
> asusme that if the ioapic is in virtual wire mode from the call to
> disablie_IO_APIC, then calling setup_ioapic_dest will force interrupts to be
> delivered to the crashing cpu, as it should be the only bit set in the online
> cpu mask.  Please feel free to poke holes in this idea.

Just try and make certain: ioapic_i8259.pin != -1
Which should cause disable_IO_APIC to put the ioapic and not the local
apic in virtual wire mode.

Anything else is likely to do strange things.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: git guidance

2007-11-27 Thread Kristoffer Ericson

On Wed, 28 Nov 2007 00:52:38 +0100
Willy Tarreau <[EMAIL PROTECTED]> wrote:

> On Tue, Nov 27, 2007 at 11:55:11PM +0100, Kristoffer Ericson wrote:
> > Greetings,
> > 
> > Google is your friend. If you're looking for irc channels you can always 
> > try #git at irc.freenode.net
> > Git howto/tutorial/... doesn't belong in the kernel mailinglist.
> 
> Well, I don't agree with you. His question is about how to use GIT to
> develop his driver.
>1) linux-kernel is a development ML.
>2) he needs help from people how already encountered such beginner's
>   issues and who might git very good advices.
Agreed, my main concern was turning list into a "git-support" list and since I 
used the tutorials myself to get started, I felt
they are quite satisfactory. However as you pointed out, needing help to 
develope his driver is a kernel matter. Point taken. :)

> 
> It should not turn into an endless thread led by people who want to
> redefine GIT's roadmap, but experience sharing helps a lot with GIT.
> 
> Tilman, there was a howto by Jeff Garzik I believe. It helped me
> a lot when I didn't understand a damn command, even if it was in
> the very old ages (version 0.5 or something like this). The tutorials
> on the GIT site are quite good too. You must read them entirely and
> proceed with the examples as you read them. Believe me, it helps you
> understand a lot of things, specially about the split in 3 parts
> (objects, cache, and working dir).
> 
> I really think that if your patches do not apply, it's because you
> have lost some changes due to a wrong initial use possibly caused
> by a mis-understanding of the tool. It happened to me too, but in
> this case you can almost certainly find your old changes in older
> commits.
> 
> I really hope that soon someone will come up with a big 400-pages
> book called "GIT" with a lot of good advices. It would be awesome.
I second that :)

> 
> Anyway, don't get demotivated about the tool or the workflow. If
> you find it inconvenient to use, you're doing something wrong and
> you don't know it.
> 
> Regards,
> Willy
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/7] Hibernation: Update messages

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Make hibernation messages start with one common prefix "PM: " and use
the word "hibernation" in messages as a synonym of "suspend to disk".

Turn some KERN_INFO messages into debug ones.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/power/disk.c |   28 +++-
 kernel/power/snapshot.c |   23 ---
 kernel/power/swap.c |   31 +--
 kernel/power/swsusp.c   |5 +++--
 4 files changed, 47 insertions(+), 40 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -191,8 +191,8 @@ int create_image(int platform_mode)
 */
error = device_power_down(PMSG_FREEZE);
if (error) {
-   printk(KERN_ERR "Some devices failed to power down, "
-   KERN_ERR "aborting suspend\n");
+   printk(KERN_ERR "PM: Some devices failed to power down, "
+   "aborting hibernation\n");
goto Enable_irqs;
}
 
@@ -203,7 +203,8 @@ int create_image(int platform_mode)
save_processor_state();
error = swsusp_arch_suspend();
if (error)
-   printk(KERN_ERR "Error %d while creating the image\n", error);
+   printk(KERN_ERR "PM: Error %d creating hibernation image\n",
+   error);
/* Restore control flow magically appears here */
restore_processor_state();
if (!in_suspend)
@@ -289,7 +290,7 @@ static int resume_target_kernel(void)
local_irq_disable();
error = device_power_down(PMSG_PRETHAW);
if (error) {
-   printk(KERN_ERR "Some devices failed to power down, "
+   printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
goto Enable_irqs;
}
@@ -438,7 +439,7 @@ static void power_down(void)
 * Valid image is on the disk, if we continue we risk serious data
 * corruption after resume.
 */
-   printk(KERN_CRIT "Please power me down manually\n");
+   printk(KERN_CRIT "PM: Please power down manually\n");
while(1);
 }
 
@@ -484,7 +485,7 @@ int hibernate(void)
if (error)
goto Exit;
 
-   printk("Syncing filesystems ... ");
+   printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
 
@@ -560,10 +561,11 @@ static int software_resume(void)
return -ENOENT;
}
swsusp_resume_device = name_to_dev_t(resume_file);
-   pr_debug("swsusp: Resume From Partition %s\n", resume_file);
+   pr_debug("PM: Resume from partition %s\n", resume_file);
} else {
-   pr_debug("swsusp: Resume From Partition %d:%d\n",
-MAJOR(swsusp_resume_device), 
MINOR(swsusp_resume_device));
+   pr_debug("PM: Resume from partition %d:%d\n",
+   MAJOR(swsusp_resume_device),
+   MINOR(swsusp_resume_device));
}
 
if (noresume) {
@@ -575,7 +577,7 @@ static int software_resume(void)
return 0;
}
 
-   pr_debug("PM: Checking swsusp image.\n");
+   pr_debug("PM: Checking hibernation image.\n");
error = swsusp_check();
if (error)
goto Unlock;
@@ -601,7 +603,7 @@ static int software_resume(void)
goto Done;
}
 
-   pr_debug("PM: Reading swsusp image.\n");
+   pr_debug("PM: Reading hibernation image.\n");
 
error = swsusp_read();
if (!error)
@@ -726,7 +728,7 @@ static ssize_t disk_store(struct kset *k
error = -EINVAL;
 
if (!error)
-   pr_debug("PM: suspend-to-disk mode set to '%s'\n",
+   pr_debug("PM: Hibernation mode set to '%s'\n",
 hibernation_modes[mode]);
mutex_unlock(_mutex);
return error ? error : n;
@@ -756,7 +758,7 @@ static ssize_t resume_store(struct kset 
mutex_lock(_mutex);
swsusp_resume_device = res;
mutex_unlock(_mutex);
-   printk("Attempting manual resume\n");
+   printk(KERN_INFO "PM: Starting manual resume from disk\n");
noresume = 0;
software_resume();
ret = n;
Index: linux-2.6/kernel/power/snapshot.c
===
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -635,7 +635,7 @@ __register_nosave_region(unsigned long s
region->end_pfn = end_pfn;
list_add_tail(>list, _regions);
  Report:
-   printk("swsusp: Registered nosave memory region: %016lx - %016lx\n",
+   printk(KERN_INFO "PM: Registered nosave memory: %016lx -

[PATCH 4/7] Hibernation: Fix comment in disk.c

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Fix a comment in kernel/power/disk.c so that it doesn't contain lines
longer that 80 characters.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/power/disk.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -568,8 +568,8 @@ static int software_resume(void)
 
if (noresume) {
/**
-* FIXME: If noresume is specified, we need to find the 
partition
-* and reset it back to normal swap space.
+* FIXME: If noresume is specified, we need to find the
+* partition and reset it back to normal swap space.
 */
mutex_unlock(_mutex);
return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/7] Hibernation: Remove unnecessary variable declaration

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Remove the unnecessary extern declaration of resume_file[]
from kernel/power/swap.c .

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/power/swap.c |2 --
 1 file changed, 2 deletions(-)

Index: linux-2.6/kernel/power/swap.c
===
--- linux-2.6.orig/kernel/power/swap.c
+++ linux-2.6/kernel/power/swap.c
@@ -28,8 +28,6 @@
 
 #include "power.h"
 
-extern char resume_file[];
-
 #define SWSUSP_SIG "S1SUSPEND"
 
 struct swsusp_header {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/7] Suspend: Use common prefix in messages

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Make suspend messages start with one common prefix "PM: ".

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/power/main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/main.c
===
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -227,7 +227,7 @@ static int suspend_enter(suspend_state_t
BUG_ON(!irqs_disabled());
 
if ((error = device_power_down(PMSG_SUSPEND))) {
-   printk(KERN_ERR "Some devices failed to power down\n");
+   printk(KERN_ERR "PM: Some devices failed to power down\n");
goto Done;
}
 
@@ -261,7 +261,7 @@ int suspend_devices_and_enter(suspend_st
suspend_console();
error = device_suspend(PMSG_SUSPEND);
if (error) {
-   printk(KERN_ERR "Some devices failed to suspend\n");
+   printk(KERN_ERR "PM: Some devices failed to suspend\n");
goto Resume_console;
}
 
@@ -344,7 +344,7 @@ static int enter_state(suspend_state_t s
if (!mutex_trylock(_mutex))
return -EBUSY;
 
-   printk("Syncing filesystems ... ");
+   printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/7] Suspend: Fix comment in main.c

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Fix a comment in kernel/power/main.c so that it doesn't contain lines
longer that 80 characters.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
---
 kernel/power/main.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/power/main.c
===
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -242,8 +242,8 @@ static int suspend_enter(suspend_state_t
 }
 
 /**
- * suspend_devices_and_enter - suspend devices and enter the desired 
system sleep
- *   state.
+ * suspend_devices_and_enter - suspend devices and enter the desired system
+ * sleep state.
  * @state:   state to enter
  */
 int suspend_devices_and_enter(suspend_state_t state)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/7] Hibernation: Move low level resume to disk.c

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Move the low level restore code to kernel/power/disk.c , since the
corresponding low level hibernation code is already there.

Make restore fail if device_power_down(PMSG_PRETHAW) returns an
error.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
---
 kernel/power/disk.c   |   49 -
 kernel/power/power.h  |1 -
 kernel/power/swsusp.c |   35 ---
 3 files changed, 48 insertions(+), 37 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -275,6 +275,53 @@ int hibernation_snapshot(int platform_mo
 }
 
 /**
+ * resume_target_kernel - prepare devices that need to be suspended with
+ * interrupts off, restore the contents of highmem that have not been
+ * restored yet from the image and run the low level code that will restore
+ * the remaining contents of memory and switch to the just restored target
+ * kernel.
+ */
+
+static int resume_target_kernel(void)
+{
+   int error;
+
+   local_irq_disable();
+   error = device_power_down(PMSG_PRETHAW);
+   if (error) {
+   printk(KERN_ERR "Some devices failed to power down, "
+   "aborting resume\n");
+   goto Enable_irqs;
+   }
+   /* We'll ignore saved state, but this gets preempt count (etc) right */
+   save_processor_state();
+   error = restore_highmem();
+   if (!error) {
+   error = swsusp_arch_resume();
+   /*
+* The code below is only ever reached in case of a failure.
+* Otherwise execution continues at place where
+* swsusp_arch_suspend() was called
+*/
+   BUG_ON(!error);
+   /* This call to restore_highmem() undos the previous one */
+   restore_highmem();
+   }
+   /*
+* The only reason why swsusp_arch_resume() can fail is memory being
+* very tight, so we have to free it as soon as we can to avoid
+* subsequent failures
+*/
+   swsusp_free();
+   restore_processor_state();
+   touch_softlockup_watchdog();
+   device_power_up();
+ Enable_irqs:
+   local_irq_enable();
+   return error;
+}
+
+/**
  * hibernation_restore - quiesce devices and restore the hibernation
  * snapshot image.  If successful, control returns in hibernation_snaphot()
  * @platform_mode - if set, use the platform driver, if available, to
@@ -297,7 +344,7 @@ int hibernation_restore(int platform_mod
if (!error) {
error = disable_nonboot_cpus();
if (!error)
-   error = swsusp_resume();
+   error = resume_target_kernel();
enable_nonboot_cpus();
}
platform_restore_cleanup(platform_mode);
Index: linux-2.6/kernel/power/swsusp.c
===
--- linux-2.6.orig/kernel/power/swsusp.c
+++ linux-2.6/kernel/power/swsusp.c
@@ -261,38 +261,3 @@ int swsusp_shrink_memory(void)
 
return 0;
 }
-
-int swsusp_resume(void)
-{
-   int error;
-
-   local_irq_disable();
-   /* NOTE:  device_power_down() is just a suspend() with irqs off;
-* it has no special "power things down" semantics
-*/
-   if (device_power_down(PMSG_PRETHAW))
-   printk(KERN_ERR "Some devices failed to power down, very 
bad\n");
-   /* We'll ignore saved state, but this gets preempt count (etc) right */
-   save_processor_state();
-   error = restore_highmem();
-   if (!error) {
-   error = swsusp_arch_resume();
-   /* The code below is only ever reached in case of a failure.
-* Otherwise execution continues at place where
-* swsusp_arch_suspend() was called
-*/
-   BUG_ON(!error);
-   /* This call to restore_highmem() undos the previous one */
-   restore_highmem();
-   }
-   /* The only reason why swsusp_arch_resume() can fail is memory being
-* very tight, so we have to free it as soon as we can to avoid
-* subsequent failures
-*/
-   swsusp_free();
-   restore_processor_state();
-   touch_softlockup_watchdog();
-   device_power_up();
-   local_irq_enable();
-   return error;
-}
Index: linux-2.6/kernel/power/power.h
===
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -154,7 +154,6 @@ extern int swsusp_swap_in_use(void);
 extern int swsusp_check(void);
 extern int swsusp_shrink_memory(void);
 extern void swsusp_free(void);
-extern int

[PATCH 0/7] Suspend and hibernation code cleanups (rev. 2)

2007-11-27 Thread Rafael J. Wysocki

Hi,

This is the second revision of the series of suspend and hibernation cleanup
patches I sent yesterday.

The patches are on top of the suspend branch of the linux-acpi-2.6 tree.

Please review.

Greetings,
Rafael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/7] Suspend: Fix compilation warning for CONFIG_SUSPEND unset

2007-11-27 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Make the new suspend debug facility code depend on CONFIG_SUSPEND,
as appropriate, to remove the compiler warning printed when CONFIG_PM is set
and CONFIG_SUSPEND is not set.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
---
 kernel/power/main.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/power/main.c
===
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -52,6 +52,8 @@ int pm_notifier_call_chain(unsigned long
 
 #endif /* CONFIG_PM_SLEEP */
 
+#ifdef CONFIG_SUSPEND
+
 #ifdef CONFIG_PM_DEBUG
 int pm_test_level = TEST_NONE;
 
@@ -125,9 +127,6 @@ power_attr(pm_test);
 static inline int suspend_test(int level) { return 0; }
 #endif /* !CONFIG_PM_DEBUG */
 
-
-#ifdef CONFIG_SUSPEND
-
 /* This is just an arbitrary number */
 #define FREE_PAGE_NUMBER (100)
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86: partial unification of asm-x86/bitops.h

2007-11-27 Thread Ingo Molnar


* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> This unifies the set/clear/test bit functions of asm/bitops.h.

great! I've added your 3 patches to x86.git. It built and booted fine.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-27 Thread Robert Hancock


[EMAIL PROTECTED] wrote:

Hello Robert,

I've attached additional detail on the config of the misbehaving system
including output from oprofile and PowerTop. PowerTop output leads me to
believe that maybe this is an interaction between my bridged ethernet
setup and dynticks? Hmmm...


Don't know about that, your top wakeups are from br_stp_enable_bridge, 
but that is only 26 a second - that doesn't explain a context switch 
rate of 150,000 a second..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] make I/O schedulers non-modular

2007-11-27 Thread Jarek Poplawski

Adrian Bunk wrote, On 11/27/2007 11:53 PM:

> On Tue, Nov 27, 2007 at 11:15:48PM +0100, Jarek Poplawski wrote:
... 
> Most Google hits are about abortion.
> 
> The fact that people use this term in some completely different 
> context does not give it the meaning you implied it had.
> 
> Oh, and this right of choice also does not exist in Poland...

Anyway, your later arguments could suggest you've understood,

what I've meant. And maybe abortion isn't bad association here...

...

> As one of the most active code removers in the kernel [1], I can tell 
> you what actually happens in practice:

...
> It's always surprising how many people complain when you deprecate or 

> remove a choice B that choice A wouldn't work for them, and who had 
> never reported their problems before since choice B worked for them...

Of course, all these choices should be reasonably limited, so the
opinions of users and maintainers should be always considered.

But, I was rather against something else: removing some maybe not very
popular, but still not buggy options, only to save a few kilobytes or
maintainers' time.

> [1] http://lwn.net/Articles/247582/

My congratulations! Of course, removing is something necessary, but I wish

you many problems! (== many users)

Thanks,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-27 Thread Eric W. Biederman

Ben Woodard <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Neil Horman <[EMAIL PROTECTED]> writes:
>>
 ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>>
>> Ben, what chipset is this?
>
> nVidia MCP55 pro
>
> It is the original version of
> http://www.supermicro.com/Aplus/motherboard/Opteron8000/MCP55/H8QM8-2.cfm
>
> i.e. not the -2. However, they don't seem to advertise the original
> version. Supermicro assures me that they are practically the same but I 
> haven't
> played with the -2 version yet.
>

That is enough for an initial approximation.  Unless something has
changed radically the Nvidia chipsets can put the ioapic instead of
the local apic in virtual wire mode so that is worth testing.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Christoph Lameter

On Wed, 28 Nov 2007, Andi Kleen wrote:

> It was demonstrated useful for some specific cases, like context switch early
> fetch on IA64. But I agree the prefetch on each list_for_each() is probably
> a bad idea and should be removed. Will also help code size.

Looks like sum_vm_events() is only ever called from all_vm_events(). 
Callers of all_vm_events():

App monitoring?
arch/s390/appldata/appldata_mem.c:  all_vm_events(ev);

Leds:
drivers/parisc/led.c:   all_vm_events(events);

proc out put for /proc/vmstat:
mm/vmstat.c:all_vm_events(e);

All of that does not seem to be performance critical 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] UML - Fix !NO_HZ busy-loop

2007-11-27 Thread Jeff Dike

On Tue, Nov 27, 2007 at 03:01:08PM -0800, Andrew Morton wrote:
> > +#ifdef UML_CONFIG_NO_HZ
> 
> Nothing ever defines this?

$ grep HZ obj/arch/um/include/uml-config.h 
#define UML_CONFIG_NO_HZ 1

uml-config.h is auto-generated to provide the config definitions to the
userspace side of UML.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Christoph Lameter

On Tue, 27 Nov 2007, Andrew Morton wrote:

> I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

vmstat: remove prefetch

Remove the prefetch logic in order to avoid touching impossible per cpu 
areas.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/vmstat.c |   11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

Index: linux-2.6/mm/vmstat.c
===
--- linux-2.6.orig/mm/vmstat.c  2007-11-27 16:04:15.345713812 -0800
+++ linux-2.6/mm/vmstat.c   2007-11-27 16:07:00.552713192 -0800
@@ -21,21 +21,14 @@ EXPORT_PER_CPU_SYMBOL(vm_event_states);
 
 static void sum_vm_events(unsigned long *ret, cpumask_t *cpumask)
 {
-   int cpu = 0;
+   int cpu;
int i;
 
memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
 
-   cpu = first_cpu(*cpumask);
-   while (cpu < NR_CPUS) {
+   for_each_cpu_mask(cpu, *cpumask) {
struct vm_event_state *this = _cpu(vm_event_states, cpu);
 
-   cpu = next_cpu(cpu, *cpumask);
-
-   if (cpu < NR_CPUS)
-   prefetch(_cpu(vm_event_states, cpu));
-
-
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
ret[i] += this->event[i];
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/27] ptrace: arch_has_single_step

2007-11-27 Thread Roland McGrath

> clean. Unless there are major problems with it this looks like 2.6.25 
> stuff. Would you mind to send updates/fixes against this tree?

I'd be glad to.  I'd be very happy to see this stuff make 2.6.25.

Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap


Christoph Lameter wrote:

On Tue, 27 Nov 2007, Randy Dunlap wrote:


+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y

def_bool y
  is the preferred form for those 2-liners above...



+
 config ARCH_NO_VIRT_TO_BUS
def_bool y
 


Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?


Yes, you can do
def_bool 
as well to make the new symbol be variable instead of constant.


--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Christoph Lameter

On Tue, 27 Nov 2007, Randy Dunlap wrote:

> > +config ARCH_SETS_UP_PER_CPU_AREA
> > +   bool
> > +   default y
> 
>   def_bool y
>   is the preferred form for those 2-liners above...
> 
> 
> > +
> >  config ARCH_NO_VIRT_TO_BUS
> > def_bool y
> >  

Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/27] ptrace: arch_has_single_step

2007-11-27 Thread Ingo Molnar


* Roland McGrath <[EMAIL PROTECTED]> wrote:

> > I did do an experimental will-it-apply and got a tremendous number 
> > of rejects against the x86 git tree, almost all of which went away 
> > when `patch -l' was used.  Seems that someone has gone on a 
> > whitespace rampage through arch/x86/ia32/ptrace32.c and 
> > arch/x86/ia32/ptrace64.c.
> 
> Damn, sorry about that.  I could have sworn I cranked everything 
> through the whitespace inoculation machine, but I guess I missed some.

i've resolved those and i've added your 27 patches to x86.git. You can 
pick it up from the 'mm' branch of x86.git:

 git-pull git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git mm

it builds and boots fine here, and the patches certainly look very 
clean. Unless there are major problems with it this looks like 2.6.25 
stuff. Would you mind to send updates/fixes against this tree?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-27 Thread Andi Kleen

> Would anyone have any problems with code that simply verified that the 
> state which we are restoring allowed interrupts to get to the processor 
> that we are currently crashing on and if not, poked in a reasonable value.

Sounds reasonable by itself.

> 
> Yes this would add some complexity to the code paths where we were 
> crashing but it could prevent the problem that we are seeing. It seems 
> like a small fairly safe change rather than a big disruptive change like 
> moving the initialization of the IOAPIC earlier in the boot process.

But longer (or not so long) term moving the IOAPIC earlier is the better 
option, 
simply because the short use of PIC mode traditionally was a source of problems
on a lot of boxes.

And it does not really make sense to keep this source of trouble just for a 
short 
time during boot when we could as well go directly into IO-APIC mode. This 
would 
probably also match what other OS are doing better and that is always a good 
idea
for stable operation.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: git guidance

2007-11-27 Thread Willy Tarreau

On Tue, Nov 27, 2007 at 11:55:11PM +0100, Kristoffer Ericson wrote:
> Greetings,
> 
> Google is your friend. If you're looking for irc channels you can always try 
> #git at irc.freenode.net
> Git howto/tutorial/... doesn't belong in the kernel mailinglist.

Well, I don't agree with you. His question is about how to use GIT to
develop his driver.
   1) linux-kernel is a development ML.
   2) he needs help from people how already encountered such beginner's
  issues and who might git very good advices.

It should not turn into an endless thread led by people who want to
redefine GIT's roadmap, but experience sharing helps a lot with GIT.

Tilman, there was a howto by Jeff Garzik I believe. It helped me
a lot when I didn't understand a damn command, even if it was in
the very old ages (version 0.5 or something like this). The tutorials
on the GIT site are quite good too. You must read them entirely and
proceed with the examples as you read them. Believe me, it helps you
understand a lot of things, specially about the split in 3 parts
(objects, cache, and working dir).

I really think that if your patches do not apply, it's because you
have lost some changes due to a wrong initial use possibly caused
by a mis-understanding of the tool. It happened to me too, but in
this case you can almost certainly find your old changes in older
commits.

I really hope that soon someone will come up with a big 400-pages
book called "GIT" with a lot of good advices. It would be awesome.

Anyway, don't get demotivated about the tool or the workflow. If
you find it inconvenient to use, you're doing something wrong and
you don't know it.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/4] Timerfd v3 - new timerfd API

2007-11-27 Thread Davide Libenzi

On Tue, 27 Nov 2007, Andrew Morton wrote:

> On Tue, 27 Nov 2007 12:47:46 -0800 (PST)
> Davide Libenzi <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 27 Nov 2007, Andrew Morton wrote:
> > 
> > > On Sun, 25 Nov 2007 14:14:19 -0800 Davide Libenzi <[EMAIL PROTECTED]> 
> > > wrote:
> > > 
> > > > +static struct file *timerfd_fget(int fd)
> > > > +{
> > > > +   struct file *file;
> > > > +
> > > > +   file = fget(fd);
> > > > +   if (!file)
> > > > +   return ERR_PTR(-EBADF);
> > > > +   if (file->f_op != _fops) {
> > > > +   fput(file);
> > > > +   return ERR_PTR(-EINVAL);
> > > > +   }
> > > > +
> > > > +   return file;
> > > > +}
> > > 
> > > I suppose we could use fget_light() in here sometime.  It is significantly
> > > quicker in microbenchmarks.  Or it was - nobody has checked that in a few
> > > years afaik.
> > 
> > Should I now?
> 
> No great rush.  It'd be fun to see if it actually makes any measurable
> difference on modern hardware (hint ;)).

I was going to say BS, given that at the time of the tests, the files 
struct was protected by an rwlock whereas now there's a rcu one.
But that seems not the case:

http://www.xmailserver.org/fget-light-test.c


$ fget-light-test -r 9   
warming up ...
testing non-shared ...
time = 314.354000 ms
testing shared ...
time = 390.781000 ms



And here is the oprofile output:

[SHARED CASE]
samples  %app name symbol name
7436 28.9339  vmlinux  __clear_user
2369  9.2179  vmlinux  system_call
1710  6.6537  vmlinux  fget_light
1244  4.8405  vmlinux  inotify_dentry_parent_queue_event
1128  4.3891  vmlinux  sys_read
1041  4.0506  libpthread-2.6.1.so  __read_nocancel
1027  3.9961  Xorg (no symbols)
978   3.8054  vmlinux  read_zero
755   2.9377  vmlinux  vfs_read
545   2.1206  vmlinux  inotify_inode_queue_event
507   1.9728  vmlinux  sysret_check
414   1.6109  vmlinux  rw_verify_area
405   1.5759  vmlinux  unix_poll
371   1.4436  nvidia   (no symbols)
333   1.2957  vmlinux  acpi_pm_read
311   1.2101  nvidia_drv.so(no symbols)
290   1.1284  vmlinux  do_select
288   1.1206  vmlinux  dnotify_parent
253   0.9844  libc-2.6.1.so(no symbols)
232   0.9027  bash (no symbols)
216   0.8405  libpthread-2.6.1.so  __pthread_enable_asynccancel


[UNSHARED CASE]
samples  %app name symbol name
6542 27.6922  vmlinux  __clear_user
4074 17.2452  vmlinux  vfs_read
1266  5.3590  vmlinux  inotify_inode_queue_event
1091  4.6182  Xorg (no symbols)
1059  4.4827  vmlinux  system_call
937   3.9663  libpthread-2.6.1.so  __read_nocancel
544   2.3027  vmlinux  clear_user
484   2.0488  vmlinux  dnotify_parent
445   1.8837  vmlinux  read_zero
438   1.8540  vmlinux  sysret_check
414   1.7525  vmlinux  unix_poll
407   1.7228  nvidia   (no symbols)
389   1.6466  vmlinux  acpi_pm_read
315   1.3334  nvidia_drv.so(no symbols)
312   1.3207  vmlinux  inotify_dentry_parent_queue_event
312   1.3207  vmlinux  sys_read
305   1.2911  vmlinux  rw_verify_area
304   1.2868  vmlinux  do_select
222   0.9397  libc-2.6.1.so(no symbols)
214   0.9059  bash (no symbols)
196   0.8297  fget-light-test  read_test
185   0.7831  vmlinux  fget_light



You can clearly notice the fget_light() drop out of the relevance window.
BTW, are all those "notify" noises supposed to be there?




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: clean up process_32/64.c

2007-11-27 Thread Hiroshi Shimamoto

White space and coding style clean up.
Make process_32/64.c similar.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   20 ++--
 arch/x86/kernel/process_64.c |  307 +-
 2 files changed, 163 insertions(+), 164 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a20de7f..bd707db 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,7 +133,7 @@ EXPORT_SYMBOL(default_idle);
  * to poll the ->work.need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+static void poll_idle(void)
 {
cpu_relax();
 }
@@ -330,8 +330,8 @@ void __show_registers(struct pt_regs *regs, int all)
printk("ESI: %08lx EDI: %08lx EBP: %08lx ESP: %08lx\n",
regs->esi, regs->edi, regs->ebp, esp);
printk(" DS: %04x ES: %04x FS: %04x GS: %04x SS: %04x\n",
-  regs->xds & 0x, regs->xes & 0x,
-  regs->xfs & 0x, gs, ss);
+   regs->xds & 0x, regs->xes & 0x,
+   regs->xfs & 0x, gs, ss);
 
if (!all)
return;
@@ -426,7 +426,7 @@ void flush_thread(void)
struct task_struct *tsk = current;
 
memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-   memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+   memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
clear_tsk_thread_flag(tsk, TIF_DEBUG);
/*
 * Forget coprocessor state..
@@ -451,8 +451,8 @@ void prepare_to_copy(struct task_struct *tsk)
 }
 
 int copy_thread(int nr, unsigned long clone_flags, unsigned long esp,
-   unsigned long unused,
-   struct task_struct * p, struct pt_regs * regs)
+   unsigned long unused,
+   struct task_struct * p, struct pt_regs * regs)
 {
struct pt_regs * childregs;
struct task_struct *tsk;
@@ -468,7 +468,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned 
long esp,
 
p->thread.eip = (unsigned long) ret_from_fork;
 
-   savesegment(gs,p->thread.gs);
+   savesegment(gs, p->thread.gs);
 
tsk = current;
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
@@ -513,7 +513,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
for (i = 0; i < 8; i++)
-   dump->u_debugreg[i] = current->thread.debugreg[i];  
+   dump->u_debugreg[i] = current->thread.debugreg[i];
 
if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - 
dump->start_stack)) >> PAGE_SHIFT;
@@ -528,7 +528,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
dump->regs.ds = regs->xds;
dump->regs.es = regs->xes;
dump->regs.fs = regs->xfs;
-   savesegment(gs,dump->regs.gs);
+   savesegment(gs, dump->regs.gs);
dump->regs.orig_eax = regs->orig_eax;
dump->regs.eip = regs->eip;
dump->regs.cs = regs->xcs;
@@ -540,7 +540,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
 }
 EXPORT_SYMBOL(dump_thread);
 
-/* 
+/*
  * Capture the user space registers if the task is not running (in user space)
  */
 int dump_task_regs(struct task_struct *tsk, elf_gregset_t *regs)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 87c8e7f..57167dc 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -3,7 +3,7 @@
  *
  *  Pentium III FXSR, SSE support
  * Gareth Hughes <[EMAIL PROTECTED]>, May 2000
- * 
+ *
  *  X86-64 port
  * Andi Kleen.
  *
@@ -19,19 +19,19 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -122,43 +122,12 @@ static void default_idle(void)
  * to poll the ->need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+static void poll_idle(void)
 {
local_irq_enable();
cpu_relax();
 }
 
-void cpu_idle_wait(void)
-{
-   unsigned int cpu, this_cpu = get_cpu();
-   cpumask_t map, tmp = current->cpus_allowed;
-
-   set_cpus_allowed(current, cpumask_of_cpu(this_cpu));
-   put_cpu();
-
-   cpus_clear(map);
-   for_each_online_cpu(cpu) {
-   per_cpu(cpu_idle_state, cpu) = 1;
-   cpu_set(cpu, map);
-   }
-
-   __get_cpu_var(cpu_idle_state) = 0;
-
-   wmb();
-   do {
-   ssleep(1);
-   for_each_online_cpu(cpu) {
-   if (cpu_isset(cpu, map) &&
-   !per_cpu(cpu_idle_state,

[PATCH 1/3] x86: partial unification of asm-x86/bitops.h

2007-11-27 Thread Jeremy Fitzhardinge

This unifies the set/clear/test bit functions of asm/bitops.h.

I have not attempted to merge the bit-finding functions, since they
rely on the machine word size and can't be easily restructured to work
generically without a lot of #ifdefs.  In particular, the 64-bit code
can assume the presence of conditional move instructions, whereas
32-bit needs to be more careful.

The inline assembly for the bit operations has been changed to remove
explicit sizing hints on the instructions, so the assembler will pick
the appropriate instruction forms depending on the architecture and
the context.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Linus Torvalds <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>

---
 include/asm-x86/bitops.h|  315 +++
 include/asm-x86/bitops_32.h |  308 --
 include/asm-x86/bitops_64.h |  297 
 3 files changed, 315 insertions(+), 605 deletions(-)

===
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -1,5 +1,320 @@
+#ifndef _ASM_X86_BITOPS_H
+#define _ASM_X86_BITOPS_H
+
+/*
+ * Copyright 1992, Linus Torvalds.
+ */
+
+#ifndef _LINUX_BITOPS_H
+#error only  can be included directly
+#endif
+
+#include 
+#include 
+
+/*
+ * These have to be done with inline assembly: that way the bit-setting
+ * is guaranteed to be atomic. All bit operations return 0 if the bit
+ * was cleared before the operation and != 0 if it was not.
+ *
+ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
+ */
+
+#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
+/* Technically wrong, but this avoids compilation errors on some gcc
+   versions. */
+#define ADDR "=m" (*(volatile long *) addr)
+#else
+#define ADDR "+m" (*(volatile long *) addr)
+#endif
+
+/**
+ * set_bit - Atomically set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * This function is atomic and may not be reordered.  See __set_bit()
+ * if you do not require the atomic guarantees.
+ *
+ * Note: there are no guarantees that this function will not be reordered
+ * on non x86 architectures, so if you are writing portable code,
+ * make sure not to rely on its reordering guarantees.
+ *
+ * Note that @nr may be almost arbitrarily large; this function is not
+ * restricted to acting on a single-word quantity.
+ */
+static inline void set_bit(int nr, volatile unsigned long *addr)
+{
+   asm volatile(LOCK_PREFIX "bts %1,%0"
+: ADDR
+: "Ir" (nr) : "memory");
+}
+
+/**
+ * __set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static inline void __set_bit(int nr, volatile unsigned long *addr)
+{
+   asm volatile("bts %1,%0"
+: ADDR
+: "Ir" (nr) : "memory");
+}
+
+
+/**
+ * clear_bit - Clears a bit in memory
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ *
+ * clear_bit() is atomic and may not be reordered.  However, it does
+ * not contain a memory barrier, so if it is used for locking purposes,
+ * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
+ * in order to ensure changes are visible on other processors.
+ */
+static inline void clear_bit(int nr, volatile unsigned long *addr)
+{
+   asm volatile(LOCK_PREFIX "btr %1,%0"
+: ADDR
+: "Ir" (nr));
+}
+
+/*
+ * clear_bit_unlock - Clears a bit in memory
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ *
+ * clear_bit() is atomic and implies release semantics before the memory
+ * operation. It can be used for an unlock.
+ */
+static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr)
+{
+   barrier();
+   clear_bit(nr, addr);
+}
+
+static inline void __clear_bit(int nr, volatile unsigned long *addr)
+{
+   asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
+}
+
+/*
+ * __clear_bit_unlock - Clears a bit in memory
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ *
+ * __clear_bit() is non-atomic and implies release semantics before the memory
+ * operation. It can be used for an unlock if no other CPUs can concurrently
+ * modify other bits in the word.
+ *
+ * No memory barrier is required here, because x86 cannot reorder stores past
+ * older loads. Same principle as spin_unlock.
+ */
+static inline void __clear_bit_unlock(unsigned nr, volatile unsigned long 
*addr)
+{
+   barrier();
+   __clear_bit(nr, addr);
+}
+
+#define smp_mb__before_clear_bit() barrier()
+#define

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Andi Kleen

On Tue, Nov 27, 2007 at 03:42:13PM -0800, Andrew Morton wrote:
> On Tue, 27 Nov 2007 15:22:56 -0800 (PST)
> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 27 Nov 2007, Andrew Morton wrote:
> > 
> > > The prefetch however might still need some work - we can indeed do
> > > prefetch() against a not-possible CPU's memory here.  And I do recall that
> > > 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> > > prefetching from a bad address.  I forget what the conclusion was on that
> > > matter.
> > > 
> > > If we do want to fix the prefetch-from-outer-space then we should be using
> > > cpu_isset(cpu, *cpumask) here rather than cpu_possible().
> > 
> > Generally the prefetch things have turned out to be not that useful. How 
> > about dropping the prefetch? I kept it because it was there.
> 
> I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

It was demonstrated useful for some specific cases, like context switch early
fetch on IA64. But I agree the prefetch on each list_for_each() is probably
a bad idea and should be removed. Will also help code size.

The best strategy is probably to figure out which oprofile counters
to use and then do some profiling and only insert prefetches where
the profiler actually finds significant cache misses.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] x86: add set/clear_cpu_cap operations

2007-11-27 Thread Jeremy Fitzhardinge

The patch to suppress bitops-related warnings added a pile of ugly
casts.  Many of these were related to the management of x86 CPU
capabilities.  Clean these up by adding specific set/clear_cpu_cap
macros, and use them consistently.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/kernel/alternative.c  |   13 ++--
 arch/x86/kernel/apic_32.c  |8 +++
 arch/x86/kernel/apic_64.c  |2 -
 arch/x86/kernel/cpu/addon_cpuid_features.c |2 -
 arch/x86/kernel/cpu/mcheck/mce_64.c|2 -
 arch/x86/kernel/setup_32.c |2 -
 arch/x86/kernel/setup_64.c |   29 +---
 arch/x86/kernel/vmi_32.c   |   10 -
 include/asm-x86/cpufeature.h   |5 +++-
 9 files changed, 38 insertions(+), 35 deletions(-)

===
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -356,15 +356,15 @@ void alternatives_smp_switch(int smp)
spin_lock_irqsave(_alt, flags);
if (smp) {
printk(KERN_INFO "SMP alternatives: switching to SMP code\n");
-   clear_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
-   clear_bit(X86_FEATURE_UP, cpu_data(0).x86_capability);
+   clear_cpu_cap(_cpu_data, X86_FEATURE_UP);
+   clear_cpu_cap(_data(0), X86_FEATURE_UP);
list_for_each_entry(mod, _alt_modules, next)
alternatives_smp_lock(mod->locks, mod->locks_end,
  mod->text, mod->text_end);
} else {
printk(KERN_INFO "SMP alternatives: switching to UP code\n");
-   set_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
-   set_bit(X86_FEATURE_UP, cpu_data(0).x86_capability);
+   set_cpu_cap(_cpu_data, X86_FEATURE_UP);
+   set_cpu_cap(_data(0), X86_FEATURE_UP);
list_for_each_entry(mod, _alt_modules, next)
alternatives_smp_unlock(mod->locks, mod->locks_end,
mod->text, mod->text_end);
@@ -431,8 +431,9 @@ void __init alternative_instructions(voi
if (smp_alt_once) {
if (1 == num_possible_cpus()) {
printk(KERN_INFO "SMP alternatives: switching to UP 
code\n");
-   set_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
-   set_bit(X86_FEATURE_UP, cpu_data(0).x86_capability);
+   set_cpu_cap(_cpu_data, X86_FEATURE_UP);
+   set_cpu_cap(_data(0), X86_FEATURE_UP);
+
alternatives_smp_unlock(__smp_locks, __smp_locks_end,
_text, _etext);
}
===
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1078,7 +1078,7 @@ static int __init detect_init_APIC (void
printk(KERN_WARNING "Could not enable APIC!\n");
return -1;
}
-   set_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+   set_cpu_cap(_cpu_data, X86_FEATURE_APIC);
mp_lapic_addr = APIC_DEFAULT_PHYS_BASE;
 
/* The BIOS may have set up the APIC at some other address */
@@ -1168,7 +1168,7 @@ int __init APIC_init_uniprocessor (void)
 int __init APIC_init_uniprocessor (void)
 {
if (enable_local_apic < 0)
-   clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+   clear_cpu_cap(_cpu_data, X86_FEATURE_APIC);
 
if (!smp_found_config && !cpu_has_apic)
return -1;
@@ -1180,7 +1180,7 @@ int __init APIC_init_uniprocessor (void)
APIC_INTEGRATED(apic_version[boot_cpu_physical_apicid])) {
printk(KERN_ERR "BIOS bug, local APIC #%d not detected!...\n",
   boot_cpu_physical_apicid);
-   clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+   clear_cpu_cap(_cpu_data, X86_FEATURE_APIC);
return -1;
}
 
@@ -1536,7 +1536,7 @@ static int __init parse_nolapic(char *ar
 static int __init parse_nolapic(char *arg)
 {
enable_local_apic = -1;
-   clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+   clear_cpu_cap(_cpu_data, X86_FEATURE_APIC);
return 0;
 }
 early_param("nolapic", parse_nolapic);
===
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -1193,7 +1193,7 @@ static __init int setup_disableapic(char
 static __init int setup_disableapic(char *str)
 {
disable_apic = 1;
-   clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+   clear_cpu_cap(_cpu_data, X86_FEATURE_APIC);
return

[PATCH 2/3] x86: clean up bitops-related warnings

2007-11-27 Thread Jeremy Fitzhardinge

Add casts to appropriate places to silence spurious bitops warnings.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/kernel/setup_64.c   |   31 ---
 arch/x86/kernel/smpboot_64.c |4 ++--
 arch/x86/mm/numa_64.c|2 +-
 include/asm-x86/cpufeature.h |2 +-
 include/asm-x86/numa_64.h|2 +-
 include/linux/thread_info.h  |   10 +-
 6 files changed, 26 insertions(+), 25 deletions(-)

===
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -685,19 +685,19 @@ static void __cpuinit init_amd(struct cp
 
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-   clear_bit(0*32+31, >x86_capability);
+   clear_bit(0*32+31, (unsigned long *)>x86_capability);
 
/* On C+ stepping K8 rep microcode works well for copy/memset */
level = cpuid_eax(1);
if (c->x86 == 15 && ((level >= 0x0f48 && level < 0x0f50) ||
 level >= 0x0f58))
-   set_bit(X86_FEATURE_REP_GOOD, >x86_capability);
+   set_bit(X86_FEATURE_REP_GOOD, (unsigned long 
*)>x86_capability);
if (c->x86 == 0x10 || c->x86 == 0x11)
-   set_bit(X86_FEATURE_REP_GOOD, >x86_capability);
+   set_bit(X86_FEATURE_REP_GOOD, (unsigned long 
*)>x86_capability);
 
/* Enable workaround for FXSAVE leak */
if (c->x86 >= 6)
-   set_bit(X86_FEATURE_FXSAVE_LEAK, >x86_capability);
+   set_bit(X86_FEATURE_FXSAVE_LEAK, (unsigned long 
*)>x86_capability);
 
level = get_model_name(c);
if (!level) {
@@ -713,7 +713,7 @@ static void __cpuinit init_amd(struct cp
 
/* c->x86_power is 8000_0007 edx. Bit 8 is constant TSC */
if (c->x86_power & (1<<8))
-   set_bit(X86_FEATURE_CONSTANT_TSC, >x86_capability);
+   set_bit(X86_FEATURE_CONSTANT_TSC, (unsigned long 
*)>x86_capability);
 
/* Multi core CPU? */
if (c->extended_cpuid_level >= 0x8008)
@@ -726,14 +726,14 @@ static void __cpuinit init_amd(struct cp
num_cache_leaves = 3;
 
if (c->x86 == 0xf || c->x86 == 0x10 || c->x86 == 0x11)
-   set_bit(X86_FEATURE_K8, >x86_capability);
+   set_bit(X86_FEATURE_K8, (unsigned long *)>x86_capability);
 
/* RDTSC can be speculated around */
-   clear_bit(X86_FEATURE_SYNC_RDTSC, >x86_capability);
+   clear_bit(X86_FEATURE_SYNC_RDTSC, (unsigned long *)>x86_capability);
 
/* Family 10 doesn't support C states in MWAIT so don't use it */
if (c->x86 == 0x10 && !force_mwait)
-   clear_bit(X86_FEATURE_MWAIT, >x86_capability);
+   clear_bit(X86_FEATURE_MWAIT, (unsigned long 
*)>x86_capability);
 
if (c->x86 == 0x10)
fam10h_check_enable_mmcfg(c);
@@ -838,16 +838,17 @@ static void __cpuinit init_intel(struct 
unsigned eax = cpuid_eax(10);
/* Check for version and the number of counters */
if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
-   set_bit(X86_FEATURE_ARCH_PERFMON, >x86_capability);
+   set_bit(X86_FEATURE_ARCH_PERFMON,
+   (unsigned long *)>x86_capability);
}
 
if (cpu_has_ds) {
unsigned int l1, l2;
rdmsr(MSR_IA32_MISC_ENABLE, l1, l2);
if (!(l1 & (1<<11)))
-   set_bit(X86_FEATURE_BTS, c->x86_capability);
+   set_bit(X86_FEATURE_BTS, (unsigned long 
*)c->x86_capability);
if (!(l1 & (1<<12)))
-   set_bit(X86_FEATURE_PEBS, c->x86_capability);
+   set_bit(X86_FEATURE_PEBS, (unsigned long 
*)c->x86_capability);
}
 
n = c->extended_cpuid_level;
@@ -866,13 +867,13 @@ static void __cpuinit init_intel(struct 
c->x86_cache_alignment = c->x86_clflush_size * 2;
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
-   set_bit(X86_FEATURE_CONSTANT_TSC, >x86_capability);
+   set_bit(X86_FEATURE_CONSTANT_TSC, (unsigned long 
*)>x86_capability);
if (c->x86 == 6)
-   set_bit(X86_FEATURE_REP_GOOD, >x86_capability);
+   set_bit(X86_FEATURE_REP_GOOD, (unsigned long 
*)>x86_capability);
if (c->x86 == 15)
-   set_bit(X86_FEATURE_SYNC_RDTSC, >x86_capability);
+   set_bit(X86_FEATURE_SYNC_RDTSC, (unsigned long 
*)>x86_capability);
else
-   clear_bit(X86_FEATURE_SYNC_RDTSC, >x86_capability);
+   clear_bit(X86_FEATURE_SYNC_RDTSC, (unsigned long 
*)>x86_capability);
c->x86_max_cores = intel_num_cpu_cores(c);
 
srat_detect_node();

[EMAIL PROTECTED]: intel-iommu-PMEN-think-oh patch.]

2007-11-27 Thread mark gross

I forgot to cc the list.

--mgross

- Forwarded message from mark gross <[EMAIL PROTECTED]> -

Date: Tue, 27 Nov 2007 15:46:09 -0800
From: mark gross <[EMAIL PROTECTED]>
To: Andrew Morton <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: intel-iommu-PMEN-think-oh patch.

I screwed up with my earlier patch to enable the portected memroy.  The
macro IOMMU_WAIT, exits when the condition goes true.  Without this
patch the code will hang at boot and some ( all?) vtd enabled systems.

--mgross

Signed-off-by: mark gross <[EMAIL PROTECTED]>

Index: linux-2.6.24-rc2+/drivers/pci/intel-iommu.c
===
--- linux-2.6.24-rc2+.orig/drivers/pci/intel-iommu.c2007-11-27 
15:29:38.0 -0800
+++ linux-2.6.24-rc2+/drivers/pci/intel-iommu.c 2007-11-27 15:30:06.0 
-0800
@@ -704,7 +704,7 @@

/* wait for the protected region status bit to clear */
IOMMU_WAIT_OP(iommu, DMAR_PMEN_REG,
-   readl, (pmen & DMA_PMEN_PRS), pmen);
+   readl, !(pmen & DMA_PMEN_PRS), pmen);

spin_unlock_irqrestore(>register_lock, flags);
 }

- End forwarded message -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Andrew Morton

On Tue, 27 Nov 2007 15:22:56 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Tue, 27 Nov 2007, Andrew Morton wrote:
> 
> > The prefetch however might still need some work - we can indeed do
> > prefetch() against a not-possible CPU's memory here.  And I do recall that
> > 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> > prefetching from a bad address.  I forget what the conclusion was on that
> > matter.
> > 
> > If we do want to fix the prefetch-from-outer-space then we should be using
> > cpu_isset(cpu, *cpumask) here rather than cpu_possible().
> 
> Generally the prefetch things have turned out to be not that useful. How 
> about dropping the prefetch? I kept it because it was there.

I don't recall anyone ever demonstrating that prefetch is useful in-kernel.

I think I've heard of situations where benefits have been seen in userspace
- if a loop does a lot of calculation on each datum which it fetches then
there's a good opportunity to pipeline the fetch with the on-core
crunching.  But kernel doesn't do that sort of thing..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap

On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote:

> The use of the __GENERIC_PERCPU is a bit problematic since arches
> may want to run their own percpu setup while using the generic
> percpu definitions. Replace it through a kconfig variable.
> 
> Cc: Rusty Russell <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> ---
> 
> Index: linux-2.6/arch/ia64/Kconfig
> ===
> --- linux-2.6.orig/arch/ia64/Kconfig  2007-11-26 15:38:56.415112360 -0800
> +++ linux-2.6/arch/ia64/Kconfig   2007-11-26 15:40:10.425862722 -0800
> @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
>   bool
>   default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> + bool
> + default y
> +
>  config DMI
>   bool
>   default y

> Index: linux-2.6/arch/sparc64/Kconfig
> ===
> --- linux-2.6.orig/arch/sparc64/Kconfig   2007-11-26 15:38:56.447111936 
> -0800
> +++ linux-2.6/arch/sparc64/Kconfig2007-11-26 15:40:10.425862722 -0800
> @@ -66,6 +66,10 @@ config AUDIT_ARCH
>   bool
>   default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> + bool
> + default y

def_bool y
  is the preferred form for those 2-liners above...


> +
>  config ARCH_NO_VIRT_TO_BUS
>   def_bool y
>  


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-27 Thread Neil Horman

On Tue, Nov 27, 2007 at 03:38:52PM -0700, Eric W. Biederman wrote:
> Neil Horman <[EMAIL PROTECTED]> writes:
> 
> >> 
> >> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> 
> Ben, what chipset is this?
> 
> >
> > Ok, I think from what I understand of what we're reading here, the apic2 = 
> > -1
> > and pin2 = -1 indicate that the 8259 has no direct connection to any cpu, 
> > which
> > means that on shutdown disable_IO_APIC should take us into virtual wire 
> > mode.
> > As such enabling the APIC early in boot should fix us, but more consisely,
> > rewriting the entry in the IOAPIC to deliver int0 to the only running cpu 
> > should
> > accomplish the same goal for this problem.  Does that sound reasonable (at 
> > least
> > as a test to ensure we understand the problem) to everyone?
> 
> Close.  There are two options with virtual wire mode.  
> - Either the local apic is in virtual wire mode, and somehow the
>   legacy interrupts make it to the local cpu.
I assume this is the case if the ioapic is also in virtual wire mode and the
destination field for the appropriate interrupt(s) (the timer interrupt in this
case) is set to either physical mode with a destination id of the lapic for the
running cpu, or if it is set to logical mode and the destination id has the
corresponding bit for the running cpu set.  Is that right?

> - Or an ioapic is in virtual wire mode and the legacy interrupt
>   controller is connected to it.
> 
I thought we only had one ioapic in this system (Ben correct me if I'm wrong on
that please).  I thought the above printk told us that, because apic2 and pin2
are both -1, that means that the 8259 isn't physically connected to any cpu, and
instead is routed through apic 0, and asserts on pin 2 of that ioapic. 

> So I guess fundamentally for any SMP system that only supports the
> cpu being in local apic mode and only routes interrupts to the boot
> strap processor we could be in trouble.  That is what our current
> information about your system suggests.
> 
If that were the case, then we would need to support moving kexec boot to cpu0,
at least in some limited cases.  I've got a patch together that enables the
handshaking I was brainstorming earlier, which should allow an attempted jump to
cpu0 on a crash, with a fallback to booting on the crashing processor.  If we
wind up confirming the above case, then I'll post it.

> However most systems actually connect the i8254 PIC interrupt
Sorry, to split hairs here, but you mean the 8259 right?  Just want to be sure
I'm clear on whats going on.  I thought the 8254 was the external timer.

> controller to the ioapic in virtual wire mode.  As I recall the
> standard mapping is to ioapic 0, pin 0.  With ioapic 0, pin 2 being
> the timer interrupt (Possibly it is the other way around).
> 
> So as a test we could feed those values into ioapic_8259 and see
> if the kdump case works.  I believe we prefer putting the ioapic
> into virtual wire mode over putting the cpu into virtual wire
> mode.  We can only control which cpu receives the legacy interrupts if
> we are putting the ioapic in virtual wire mode.
> 
I'm sorry, I can't find ioapic_8259 defined anywhere.  Where is that supposed to
be?  Show me where its defined and I'll happily write the patch.


> It may also be an interesting test to just enable the timer for the
> ioapic in early boot, as you have suggested.  I don't have a clue what
> that will do.
> 
Unfortunately nothing.  We've tried using the local apic timer in a previous
test, and it resulted in no change, as did transitioning the cpu to the apic
timer via a call to switch_ipi_to_APIC_timer.  Its possible I did something
wrong however.

Currently I'm writing a patch that calls setup_ioapic_dest after we call
disable_IO_APIC.  Looking at the implementation, it appears that calling this
function should rewrite the irq routing table in the ioapic to deliver
interrupts to the set of online cpus, as defined by the TARGET_CPUS macro.  I
asusme that if the ioapic is in virtual wire mode from the call to
disablie_IO_APIC, then calling setup_ioapic_dest will force interrupts to be
delivered to the crashing cpu, as it should be the only bit set in the online
cpu mask.  Please feel free to poke holes in this idea.


Thanks & Regards
Neil

> Eric

-- 
/***
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 [EMAIL PROTECTED]
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86: clean up bitops-related warnings

2007-11-27 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
> On Wednesday 28 November 2007 00:01:26 Jeremy Fitzhardinge wrote:
>   
>> Add casts to appropriate places to silence spurious bitops warnings.
>> 
>
> Looks a bit ugly. Perhaps it would be better to define standard wrappers
>
> clear_bit_unaligned/any/bettername() etc. ?

Well, most of the casts are already in wrapper functions.  The
setup_64.c ones are ugly, but the following patch cleans them up.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: freeze vs freezer

2007-11-27 Thread Jeremy Fitzhardinge

Kyle Moffett wrote:
> On Nov 27, 2007, at 17:49:18, Jeremy Fitzhardinge wrote:
>> Rafael J. Wysocki wrote:
>>> Well, this is more-or-less how we all imagine that should be done
>>> eventually.
>>>
>>> The main problem is how to implement it without causing too much
>>> breakage.  Also, there are some dirty details that need to be taken
>>> into consideration.
>>
>> For Xen suspend/resume, I'd like to use the freezer to get all
>> threads into a known consistent state (where, specifically, they
>> don't have any outstanding pagetable updates pending).  In other
>> words, the freezer as it currently stands is what I want, modulo some
>> of these issues where it gets caught up unexpectedly.  If threads end
>> up getting frozen anywhere preempt isn't explicitly disabled, it
>> wouldn't work for me.
>
> The problem with "one freezer" is that "known consistent state" means
> something completely different to every single driver and subsystem.

Not really.  The freezer puts tasks into a particular well-understood
state: they're either in usermode, or in the kernel in the
refrigerator.  And since the places which call into the refrigerator are
explicit in the source, and not terribly numerous, its easy to audit
exactly what the state is at each call.

> Xen wants it to mean "No pending page table updates and no more
> updates from this point forward".  A network driver wants it to mean
> "All pending network packets DMAed out or in and the device shut down
> with all remaining packets queued.  A SATA controller wants it to mean
> "All DMA quiesced and no more commands", etc.

Well, those are somewhat different.  The existing suspend/resume driver
callbacks are sufficient for a device to be in that state.  What I want
for Xen is more global: I just want to make sure tasks are not preempted
in the middle of a state which can't be suspended.  The specific details
of the state I want are moderately complex, but short lived.  The
problem with other mechanisms - like stop_machine - is that they can
leave threads preempted in one of the states I can't handle, whereas the
the freezer is more deterministic.

> The only way to have that work is to put minimal definitions of what
> state you care about in the drivers themselves.  For Xen this means
> that you need to have an appropriately-timed suspend handler which
> hooks into Xen code very precisely to create and preserve the "No
> pending page table updates" state that you care about.  It will be
> more work in the short term but it's the only maintainable solution in
> the long term IMO.

No, that doesn't really work.  Aside from scattering hooks everywhere
there's pagetable updates, there's no real existing place to hook into. 
While I could put those hooks in, they would amount to changing the
kernel-internal pagetable update interface for everyone to deal with a
corner case of a fairly obscure user - I don't think its a good tradeoff.

The freezer is nice because the state it puts each task into is
well-defined, and is well-suited for Xen's use.  In fact, I would agree
with you that the use I want to put the freezer to better suits it than
its current use in suspend/resume.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-27 Thread Greg KH

On Tue, Nov 27, 2007 at 06:10:42PM -0500, Kyle McMartin wrote:
> On Tue, Nov 27, 2007 at 03:02:52PM -0800, Greg KH wrote:
> > Last updated November 27, 2008
> > 
> 
> The future is now! ;-)

/me goes off to recharge the flux generator...

Heh, thanks, I'll go fix that.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

2007-11-27 Thread Ben Woodard


Andi Kleen wrote:

Are we putting the system back in PIC mode or virtual wire mode? I have
not seen systems which support PIC mode. All latest systems seems
to be having virtual wire mode. I think in case of PIC mode, interrupts


Yes it's probably virtual wire. For real PIC mode we would need really
old systems without APIC.


can be delivered to cpu0 only. In virt wire mode, one can program IOAPIC
to deliver interrupt to any of the cpus and that's what we have been


The code doesn't try to program anything specific, it just restores the state
that was left over originally by the BIOS.



So if the BIOS originally left the IOAPIC in a state where the timer 
interrupts were only going to CPU0 then by restoring that state we could 
be bringing this problem upon ourselves when we restore that state.


Would anyone have any problems with code that simply verified that the 
state which we are restoring allowed interrupts to get to the processor 
that we are currently crashing on and if not, poked in a reasonable value.


Yes this would add some complexity to the code paths where we were 
crashing but it could prevent the problem that we are seeing. It seems 
like a small fairly safe change rather than a big disruptive change like 
moving the initialization of the IOAPIC earlier in the boot process.



-Andi



--
-ben
-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Christoph Lameter

On Tue, 27 Nov 2007, Andrew Morton wrote:

> The prefetch however might still need some work - we can indeed do
> prefetch() against a not-possible CPU's memory here.  And I do recall that
> 4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
> prefetching from a bad address.  I forget what the conclusion was on that
> matter.
> 
> If we do want to fix the prefetch-from-outer-space then we should be using
> cpu_isset(cpu, *cpumask) here rather than cpu_possible().

Generally the prefetch things have turned out to be not that useful. How 
about dropping the prefetch? I kept it because it was there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: Prevent dereferencing non-allocated per_cpu variables

2007-11-27 Thread Andrew Morton

On Tue, 27 Nov 2007 15:12:41 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Tue, 27 Nov 2007 23:16:28 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Nov 27, 2007 at 01:50:53PM -0800, [EMAIL PROTECTED] wrote:
> > > Change loops controlled by 'for (i = 0; i < NR_CPUS; i++)' to use
> > > 'for_each_possible_cpu(i)' when there's a _remote possibility_ of
> > > dereferencing a non-allocated per_cpu variable involved.
> > > 
> > > All files except mm/vmstat.c are x86 arch.
> > > 
> > > Based on 2.6.24-rc3-mm1 .
> > > 
> > > Thanks to [EMAIL PROTECTED] for pointing this out.
> > 
> > Looks good to me. 2.6.24 candidate.
> 
> hm.  Has anyone any evidence that we're actually touching
> not-possible-cpu's memory here?
> 
> Also, the sum_vm_events() change looks buggy - it assumes that
> cpu_possible_map has no gaps in it.  But that change is unneeded because
> sum_vm_events() is only ever passed cpu_online_map and I'm hoping that we
> don't usually online not-possible CPUs.
> 
> --- a/mm/vmstat.c~mm-prevent-dereferencing-non-allocated-per_cpu-variables-fix
> +++ a/mm/vmstat.c
> @@ -27,12 +27,12 @@ static void sum_vm_events(unsigned long 
>   memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
>  
>   cpu = first_cpu(*cpumask);
> - while (cpu < NR_CPUS && cpu_possible(cpu)) {
> + while (cpu < NR_CPUS) {
>   struct vm_event_state *this = _cpu(vm_event_states, cpu);
>  
>   cpu = next_cpu(cpu, *cpumask);
>  
> - if (cpu < NR_CPUS && cpu_possible(cpu))
> + if (cpu < NR_CPUS)
>   prefetch(_cpu(vm_event_states, cpu));

The prefetch however might still need some work - we can indeed do
prefetch() against a not-possible CPU's memory here.  And I do recall that
4-5 years ago we did have a CPU (one of mine, iirc) which would oops when
prefetching from a bad address.  I forget what the conclusion was on that
matter.

If we do want to fix the prefetch-from-outer-space then we should be using
cpu_isset(cpu, *cpumask) here rather than cpu_possible().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] make I/O schedulers non-modular

2007-11-27 Thread Adrian Bunk

On Wed, Nov 28, 2007 at 12:02:08AM +0100, Jarek Poplawski wrote:
> Jarek Poplawski wrote, On 11/27/2007 11:15 PM:
> ...
> 
> > Otherwise it's not so hard to overlook some stagnation.
> 
> Btw., after this 'forking' thing etc. it seems I might have lost the point
> a little: which removed choices should justify such a fork.

Let me try to rephrase it:

If you think an open source project does something wrong you have the 
right to fork it and offer an (in your opinion) better version.

This is the right you have.

But if you think open source gives you any legal or moral right to 
demand any featurs or choices or whatever from developers you are 
completely mistaken.

> But, I hope,
> you didn't mean your patch only, because then e.g. this stagnation threat
> looks like a bit exaggerated...

The question how many I/O schedulers we need is anyway in no direction 
related to my patch.

> Jarek P.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-27 Thread Frans Pop

Greg KH wrote:
> Based on an original article by Jon Corbet for lwn.net written October 1,
> 2003 and located at http://lwn.net/Articles/51437/
> 
> Last updated November 27, 2008
  ^^^ 
Wow, that's impressive: both kobjects de-mystified and time travel made 
possible!

;-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 760 matches

Mail list logo