Re: [PATCH 2/2] Fix possible leakage of blocks in UDF

2007-06-01 Thread Andrew Morton
On Sat, 02 Jun 2007 00:17:51 -0500 Eric Sandeen <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Fri, 01 Jun 2007 17:37:49 -0500
> > Eric Sandeen <[EMAIL PROTECTED]> wrote:
> 
> >> going for the inode_lock twice?
> >>
> > 
> > lockdep should catch that.
> > 
> 
> hey that's a good idea...!  *sigh* sometimes I worry about myself... but 
> hey at least I got it right. :)
> 
> =
> [ INFO: possible recursive locking detected ]
> 2.6.22-rc3 #8
> -
> lt-fsstress/3285 is trying to acquire lock:
>   (inode_lock){--..}, at: [] __mark_inode_dirty+0xe2/0x16c
> 
> but task is already holding lock:
>   (inode_lock){--..}, at: [] 
> _atomic_dec_and_lock+0x39/0x58
> 
> other info that might help us debug this:
> 3 locks held by lt-fsstress/3285:
>   #0:  (>i_mutex/1){--..}, at: [] 
> do_rmdir+0x7c/0xe3
>   #1:  (>i_mutex){--..}, at: [] 
> mutex_lock+0x22/0x24
>   #2:  (inode_lock){--..}, at: [] 
> _atomic_dec_and_lock+0x39/0x58
> 
> stack backtrace:
> 
> Call Trace:
>   [] __lock_acquire+0x155/0xbaa
>   [] __mark_inode_dirty+0xe2/0x16c
>   [] lock_acquire+0x7b/0x9f
>   [] __mark_inode_dirty+0xe2/0x16c
>   [] _spin_lock+0x1e/0x28
>   [] __mark_inode_dirty+0xe2/0x16c
>   [] :udf:udf_write_aext+0x101/0x11b
>   [] :udf:extent_trunc+0xd6/0x123
>   [] :udf:udf_truncate_tail_extent+0xda/0x171
>   [] :udf:udf_drop_inode+0x26/0x35
>   [] iput+0x74/0x76
>   [] dentry_iput+0xa0/0xb8
>   [] prune_dcache+0xa2/0x174
>   [] d_kill+0x21/0x43
>   [] prune_one_dentry+0x3a/0xef
>   [] prune_dcache+0xed/0x174
>   [] shrink_dcache_parent+0x21/0x10e
>   [] dentry_unhash+0x26/0x84
>   [] vfs_rmdir+0x88/0x117
>   [] do_rmdir+0xa1/0xe3
>   [] syscall_trace_enter+0x8d/0x8f
>   [] sys_rmdir+0x11/0x13
>   [] tracesys+0xdc/0xe1
> 

Well.  Documentation/filesystems/Locking says

drop_inode: no  !!!inode_lock!!!

That patch is DOA, methinks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PATCH] ACPI patches for 2.6.22 - part 3

2007-06-01 Thread Len Brown
Hi Linus,

please pull from: 

git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release

This batch fixes two places where Linux currently fails to load tables
of the format that we expect OEM's to soon ship -- OEMx and
packages-as-method-arguments for _DSM.  In both cases the changed
code has no effect on systems without those needs.

The acpi_osi= boot parameter is expanded and invoked automatically
via DMI for some systems that need it.  We stop short here of changing
the default to acpi_osi=Linux -- that waits for 2.6.23.
The Thinkpad commit is for a 2.6.22 regression,
as is the section mis-match build fix.

This will update the files shown below.

thanks!

-Len

ps. individual patches are available on [EMAIL PROTECTED]
and a consolidated plain patch is available here:
ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.22/acpi-release-20070126-2.6.22-rc3.diff.gz

 Documentation/kernel-parameters.txt |5 -
 Documentation/thinkpad-acpi.txt |   25 ++---
 drivers/acpi/numa.c |2 
 drivers/acpi/osl.c  |  118 +--
 drivers/acpi/tables/tbinstal.c  |8 -
 drivers/acpi/thermal.c  |   13 +-
 drivers/acpi/utilities/utcopy.c |  120 ++--
 drivers/acpi/utilities/uteval.c |   28 +
 drivers/acpi/utilities/utobject.c   |   42 
 drivers/acpi/utilities/utxface.c|4 
 drivers/misc/thinkpad_acpi.c|   17 +--
 drivers/misc/thinkpad_acpi.h|6 -
 include/acpi/acpi_numa.h|2 
 include/acpi/acpiosxf.h |3 
 include/acpi/acpixf.h   |2 
 include/acpi/acutils.h  |2 
 16 files changed, 267 insertions(+), 130 deletions(-)

through these commits:

Bob Moore (1):
  ACPICA: Support for external package objects as method arguments

Henrique de Moraes Holschuh (1):
  ACPI: thinkpad-acpi: do not use named sysfs groups

Len Brown (4):
  ACPICA: allow Load(OEMx) tables
  ACPI: extend "acpi_osi=" boot option
  ACPI: Make _OSI(Linux) a special case
  ACPI: add __init to acpi_initialize_subsystem()

Thomas Renninger (1):
  ACPI: thermal: Replace pointer with name in trip_points

Tony Luck (1):
  ACPI: Section mismatch ... acpi_map_pxm_to_node

with this log:

commit c4d36a822e7c51cd6ffcf9133854d5e32489d269
Merge: fcf7535... dd272b5...
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Sat Jun 2 01:02:09 2007 -0400

Pull osi-now into release branch

commit fcf75356e9cf0460ef47a5b756bc3b0951ecab59
Merge: f285e3d... 6287ee3...
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Sat Jun 2 00:48:48 2007 -0400

Pull now into release branch

commit 6287ee32952b502c23d54f12895c3895ddbe5013
Author: Bob Moore <[EMAIL PROTECTED]>
Date:   Tue Apr 3 19:59:37 2007 -0400

ACPICA: Support for external package objects as method arguments

Implemented support to allow Package objects to be passed as
method arguments to the acpi_evaluate_object interface. Previously,
this would return an AE_NOT_IMPLEMENTED exception.

Signed-off-by: Bob Moore <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>

commit 8ff6f48d99a0351bcc9ceab422042ef9d3bad9aa
Author: Luck, Tony <[EMAIL PROTECTED]>
Date:   Thu May 24 13:57:40 2007 -0700

ACPI: Section mismatch ... acpi_map_pxm_to_node

Last of the "Section mismatch" errors from ia64 builds! 
acpi_map_pxm_to_node()
is defined with attribute __cpuinit, but is called by "normal" kernel 
functions
acpi_getnode() and acpi_map_cpu2node().

Commit f363d16fbb9374c0bd7f2757d412c287169094c9 moved the data structures on
which this routine operates from __cpuinitdata to regular memory, so this
routine can also move out of init space.

Signed-off-by: Tony Luck <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>

commit cc4c24e115ca7bc2e4ec74d70bcb8fda1d1a8df8
Author: Henrique de Moraes Holschuh <[EMAIL PROTECTED]>
Date:   Wed May 30 20:50:14 2007 -0300

ACPI: thinkpad-acpi: do not use named sysfs groups

The initial version of the thinkpad-acpi sysfs interface (not yet released
in any stable mainline kernel) made liberal use of named sysfs groups, in
order to get the attributes more organized.

This proved to be a really bad design decision.  Maybe if attribute groups
were as flexible as a real directory, and if binary attributes were not
second-class citizens, the idea of subdirs and named groups would not have
been so bad.

This patch makes all the thinkpad-acpi sysfs groups anonymous (thus
removing the subdirs), adds the former group names as a prefix (so that
hotkey/enable becomes hotkey_enable for example), and updates the
documentation.

These changes will make the thinkpad-acpi sysfs ABI a lot easier to
maintain.

Signed-off-by: Henrique de Moraes Holschuh <[EMAIL PROTECTED]>
  

Re: [PATCH 2/2] Fix possible leakage of blocks in UDF

2007-06-01 Thread Eric Sandeen

Andrew Morton wrote:

On Fri, 01 Jun 2007 17:37:49 -0500
Eric Sandeen <[EMAIL PROTECTED]> wrote:



going for the inode_lock twice?



lockdep should catch that.



hey that's a good idea...!  *sigh* sometimes I worry about myself... but 
hey at least I got it right. :)


=
[ INFO: possible recursive locking detected ]
2.6.22-rc3 #8
-
lt-fsstress/3285 is trying to acquire lock:
 (inode_lock){--..}, at: [] __mark_inode_dirty+0xe2/0x16c

but task is already holding lock:
 (inode_lock){--..}, at: [] 
_atomic_dec_and_lock+0x39/0x58


other info that might help us debug this:
3 locks held by lt-fsstress/3285:
 #0:  (>i_mutex/1){--..}, at: [] 
do_rmdir+0x7c/0xe3
 #1:  (>i_mutex){--..}, at: [] 
mutex_lock+0x22/0x24
 #2:  (inode_lock){--..}, at: [] 
_atomic_dec_and_lock+0x39/0x58


stack backtrace:

Call Trace:
 [] __lock_acquire+0x155/0xbaa
 [] __mark_inode_dirty+0xe2/0x16c
 [] lock_acquire+0x7b/0x9f
 [] __mark_inode_dirty+0xe2/0x16c
 [] _spin_lock+0x1e/0x28
 [] __mark_inode_dirty+0xe2/0x16c
 [] :udf:udf_write_aext+0x101/0x11b
 [] :udf:extent_trunc+0xd6/0x123
 [] :udf:udf_truncate_tail_extent+0xda/0x171
 [] :udf:udf_drop_inode+0x26/0x35
 [] iput+0x74/0x76
 [] dentry_iput+0xa0/0xb8
 [] prune_dcache+0xa2/0x174
 [] d_kill+0x21/0x43
 [] prune_one_dentry+0x3a/0xef
 [] prune_dcache+0xed/0x174
 [] shrink_dcache_parent+0x21/0x10e
 [] dentry_unhash+0x26/0x84
 [] vfs_rmdir+0x88/0x117
 [] do_rmdir+0xa1/0xe3
 [] syscall_trace_enter+0x8d/0x8f
 [] sys_rmdir+0x11/0x13
 [] tracesys+0xdc/0xe1

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Compact Flash performance...

2007-06-01 Thread Willy Tarreau
On Thu, May 31, 2007 at 06:43:46PM -0400, Mark Lord wrote:
> Jeff Garzik wrote:
> >Mark Lord wrote:
> >>Some cards may perform better when their "memory" interface is used
> >>instead of the "I/O" interface, or vice-versa.  I'm not sure which
> >>of the two methods was selected by libata (probably the "memory" 
> >>interface).
> >
> >I am very CF-ignorant.  How does libata select a memory or I/O interface 
> >on a CF device?
> 
> Right.  Usually we cannot select them, as it's the wires between
> the ATA chipset (motherboard) and the CFCARD that determine this.

CF cards support 3 modes (MEM, I/O and True IDE), and neither MEM nor I/O
modes can talk IDE. Most often, the PIN 9 is simply shorted to the ground
at the connector to set the card in True IDE mode, which makes it emulate
a standard IDE disk.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 21:45:15 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
> 
> > They are different instances which happen to have the same length (zero).
> 
> I guess one could use the slab allocators as a type of reservation 
> ticket generator with zero sized objects. Hmmm But is that really a 
> useful thing to do?
> 
> > But the code will incorrectly decide that they are the same instance.  It
> > might cause refcounting or accounting errors, for example.  I don't know - 
> > the
> > kernel's a big place.
> 
> That would have to occur with objects that are repeatedly allocated and 
> then linked toghether etc. Linking typicallty requires a listhead so its 
> typically difficult to do zero length objects.

Well I can't immediately think of a scenario in which it's likely to occur,
but we're in the position of trying to prove a negative.

Poke Bill Irwin - he'll think of something ;)

> > I agree the risk is low, but if something _does_ blow up, it will do so 
> > subtly.
> 
> The cases that we have seen so far are due to array allocations of N 
> elements where N == 0 leads to the creation of a zero sized object.
> The objects of the array are not zero sized it is just that zero of 
> them are allocated.

We lose leak-detection and double-free detection this way, too.  Not a big
deal.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Missing RAM on x86_64

2007-06-01 Thread Mike Richards

Hi, I appear to be missing quite a bit of RAM on an x86_64 system. I
have 1GB installed, but 'free' only shows 878MB:

pokey$ free -m
total   used   free sharedbuffers cached
Mem:   878571306  0 52332
-/+ buffers/cache:186691
Swap: 1023  0   1023

I'm used to seeing a little bit of RAM missing with 32bit systems, but
146MB seems a bit much. The part of dmesg that concerns the RAM is
shown below. Anyone know what's up here? Is this normal for an x86_64
system?

Linux version 2.6.20.11 ([EMAIL PROTECTED]) (gcc version 4.1.2) #1 SMP Thu
May 24 18:29:52 GMT 2007
Command line: auto BOOT_IMAGE=2.6.20.11 rw root=801
BIOS-provided physical RAM map:
BIOS-e820:  - 0009fc00 (usable)
BIOS-e820: 0009fc00 - 000a (reserved)
BIOS-e820: 000e - 0010 (reserved)
BIOS-e820: 0010 - 37fd (usable)
BIOS-e820: 37fd - 37fde000 (ACPI data)
BIOS-e820: 37fde000 - 3800 (ACPI NVS)
BIOS-e820: fec0 - fec01000 (reserved)
BIOS-e820: fee0 - fef0 (reserved)
BIOS-e820: ff78 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 229328) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.3 present.
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 229328) 1 entries of 256 used
Zone PFN ranges:
 DMA 0 -> 4096
 DMA324096 ->  1048576
 Normal1048576 ->  1048576
early_node_map[2] active PFN ranges
   0:0 ->  159
   0:  256 ->   229328
On node 0 totalpages: 229231
 DMA zone: 56 pages used for memmap
 DMA zone: 890 pages reserved
 DMA zone: 3053 pages, LIFO batch:0
 DMA32 zone: 3079 pages used for memmap
 DMA32 zone: 222153 pages, LIFO batch:31
 Normal zone: 0 pages used for memmap
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: nVidia   MPTABLE: Product ID: MCP51G/M MPTABLE:
APIC at: 0xFEE0
Processor #0 (Bootup-CPU)
I/O APIC #1 at 0xFEC0.
Setting APIC routing to flat
Processors: 1
Nosave address range: 0009f000 - 000a
Nosave address range: 000a - 000e
Nosave address range: 000e - 0010
Allocating PCI resources starting at 4000 (gap: 3800:c6c0)
PERCPU: Allocating 25088 bytes of per cpu data
Built 1 zonelists.  Total pages: 225206
Kernel command line: auto BOOT_IMAGE=2.6.20.11 rw root=801
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Checking aperture...
CPU 0: aperture @ 61ba00 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Memory: 898964k/917312k available (2235k kernel code, 17744k reserved,
750k data, 200k init)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Andrew Morton wrote:

> They are different instances which happen to have the same length (zero).

I guess one could use the slab allocators as a type of reservation 
ticket generator with zero sized objects. Hmmm But is that really a 
useful thing to do?

> But the code will incorrectly decide that they are the same instance.  It
> might cause refcounting or accounting errors, for example.  I don't know - the
> kernel's a big place.

That would have to occur with objects that are repeatedly allocated and 
then linked toghether etc. Linking typicallty requires a listhead so its 
typically difficult to do zero length objects.
 
> I agree the risk is low, but if something _does_ blow up, it will do so 
> subtly.

The cases that we have seen so far are due to array allocations of N 
elements where N == 0 leads to the creation of a zero sized object.
The objects of the array are not zero sized it is just that zero of 
them are allocated.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-06-01 Thread David Wagner
[EMAIL PROTECTED] writes:
>Experience over on the Windows side of the fence indicates that "remote bad
>guys get some local user first" is a *MAJOR* part of the current real-world
>threat model - the vast majority of successful attacks on end-user boxes these
>days start off with either "Get user to (click on link|open attachment)" or
>"Subvert the path to a website (either by hacking the real site or hijacking
>the DNS) and deliver a drive-by fruiting when the user visits the page".

AppArmor isn't trying to defend everyday users from getting phished or
social engineered; it is trying to protect servers from getting rooted
because of security holes in their network daemons.  I find that a
laudable goal.  Sure, it doesn't solve every security problem in the
world, but so what?  A tool that could solve that one security problem
would still be a useful thing, even if it did nothing else.

I don't find the Windows stuff too relevant here.  As I understand it,
AppArmor isn't aimed at defending Windows desktop users; it is aimed at
defending Linux servers.  A pretty different environment, I'd say.

Ultimately, there are some things AppArmor may be good at, and there
are also sure to be some things it is bloody useless for.  My hammer
isn't very good for screwing in screws, but I still find it useful.
I confess I don't understand the kvetching about AppArmor's goals.
What are you expecting, some kind of silver bullet?

A question I'd find more interesting is whether AppArmor is able to
meet its stated goals, under a reasonable threat model, and with what
degree of assurance, and at what cost.  But I don't know whether that's
relevant for the linux-kernel mailing list.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc3][ACPI?] Resume from s2r doesn't work.

2007-06-01 Thread Andrey Borzenkov
Olaf Dietsche wrote:

> resume from suspend to ram doesn't work for my laptop and never
> has. So, this is not a regression.
> 
> Hibernate (aka suspend to disk) works, however.
> 
> When I resume, everything seems to come up (fan becomes busy, disk and
> dvd spin up for a short time), but the machine is not responding to
> anything - neither keyboard, mouse nor ping from another machine. The
> laptop is effectively dead and only a power cycle helps.
> 

I understand that every case of failed resume is different, but this looks
quite similar to the problem I have:
http://bugzilla.kernel.org/show_bug.cgi?id=7499, at least symptoms :)

> I've tried a minimal config and init=/bin/bash as well, but the result
> is the same.
> 

One more similarity :) And BTW s2ram does not work either.

-andrey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 21:01:09 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
> 
> > > On Fri, 1 Jun 2007 18:37:46 -0700 (PDT) Christoph Lameter <[EMAIL 
> > > PROTECTED]> wrote:
> > >
> > > +#define ZERO_SIZE_PTR ((void *)16)
> > 
> > Jeremy's point was a good one.  The kernel _does_ use address-comparison
> > to determine object-inequality in an unknown but non-zero number of places.
> > 
> > It is of course unlikely that this will occur in conjunction with zero-sized
> > objects, but who knows?
> 
> The zero sized objects are always the same and have the same content of 
> nothingness. So the kernel would find that they are the same which they 
> indeed are. Why could this be a problem?

They are different instances which happen to have the same length (zero).

But the code will incorrectly decide that they are the same instance.  It
might cause refcounting or accounting errors, for example.  I don't know - the
kernel's a big place.

I agree the risk is low, but if something _does_ blow up, it will do so subtly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Conditionals for development tests and output

2007-06-01 Thread Christoph Lameter
This introduces 

CONFIG_DEVELKERNEL

If CONFIG_DEVELKERNEL is set then this is a development kernel.
Otherwise the kernel to be built is a a production kernel.


If CONFIG_DEVELKERNEL is set then the constant

DEVELKERNEL

is set to 1. Otherwise it is zero.

The following functions are defined in kernel.h for diagnostics

DEVEL_WARN_ON(condition)

Warn on the condition but only if this is a development kernel

DEVEL_WARN_ON_ONCE(condition)

Warn on the condition once but only if this is a development kernel

devel_printk(fmt, ...)

Output that should only be generated for a development kernel.


CONFIG_DEVELKERNEL is set and cleared by edit the Makefile. Line 7 contains

DEVELKERNEL = 1


Updates SLAB and SLUB to not perform the kmalloc(0) tests anymore
if the kernel is not a development kernel.


Editing DEVELKERNEL requires a 

make clean

to correctly rebuild the kernel with the proper settings. This is similar 
to editing the version number. Sam: Is there any way to avoid the make 
clean?

Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>


---
 Makefile |3 ++-
 include/linux/kernel.h   |   14 ++
 include/linux/slub_def.h |2 +-
 mm/slab.c|2 +-
 mm/slub.c|4 ++--
 scripts/kconfig/symbol.c |9 +
 6 files changed, 29 insertions(+), 5 deletions(-)

Index: slub/Makefile
===
--- slub.orig/Makefile  2007-06-01 20:11:57.0 -0700
+++ slub/Makefile   2007-06-01 20:56:27.0 -0700
@@ -3,6 +3,7 @@ PATCHLEVEL = 6
 SUBLEVEL = 22
 EXTRAVERSION = -rc3-mm1
 NAME = Jeff Thinks I Should Change This, But To What?
+DEVELKERNEL = 1
 
 # *DOCUMENTATION*
 # To see a list of typical targets execute "make help"
@@ -320,7 +321,7 @@ AFLAGS  := -D__ASSEMBLY__
 KERNELRELEASE = $(shell cat include/config/kernel.release 2> /dev/null)
 KERNELVERSION = $(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)
 
-export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
+export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION DEVELKERNEL
 export ARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC
 export CPP AR NM STRIP OBJCOPY OBJDUMP MAKE AWK GENKSYMS PERL UTS_MACHINE
 export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
Index: slub/scripts/kconfig/symbol.c
===
--- slub.orig/scripts/kconfig/symbol.c  2007-06-01 20:13:24.0 -0700
+++ slub/scripts/kconfig/symbol.c   2007-06-01 21:07:52.0 -0700
@@ -72,6 +72,15 @@ void sym_init(void)
sym->type = S_STRING;
sym->flags |= SYMBOL_AUTO;
sym_add_default(sym, uts.release);
+
+   sym = sym_lookup("DEVELKERNEL", 0);
+   sym->type = S_BOOLEAN;
+   sym->flags |= SYMBOL_VALID|SYMBOL_AUTO;
+   p = getenv("DEVELKERNEL");
+   if (p && atoi(p))
+   sym_add_default(sym, "y");
+   else
+   sym_add_default(sym, "n");
 }
 
 enum symbol_type sym_get_type(struct symbol *sym)
Index: slub/include/linux/kernel.h
===
--- slub.orig/include/linux/kernel.h2007-06-01 20:19:17.0 -0700
+++ slub/include/linux/kernel.h 2007-06-01 20:30:16.0 -0700
@@ -375,4 +375,18 @@ struct sysinfo {
 #define NUMA_BUILD 0
 #endif
 
+#ifdef CONFIG_DEVELKERNEL
+#define DEVELKERNEL 1
+#else
+#define DEVELKERNEL 0
+#endif
+
+#define DEVEL_WARN_ON(x)   WARN_ON(DEVELKERNEL && (x))
+#define DEVEL_WARN_ON_ONCE(x)  WARN_ON_ONCE(DEVELKERNEL && (x))
+
+#define devel_printk(fmt,arg...)   ({  
\
+   if (DEVELKERNEL)\
+   printk(fmt, ##arg); \
+})
+
 #endif
Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-06-01 20:25:06.0 -0700
+++ slub/include/linux/slub_def.h   2007-06-01 20:28:02.0 -0700
@@ -80,7 +80,7 @@ static inline int kmalloc_index(size_t s
 * allocate memory but return ZERO_SIZE_PTR.
 * WARN so that people can review and fix their code.
 */
-   WARN_ON_ONCE(size == 0);
+   DEVEL_WARN_ON_ONCE(size == 0);
 
if (!size)
return 0;
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 20:24:26.0 -0700
+++ slub/mm/slub.c  2007-06-01 20:28:20.0 -0700
@@ -2525,8 +2525,8 @@ void __init kmem_cache_init(void)
kmem_size = offsetof(struct kmem_cache, cpu_slab) +
nr_cpu_ids * sizeof(struct page *);
 
-   

2.6.22-rc3: more section mismatch

2007-06-01 Thread Andrey Borzenkov
Sorry if it was already reported.

  MODPOST vmlinux
WARNING: arch/i386/kernel/built-in.o(.text+0x963a): Section mismatch: 
reference to .init.text:amd_init_mtrr (between 'mtrr_bp_init' 
and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0x963f): Section mismatch: 
reference to .init.text:cyrix_init_mtrr (between 'mtrr_bp_init' 
and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0x9644): Section mismatch: 
reference to .init.text:centaur_init_mtrr (between 'mtrr_bp_init' 
and 'mtrr_attrib_to_str')
WARNING: arch/i386/kernel/built-in.o(.text+0xa6f5): Section mismatch: 
reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa709): Section mismatch: 
reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa730): Section mismatch: 
reference to .init.text: (between 'get_mtrr_state' and 'generic_get_mtrr')
  AS  arch/i386/boot/setup.o

config attached
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc3
# Mon May 28 00:07:26 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
CONFIG_MPENTIUMIII=y
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set

Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Andrew Morton wrote:

> > On Fri, 1 Jun 2007 18:37:46 -0700 (PDT) Christoph Lameter <[EMAIL 
> > PROTECTED]> wrote:
> >
> > +#define ZERO_SIZE_PTR ((void *)16)
> 
> Jeremy's point was a good one.  The kernel _does_ use address-comparison
> to determine object-inequality in an unknown but non-zero number of places.
> 
> It is of course unlikely that this will occur in conjunction with zero-sized
> objects, but who knows?

The zero sized objects are always the same and have the same content of 
nothingness. So the kernel would find that they are the same which they 
indeed are. Why could this be a problem?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman
"Darrick J. Wong" <[EMAIL PROTECTED]> writes:

> On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote:
>
>> I doubt it.  The practical problem is that cpu_down does not
>> and by design can not call the irq balancing part properly
>> and I haven't yet seen anything to suggest that we don't migrate
>> irq properly.
>> 
>> So I'm guessing it was the decision part.
>
> I'm not using any IRQ balancer, afaik.  As I recall, CONFIG_IRQBALANCE
> is i386-only, and I'm not running the userland irqbalance program
> either.  Just messing around with /proc/irq/*/smp_affinity by hand. :)

This is just getting confusing.

Emmanuel Fust.  Please play with /proc/irq/*/smp_affinity by hand and
confirm that you can move your irqs.  This will confirm it is the decision
part.

Darrick.  The cpu hotplug architecture makes it impossible to properly
call irq migration code that backs /proc/irq/*/smp_affinity.  Therefore
the cpu hotplug interface to irq migration is broken by design.  There
are some other bugs in the implementation of migrating irqs off of cpus
as well.  I'm pretty certain that some combination of those problems is
biting you.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Andrew Morton
> On Fri, 1 Jun 2007 18:37:46 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
> wrote:
>
> +#define ZERO_SIZE_PTR ((void *)16)

Jeremy's point was a good one.  The kernel _does_ use address-comparison
to determine object-inequality in an unknown but non-zero number of places.

It is of course unlikely that this will occur in conjunction with zero-sized
objects, but who knows?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Henrique de Moraes Holschuh
On Fri, 01 Jun 2007, Jeff Garzik wrote:
> IIRC, Debian was the one OS that really did need a shutdown utility 
> update, as the message says :)

Actually, editing /etc/init.d/halt is enough.  Find the hddown="-h" and
change it to hddown="".

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Linus Torvalds wrote:

> So when I suggested the uglier
> 
>   if ((unsigned long)x <= 16)
>   return;
> 
> I really did mean to use that ugly cast.. Yours is prettier, but sadly, 
> yours is simply not safe: a signed comparison might end up making _all_ 
> kernel pointers trigger that test.

Maybe we can have a compromise? Lets at least keep the ZERO_SIZE_PTR
reference in there.


SLUB: Make sure that the comparision with ZERO_SIZE_PTR is unsigned

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>


---
 mm/slub.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 20:00:56.0 -0700
+++ slub/mm/slub.c  2007-06-01 20:04:26.0 -0700
@@ -2338,7 +2338,13 @@ void kfree(const void *x)
struct kmem_cache *s;
struct page *page;
 
-   if (x <= ZERO_SIZE_PTR)
+   /*
+* This has to be an unsigned comparison. According to Linus
+* some gcc version tread a pointer as a signed entity. Then
+* this comparison would be true for all "negative" pointers
+* (which would cover the whole upper half of the address space).
+*/
+   if ((unsigned long)x <= (unsigned long)ZERO_SIZE_PTR)
return;
 
page = virt_to_head_page(x);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
>  
> - if (!x)
> + if (x <= ZERO_SIZE_PTR)
>   return;

Btw, this is _not_ safe.

A number of gcc versions have done signed arithmetic on pointers. It's 
insane and stupid, but it happens, and it so happens to work on 
architectures where the point where the sign changes over is not a valid 
pointer area.

On x86, doing signed arithmetic on pointers is a clear and unambiguous 
_bug_ (because a C object really _can_ start in "positive" space and end 
in "negative" pointer space), but I think some gcc versions did it there 
too.

On some other architectures, like x86-64, the virtual memory around the 
magic switch-over point is not mappable, so a C object cannot validly 
straddle the area where positive overflows into negative, and as such a 
compiler _could_ consider pointers to be signed (although I really don't 
see the point).

So when I suggested the uglier

if ((unsigned long)x <= 16)
return;

I really did mean to use that ugly cast.. Yours is prettier, but sadly, 
yours is simply not safe: a signed comparison might end up making _all_ 
kernel pointers trigger that test.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: first one disk drops our of raid, then the other - I/O error on same sector. raid bug?

2007-06-01 Thread Neil Brown
On Friday June 1, [EMAIL PROTECTED] wrote:
> This is a debian stable system in production. Kernel 2.6.18.
> I upgraded the kernel from 2.6.8 to 2.6.18 about 2 weeks ago and saw no
> problems until today.
> 
> 
> May 30 12:58:00 bitc kernel: (scsi0:A:1:0): Unexpected busfree in
> Command phase

This error is not indicative of drive problems, but rather of
buss/controller problems.

The fact that identical problems hit both devices support this.  I
would suggest that you check all cable connections and re-seat the
SCSI card (if it is a separate card).  If the problem persists, look
at replacing hardware.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread John Anthony Kazos Jr.
> > > +  * The behavior for zero sized allocs changes. We no longer
> > > +  * allocate memory but return ZERO_SIZE_PTR.
> > > +  * WARN so that people can review and fix their code.
> > 
> > I don't see why people have so much opposition to zero-size memory 
> > allocations. There's all sorts of situations where you want a resizeable 
> > array that may have zero objects, especially in these days of 
> > hotpluggability.
> 
> In case you have not read the description to the end: This patch does 
> exactly what you want and legitimizes zero size object use. The warning 
> will be remove before 2.6.22 is released.

Ah, sorry then. I just saw this, and remembered a rather contrary 
discussion about it quite a time ago. Excellent then.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Darrick J. Wong
On Fri, Jun 01, 2007 at 06:18:32PM -0600, Eric W. Biederman wrote:

> I doubt it.  The practical problem is that cpu_down does not
> and by design can not call the irq balancing part properly
> and I haven't yet seen anything to suggest that we don't migrate
> irq properly.
> 
> So I'm guessing it was the decision part.

I'm not using any IRQ balancer, afaik.  As I recall, CONFIG_IRQBALANCE
is i386-only, and I'm not running the userland irqbalance program
either.  Just messing around with /proc/irq/*/smp_affinity by hand. :)

--D
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, John Anthony Kazos Jr. wrote:

> > +* The behavior for zero sized allocs changes. We no longer
> > +* allocate memory but return ZERO_SIZE_PTR.
> > +* WARN so that people can review and fix their code.
> 
> I don't see why people have so much opposition to zero-size memory 
> allocations. There's all sorts of situations where you want a resizeable 
> array that may have zero objects, especially in these days of 
> hotpluggability.

In case you have not read the description to the end: This patch does 
exactly what you want and legitimizes zero size object use. The warning 
will be remove before 2.6.22 is released.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


first one disk drops our of raid, then the other - I/O error on same sector. raid bug?

2007-06-01 Thread Brad Langhorst
This is a debian stable system in production. Kernel 2.6.18.
I upgraded the kernel from 2.6.8 to 2.6.18 about 2 weeks ago and saw no
problems until today.


May 30 12:58:00 bitc kernel: (scsi0:A:1:0): Unexpected busfree in
Command phase
May 30 12:58:00 bitc kernel: SEQADDR == 0x16c
May 30 12:58:00 bitc kernel: sd 0:0:1:0: SCSI error: return code =
0x0001
May 30 12:58:00 bitc kernel: end_request: I/O error, dev sdb, sector
287322317
May 30 12:58:00 bitc kernel: raid1: Disk failure on sdb2, disabling
device. 
May 30 12:58:00 bitc kernel: ^IOperation continuing on 1 devices
May 30 12:58:01 bitc kernel: sd 0:0:1:0: SCSI error: return code =
0x0001
May 30 12:58:01 bitc kernel: end_request: I/O error, dev sdb, sector
284758725
May 30 12:58:01 bitc kernel: RAID1 conf printout:
May 30 12:58:01 bitc kernel:  --- wd:1 rd:2
May 30 12:58:01 bitc kernel:  disk 0, wo:0, o:1, dev:sda2
May 30 12:58:01 bitc kernel:  disk 1, wo:1, o:0, dev:sdb2
May 30 12:58:01 bitc kernel: RAID1 conf printout:
May 30 12:58:01 bitc kernel:  --- wd:1 rd:2
May 30 12:58:01 bitc kernel:  disk 0, wo:0, o:1, dev:sda2

I can add /dev/sdb back

May 30 17:14:17 bitc kernel: md: md1: sync done.
May 30 17:14:17 bitc kernel: RAID1 conf printout:
May 30 17:14:17 bitc kernel:  --- wd:2 rd:2
May 30 17:14:17 bitc kernel:  disk 0, wo:0, o:1, dev:sda2
May 30 17:14:17 bitc kernel:  disk 1, wo:0, o:1, dev:sdb2

thenk /dev/sda fell out!

May 31 02:37:06 bitc kernel: (scsi0:A:0:0): Unexpected busfree in
Command phase
May 31 02:37:06 bitc kernel: SEQADDR == 0x16c
May 31 02:37:06 bitc kernel: sd 0:0:0:0: SCSI error: return code =
0x0001
May 31 02:37:06 bitc kernel: end_request: I/O error, dev sda, sector
287322317
May 31 02:37:06 bitc kernel: raid1: Disk failure on sda2, disabling
device. 
May 31 02:37:06 bitc kernel: ^IOperation continuing on 1 devices
May 31 02:37:06 bitc kernel: sd 0:0:0:0: SCSI error: return code =
0x0001
May 31 02:37:06 bitc kernel: end_request: I/O error, dev sda, sector
276325445
May 31 02:37:06 bitc kernel: RAID1 conf printout:
May 31 02:37:06 bitc kernel:  --- wd:1 rd:2
May 31 02:37:06 bitc kernel:  disk 0, wo:1, o:0, dev:sda2
May 31 02:37:06 bitc kernel:  disk 1, wo:0, o:1, dev:sdb2
May 31 02:37:06 bitc kernel: RAID1 conf printout:
May 31 02:37:06 bitc kernel:  --- wd:1 rd:2
May 31 02:37:06 bitc kernel:  disk 1, wo:0, o:1, dev:sdb2

Note that the first error is at the same sector for both disks
287322317(sda) = 287322317(sdb)
but the second one is at a different spot 
276325445(sda) != 284758725(sbd)

it seems unlikely that both disks have exactly the same bad sector 

Maybe a disk is failing, but smart doesn't show a big difference in ECC
errors between the two disks.  Seems wrong that the kernel raid should
first drop one disk, then the other if one disk is really failing.

Is this a bug?

Brad



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread John Anthony Kazos Jr.
> +  * The behavior for zero sized allocs changes. We no longer
> +  * allocate memory but return ZERO_SIZE_PTR.
> +  * WARN so that people can review and fix their code.

I don't see why people have so much opposition to zero-size memory 
allocations. There's all sorts of situations where you want a resizeable 
array that may have zero objects, especially in these days of 
hotpluggability.

Not only is it simpler (and therefore less likely to be buggy) to write 
code to simply resize to current number of objects, but not having to make 
additional code for checking the special case of count==0 leading to 
different function calls (instead of always reallocating, you might have 
to allocate instead, and instead of reallocating to zero you might have to 
free instead) will very slightly increase the object-text size.

The standard-C behavior of valid zero-size allocation has a very good 
reason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] since the definition of dst_discard_in and dst_discard_out are the same,

2007-06-01 Thread Denis Cheng
they should merged into one

Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 net/core/dst.c |   17 -
 1 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 764bccb..c6a0587 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -111,13 +111,7 @@ out:
spin_unlock(_lock);
 }
 
-static int dst_discard_in(struct sk_buff *skb)
-{
-   kfree_skb(skb);
-   return 0;
-}
-
-static int dst_discard_out(struct sk_buff *skb)
+static int dst_discard(struct sk_buff *skb)
 {
kfree_skb(skb);
return 0;
@@ -138,8 +132,7 @@ void * dst_alloc(struct dst_ops * ops)
dst->ops = ops;
dst->lastuse = jiffies;
dst->path = dst;
-   dst->input = dst_discard_in;
-   dst->output = dst_discard_out;
+   dst->input = dst->output = dst_discard;
 #if RT_CACHE_DEBUG >= 2
atomic_inc(_total);
 #endif
@@ -153,8 +146,7 @@ static void ___dst_free(struct dst_entry * dst)
   protocol module is unloaded.
 */
if (dst->dev == NULL || !(dst->dev->flags_UP)) {
-   dst->input = dst_discard_in;
-   dst->output = dst_discard_out;
+   dst->input = dst->output = dst_discard;
}
dst->obsolete = 2;
 }
@@ -242,8 +234,7 @@ static inline void dst_ifdown(struct dst_entry *dst, struct 
net_device *dev,
return;
 
if (!unregister) {
-   dst->input = dst_discard_in;
-   dst->output = dst_discard_out;
+   dst->input = dst->output = dst_discard;
} else {
dst->dev = _dev;
dev_hold(_dev);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Dependent CPU core speed reporting not updated with CPUFREQ_SHARED_TYPE_HW?

2007-06-01 Thread Pallipadi, Venkatesh
 

>-Original Message-
>From: Darrick J. Wong [mailto:[EMAIL PROTECTED] 
>Sent: Friday, June 01, 2007 11:44 AM
>To: Pallipadi, Venkatesh
>Cc: linux-kernel@vger.kernel.org
>Subject: Re: Dependent CPU core speed reporting not updated 
>with CPUFREQ_SHARED_TYPE_HW?
>
>On Thu, Mar 29, 2007 at 06:06:22PM -0700, Pallipadi, Venkatesh wrote:
>> thought of
>> making affected CPUs show the dependency in case of hw coord, but
>> retaining the percpu
>> control. But, it seemed complicated change for something that is
>> cosmetic.
>
>Actually, it's not so cosmetic any more.  Our newest servers have a
>power meter that measures power consumption, and I'm writing a program
>to measure the power cost of various cpufreq transitions in order to
>enforce a power cap.  Due to the under-reporting in affected_cpus, the
>app thinks that (taking your example above) CPUs 0 and 2 can be
>controlled independently.  Thus, a p-state transition of (x, x) ->
>(x, x-1) yields no energy saving at all, while (x, x-1) -> (x-1, x-1)
>does.  My program considers the effects of a single CPU's transition
>independently of which CPU it is and without considering what
>frequencies the other CPUs are operating at, which means that it will
>conclude that the cost of increasing speed (or the reward for 
>decreasing
>it) is half of what it is ... sort of.  It's mildly broken as a result,
>though amusingly enough it still seems to work ok.  I suspect that it
>might flail around trying to hit a cap a bit more than it would if
>affected_cpus were more accurate.

Hmmm. How about having a new cpufreq_sysfs entry to say
these CPUs are frequency dependent in hardware.

affected_cpus today has a single cpufreq directory for all affected_cpus
and we coordinate all CPUs in software. To change freq, we will have to
move among all affected_cpus and write an MSR.

Hardware coordination basically tells us that kernel can control
frequency
percpu, but underneath hardware will pick highest requested freq among a
group of CPUs. Instaed of handling this case as the existing software
coordination case above, we can add a new entry in cpufreq /sysfs
denoting
hardware coordinated CPU group.

Though it will be confusing with too many interfaces, I feel this is the
right way to go about here.

Comments? Thoughts?

Thanks,
Venki  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A kexec approach to hibernation

2007-06-01 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

>> But kernel threads also rely on userspace, due to e.g. fuse and usermode
>> helpers.

> Yes, I know that and I think these issues are solvable within the current
> approach.

It seems like it would be very hard to get writing of an image to a
fuse filesystem working under the current scheme.

Trying to image a system while it is running seems fundamentally broken.
As another example, I believe currently although devices are "quiesced"
or stopped while the atomic snapshot is made, they are all then started
again afterward while the image is written to disk.  As a result, the
network drivers will continue acking TCP packets that are received after
the snapshot, but these packets will be lost.

You might claim then that the solution is to simply keep the network
driver quiesced or stopped.  But then it is impossible to write the
image over the network.  The way to get around this problem is to write
the image over the network using a fresh network stack.

>> [snip]
>> 
>> >> > One more thing: How do we restore the system state?
>> >> 
>> >> The "resume kernel" would be loaded at the same address as the "save
>> >> kernel" was loaded (it should probably be the same kernel),
>> 
>> > Well, we'd have to use a relocatable kernel for this purpose, it
>> > seems.
>> 
>> Not necessarily relocatable (although that would be the usual
>> solution).  It just needs to be loaded at a different address than the
>> normal kernel.

> AFAICS, you can't do that with a kernel which is not relocatable (you can load
> it, of course, but will it work then?).

I seem to recall in recent kernel versions support for both a
relocatable kernel and also support for non-relocatable kernels which
load at a non-standard address.

>> If it isn't relocatable, the memory that would be needed by the "save kernel"
>> would have to be reserved at boot. 

> That doesn't seem to be realistic to me.

Okay.  I don't see why there would be a problem with using a relocatable
kernel though.

[snip]

>> >> Presumably it would be most convenient to have the normal boot loader
>> >> load the resume kernel directly at the desired address.  The
>> >> disadvantage is that at the same time the image is written, something
>> >> would have to be done so that the boot loader would know to load the
>> >> resume kernel, rather than the normal kernel.  (E.g. the image writing
>> >> kernel would need to modify the grub config file.)
>> 
>> > No, it can't do that, unless the file is on a 'safe' filesystem
>> 
>> Grub, its configuration, and the kernel used to resume the system had
>> better be on a "safe" filesystem already (i.e. a separate, unmounted
>> before hibernation /boot).

> Currently, you don't need to do that.

Some people get away with it, but fundamentally it is broken to do so.
(The fact that the current software suspend implementations tell the
filesystems to sync to disk increases its chances of working.)  You are
accessing a filesystem that is in an unknown state.  Consider that the
user might make a change to grub.conf, but the kernel caches the write.
If the filesystem containing grub.conf is left mounted, the write might
never reach disk before the system is hibernated.  As a result, when
grub attempts to read it, it doesn't get the expected data.

>> >> This shouldn't be a significant problem in practice.
>> 
>> > I don't agree here.
>> 
>> I think hibernate-script already includes support for modifying grub's
>> configuration.

> Yes.  It does that _before_ the hibernation begins. ;-)

Either way, it doesn't make much difference.  Inside of
hibernate-script, you need logic like:

if /boot is not mounted: mount /boot
make change
umount /boot

If you do it from the "save kernel", you need logic like:
mount /dev/boot-device /boot  (no fstab on "save kernel", most likely)
make change
umount /boot.

[snip]

>> As far as I understand it, the swsusp resume path involves the boot
>> kernel loading the entire image from disk to available memory, then
>> shutting down all the devices, and copying the memory into place, and
>> then jumping to the original kernel, which reinitializes devices and
>> starts tasks running.  This isn't very different from what I was
>> proposing as the alternative anyway, except that: memory is copied once,
>> which is pretty fast, but means that only up to half of the total memory
>> can be saved.

> No that's not correct.  Actually, during the restore we _can_ load much more
> than 50% of RAM, everything needed for that is already in place. :-)

I suppose you do that by using more sophisticated logic to atomically
copy the pages to their final location after loading them from disk.  In
particular, I suppose you must order the page copies carefully to avoid
clobbering pages that have not yet been copied.  Seems reasonable.  In
that case, there is indeed probably no reason to not use that approach
for resuming.

[snip]

>> The whole reason to want to checkpoint filesystems 

Re: tickless timer support on non-x86

2007-06-01 Thread Valdis . Kletnieks
On Fri, 01 Jun 2007 18:20:02 EDT, Andrey Vul said:
> I want to use the tickless timer features in 2.6.21, but
> unfortunately, the dependency for tickless timers is
> GENERIC_CLOCKEVENTS (and tickless is only in arch/i386).
> 
> Any workarounds or solutions for non-x86 people?
> 
> My CPU is AMD Turion ML-34.

If that AMD is running a 32-bit kernel, the code already in the kernel
should work just fine (barring hardware issues like busticated HPET/timer
and so on).  If it doesn't work, we should debug that issue.

If you're running a 64-bit kernel, you might want to check here:

http://www.tglx.de/projects/hrtimers/

The 2.6.22-rc3-hrt2 patch series can be 'quilt'ed onto a -rc3-mmm1 tree
with about 20 minutes of work fixing a few trivial rejects (2 or 3 patches
already in -mm, and another 2 or 3 trivial rejectts...)


pgpdTp8qeUDMz.pgp
Description: PGP signature


Re: [RFC] tablet buttons driver for fujitsu siemens laptops

2007-06-01 Thread Stephen Hemminger
On Sat, 2 Jun 2007 02:59:33 +0200
Robert Gerlach <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I have written a driver for the tablet buttons of (some?) Fujitsu Siemens 
> tablet notebook. Can someone please review this (I'm a newbie here).
> 
> Other questions, where should the modification button (fn) handled (kernel- 
> or 
> userspace)? This button should work like stickykey's in gnome (for 
> one-finger-use). Currently, I have a small userspace daemon for this.
> 
> Some models doesn't have a brightness up and down, only a backlight on and 
> off 
> button. What event should reported there.
> 
> Thanks,
> Robert
> 
> 
> #ifdef DEBUG
> #  define debug(m, a...)  printk( KERN_DEBUG   MODULENAME ": " m "\n", 
> ##a)
> #else
> #  define debug(m, a...)  do {} while(0)
> #endif
> 
> #define info(m, a...) printk( KERN_INFOMODULENAME ": " m "\n", ##a)
> #define warn(m, a...) printk( KERN_WARNING MODULENAME ": " m "\n", ##a)
> #define error(m, a...)printk( KERN_ERR MODULENAME ": " m "\n", 
> ##a)
>
>

Please don't reinvent
pr_debug
pr_info
pr_warn,...


-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid.c, all kernel versions, problem with multi-luns

2007-06-01 Thread Patro, Sumant

Please check with the server provider if multi-lun is supported with the
adapter you are using.

--Sumant

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Reinaldo
Carvalho
Sent: Wednesday, May 30, 2007 4:10 PM
To: linux-kernel@vger.kernel.org
Subject: megaraid.c, all kernel versions, problem with multi-luns

Hi,

I have a Dell PowerEdge Expandable RAID controller, with a hardware
Raid-5 at Channel 01 running perfectly, and a nCipher Crypter at Channel
02.

This controller doesn't correctly detect devices (e.g. nCipher
Crypter) with multiples LUNs. Only one LUN is detected.

At another controller (e.g. Adaptec 79xx) two LUNs were detect. I
compiled 2.6.8, 2.6.18 and 2.6.21.3 to test megaraid driver and all
failed detecting two LUNs.

I think that this is a firmware problem, but i'd like have some
opinions.

I read some docs
(http://www.suse.de/~garloff/linux/scsi-scan/scsi-scanning.html,
http://www.ictp.trieste.it/~radionet/nuc1996/ref/howto-html/scsi-howto-2
.html)
and this problem doesn't seem to be simple.

Best regards,

More information with Dell PowerEdge Expandable RAID controller (LSI
Logic MegaRaid):

Attached devices:
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: PE/PVModel: 1x5 SCSI BP  Rev: 1.0
  Type:   ProcessorANSI  SCSI revision: 02
Host: scsi0 Channel: 01 Id: 00 Lun: 00
  Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
  Type:   ProcessorANSI  SCSI revision: 02
Host: scsi0 Channel: 02 Id: 00 Lun: 00
  Vendor: MegaRAID Model: LD 0 RAID5  279G Rev: 522A
  Type:   Direct-AccessANSI  SCSI revision: 02


14:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller
4 (rev 06)
Subsystem: Dell PowerEdge Expandable RAID Controller 4e/Di
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping+ SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- SERR- 

14:0e.0 0104: 1028:0013 (rev 06)


Information with Adaptec 79xx or others SCSI controllers:

Host: scsi0 Channel: 01 Id: 00 Lun: 00
  Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
  Type:   ProcessorANSI  SCSI revision: 02
Host: scsi0 Channel: 01 Id: 00 Lun: 01
  Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
  Type:   ProcessorANSI  SCSI revision: 02


--
Reinaldo Carvalho
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in the body of a message to [EMAIL PROTECTED] More majordomo
info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: LSI MegaRAID problems

2007-06-01 Thread Patro, Sumant

I suspect the errors are coming because of bad disk(s).  
Driver message indicates "reset" completed successfully.

--Sumant

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jules Colding
Sent: Wednesday, May 30, 2007 4:00 AM
To: linux-kernel
Subject: LSI MegaRAID problems

Hi,

I have a "LSI Logic MegaRAID SCSI 320-4x" adapter with an external raid5
array of 5 Seagate ST336754LW and XFS as fs on it. The device in
question is /dev/sdb and the box is a dual Opteron 252.

I've recently started to see this in the log almost whenever I touch the
filesystem:

May 30 12:22:56 omc-2 [ 1120.991356] megaraid: aborting-109150 cmd=28
 May 30 12:22:56 omc-2 [ 1120.991366] megaraid abort:
109150:68[255:129], fw owner May 30 12:22:56 omc-2 [ 1120.991371]
megaraid: aborting-109151 cmd=28  May 30 12:22:56 omc-2 [
1120.991374] megaraid abort: 109151:64[255:129], fw owner May 30
12:22:56 omc-2 [ 1120.991379] megaraid: 2 outstanding commands. Max wait
300 sec May 30 12:22:56 omc-2 [ 1120.991382] megaraid mbox: Wait for 2
commands to complete:300 May 30 12:23:01 omc-2 [ 1126.006002] megaraid
mbox: Wait for 2 commands to complete:295 May 30 12:23:06 omc-2 [
1131.020774] megaraid mbox: Wait for 2 commands to complete:290 May 30
12:23:11 omc-2 [ 1136.035548] megaraid mbox: Wait for 2 commands to
complete:285 May 30 12:23:16 omc-2 [ 1141.050325] megaraid mbox: Wait
for 2 commands to complete:280 May 30 12:23:21 omc-2 [ 1146.065098]
megaraid mbox: Wait for 2 commands to complete:275 May 30 12:23:26 omc-2
[ 1151.083870] megaraid mbox: Wait for 0 commands to complete:270 May 30
12:23:26 omc-2 [ 1151.083874] megaraid mbox: reset sequence completed
sucessfully May 30 12:23:26 omc-2 [ 1151.083979] sd 0:4:1:0: SCSI error:
return code = 0x00040001 May 30 12:23:26 omc-2 [ 1151.083983]
end_request: I/O error, dev sdb, sector 95601663 May 30 12:23:26 omc-2 [
1151.084124] sd 0:4:1:0: SCSI error: return code = 0x00040001 May 30
12:23:26 omc-2 [ 1151.084128] end_request: I/O error, dev sdb, sector
95601535 May 30 12:23:26 omc-2 [ 1151.084332] sd 0:4:1:0: SCSI error:
return code = 0x00040001 May 30 12:23:26 omc-2 [ 1151.084334]
end_request: I/O error, dev sdb, sector 95601535 May 30 12:23:27 omc-2 [
1152.725763] sd 0:4:1:0: SCSI error: return code = 0x00040001 May 30
12:23:27 omc-2 [ 1152.725768] end_request: I/O error, dev sdb, sector
71411967 May 30 12:23:27 omc-2 [ 1152.725816] sd 0:4:1:0: SCSI error:
return code = 0x00040001 May 30 12:23:27 omc-2 [ 1152.725818]
end_request: I/O error, dev sdb, sector 71411967 May 30 12:23:31 omc-2 [
1156.578149] sd 0:4:1:0: SCSI error: return code = 0x00040001 May 30
12:23:31 omc-2 [ 1156.578156] end_request: I/O error, dev sdb, sector
143351464
May 30 12:23:31 omc-2 [ 1156.578173] I/O error in filesystem ("sdb1")
meta-data dev sdb1 block 0x88b5e69   ("xlog_iodone") error 5 buf
count 10752
May 30 12:23:31 omc-2 [ 1156.578178] xfs_force_shutdown(sdb1,0x2) called
from line 960 of file fs/xfs/xfs_log.c.  Return address =
0x80398b56 May 30 12:23:31 omc-2 [ 1156.578204] Filesystem
"sdb1": Log I/O Error Detected.  Shutting down filesystem: sdb1 May 30
12:23:31 omc-2 [ 1156.578207] Please umount the filesystem, and rectify
the problem(s) May 30 12:23:31 omc-2 [ 1156.578251] sd 0:4:1:0: SCSI
error: return code = 0x00040001 May 30 12:23:31 omc-2 [ 1156.578253]
end_request: I/O error, dev sdb, sector 63 May 30 12:24:13 omc-2 [
1198.747915] xfs_force_shutdown(sdb1,0x1) called from line 424 of file
fs/xfs/xfs_rw.c.  Return address = 0x803afc2a


One of the drives in the array has been put offline after having seen
media errors. I'm waiting for a replacement but the recurring errors
worry me...

Any help/advises would be greatly appreciated.

Thanks a lot in advance,
  jules


PS: I'm running a distribution kernel, but having seen zero responses on
the gentoo list I dared to write here. The kernel is gentoo-sources
2.6.20-r8.
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in the body of a message to [EMAIL PROTECTED] More majordomo
info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SLUB: Return ZERO_SIZE_PTR for kmalloc(0)

2007-06-01 Thread Christoph Lameter
Instead of returning the smallest available object return ZERO_SIZE_PTR.

A ZERO_SIZE_PTR can be legitimately used as an object pointer as long
as it is not deferenced. The dereference of ZERO_SIZE_PTR causes a
distinctive fault. kfree will handle a ZERO_SIZE_PTR in the same way as 
NULL.

This enables functions to transparently use zero sized object. F.e.
if n = number of objects then the following code snippet will work
wether n = 0 or larger.

objects = kmalloc(n * sizeof(object), GFP_KERNEL);

for (i = 0; i < n; i++)
objects[i].x = y;

kfree(objects);

In addition to the warning for kmalloc(0) that is already there this patch 
will cause a failure if there is an attempt to access objects in the zero 
sized array.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |   29 ++---
 mm/slub.c|   10 +-
 2 files changed, 27 insertions(+), 12 deletions(-)

Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-06-01 18:08:32.0 -0700
+++ slub/include/linux/slub_def.h   2007-06-01 18:22:56.0 -0700
@@ -74,14 +74,17 @@ extern struct kmem_cache kmalloc_caches[
  */
 static inline int kmalloc_index(size_t size)
 {
+
/*
-* We should return 0 if size == 0 (which would result in the
-* kmalloc caller to get NULL) but we use the smallest object
-* here for legacy reasons. Just issue a warning so that
-* we can discover locations where we do 0 sized allocations.
+* The behavior for zero sized allocs changes. We no longer
+* allocate memory but return ZERO_SIZE_PTR.
+* WARN so that people can review and fix their code.
 */
WARN_ON_ONCE(size == 0);
 
+   if (!size)
+   return 0;
+
if (size > KMALLOC_MAX_SIZE)
return -1;
 
@@ -127,13 +130,25 @@ static inline struct kmem_cache *kmalloc
 #define SLUB_DMA 0
 #endif
 
+
+/*
+ * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
+ *
+ * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault.
+ *
+ * ZERO_SIZE_PTR can be passed to kfree() in the same way that NULL can.
+ * Both make kfree() a no-op.
+ */
+#define ZERO_SIZE_PTR ((void *)16)
+
+
 static inline void *kmalloc(size_t size, gfp_t flags)
 {
if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return ZERO_SIZE_PTR;
 
return kmem_cache_alloc(s, flags);
} else
@@ -146,7 +161,7 @@ static inline void *kzalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return ZERO_SIZE_PTR;
 
return kmem_cache_zalloc(s, flags);
} else
@@ -162,7 +177,7 @@ static inline void *kmalloc_node(size_t 
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return ZERO_SIZE_PTR;
 
return kmem_cache_alloc_node(s, flags, node);
} else
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 18:08:32.0 -0700
+++ slub/mm/slub.c  2007-06-01 18:22:56.0 -0700
@@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags
 
if (s)
return slab_alloc(s, flags, -1, __builtin_return_address(0));
-   return NULL;
+   return ZERO_SIZE_PTR;
 }
 EXPORT_SYMBOL(__kmalloc);
 
@@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t 
 
if (s)
return slab_alloc(s, flags, node, __builtin_return_address(0));
-   return NULL;
+   return ZERO_SIZE_PTR;
 }
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
@@ -2338,7 +2338,7 @@ void kfree(const void *x)
struct kmem_cache *s;
struct page *page;
 
-   if (!x)
+   if (x <= ZERO_SIZE_PTR)
return;
 
page = virt_to_head_page(x);
@@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return ZERO_SIZE_PTR;
 
return slab_alloc(s, gfpflags, -1, caller);
 }
@@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return ZERO_SIZE_PTR;
 
return slab_alloc(s, gfpflags, node, caller);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read 

Re: [patch 1/1] document Acked-by:

2007-06-01 Thread Valdis . Kletnieks
On Sat, 02 Jun 2007 00:10:46 +0200, Krzysztof Halasa said:
> "Scott Preece" <[EMAIL PROTECTED]> writes:
> 
> > This is a question worth answering - is it rude to ack/nak a patch if
> > you're not a maintainer or otherwise known-to-be-trusted, or is it OK
> > for anyone to express an opinion? Andrew's patch text seems to imply
> > that it's generally OK.
> 
> Every pair of eyes (or a single one) looking at the patch in question
> is a good thing. I can't imagine why would one want to look at the
> code if he/she can't ack or nak or otherwise comment it.

I'd be the *first* to admit that my kernel-foo isn't perfect, and sometimes I'm
right and sometimes I'm wrong when I review somebody else's code.  I certainly
*hope* that nobody's taking my review as anything more authoritative than "an
actual maintainer might want to look at this".

On the other hand, we don't need a Foo-By: tag for "or otherwise comment".

Phrased differently, if I haven't stuck a "Signed-off-by:" or "Tested-By:"
on it, I'm by default only commenting.  The code submitter can decide I'm right
and fix and resubmit, the maintainer can decide I'm right and toss a NAK. Or
they can both decide I'm full of it and hit the Delete key..


pgpaLFlsNXaYJ.pgp
Description: PGP signature


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, [EMAIL PROTECTED] wrote:

> On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said:
> 
> > #define BADPTR ((void *)16)
> 
> > I bet you'd find *more* problems that way than by returning NULL, and 
> > you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".
> 
> Hmm.. this looks like a good contender for "first usage of #ifndef 
> CONFIG_STABLE"

The warning? Sure but the BADPTR (or ZERO_SIZE_PTR now) will stay. 
It is definitely an error to deference an object that has no size.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread Eric W. Biederman
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

> H. Peter Anvin wrote:
>> It would have to, because of the way code32_start is defined to work.
>> We don't get control again after its use as a hook.
>>   
>
> Who uses that hook?  The impression I get is that the execution
> environment for jumping to that pointer is not very well defined at present.

loadlin.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel's response Linux/MTRR/8GB Memory Support / Why doesn't the kernel realize the BIOS has problems and re-map appropriately?

2007-06-01 Thread Jesse Barnes
On Friday, June 1, 2007 6:05:39 Venki Pallipadi wrote:
> On Fri, Jun 01, 2007 at 02:41:57PM -0700, Jesse Barnes wrote:
> > On Friday, June 1, 2007 2:19:43 Andi Kleen wrote:
> > > And normally the MTRRs win, don't they (if I remember the table
> > > correctly) So if the MTRR says UC and PAT disagrees it might not
> > > actually help
> >
> > I just checked, yes the MTRRs win for UC types.  But it sounds like the
> > cases we're talking about are actually situations where there's no MTRR
> > coverage, so the default type is used.  The manual doesn't specifically
> > call out how memory using the default type interacts with PAT, but it may
> > well be that it stays uncached if the default type is uncached.  Again
> > that argues for fixing the MTRR mapping problem in some way.
>
> I feel, having a silent/transparent workaround is not a good idea. With
> that chances are BIOS bug will go unnoticed (having an error message in
> dmesg may not get noticed either). Probably we should just panic at boot
> with a
> detailed message about the e820 mtrr discrepancy (which can be logged as
> a BUG to BIOS provider) and suggest a temporary workaround of "mem=___".

That might be best, short of actually fixing the MTRRs...

Jesse


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel's response Linux/MTRR/8GB Memory Support / Why doesn't the kernel realize the BIOS has problems and re-map appropriately?

2007-06-01 Thread Venki Pallipadi
On Fri, Jun 01, 2007 at 02:41:57PM -0700, Jesse Barnes wrote:
> On Friday, June 1, 2007 2:19:43 Andi Kleen wrote:
> > And normally the MTRRs win, don't they (if I remember the table correctly)
> > So if the MTRR says UC and PAT disagrees it might not actually help
> 
> I just checked, yes the MTRRs win for UC types.  But it sounds like the cases 
> we're talking about are actually situations where there's no MTRR coverage, 
> so the default type is used.  The manual doesn't specifically call out how 
> memory using the default type interacts with PAT, but it may well be that it 
> stays uncached if the default type is uncached.  Again that argues for fixing 
> the MTRR mapping problem in some way.
> 

I feel, having a silent/transparent workaround is not a good idea. With that
chances are BIOS bug will go unnoticed (having an error message in dmesg may not
get noticed either). Probably we should just panic at boot with a
detailed message about the e820 mtrr discrepancy (which can be logged as
a BUG to BIOS provider) and suggest a temporary workaround of "mem=___".

Thanks,
Venki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> It would have to, because of the way code32_start is defined to work.
> We don't get control again after its use as a hook.
>   

Who uses that hook?  The impression I get is that the execution
environment for jumping to that pointer is not very well defined at present.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Valdis . Kletnieks
On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said:

>   #define BADPTR ((void *)16)

> I bet you'd find *more* problems that way than by returning NULL, and 
> you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".

Hmm.. this looks like a good contender for "first usage of #ifndef 
CONFIG_STABLE"

:)


pgp5hRgrE3k4B.pgp
Description: PGP signature


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
>
> On Fri, 1 Jun 2007, Andrew Morton wrote:
> 
> > I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,
> 
> The trouble with the WARN_ON is that it triggers even for code that is 
> okay like noted by Jeremy.

Yes. Sometimes it's just more natural to have

ptr = kmalloc(size);
.. use it ..
free(ptr);

and if a *degenerate* case of size=0 happens, who cares? It should just 
work, as long as we (obviously) don't actually try to access the pointer.

So I don't much like the WARN_ON(size == 0). I think it potentially just 
causes people to write around it, and quite possibly causes the callers to 
write code that is not at all more readable or maintainable!

That's why I'd much rather return BADPTR instead: we'll get an oops for 
buggy code, but we don't penalize the "natural" and good code! So once you 
return BADPTR, there really isn't any good reason for the WARN_ON.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> Just to clarify:
> 
> In my proposal is that we have bzImage structured something like (where
> "|" is concatenation, and "()" is a  blob containing stuff):
> 
> bzImage = 16-bit setup | ELF file (decompressor, compressed kernel)
>   
> 
> With the intention that 32-bit only bootloader always loads the ELF file
> as-is and just runs it.  Aside from the fact that its an ELF file,
> there's nothing else about it which really concerns the bootloader,
> since once its loaded and running, it does all its own setup.  Its not
> clear that code32_start really means much in this case, though I guess
> it could point to the same place as the ELF file's entrypoint.
> 

It would have to, because of the way code32_start is defined to work.
We don't get control again after its use as a hook.

> Whereas you're proposing:
> 
> bzImage = 16-bit setup | decompressor | compressed kernel (ELF file)
>   
> 
> where code32_start points to the decompressor, and some other pointer
> points to the compressed kernel data.  And your intent is that an
> external bootloader could also interpret the compressed kernel image,
> and identify what format its in and handle it appropriately from
> outside.  Right?

Correct.

> In both cases, it seems to me that we need an extra boot_param pointer
> to point to the offset of the payload blob (ELF file in my case,
> compressed kernel in yours).  Yes?

Indeed.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] tablet buttons driver for fujitsu siemens laptops

2007-06-01 Thread Robert Gerlach
Hi,

I have written a driver for the tablet buttons of (some?) Fujitsu Siemens 
tablet notebook. Can someone please review this (I'm a newbie here).

Other questions, where should the modification button (fn) handled (kernel- or 
userspace)? This button should work like stickykey's in gnome (for 
one-finger-use). Currently, I have a small userspace daemon for this.

Some models doesn't have a brightness up and down, only a backlight on and off 
button. What event should reported there.

Thanks,
Robert

---
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MODULENAME "fsc_btns"
#define MODULEDESC "Fujitsu Siemens Application Panel Driver for T-Series 
Lifebooks"
#define MODULEVERS "0.30a"

struct keymap_entry {   /* keymap_entry */
unsigned int mask;
unsigned int code;
};

/* TODO: no brightness/backlight toggle ? */
#define KEY_BACKLIGHTTOGGLE 241

static struct keymap_entry keymap_t4010[] = {
{ 0x0010, KEY_SCROLLDOWN },
{ 0x0020, KEY_SCROLLUP },
{ 0x0040, KEY_DIRECTION },
{ 0x0080, KEY_FN },
{ 0x0100, KEY_BRIGHTNESSUP },
{ 0x0200, KEY_BRIGHTNESSDOWN },
{ 0x0400, KEY_BACKLIGHTTOGGLE },
{ 0x8000, KEY_MENU },
{ 0x, 0},
};

#define default_keymap keymap_t4010

static struct fscbtns_t {   /* fscbtns_t */
unsigned int interrupt;
unsigned int address;

struct keymap_entry *keymap;
int display_direction;

struct platform_device *pdev;

#ifdef CONFIG_ACPI
struct acpi_device *adev;
#endif

struct input_dev *idev;
char idev_phys[16];
} fscbtns = {

#ifndef CONFIG_ACPI
/* XXX: is this always true ??? */
.interrupt = 5,
.address = 0xfd70,
#endif

.keymap = default_keymap
};

static unsigned int repeat_rate = 16;
static unsigned int repeat_delay = 500;


#ifdef DEBUG
#  define debug(m, a...)printk( KERN_DEBUG   MODULENAME ": " m "\n", 
##a)
#else
#  define debug(m, a...)do {} while(0)
#endif

#define info(m, a...)   printk( KERN_INFOMODULENAME ": " m "\n", ##a)
#define warn(m, a...)   printk( KERN_WARNING MODULENAME ": " m "\n", ##a)
#define error(m, a...)  printk( KERN_ERR MODULENAME ": " m "\n", ##a)


/*** INPUT ***/

static int input_fscbtns_setup(void)
{
struct keymap_entry *key;
struct input_dev *idev;
int error;

snprintf(fscbtns.idev_phys, sizeof(fscbtns.idev_phys),
"%s/input0", 
#ifdef CONFIG_ACPI
acpi_device_hid(fscbtns.adev)
#else
MODULENAME
#endif
);

fscbtns.idev = idev = input_allocate_device();
if(!idev)
return -ENOMEM;

idev->phys = fscbtns.idev_phys;
idev->name = MODULEDESC;
idev->id.bustype = BUS_HOST;
idev->id.vendor  = 0x1734;  /* "Fujitsu Siemens Computer GmbH" from 
pci.ids */
idev->id.product = 0x0001;
idev->id.version = 0x0101;
idev->cdev.dev = &(fscbtns.pdev->dev);

set_bit(EV_REP, idev->evbit);
set_bit(EV_KEY, idev->evbit);
for(key = fscbtns.keymap; key->mask; key++)
set_bit(key->code, idev->keybit);

set_bit(EV_SW, idev->evbit);
set_bit(SW_TABLET_MODE, idev->swbit);

error = input_register_device(idev);
if(error) {
input_free_device(idev);
return error;
}

return 0;
}

static void input_fscbtns_remove(void)
{
input_unregister_device(fscbtns.idev);
}

static void fscbtns_set_repeat_rate(int delay, int period)
{
fscbtns.idev->rep[REP_DELAY]  = delay;
fscbtns.idev->rep[REP_PERIOD] = period;
}

static void fscbtns_event(void)
{
u8 i;
unsigned int keymask;
unsigned int changed;
static unsigned int prev_keymask = 0;
struct keymap_entry *key;

outb(0xdd, fscbtns.address);
i = inb(fscbtns.address+4) ^ 0xff;
if(i != fscbtns.display_direction) {
debug("display_direction change (%d)", i);
fscbtns.display_direction = i;
input_report_switch(fscbtns.idev, SW_TABLET_MODE, i);
}

outb(0xde, fscbtns.address);
keymask = inb(fscbtns.address+4) ^ 0xff;
outb(0xdf, fscbtns.address);
keymask |= (inb(fscbtns.address+4) ^ 0xff) << 8;

changed = keymask ^ prev_keymask;
debug("keymask: 0x%04x (0x%04x)", keymask, changed);

if(changed) {
for(key = fscbtns.keymap; key->mask; key++)
if(key->mask == changed) {
debug("send %d %s", key->code, (keymask & 
changed ? "pressed" : "released"));
input_report_key(fscbtns.idev, key->code, 

Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> In some cases, sure, and we should remove the warning at some stage.
>
> But it has exposed a couple of bugs and a few weird things.
>   

We just need to scatter better bug powder into the corners of the
allocations.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> That's a method of defining the memory space.
>
> I think the current definition is suitable for entering at the 16-bit
> entry point.

I agree.  I'm going to assume that if we're booting all the way up from
real mode, we're either on real hardware, or some environment that's
trying hard to be real hardware.  In that case, there won't necessarily
be much need for the subarch-specific data, but even if there is, it can
be way out of the real-mode address space, and therefore be a non-issue
for 16-bit code.

Just to clarify:

In my proposal is that we have bzImage structured something like (where
"|" is concatenation, and "()" is a  blob containing stuff):

bzImage = 16-bit setup | ELF file (decompressor, compressed kernel)
  

With the intention that 32-bit only bootloader always loads the ELF file
as-is and just runs it.  Aside from the fact that its an ELF file,
there's nothing else about it which really concerns the bootloader,
since once its loaded and running, it does all its own setup.  Its not
clear that code32_start really means much in this case, though I guess
it could point to the same place as the ELF file's entrypoint.

Whereas you're proposing:

bzImage = 16-bit setup | decompressor | compressed kernel (ELF file)
  

where code32_start points to the decompressor, and some other pointer
points to the compressed kernel data.  And your intent is that an
external bootloader could also interpret the compressed kernel image,
and identify what format its in and handle it appropriately from
outside.  Right?

In both cases, it seems to me that we need an extra boot_param pointer
to point to the offset of the payload blob (ELF file in my case,
compressed kernel in yours).  Yes?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] document Acked-by:

2007-06-01 Thread Krzysztof Halasa
"John Anthony Kazos Jr." <[EMAIL PROTECTED]> writes:

> "Acked-by:" does not mean "I like this" but rather "I approve of this".

I'd say it means "I acknowledge it". If you want to express
approval, why not use some sort of "Approved-by"?

> If I put "Acked-by: John..." on a patch of any kind, even trivial, it 
> would look incredibly stupid, because I'm just some guy messing around 
> with the kernel. A tactful response to me doing that from any actual 
> kernel bigwig would be, "I appreciate your enthusiasm, but you are not 
> part of the kernel patch flow." Similarly, a tactful response to me 
> NACKing a patch would be, "I appreciate your concern, but you are in no 
> position to remove a patch from the stream. Your comments will be 
> considered and implemented or countered by an actual maintainer."
>
> This is appropriate.

You seem to know these things very well.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Andrew Morton
On Fri, 01 Jun 2007 17:43:46 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > As I said, it's specific to the kmalloc(0) problem, and we're fixing that
> > by other means anyway.
> >   
> 
> I guess I'm not getting much traction in my campaign for equal rights
> for zero-sized allocations. They're perfectly reasonable things to have,
> if that's the way the code falls out. The warning is just prejudice.
> 

In some cases, sure, and we should remove the warning at some stage.

But it has exposed a couple of bugs and a few weird things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] Replacing the /proc//exe symlink code

2007-06-01 Thread Matt Helsley
This patch avoids holding the mmap semaphore while walking VMAs in response to
programs which read or follow the /proc//exe symlink. This also allows
us to merge mmu and nommu proc_exe_link() functions. The costs are holding the
task lock, a separate reference to the executable file stored in the task
struct, and increased code in fork, exec, and exit paths.

Changes:
Clear exe_file field in exit path
Use task_lock() to protect exe_file between write and read paths

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---

 fs/exec.c |7 +--
 fs/proc/base.c|   21 +
 fs/proc/internal.h|1 -
 fs/proc/task_mmu.c|   34 --
 fs/proc/task_nommu.c  |   34 --
 include/linux/sched.h |3 ++-
 kernel/exit.c |6 ++
 kernel/fork.c |9 -
 8 files changed, 42 insertions(+), 73 deletions(-)

Index: linux-2.6.22-rc2-mm1/include/linux/sched.h
===
--- linux-2.6.22-rc2-mm1.orig/include/linux/sched.h
+++ linux-2.6.22-rc2-mm1/include/linux/sched.h
@@ -988,10 +988,11 @@ struct task_struct {
int oomkilladj; /* OOM kill score adjustment (bit shift). */
char comm[TASK_COMM_LEN]; /* executable name excluding path
 - access with [gs]et_task_comm (which lock
   it with task_lock())
 - initialized normally by flush_old_exec */
+   struct file *exe_file;
 /* file system info */
int link_count, total_link_count;
 #ifdef CONFIG_SYSVIPC
 /* ipc stuff */
struct sysv_sem sysvsem;
@@ -1549,11 +1550,11 @@ static inline int thread_group_empty(str
 
 #define delay_group_leader(p) \
(thread_group_leader(p) && !thread_group_empty(p))
 
 /*
- * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
+ * Protects ->fs, ->files, ->mm, ->group_info, ->comm, ->exe_file, keyring
  * subscriptions and synchronises with wait4().  Also used in procfs.  Also
  * pins the final release of task.io_context.  Also protects ->cpuset.
  *
  * Nests both inside and outside of read_lock(_lock).
  * It must not be nested with write_lock_irq(_lock),
Index: linux-2.6.22-rc2-mm1/fs/exec.c
===
--- linux-2.6.22-rc2-mm1.orig/fs/exec.c
+++ linux-2.6.22-rc2-mm1/fs/exec.c
@@ -1106,12 +1106,15 @@ int search_binary_handler(struct linux_b
read_unlock(_lock);
retval = fn(bprm, regs);
if (retval >= 0) {
put_binfmt(fmt);
allow_write_access(bprm->file);
-   if (bprm->file)
-   fput(bprm->file);
+   task_lock(current);
+   if (current->exe_file)
+   fput(current->exe_file);
+   current->exe_file = bprm->file;
+   task_unlock(current);
bprm->file = NULL;
current->did_exec = 1;
proc_exec_connector(current);
return retval;
}
Index: linux-2.6.22-rc2-mm1/fs/proc/base.c
===
--- linux-2.6.22-rc2-mm1.orig/fs/proc/base.c
+++ linux-2.6.22-rc2-mm1/fs/proc/base.c
@@ -951,10 +951,31 @@ const struct file_operations proc_pid_sc
.write  = sched_write,
.llseek = seq_lseek,
.release= seq_release,
 };
 
+static int proc_exe_link(struct inode *inode, struct dentry **dentry,
+struct vfsmount **mnt)
+{
+   int error = -ENOENT;
+   struct task_struct *task;
+
+   task = get_proc_task(inode);
+   if (!task)
+   return error;
+   task_lock(task);
+   if (!task->exe_file)
+   goto out;
+   *mnt = mntget(task->exe_file->f_path.mnt);
+   *dentry = dget(task->exe_file->f_path.dentry);
+   error = 0;
+out:
+   task_unlock(task);
+   put_task_struct(task);
+   return error;
+}
+
 static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct inode *inode = dentry->d_inode;
int error = -EACCES;
 
Index: linux-2.6.22-rc2-mm1/kernel/exit.c
===
--- linux-2.6.22-rc2-mm1.orig/kernel/exit.c
+++ linux-2.6.22-rc2-mm1/kernel/exit.c
@@ -924,10 +924,16 @@ fastcall void do_exit(long code)
if (unlikely(tsk->audit_context))
audit_free(tsk);
 
taskstats_exit(tsk, group_dead);
 
+   task_lock(tsk);
+   if (tsk->exe_file) {
+   

Re: [PATCH 2.6.21] cramfs: add cramfs Linear XIP

2007-06-01 Thread Jared Hulbert

> The current xip stack relies on having struct page behind the memory
> segment. This causes few impact on memory management, but occupies some
> more memory. The cramfs patch chose to modify copy on write in order to
> deal with vmas that don't have struct page behind.
> So far, Hugh and Linus have shown strong opposition against copy on
> write with no struct page behind. If this implementation is acceptable
> to the them, it seems preferable to me over wasting memory. The xip
> stack should be modified to use this vma flag in that case.

I would rather not :P

We can copy on write without a struct page behind the source today, no?


The existing COW techniques fail on some corner cases.  I'm not up to
speed on the vm code.  I'll try to look into this a little more but it
might be useful if I knew what questions I need to answer so you vm
experts can understand the problem.

Let me give one example.  If you try to debug an XIP application
without this patch, bad things happen.  XIP in this sense is synomous
with executing directly out of Flash and you can't just change the
physical memory to redirect it to the debugger so easily in Flash.
Now I don't know exactly why yet some, but not all applications,
trigger this added vm hack.  I'm not sure exactly why it would get
triggered under normal circumstances.  Why would a read-only map get
written to?


What is insufficient for the XIP code with the current COW?


So I think the problem may have something to do with the nature of the
memory in question.   We are using Flash that is ioremap()'ed to a
usable virtual address.  And yet we go on to try to use it as if it
were plain old system memory, like any RAM page.  We need it to be
presented as any other memory page only physically read-only.
ioremap() seems to be a hacky way of accomplishing that, but I can't
think of better way.  In ARM we even had to invent ioremap_cached() to
improve performance.  Thoughts?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
> So for *both* of the above reasons, it's actually stupid to return NULL 
> for a zero-sized allocation. It would be much better to return another 
> pointer that will trap on access. A good candidate might be to return
>
>   #define BADPTR ((void *)16)
>   

I think this is a good idea in principle, but I wonder if there's any
code which assumes that kmalloc(x) != kmalloc(x) for all non-NULL
returns from kmalloc.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> As I said, it's specific to the kmalloc(0) problem, and we're fixing that
> by other means anyway.
>   

I guess I'm not getting much traction in my campaign for equal rights
for zero-sized allocations. They're perfectly reasonable things to have,
if that's the way the code falls out. The warning is just prejudice.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Replacing the /proc//exe symlink code

2007-06-01 Thread Matt Helsley
On Fri, 2007-06-01 at 17:31 -0500, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([EMAIL PROTECTED]):
> > On Wed, 2007-05-30 at 13:09 -0500, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley ([EMAIL PROTECTED]):
> > > > This patch avoids holding the mmap semaphore while walking VMAs in 
> > > > response to
> > > > programs which read or follow the /proc//exe symlink. This 
> > > > also allows us
> > > > to merge mmu and nommu proc_exe_link() functions. The costs are holding 
> > > > a separate
> > > > reference to the executable file stored in the task struct and 
> > > > increased code in
> > > > fork, exec, and exit paths.
> > > > 
> > > > Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
> > > > ---
> > > > 
> > > > Compiled and passed simple tests for regressions when patched against a 
> > > > 2.6.20
> > > > and 2.6.22-rc2-mm1 kernel.
> > > > 
> > > >  fs/exec.c |5 +++--
> > > >  fs/proc/base.c|   20 
> > > >  fs/proc/internal.h|1 -
> > > >  fs/proc/task_mmu.c|   34 --
> > > >  fs/proc/task_nommu.c  |   34 --
> > > >  include/linux/sched.h |1 +
> > > >  kernel/exit.c |2 ++
> > > >  kernel/fork.c |   10 +-
> > > >  8 files changed, 35 insertions(+), 72 deletions(-)
> > 
> > 
> > 
> > > > Index: linux-2.6.22-rc2-mm1/kernel/exit.c
> > > > ===
> > > > --- linux-2.6.22-rc2-mm1.orig/kernel/exit.c
> > > > +++ linux-2.6.22-rc2-mm1/kernel/exit.c
> > > > @@ -924,10 +924,12 @@ fastcall void do_exit(long code)
> > > > if (unlikely(tsk->audit_context))
> > > > audit_free(tsk);
> > > >  
> > > > taskstats_exit(tsk, group_dead);
> > > >  
> > > > +   if (tsk->exe_file)
> > > > +   fput(tsk->exe_file);
> > > 
> > > Hi,
> > > 
> > > just taking a cursory look so I may be missing something, but doesn't
> > > this leave the possibility that right here, with tsk->exe_file being
> > > put, another task would try to look at tsk's /proc/tsk->pid/exe?
> > > 
> > > thanks,
> > > -serge
> > >
> > >   exit_mm(tsk);
> > >
> >   
> > 
> > 
> > Good question. To be precise, I think the problem doesn't exist here but
> > after the exit_mm() because there's a VMA that holds a reference to the
> > same file.
> > 
> > The existing code appears to solve the race between
> > reading/following /proc/tsk->pid/exe and exit_mm() in the exit path by
> > returning -ENOENT for the case where there is no executable VMA with a
> > reference to the file backing it.
> > 
> > So I need to put NULL in the exe_file field and adjust the return value
> > to be -ENOENT instead of -ENOSYS.
> > 
> > Thanks for the review!
> 
> Ok, I had to think about this a bit, but so you're saying you set it to
> NULL in do_exit(), and anyone who has just dereferenced tsk->exe_file
> before the fput in do_exit() should be ok because the vma hasn't yet
> been put?

Yes

> Should the 
>   if (!task->exe_file)
>   goto out;
>   *mnt = mntget(task->exe_file->f_path.mnt);
>   *dentry = dget(task->exe_file->f_path.dentry);
> 
> also go inside an preempt_disable to prevent sleeping and maybe become

It needs some form of protection from concurrent access between write
and read. write happens during exec, fork, and exit. In the fork case
however it's not necessary because the new task isn't visible in /proc
yet and the value of current doesn't change anyway.

>   exef = task->exe_file;  /* to prevent task->exe_file being set
>   to NULL before we've grabbed the path */
>   if (!exef)
>   goto out;
>   get_file(exef);  /* to prevent the mm somehow being put before
>   we've grabbed the path? */
>   *mnt = mntget(task->exe_file->f_path.mnt);
>   *dentry = dget(task->exe_file->f_path.dentry);
>   put_file(exef);  /* ? */
> 
> ?
> 
> Or am I being overly paranoid?

No, you're right. In fact, because readlink() can be initiated by
another task I think disabling preemption really only fixes the problem
on uniprocessor systems. So I was thinking of using task_lock() to
protect exe_file.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> Well, I think we can safely say that its something that's only
> meaningful in 32/64-bit mode, so we aren't constrained by the real-mode
> address space.
> 
> One of my goals in this project is to make the boot image, in some way,
> completely define which memory it needs it get started.  That means that
> the boot loader can either place things knowing they'll avoid the boot
> image and/or definitively know that the image is unloadable.
> 
> So I don't think its strictly necessary to pre-define what memory this
> object can use, since I think it can be safely determined dynamically.
> 

That's a method of defining the memory space.

I think the current definition is suitable for entering at the 16-bit
entry point.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> Well there are architectural problems. We determine the power of two slab 
> at compile time. The object size information is currently not available in 
> the binary :=).
>   

That only applies to allocations with constant sizes. One presumes
nobody is explicitly doing kmalloc(0), so we can use a separate
runtime-computed-size path to do poisoning. (Which is probably 90% of
the problem, since people who kmalloc(sizeof(struct foo)) will generally
stay within bounds without too much effort.)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH -mm 1/2] PM: Introduce hibernation and suspend notifiers

2007-06-01 Thread Rafael J. Wysocki
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume() routines.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 Documentation/power/notifiers.txt |   50 ++
 include/linux/notifier.h  |6 
 include/linux/suspend.h   |   37 +---
 kernel/power/disk.c   |   16 +---
 kernel/power/main.c   |9 ++
 kernel/power/power.h  |   10 +++
 kernel/power/user.c   |   11 ++--
 7 files changed, 129 insertions(+), 10 deletions(-)

Index: linux-2.6.22-rc3/include/linux/suspend.h
===
--- linux-2.6.22-rc3.orig/include/linux/suspend.h   2007-05-31 
00:00:38.0 +0200
+++ linux-2.6.22-rc3/include/linux/suspend.h2007-06-01 22:55:01.0 
+0200
@@ -54,7 +54,8 @@ struct hibernation_ops {
void (*restore_cleanup)(void);
 };
 
-#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND)
+#ifdef CONFIG_PM
+#ifdef CONFIG_SOFTWARE_SUSPEND
 /* kernel/power/snapshot.c */
 extern void __register_nosave_region(unsigned long b, unsigned long e, int km);
 static inline void register_nosave_region(unsigned long b, unsigned long e)
@@ -72,7 +73,7 @@ extern unsigned long get_safe_page(gfp_t
 
 extern void hibernation_set_ops(struct hibernation_ops *ops);
 extern int hibernate(void);
-#else
+#else /* CONFIG_SOFTWARE_SUSPEND */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline void register_nosave_region_late(unsigned long b, unsigned long 
e) {}
 static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
@@ -81,7 +82,7 @@ static inline void swsusp_unset_page_fre
 
 static inline void hibernation_set_ops(struct hibernation_ops *ops) {}
 static inline int hibernate(void) { return -ENOSYS; }
-#endif /* defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND) */
+#endif /* CONFIG_SOFTWARE_SUSPEND */
 
 void save_processor_state(void);
 void restore_processor_state(void);
@@ -89,4 +90,34 @@ struct saved_context;
 void __save_processor_state(struct saved_context *ctxt);
 void __restore_processor_state(struct saved_context *ctxt);
 
+/* kernel/power/main.c */
+extern struct blocking_notifier_head pm_chain_head;
+
+static inline int register_pm_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_register(_chain_head, nb);
+}
+
+static inline int unregister_pm_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_unregister(_chain_head, nb);
+}
+
+#define pm_notifier(fn, pri) { \
+   static struct notifier_block fn##_nb =  \
+   { .notifier_call = fn, .priority = pri };   \
+   register_pm_notifier(##_nb); \
+}
+#else /* CONFIG_PM */
+static inline int register_pm_notifier(struct notifier_block *nb) {
+   return 0;
+}
+
+static inline int unregister_pm_notifier(struct notifier_block *nb) {
+   return 0;
+}
+
+#define pm_notifier(fn, pri)   do { (void)(fn); } while (0)
+#endif /* CONFIG_PM */
+
 #endif /* _LINUX_SWSUSP_H */
Index: linux-2.6.22-rc3/kernel/power/power.h
===
--- linux-2.6.22-rc3.orig/kernel/power/power.h  2007-05-31 00:00:38.0 
+0200
+++ linux-2.6.22-rc3/kernel/power/power.h   2007-06-01 22:55:01.0 
+0200
@@ -173,5 +173,15 @@ extern void swsusp_close(void);
 extern int suspend_enter(suspend_state_t state);
 
 struct timeval;
+/* kernel/power/swsusp.c */
 extern void swsusp_show_speed(struct timeval *, struct timeval *,
unsigned int, char *);
+
+/* kernel/power/main.c */
+extern struct blocking_notifier_head pm_chain_head;
+
+static inline int pm_notifier_call_chain(unsigned long val)
+{
+   return (blocking_notifier_call_chain(_chain_head, val, NULL)
+   == NOTIFY_BAD) ? -EINVAL : 0;
+}
Index: linux-2.6.22-rc3/include/linux/notifier.h
===
--- linux-2.6.22-rc3.orig/include/linux/notifier.h  2007-05-31 
00:00:38.0 +0200
+++ linux-2.6.22-rc3/include/linux/notifier.h   2007-06-01 23:01:38.0 
+0200
@@ -209,5 +209,11 @@ extern int __srcu_notifier_call_chain(st
 #define CPU_DOWN_FAILED_FROZEN (CPU_DOWN_FAILED | CPU_TASKS_FROZEN)
 #define CPU_DEAD_FROZEN(CPU_DEAD | CPU_TASKS_FROZEN)
 
+/* Hibernation and suspend events */
+#define PM_HIBERNATION_PREPARE 0x0001 /* Going to hibernate */
+#define PM_POST_HIBERNATION0x0002 /* Hibernation finished */
+#define PM_SUSPEND_PREPARE 0x0003 /* Going to suspend the system */
+#define PM_POST_SUSPEND0x0004 

[RFC][PATCH -mm 0/2] PM: Hibernation and suspend notifiers (rev. 2)

2007-06-01 Thread Rafael J. Wysocki
Hi,

This is the second revision of the patches that introduce hibernation and
suspend notifiers.

Generally, I have followed the Alan's suggestion to use a blocking notifier
chain and the Pavel's suggestion to limit the number of events.  Also, I've
dropped the patch to disable the requesting of firmware.

Comments welcome.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH -mm 2/2] PM: Disable usermode helper before hibernation and suspend

2007-06-01 Thread Rafael J. Wysocki
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Use a hibernation and suspend notifier to disable the user mode helper before
a hibernation/suspend and enable it after the operation.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/kmod.c |   33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc3/kernel/kmod.c
===
--- linux-2.6.22-rc3.orig/kernel/kmod.c 2007-05-31 00:00:37.0 +0200
+++ linux-2.6.22-rc3/kernel/kmod.c  2007-06-02 00:01:47.0 +0200
@@ -33,6 +33,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 extern int max_threads;
@@ -46,6 +48,14 @@ static struct workqueue_struct *khelper_
 */
 char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
 
+/*
+ * If set, both call_usermodehelper_keys() and call_usermodehelper_pipe() exit
+ * immediately returning -EBUSY.  Used for preventing user land processes from
+ * being created after the user land has been frozen during a system-wide
+ * hibernation or suspend operation.
+ */
+static int usermodehelper_disabled;
+
 /**
  * request_module - try to load a kernel module
  * @fmt: printf style format string for the name of the module
@@ -251,6 +261,24 @@ static void __call_usermodehelper(struct
complete(sub_info->complete);
 }
 
+static int usermodehelper_pm_callback(struct notifier_block *nfb,
+   unsigned long action,
+   void *ignored)
+{
+   switch (action) {
+   case PM_HIBERNATION_PREPARE:
+   case PM_SUSPEND_PREPARE:
+   usermodehelper_disabled = 1;
+   return NOTIFY_OK;
+   case PM_POST_HIBERNATION:
+   case PM_POST_SUSPEND:
+   usermodehelper_disabled = 0;
+   return NOTIFY_OK;
+   }
+
+   return NOTIFY_DONE;
+}
+
 /**
  * call_usermodehelper_keys - start a usermode application
  * @path: pathname for the application
@@ -276,7 +304,7 @@ int call_usermodehelper_keys(char *path,
struct subprocess_info *sub_info;
int retval;
 
-   if (!khelper_wq)
+   if (!khelper_wq || usermodehelper_disabled)
return -EBUSY;
 
if (path[0] == '\0')
@@ -319,7 +347,7 @@ int call_usermodehelper_pipe(char *path,
};
struct file *f;
 
-   if (!khelper_wq)
+   if (!khelper_wq || usermodehelper_disabled)
return -EBUSY;
 
if (path[0] == '\0')
@@ -347,4 +375,5 @@ void __init usermodehelper_init(void)
 {
khelper_wq = create_singlethread_workqueue("khelper");
BUG_ON(!khelper_wq);
+   pm_notifier(usermodehelper_pm_callback, 0);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] document Acked-by:

2007-06-01 Thread John Anthony Kazos Jr.
> > I think the comment had to do with the concept that ACK/NAK implies
> > authority.  If you're not the maintainer, it's rude to imply that you
> > are.  Obvious, test reports (good or bad!) are always welcome.
> 
> Well, I understand a test is a different thing, an experiment to
> see if the patch works or not, while ack/etc. is just opinion
> of someone who reads the patch without actually using it.
> 
> I think ack/etc doesn't, in any way, imply being the maintainer,
> though it imply that the "acker" has actually read the code,
> understands it, and believes it's correct (or not, and why).
> 
> If we want to differentiate between "authoritative" and
> "non-authoritative" opinions (and the name and email address
> aren't enough) then I think we need to state that explicite
> (perhaps something like "Acked-by: FIRST M. LAST , XXX
> subsystem maintainer" would suffice).

"Acked-by:" does not mean "I like this" but rather "I approve of this". 
Someone who is not a maintainer is encouraged to speak of like and 
dislike, in great detail, but has no position at all to approve or 
disapprove of it going in.

If I put "Acked-by: John..." on a patch of any kind, even trivial, it 
would look incredibly stupid, because I'm just some guy messing around 
with the kernel. A tactful response to me doing that from any actual 
kernel bigwig would be, "I appreciate your enthusiasm, but you are not 
part of the kernel patch flow." Similarly, a tactful response to me 
NACKing a patch would be, "I appreciate your concern, but you are in no 
position to remove a patch from the stream. Your comments will be 
considered and implemented or countered by an actual maintainer."

This is appropriate.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Extending boot protocol & bzImage for paravirt_ops

2007-06-01 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> Well, if we define is as a movable object then it has to be treated as
> such.  It's a protocol definition issue.  If we define it opaque, though
> -- of for that matter, if we don't -- we should define what memory it
> can live in, though.  Right now, the only "available" memory we have is
> end of setup to 0xa; the command line is defined to be allocated
> from this memory.
>   

Well, I think we can safely say that its something that's only
meaningful in 32/64-bit mode, so we aren't constrained by the real-mode
address space.

One of my goals in this project is to make the boot image, in some way,
completely define which memory it needs it get started.  That means that
the boot loader can either place things knowing they'll avoid the boot
image and/or definitively know that the image is unloadable.

So I don't think its strictly necessary to pre-define what memory this
object can use, since I think it can be safely determined dynamically.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A kexec approach to hibernation

2007-06-01 Thread Rafael J. Wysocki
On Saturday, 2 June 2007 01:54, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote:
> 
> [snip]
> 
> >> Just before jumping into the new kernel, with interrupts disabled, the
> >> old kernel could either prepare a data structure that specifies what
> >> pages are allocated, or alternatively simply provide a pointer to the
> >> relevant data structure in the old kernel.
> 
> > But for this purpose the old kernel will actually need to do what is 
> > currently
> > done in swsusp while the image is being created (the only difference is that
> > we allocate memory in the process, but that's a detail only).
> 
> Okay, but creating a list of pages should be extremely easy.
> Alternatively, with the "save kernel" might be able to read the existing
> data structures directly.
> 
> >> I can't say exactly how this data would be given to the new kernel, but I
> >> can't imagine it being difficult.  (For instance, multiboot headers, the
> >> kernel command line, initrd, or some other mechanism could be used.)
> 
> > Besides, you need to load the new kernel somehow.  If that's to work without
> > problems, that should be done before we switch off devices.
> 
> Well, the new kernel can be loaded at any time,

No.  By reading from a file systems, you're modifying it's meta data (in
general, of course).

> and would be done in exactly the way kexec loads a kernel.  It would probably
> make sense to load the kernel into memory (but not jump to it) as the very
> first step of hibernation.

I think you'd have to do that.

> >> >> 5. The new kernel loads, and then either kernel space or user space
> >> >> writes the necessary data from the old kernel to disk.
> >> 
> >> > You also need to reinitialize devices needed to write the image.
> >> 
> >> Yes.  That would be done, as normal, when the kernel loads.  Currently
> >> devices are suspended or stopped anyway before the atomic copy, and then
> >> reinitialized to write the image.  In theory, this stopping shouldn't be
> >> needed, and I mentioned that if additional support were added to some
> >> drivers for passing some information about the current state of the
> >> device, the device might only need to be partially shut down before
> >> jumping to the new kernel.  This might allow, for instance, avoiding
> >> spinning down and then up again the disks.
> 
> > Well, I don't quite agree.  I think that for this purpose we'll need 
> > devices to
> > be initialized from scratch by the new kernel, so the old kernel should put
> > them into states that allow this to be done.
> 
> I agree that the default behavior should be to completely shut down the
> devices.  Later, special support could be added to select devices to
> allow them to not be fully shut down.
> 
> > We are going to implement something like this anyway, but that's a rather 
> > long
> > way to go.
> 
> >> >> 6. The new kernel either powers off or suspends to ram.  If it suspends
> >> >> to ram, then it would need to be able to jump back to the old kernel
> >> >> when it resumes from ram.
> >> 
> >> > What if the user wants to abort the hibernation?
> >> 
> >> This would be handled in effectively the same way as if the user wants
> >> to suspend to ram after writing the image: it would be necessary to jump
> >> back to the old kernel.  This would effectively be handled in the same
> >> way as a resume, except that the copying back of memory would be
> >> avoided.  Presumably the image writing kernel would have devices in
> >> approximately the same state as the image loading kernel, and so the old
> >> kernel needs to be prepared to receive the devices in that state anyway.
> 
> > Please see above.  I don't think that would be easy to arrange for.
> 
> In that case, the devices can indeed be fully shut down, at least
> initially.
> 
> >> >> The advantages of this approach include:
> >> >> 
> >> >> - having a completely functional system (with a completely functional
> >> >> userspace) from which the image is written, without having to worry
> >> >> about messing up the state that is being saved (hell, the user could
> >> >> even do it via an interactive shell on the new kernel);
> >> >> 
> >> >> - no need to worry about trying to use drivers while some processes are
> >> >> frozen;
> >> 
> >> > We're rather worried about running processes when the devices are
> >> > frozen. ;-)
> >> 
> >> The point is, with this kexec approach, essentially no code at all runs
> >> under the old kernel after the very initial steps of the hibernation
> >> have begun, but any code, kernel or user, can run under the new kernel,
> >> because the new kernel provides a completely functional system, while at
> >> the same time not clobbering any of the memory of the old kernel.  In
> >> particular, it will be possible to write the image to a fuse file
> >> system.
> 
> > You need to be cautious here.  You can't touch any filesystems mounted by

Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Andrew Morton wrote:

> I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,

The trouble with the WARN_ON is that it triggers even for code that is 
okay like noted by Jeremy. My initial intend with NULL was to allow the 
allocation of a zero sized pointer without extra checks. It should only
trigger a failure if something bad was done.

NULL had the problem of confusion with no memory available. I think BADPTR 
is good. If the coding warts are causing trouble then BADPTR will result 
in a failure. Otherwise if the code is doing a kmalloc(0) and not 
dereferencing the pointer (like Paul's fixed code) then we should be fine 
and not issue a warning.

The false positives may be upsetting some people.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-01 Thread Eric W. Biederman

> As a side note, on my very old SMP machine, 2.6.20 correctly
> load-balance IRQs across CPU but 2.6.21 not. I know that
> in-kernel IRQ load balancer is marked as deprecated and
> somewhat broken, but with your report it make me think it
> could be a bug in the IRQ rerouting part in my case too and
> not necessary in the load-balancer (decision) part.

I doubt it.  The practical problem is that cpu_down does not
and by design can not call the irq balancing part properly
and I haven't yet seen anything to suggest that we don't migrate
irq properly.

So I'm guessing it was the decision part.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/2] dmaengine: move channel management to the client

2007-06-01 Thread Dan Williams
This effectively makes channels a shared resource rather than tying them
to a specific client.  dmaengine now assumes that clients will internally
track how many channels they need and dmaengine will learn if the client cares 
about
a channel at dma_event_callback time.  This also enables a client to ignore
a channel if it does not meet extra client specific constraints beyond
simple base capabilities.

This patch also fixes up the NET_DMA client to use the new mechanism.

Changelog:
* removed DMA_TX_ARRAY_INIT, no longer needed
* dma_client_chan_free -> dma_chan_release: switch to global reference
  counting only at device unregistration time, before it was also happening
  at client unregistration time
* clients now return dma_state_client to dmaengine (ack, dup, nak)

Cc: Chris Leech <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  213 +++--
 drivers/dma/ioatdma.c |1 
 drivers/dma/ioatdma.h |3 -
 include/linux/dmaengine.h |   75 
 net/core/dev.c|  111 ---
 5 files changed, 220 insertions(+), 183 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8a49103..927a8ac 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -37,11 +37,11 @@
  * Each device has a channels list, which runs unlocked but is never modified
  * once the device is registered, it's just setup by the driver.
  *
- * Each client has a channels list, it's only modified under the client->lock
- * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ * Each client is responsible for keeping track of the channels it uses.  See
+ * the definition of dma_event_callback in dmaengine.h.
  *
  * Each device has a kref, which is initialized to 1 when the device is
- * registered. A kref_put is done for each class_device registered.  When the
+ * registered. A kref_get is done for each class_device registered.  When the
  * class_device is released, the coresponding kref_put is done in the release
  * method. Every time one of the device's channels is allocated to a client,
  * a kref_get occurs.  When the channel is freed, the coresponding kref_put
@@ -51,10 +51,12 @@
  * references to finish.
  *
  * Each channel has an open-coded implementation of Rusty Russell's "bigref,"
- * with a kref and a per_cpu local_t.  A single reference is set when on an
- * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
- * extra reference per outstanding transaction.  The relase function does a
- * kref_put on the device. -ChrisL
+ * with a kref and a per_cpu local_t.  A dma_chan_get is called when a client
+ * signals that it wants to use a channel, and dma_chan_put is called when
+ * a channel is removed or a client using it is unregesitered.  A client can
+ * take extra references per outstanding transaction, as is the case with
+ * the NET DMA client.  The release function does a kref_put on the device.
+ * -ChrisL, DanW
  */
 
 #include 
@@ -102,8 +104,18 @@ static ssize_t show_bytes_transferred(struct class_device 
*cd, char *buf)
 static ssize_t show_in_use(struct class_device *cd, char *buf)
 {
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+   int in_use = 0;
+
+   if (unlikely(chan->slow_ref) && atomic_read(>refcount.refcount) > 
1)
+   in_use = 1;
+   else {
+   if (local_read(&(per_cpu_ptr(chan->local,
+   get_cpu())->refcount)) > 0)
+   in_use = 1;
+   put_cpu();
+   }
 
-   return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+   return sprintf(buf, "%d\n", in_use);
 }
 
 static struct class_device_attribute dma_class_attrs[] = {
@@ -129,42 +141,50 @@ static struct class dma_devclass = {
 
 /* --- client and device registration --- */
 
+#define dma_async_chan_satisfies_mask(chan, mask) 
__dma_async_chan_satisfies_mask((chan), &(mask))
+static int __dma_async_chan_satisfies_mask(struct dma_chan *chan, 
dma_cap_mask_t *want)
+{
+   dma_cap_mask_t has;
+
+   bitmap_and(has.bits, want->bits, chan->device->cap_mask.bits, 
DMA_TX_TYPE_END);
+   return bitmap_equal(want->bits, has.bits, DMA_TX_TYPE_END);
+}
+
 /**
- * dma_client_chan_alloc - try to allocate a channel to a client
+ * dma_client_chan_alloc - try to allocate channels to a client
  * @client: _client
  *
  * Called with dma_list_mutex held.
  */
-static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+static void dma_client_chan_alloc(struct dma_client *client)
 {
struct dma_device *device;
struct dma_chan *chan;
-   unsigned long flags;
int desc;   /* allocated descriptor count */
+   enum dma_state_client ack;
 
-   /* Find a channel, any DMA engine will do */
-   list_for_each_entry(device, _device_list, global_node) {
+   /* Find a channel 

[RFC PATCH 1/2] dmaengine: add base support for the async_tx api

2007-06-01 Thread Dan Williams
In preparation for the async_tx (dmaengine client) API this patch:
1/ introduces struct dma_async_tx_descriptor as a common field for all
   dmaengine software descriptors.  The primary role of this structure
   is to enable callbacks at transaction completion time, and support
   transaction chains that span multiple channels
2/ converts the device_memcpy_* methods into separate prep, set
   src/dest, and submit stages
3/ adds support for capabilities beyond memcpy (xor, memset, xor zero
   sum, completion interrupts).  place holders for future capabilities
   are also included
4/ converts ioatdma to the new semantics

Changelog:
* drop dma mapping methods, suggested by Chris Leech
* fix ioat_dma_dependency_added, also caught by Andrew Morton
* fix dma_sync_wait, change from Andrew Morton
* uninline large functions, change from Andrew Morton
* add tx->callback = NULL to dmaengine calls to interoperate with async_tx
  calls
* hookup ioat_tx_submit
* convert channel capabilities to a 'cpumask_t like' bitmap

Cc: Chris Leech <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  182 +
 drivers/dma/ioatdma.c |  248 -
 drivers/dma/ioatdma.h |8 +
 include/linux/dmaengine.h |  245 
 4 files changed, 454 insertions(+), 229 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 322ee29..8a49103 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -59,6 +59,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_MUTEX(dma_list_mutex);
 static LIST_HEAD(dma_device_list);
@@ -165,6 +167,24 @@ static struct dma_chan *dma_client_chan_alloc(struct 
dma_client *client)
return NULL;
 }
 
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
+{
+   enum dma_status status;
+   unsigned long dma_sync_wait_timeout = jiffies + msecs_to_jiffies(5000);
+
+   dma_async_issue_pending(chan);
+   do {
+   status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
+   if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
+   printk(KERN_ERR "dma_sync_wait_timeout!\n");
+   return DMA_ERROR;
+   }
+   } while (status == DMA_IN_PROGRESS);
+
+   return status;
+}
+EXPORT_SYMBOL(dma_sync_wait);
+
 /**
  * dma_chan_cleanup - release a DMA channel's resources
  * @kref: kernel reference structure that contains the DMA channel device
@@ -322,6 +342,28 @@ int dma_async_device_register(struct dma_device *device)
if (!device)
return -ENODEV;
 
+   /* validate device routines */
+   BUG_ON(dma_has_cap(DMA_MEMCPY, device->cap_mask) &&
+   !device->device_prep_dma_memcpy);
+   BUG_ON(dma_has_cap(DMA_XOR, device->cap_mask) &&
+   !device->device_prep_dma_xor);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_zero_sum);
+   BUG_ON(dma_has_cap(DMA_MEMSET, device->cap_mask) &&
+   !device->device_prep_dma_memset);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_interrupt);
+
+   BUG_ON(!device->device_alloc_chan_resources);
+   BUG_ON(!device->device_free_chan_resources);
+   BUG_ON(!device->device_tx_submit);
+   BUG_ON(!device->device_set_dest);
+   BUG_ON(!device->device_set_src);
+   BUG_ON(!device->device_dependency_added);
+   BUG_ON(!device->device_is_tx_complete);
+   BUG_ON(!device->device_issue_pending);
+   BUG_ON(!device->dev);
+
init_completion(>done);
kref_init(>refcount);
device->dev_id = id++;
@@ -397,6 +439,146 @@ void dma_async_device_unregister(struct dma_device 
*device)
 }
 EXPORT_SYMBOL(dma_async_device_unregister);
 
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * @len: length
+ *
+ * Both @dest and @src must be mappable to a bus address according to the
+ * DMA mapping API rules for streaming mappings.
+ * Both @dest and @src must stay memory resident (kernel memory or locked
+ * user space pages).
+ */
+dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+void *dest, void *src, size_t len)
+{
+   struct dma_device *dev = chan->device;
+   struct dma_async_tx_descriptor *tx;
+   dma_addr_t addr;
+   dma_cookie_t cookie;
+   int cpu;
+
+   tx = dev->device_prep_dma_memcpy(chan, len, 0);
+   if (!tx)
+   return -ENOMEM;
+
+   tx->ack = 1;
+   tx->callback = NULL;
+   addr = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);

[RFC PATCH 0/2] dmaengine: preparation for raid acceleration

2007-06-01 Thread Dan Williams
Hello David,

The following two patches are part of the raid acceleration series I
would like to push for 2.6.23 consideration.  I am sending these two
separately for your review for the following reasons: the 'dmaengine'
core initially came in through netdev,  patch #2 makes changes to
net/dev/core.c, and lastly I have an ack on the raid changes but no ack
from the community on the i/oat and dmaengine changes.

  dmaengine: add base support for the async_tx api
  dmaengine: move channel management to the client

Regards,
Dan

the full series is available at:
git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 16:57:20 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Andrew, want to take this patch to -mm to see if it triggers anything?

spose so.

I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there,
because it is exposing some coding warts.  But we should turn it off for
2.6.22 and make it conditional on CONFIG_DEVEL_KERNEL (or whatever it will
be called) later.

The BADPTR thing is a little worrying because it will make
previously-working-by-luck code go oops.  I guess we can live with that.

So we end up with the BADPTR code enabled even in production kernels, in
which case your ((unsigned long)x <= 16) trick is worth doing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Linus Torvalds wrote:

> > A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in 
> > kmalloc_slab() will trigger.
> 
> Did we use to BUG_ON()? I think that's wrong. There are ways for users to 
> potentially ask the kernel to do big allocations, and the correct response 
> is to say "no can do", not to crash!

There is no way to distinguish that from out of memory. Failing on large 
allocs is what we have always done for kmalloc(). Before 2.6.22 we used to 
fail for allocs > 256k which was a big nuisance for NUMAs large allocs.

The patches in 2.6.22 allow us for the first time to allocate arbitrary 
sized objects up to MAX_ORDER. So we no longer have troubles with large 
NUMA objects.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc3][ACPI?] Resume from s2r doesn't work.

2007-06-01 Thread Indan Zupancic
On Sat, June 2, 2007 00:17, Olaf Dietsche wrote:
> It doesn't work. I tried all options "s2ram -f (-s, -p, -m, -r, -a 1, -a 2,
> -a 3)" one after the other.
>
> Since the screen (or any other device) works without problems, when I
> skip acpi_enter_sleep_state(), I don't think it's screen related.

I use "s2ram -f -p -s -a 3", maybe you need some exotic combination too.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Bill Davidsen

Jens Axboe wrote:

On Thu, May 31 2007, Phillip Susi wrote:
  

Jens Axboe wrote:


No Stephan is right, the barrier is both an ordering and integrity
constraint. If a driver completes a barrier request before that request
and previously submitted requests are on STABLE storage, then it
violates that principle. Look at the code and the various ordering
options.
  
I am saying that is the wrong thing to do.  Barrier should be about 
ordering only.  So long as the order they hit the media is maintained, 
the order the requests are completed in can change.  barrier.txt bears 



But you can't guarentee ordering without flushing the data out as well.
It all depends on the type of cache on the device, of course. If you
look at the ordinary sata/ide drive with write back caching, you can't
just issue the requests in order and pray that the drive cache will make
it to platter.

If you don't have write back caching, or if the cache is battery backed
and thus guarenteed to never be lost, maintaining order is naturally
enough.
  


Do I misread this? If ordered doesn't reach all the way to the platter 
then there will be failure modes which result in order not preserved. 
Battery backed cache doesn't prevect failures between the cache and the 
platter.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SELECT() returns 1 But FIONREAD says (Input/output error)

2007-06-01 Thread Uncle George

Robert Hancock wrote:

It's because you haven't done anything to handle the error which is 
still persisting. Likely the only thing sane you can do in this case is 
close the fd and try to reopen it later.



This seems to be true, but not for what you might think.

It appears that if u plug the USB/serial device back into the usb-hub, 
the code creates a /dev/ttyUSB1 ( if you have not yet closed the 
disconnected /dev/ttyUSB0. ) When you do close /dev/ttyUSB0, then the 
device is erased from the /dev directory.


Now /dev/ttyUSB1 is the device. And /dev/ttyUSB0 disappeared. This does 
not seem proper. As now the program has no idea or capability to re-open 
the GPS device.


I have been informed that this was an approved kernel feature. Is this 
suppose to happen? Or was it an unintended consequence?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
> 
> A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in 
> kmalloc_slab() will trigger.

Did we use to BUG_ON()? I think that's wrong. There are ways for users to 
potentially ask the kernel to do big allocations, and the correct response 
is to say "no can do", not to crash!

> Here is the updated patch. It works fine here:
> 
> SLUB: Return BADPTR instead of warning for kmalloc(0)

Looks fine to me. My only comment is that

> - if (!x)
> + if (!x || x == BADPTR)
>   return;

This could be micro-optimized (again, non-standard, but it should be 
"practically portable") to have just a single test using something like

if ((unsigned long)x <= 16)
return;

but I guess it doesn't really matter much.

I think this is better than what we have now, but I also suspect it's 
*not* something we should try this late in the -rc sequence ;)

Andrew, want to take this patch to -mm to see if it triggers anything?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Bill Davidsen

Neil Brown wrote:

On Friday June 1, [EMAIL PROTECTED] wrote:
  

On Thu, May 31, 2007 at 02:31:21PM -0400, Phillip Susi wrote:


David Chinner wrote:
  

That sounds like a good idea - we can leave the existing
WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
behaviour that only guarantees ordering. The filesystem can then
choose which to use where appropriate

So what if you want a synchronous write, but DON'T care about the order? 
  

submit_bio(WRITE_SYNC, bio);

Already there, already used by XFS, JFS and direct I/O.



Are you sure?

You seem to be saying that WRITE_SYNC causes the write to be safe on
media before the request returns.  That isn't my understanding.
I think (from comments near the definition and a quick grep through
the code) that WRITE_SYNC expedites the delivery of the request
through the elevator, but doesn't do anything special about getting it
onto the media.


My impression is that the sync will return when the i/o has been 
delivered to the device, and will get special treatment by the elevator 
code (I looked quickly, more is needed). I'm sore someone will tell me 
if I misread this. ;-)


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A kexec approach to hibernation

2007-06-01 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote:

[snip]

>> Just before jumping into the new kernel, with interrupts disabled, the
>> old kernel could either prepare a data structure that specifies what
>> pages are allocated, or alternatively simply provide a pointer to the
>> relevant data structure in the old kernel.

> But for this purpose the old kernel will actually need to do what is currently
> done in swsusp while the image is being created (the only difference is that
> we allocate memory in the process, but that's a detail only).

Okay, but creating a list of pages should be extremely easy.
Alternatively, with the "save kernel" might be able to read the existing
data structures directly.

>> I can't say exactly how this data would be given to the new kernel, but I
>> can't imagine it being difficult.  (For instance, multiboot headers, the
>> kernel command line, initrd, or some other mechanism could be used.)

> Besides, you need to load the new kernel somehow.  If that's to work without
> problems, that should be done before we switch off devices.

Well, the new kernel can be loaded at any time, and would be done in
exactly the way kexec loads a kernel.  It would probably make sense to
load the kernel into memory (but not jump to it) as the very first step
of hibernation.

>> >> 5. The new kernel loads, and then either kernel space or user space
>> >> writes the necessary data from the old kernel to disk.
>> 
>> > You also need to reinitialize devices needed to write the image.
>> 
>> Yes.  That would be done, as normal, when the kernel loads.  Currently
>> devices are suspended or stopped anyway before the atomic copy, and then
>> reinitialized to write the image.  In theory, this stopping shouldn't be
>> needed, and I mentioned that if additional support were added to some
>> drivers for passing some information about the current state of the
>> device, the device might only need to be partially shut down before
>> jumping to the new kernel.  This might allow, for instance, avoiding
>> spinning down and then up again the disks.

> Well, I don't quite agree.  I think that for this purpose we'll need devices 
> to
> be initialized from scratch by the new kernel, so the old kernel should put
> them into states that allow this to be done.

I agree that the default behavior should be to completely shut down the
devices.  Later, special support could be added to select devices to
allow them to not be fully shut down.

> We are going to implement something like this anyway, but that's a rather long
> way to go.

>> >> 6. The new kernel either powers off or suspends to ram.  If it suspends
>> >> to ram, then it would need to be able to jump back to the old kernel
>> >> when it resumes from ram.
>> 
>> > What if the user wants to abort the hibernation?
>> 
>> This would be handled in effectively the same way as if the user wants
>> to suspend to ram after writing the image: it would be necessary to jump
>> back to the old kernel.  This would effectively be handled in the same
>> way as a resume, except that the copying back of memory would be
>> avoided.  Presumably the image writing kernel would have devices in
>> approximately the same state as the image loading kernel, and so the old
>> kernel needs to be prepared to receive the devices in that state anyway.

> Please see above.  I don't think that would be easy to arrange for.

In that case, the devices can indeed be fully shut down, at least
initially.

>> >> The advantages of this approach include:
>> >> 
>> >> - having a completely functional system (with a completely functional
>> >> userspace) from which the image is written, without having to worry
>> >> about messing up the state that is being saved (hell, the user could
>> >> even do it via an interactive shell on the new kernel);
>> >> 
>> >> - no need to worry about trying to use drivers while some processes are
>> >> frozen;
>> 
>> > We're rather worried about running processes when the devices are
>> > frozen. ;-)
>> 
>> The point is, with this kexec approach, essentially no code at all runs
>> under the old kernel after the very initial steps of the hibernation
>> have begun, but any code, kernel or user, can run under the new kernel,
>> because the new kernel provides a completely functional system, while at
>> the same time not clobbering any of the memory of the old kernel.  In
>> particular, it will be possible to write the image to a fuse file
>> system.

> You need to be cautious here.  You can't touch any filesystems mounted by
> the old kernel, or they will be corrupted after the restore.

Certainly.  Note that any filesystems that are available to the "save
state" kernel would have been specifically mounted under that kernel.
There isn't any real possibility of confusion over which filesystems are
safe to access.

>> >> - no need for complicated process freezing;
>> 
>> > In fact it's not complicated, at least 

Re: What is the kernel documentation mailing list ?

2007-06-01 Thread Chris Wright
* Jesper Juhl ([EMAIL PROTECTED]) wrote:
> On 01/06/07, Piyush K <[EMAIL PROTECTED]> wrote:
> >Hi,
> >What is the kernel documentation mailing list where I can discuss
> >kernel documentation changes ?
> >Please cc to my email address too.
> >Thanks,
> >PYK
> 
> As far as I know, there is no dedicated list for documentation issues.
> There was a [EMAIL PROTECTED] list at one point but I think it
> died - not 100% sure though.

There is some recent attempt to bring it back to life.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Linus Torvalds wrote:

> I'm not seeing the path on the freeing side that silently accepts BADPTR.
> 
> It is not ok to derefence BADPTR, but it's obviously ok to _free_ it.

Ok.

> Also, if I read the patch correctly, you _also_ return BADPTR for slabs 
> that are too large. No?  That would be wrong - those need to return NULL 
> for "out of memory".

A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in 
kmalloc_slab() will trigger.

Here is the updated patch. It works fine here:


SLUB: Return BADPTR instead of warning for kmalloc(0)

Remove the WARN_ON_ONCE and simply return BADPTR.

BADPTR can be used legitimately as long as it is not dereferenced.
Can even be freed.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |   18 --
 mm/slub.c|   10 +-
 2 files changed, 13 insertions(+), 15 deletions(-)

Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-06-01 16:19:36.0 -0700
+++ slub/include/linux/slub_def.h   2007-06-01 16:43:57.0 -0700
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#define BADPTR ((void *)16)
+
 struct kmem_cache_node {
spinlock_t list_lock;   /* Protect partial list and nr_partial */
unsigned long nr_partial;
@@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[
  */
 static inline int kmalloc_index(size_t size)
 {
-   /*
-* We should return 0 if size == 0 (which would result in the
-* kmalloc caller to get NULL) but we use the smallest object
-* here for legacy reasons. Just issue a warning so that
-* we can discover locations where we do 0 sized allocations.
-*/
-   WARN_ON_ONCE(size == 0);
+
+   if (!size)
+   return 0;
 
if (size > KMALLOC_MAX_SIZE)
return -1;
@@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_alloc(s, flags);
} else
@@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_zalloc(s, flags);
} else
@@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t 
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_alloc_node(s, flags, node);
} else
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700
+++ slub/mm/slub.c  2007-06-01 16:43:27.0 -0700
@@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags
 
if (s)
return slab_alloc(s, flags, -1, __builtin_return_address(0));
-   return NULL;
+   return BADPTR;
 }
 EXPORT_SYMBOL(__kmalloc);
 
@@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t 
 
if (s)
return slab_alloc(s, flags, node, __builtin_return_address(0));
-   return NULL;
+   return BADPTR;
 }
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
@@ -2338,7 +2338,7 @@ void kfree(const void *x)
struct kmem_cache *s;
struct page *page;
 
-   if (!x)
+   if (!x || x == BADPTR)
return;
 
page = virt_to_head_page(x);
@@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return slab_alloc(s, gfpflags, -1, caller);
 }
@@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return slab_alloc(s, gfpflags, node, caller);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: megaraid.c, all kernel versions, problem with multi-luns

2007-06-01 Thread Andrew Morton
On Wed, 30 May 2007 20:09:44 -0300
"Reinaldo Carvalho" <[EMAIL PROTECTED]> wrote:

> Hi,

(cc's added) (CONFIG_SCSI_MULTI_LUN is set)

> I have a Dell PowerEdge Expandable RAID controller, with a hardware
> Raid-5 at Channel 01 running perfectly, and a nCipher Crypter at
> Channel 02.
> 
> This controller doesn't correctly detect devices (e.g. nCipher
> Crypter) with multiples LUNs. Only one LUN is detected.
> 
> At another controller (e.g. Adaptec 79xx) two LUNs were detect. I
> compiled 2.6.8, 2.6.18 and 2.6.21.3 to test megaraid driver and all
> failed detecting two LUNs.
> 
> I think that this is a firmware problem, but i'd like have some opinions.
> 
> I read some docs
> (http://www.suse.de/~garloff/linux/scsi-scan/scsi-scanning.html,
> http://www.ictp.trieste.it/~radionet/nuc1996/ref/howto-html/scsi-howto-2.html)
> and this problem doesn't seem to be simple.
> 
> Best regards,
> 
> More information with Dell PowerEdge Expandable RAID controller (LSI
> Logic MegaRaid):
> 
> Attached devices:
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
>   Vendor: PE/PVModel: 1x5 SCSI BP  Rev: 1.0
>   Type:   ProcessorANSI  SCSI revision: 02
> Host: scsi0 Channel: 01 Id: 00 Lun: 00
>   Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
>   Type:   ProcessorANSI  SCSI revision: 02
> Host: scsi0 Channel: 02 Id: 00 Lun: 00
>   Vendor: MegaRAID Model: LD 0 RAID5  279G Rev: 522A
>   Type:   Direct-AccessANSI  SCSI revision: 02
> 
> 
> 14:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller
> 4 (rev 06)
> Subsystem: Dell PowerEdge Expandable RAID Controller 4e/Di
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
> ParErr- Stepping+ SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
> >TAbort- SERR-  Latency: 64 (32000ns min), Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 22
> Region 0: Memory at d7ff (32-bit, prefetchable) [size=64K]
> Region 2: Memory at defc (32-bit, non-prefetchable) [size=256K]
> Expansion ROM at df00 [disabled] [size=128K]
> Capabilities: 
> 
> 14:0e.0 0104: 1028:0013 (rev 06)
> 
> 
> Information with Adaptec 79xx or others SCSI controllers:
> 
> Host: scsi0 Channel: 01 Id: 00 Lun: 00
>   Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
>   Type:   ProcessorANSI  SCSI revision: 02
> Host: scsi0 Channel: 01 Id: 00 Lun: 01
>   Vendor: nCipher  Model: Fastness Crypto  Rev: 2*00
>   Type:   ProcessorANSI  SCSI revision: 02
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
> 
> Something like this? (Not tested yet just for review):

I'm not seeing the path on the freeing side that silently accepts BADPTR.

It is not ok to derefence BADPTR, but it's obviously ok to _free_ it.

Also, if I read the patch correctly, you _also_ return BADPTR for slabs 
that are too large. No?  That would be wrong - those need to return NULL 
for "out of memory".

(But I only looked at the diff, not at the end result, so it may be that 
all the cases where you changed "return NULL" into "return BADPTR" were 
really just the size-zero ones - it just looked suspicious).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Linus Torvalds wrote:

> No, I don't think you can do it this way.

Ultimately not. But its worth to see if this works.

> At a minimum, you'd need to test that the result is word-aligned. 
> Preferably 8-byte aligned. We literally have stuff that knows about these 
> things and uses the low bits in the pointer to keep extra data.

kmalloc allocations are guaranteed to be aligned to KMALLOC_MINALIGN. I 
can bring that in but it will make the patch less readable.

> Of course, there migth be other (even more subtle) cases where we just 
> assume certain alignment, and depend on the fact that we just _happen_ to 
> get it. Who knows..

I tried to get rid of those cased and I hope that work is complete in 
2.6.22.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
> 
> Hmmm... We are going rapidly here. This is a patch that I am testing right 
> now. It right adjust the object and the patch is manageable:

No, I don't think you can do it this way.

At a minimum, you'd need to test that the result is word-aligned. 
Preferably 8-byte aligned. We literally have stuff that knows about these 
things and uses the low bits in the pointer to keep extra data.

(That said, I didn't check whether we actually kmalloc() the data, but I 
think we do - things like "struct key" etc).

Now, maybe those things always have a 8-byte-aligned size, and it works 
out, but I'd be worried.

Of course, there migth be other (even more subtle) cases where we just 
assume certain alignment, and depend on the fact that we just _happen_ to 
get it. Who knows..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Who is administering the kernel bugzilla?

2007-06-01 Thread Rafael J. Wysocki
Hi,

Can anyone please tell me who's administering the kernel bugzilla now?

I've tried to write to [EMAIL PROTECTED] , but this address seems
to point to nowhere.

The problem is that I'd like to have a new component called
"Hibernation-Suspend" in the kernel bugzilla's "Power Management" category and
bug reports with this component selected to be assigned to me automatically.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22-rc3][ACPI?] Resume from s2r doesn't work.

2007-06-01 Thread Nigel Cunningham
Hi.

On Sat, 2007-06-02 at 00:37 +0200, Rafael J. Wysocki wrote:
> On Saturday, 2 June 2007 00:17, Olaf Dietsche wrote:
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> > 
> > > On Friday, 1 June 2007 23:12, Olaf Dietsche wrote:
> > >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> > >> 
> > >> > On Friday, 1 June 2007 22:27, Olaf Dietsche wrote:
> > >> >> When I resume, everything seems to come up (fan becomes busy, disk and
> > >> >> dvd spin up for a short time),
> > >> >
> > >> > Hmm, what about the screen?
> > >> 
> > >> When the laptop is dead, screen remains black.
> > >> 
> > >> When I skip acpi_enter_sleep_state(), the screen works like everything
> > >> else.
> > >
> > > I think you should try s2ram (http://en.opensuse.org/s2ram) as the first 
> > > step.
> > 
> > It doesn't work. I tried all options "s2ram -f (-s, -p, -m, -r, -a 1, -a 2,
> > -a 3)" one after the other.
> > 
> > Since the screen (or any other device) works without problems, when I
> > skip acpi_enter_sleep_state(), I don't think it's screen related.
> 
> No, it might be, actually.  If you skip acpi_enter_sleep_state(), your machine
> doesn't really suspend, so in fact you only confirm that your drivers 
> implement
> .suspend() and .resume() hooks correctly.

Actually, you don't even confirm that. Chips that would be powered down
by entering the sleep state will not be powered down in this scenario,
so failures to properly reinitialise them in resume routines won't be
noticed.

Regards,

Nige


signature.asc
Description: This is a digitally signed message part


Re: 2.6.22-rc3-mm1

2007-06-01 Thread Benjamin Herrenschmidt
On Fri, 2007-06-01 at 14:02 -0700, Andrew Morton wrote:
> 
> 
> Yeah, allmodconfig tends to fall over in a heap on a lot of the
> less-lavishly-maintained architectures.  If any of these are specific
> to
> -mm then I guess we should fix them up, prevent the kernel from
> actually
> going backwards.

Some of the later seems to be related to the lack of CONFIG_PM .. it's
not much a lavish maintainership issue than the fact that nobody every
builds the powermac drivers without CONFIG_PM :-) I'll look into fixing
some of these.

As for the ps3 bits, it's a known problem, the ps3 support is still very
much a work in progress.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Linus Torvalds wrote:

> So for *both* of the above reasons, it's actually stupid to return NULL 
> for a zero-sized allocation. It would be much better to return another 
> pointer that will trap on access. A good candidate might be to return
> 
>   #define BADPTR ((void *)16)

Something like this? (Not tested yet just for review):


SLUB: Return BADPTR instead of warning for kmalloc(0)

Remove the WARN_ON_ONCE and simply return BADPTR.

BADPTR can be used legitimately as long as it is not dereferenced.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |   18 --
 mm/slub.c|8 
 2 files changed, 12 insertions(+), 14 deletions(-)

Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-06-01 16:19:36.0 -0700
+++ slub/include/linux/slub_def.h   2007-06-01 16:24:54.0 -0700
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#define BADPTR ((void *)16)
+
 struct kmem_cache_node {
spinlock_t list_lock;   /* Protect partial list and nr_partial */
unsigned long nr_partial;
@@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[
  */
 static inline int kmalloc_index(size_t size)
 {
-   /*
-* We should return 0 if size == 0 (which would result in the
-* kmalloc caller to get NULL) but we use the smallest object
-* here for legacy reasons. Just issue a warning so that
-* we can discover locations where we do 0 sized allocations.
-*/
-   WARN_ON_ONCE(size == 0);
+
+   if (!size)
+   return 0;
 
if (size > KMALLOC_MAX_SIZE)
return -1;
@@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_alloc(s, flags);
} else
@@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size,
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_zalloc(s, flags);
} else
@@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t 
struct kmem_cache *s = kmalloc_slab(size);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return kmem_cache_alloc_node(s, flags, node);
} else
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700
+++ slub/mm/slub.c  2007-06-01 16:27:12.0 -0700
@@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags
 
if (s)
return slab_alloc(s, flags, -1, __builtin_return_address(0));
-   return NULL;
+   return BADPTR;
 }
 EXPORT_SYMBOL(__kmalloc);
 
@@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t 
 
if (s)
return slab_alloc(s, flags, node, __builtin_return_address(0));
-   return NULL;
+   return BADPTR;
 }
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
@@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return slab_alloc(s, gfpflags, -1, caller);
 }
@@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t
struct kmem_cache *s = get_slab(size, gfpflags);
 
if (!s)
-   return NULL;
+   return BADPTR;
 
return slab_alloc(s, gfpflags, node, caller);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT] Fix NR_syscalls in ARM

2007-06-01 Thread Russell King
On Sat, Jun 02, 2007 at 12:18:40AM +0100, Russell King wrote:
> On Fri, Jun 01, 2007 at 04:10:53PM -0700, Deepak Saxena wrote:
> > The -rt patch adds a NR_syscalls symbol to the arm/unistd.h but
> > it is not the correct value as there are 348 syscalls on ARM
> > and the existing change sets the symbol to 322.
> > 
> > Russell: Why isn't this in mainline? Other arches all seem to have 
> > this symbol already defined.
> 
> The hint is that it isn't in mainline; it's just plainly not required.
> It's also the wrong place to define it; it's not a property that
> unistd.h should concern itself with - it's a property of the kernel's
> branch table for calling the syscalls, and on ARM we calculate that
> number directly from the size of the kernel's branch table.
> 
> It's also not just last_syscall_number+1 since the table is sized to
> make the assembly easy - iow, a number divisible by 4.
> 
> So all in all, NR_syscalls in unistd.h is just utterly wrong.

BTW, it should be pointed out that you've found the exact reason why
putting it in unistd.h is _wrong_.  It's all to easy for it to get
out of sync with updates to the place where it really matters - the
code which bounds-checks the syscall number (that being the assembly
code which indexes the branch table.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Andrew Morton wrote:
> 
> We could store the size of the allocation in the allocated object?  Just
> add four bytes to the user's request, then pick the appropriate cache based
> on that, then put the user's `size' at the tail of the resulting allocation?

It should be easy enough to do it for _most_ allocations by just doing it 
when there is already "enough slack" to do it (which is likely true most 
of the time).

IOW, if you ask for a 42-byte allocation, and we allocate from a 64-byte 
slab, you get the slab allocation at address X, you don't actually have to 
return "X" at all. Just return "X+8", and then you do:

 - at 32-bit word at X+0 you put the "real length"
 - at 32-bit word at X+4 you put some good redzone marker
 - at 32-bit word at "X + reallen + 8" you put the endzone marker.

And then you say: if the real length was within 12 bytes of the allocation 
length, we just don't do this.

So you wouldn't get any redzoning for those allocations that are exactly 
sized (or close enough) to fit in an allocation block, but I bet *most* 
allocations would get this for free.

And then, if you actually turn on redzoning, you just always add the 12 
byte to the allocation size (assuming the alignment rules allow you to).

The nice thing about this is that the freeing path already knows where the 
object is *supposed* to start (because it sees the allocation size in the 
slub/slab data structures), so the kfree() path can actually figure out on 
its own whether it is given a "X" or an "X+8" kind of address.

So you don't actually need any extra information. You literally just need 
enough slop in the allocation that you can do this in the first place, so 
there is no cost (except for the cost of checking itself, of course).

Hmm?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
On Fri, 1 Jun 2007, Christoph Lameter wrote:

> Hmmm... We are going rapidly here. This is a patch that I am testing right 
> now. It right adjust the object and the patch is manageable:

Does not boot sigh.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT] enable interrupts in do_page_fault for users.

2007-06-01 Thread Steven Rostedt
Ingo,

To prevent scheduling while irqs disabled, I added a force to the
do_page_fault to enable interrupts.

The thing is, if the user faults at an address above PAGE_OFFSET. My
stupid program that I attached to my last email did just that:

  unsigned long *p = (void*)-1;

  *p = 0xbed;


where (void*)-1 is an address greater than PAGE_OFFSET.  If I changed
that address to (void*)1, the program still segfaulted, but I didn't get
the warning about scheduling while interrupts disabled. I even put in a
print to show if interrupts are disabled in do_page_fault before calling
force_sig_info, and with (void*)-1 they were, and with (void*)1 they
were not disabled.

This patch forces interrupts to always be enabled when entering the user
fault code. Maybe this should also be applied to mainline?

This solves the one issue with scheduling while irqs disabled.

-- Steve

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

Index: linux-2.6.21-rt9/arch/x86_64/mm/fault.c
===
--- linux-2.6.21-rt9.orig/arch/x86_64/mm/fault.c
+++ linux-2.6.21-rt9/arch/x86_64/mm/fault.c
@@ -476,6 +476,10 @@ bad_area:
 bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & PF_USER) {
+
+   /* it's possible to have interrupts off here */
+   local_irq_enable();
+
if (is_prefetch(regs, address, error_code))
return;
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT] Fix NR_syscalls in ARM

2007-06-01 Thread Russell King
On Fri, Jun 01, 2007 at 04:10:53PM -0700, Deepak Saxena wrote:
> The -rt patch adds a NR_syscalls symbol to the arm/unistd.h but
> it is not the correct value as there are 348 syscalls on ARM
> and the existing change sets the symbol to 322.
> 
> Russell: Why isn't this in mainline? Other arches all seem to have 
> this symbol already defined.

The hint is that it isn't in mainline; it's just plainly not required.
It's also the wrong place to define it; it's not a property that
unistd.h should concern itself with - it's a property of the kernel's
branch table for calling the syscalls, and on ARM we calculate that
number directly from the size of the kernel's branch table.

It's also not just last_syscall_number+1 since the table is sized to
make the assembly easy - iow, a number divisible by 4.

So all in all, NR_syscalls in unistd.h is just utterly wrong.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device Driver Etiquette

2007-06-01 Thread Daniel J Blueman

On 1 Jun, 19:40, "Lee Revell" <[EMAIL PROTECTED]> wrote:

On 6/1/07, Matthew Fredrickson <[EMAIL PROTECTED]> wrote:

> is it acceptable (although
> not nice) to simply fix it this way, by disabling irqs while it loads
> the firmware?

I would say to just disable IRQs while loading firmware.  Almost every
server I maintain has some vendor driver which generates a "many lost
ticks!" message on load.  As long as it's only done at module load
time it should be fine.


For anything ~10s or more, you'll probably also need to call the timer
update function to prevent soft lockup warning being generated.


Of course the best solution is to just get the driver into mainline.

Lee

--
Daniel J Blueman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3 hibernate(?) fails totally - regression

2007-06-01 Thread Rafael J. Wysocki
On Saturday, 2 June 2007 00:37, David Greaves wrote:
> Rafael J. Wysocki wrote:
> > On Friday, 1 June 2007 23:23, David Greaves wrote:
> >> Not a regression though, it does it in 2.6.21
> >>
> >> If I cause the system to save state to disk then whilst off it no longer
> >> responds to g-wol.
> > 
> > Can you please try with the hibernation and suspend patch series from
> > 
> > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc3/patches/
> > 
> > applied?
> > 
> > Greetings,
> > Rafael
> > 
> 
> Sorry I made a mistake in the report.
> I was still booting 2.6.21.1 - very sorry :(
> 
> The real situation is worse :(

Ouch.
 
> 2.6.22-rc3 (no patches) just hangs on suspend at:
> Suspending consoles
> 
> console switching works but needs a hard reset to reboot.
> 
> 2.6.22-rc3-skge (with Rafael's patches)
> suspends to disk and powers off
> wol doesn't work incidentally
> resume resumes to the exact same place that 2.6.22-rc3 hangs at...
> ie a non-responsive system saying
> Suspending consoles
> 
> Note, in both cases I can switch VTs, the caps/numlock lights respond.

Can you set CONFIG_DISABLE_CONSOLE_SUSPEND in .config and see where exactly it
fails?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Christoph Lameter
> So a kmalloc(62) would get upped to 66, so we allocate from size-128
> and put the number 62 at bytes 124-127 and we poison bytes 62-123?

Hmmm... We are going rapidly here. This is a patch that I am testing right 
now. It right adjust the object and the patch is manageable:



SLUB mm-only: Right align kmalloc objects to trigger overwrite detection

Right align kmalloc objects if they are less than the full kmalloc slab size.
This will move the object to be flush with the end of the object in order
to allow the easy detection of writes / reads after the end of the kmalloc
object.

Without slub_debug overwrites will destroy the free pointer of the next object
or the next object. Read will yield garbage that is likely zero.

With slub_debug redzone checks will be triggered. Reads will read redzone
poison.

This patch is only for checking things out. There are issues:

1. Alignment of kmalloc objects may now be different. In particular
   objects whose size is not a multiple of wordsize may be not word alignmed.

2. __kmalloc and kfree need to touch an additional cacheline in
   struct kmem_cache thereby reducing performance.

3. An object allocated via kmalloc may no longer be freed via kmem_cache_free.

So we need to figure out some may to make this configurable. Preferably
runtime configurable.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |   22 +++---
 mm/slub.c|   11 ---
 2 files changed, 27 insertions(+), 6 deletions(-)

Index: slub/include/linux/slub_def.h
===
--- slub.orig/include/linux/slub_def.h  2007-06-01 15:56:42.0 -0700
+++ slub/include/linux/slub_def.h   2007-06-01 16:00:03.0 -0700
@@ -120,6 +120,19 @@ static inline struct kmem_cache *kmalloc
return _caches[index];
 }
 
+static inline unsigned long kmalloc_size(size_t size)
+{
+   int index = kmalloc_index(size);
+
+   if (index >= KMALLOC_SHIFT_LOW)
+   return 1 << index;
+
+   if (index == 2)
+   return 192;
+   return 96;
+}
+
+
 #ifdef CONFIG_ZONE_DMA
 #define SLUB_DMA __GFP_DMA
 #else
@@ -135,7 +148,8 @@ static inline void *kmalloc(size_t size,
if (!s)
return NULL;
 
-   return kmem_cache_alloc(s, flags);
+   return kmem_cache_alloc(s, flags)
+   + kmalloc_size(size) - size;
} else
return __kmalloc(size, flags);
 }
@@ -148,7 +162,8 @@ static inline void *kzalloc(size_t size,
if (!s)
return NULL;
 
-   return kmem_cache_zalloc(s, flags);
+   return kmem_cache_zalloc(s, flags)
+   + kmalloc_size(size) - size;
} else
return __kzalloc(size, flags);
 }
@@ -159,7 +174,8 @@ extern void *__kmalloc_node(size_t size,
 static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
-   struct kmem_cache *s = kmalloc_slab(size);
+   struct kmem_cache *s = kmalloc_slab(size) +
+   kmalloc_size(size) - size;
 
if (!s)
return NULL;
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-06-01 15:51:05.0 -0700
+++ slub/mm/slub.c  2007-06-01 16:15:21.0 -0700
@@ -2283,9 +2283,10 @@ static struct kmem_cache *get_slab(size_
 void *__kmalloc(size_t size, gfp_t flags)
 {
struct kmem_cache *s = get_slab(size, flags);
+   int offset = size - s->size;
 
if (s)
-   return slab_alloc(s, flags, -1, __builtin_return_address(0));
+   return slab_alloc(s, flags, -1, __builtin_return_address(0)) + 
offset;
return NULL;
 }
 EXPORT_SYMBOL(__kmalloc);
@@ -2294,9 +2295,10 @@ EXPORT_SYMBOL(__kmalloc);
 void *__kmalloc_node(size_t size, gfp_t flags, int node)
 {
struct kmem_cache *s = get_slab(size, flags);
+   int offset = size - s->size;
 
if (s)
-   return slab_alloc(s, flags, node, __builtin_return_address(0));
+   return slab_alloc(s, flags, node, __builtin_return_address(0)) 
+ offset;
return NULL;
 }
 EXPORT_SYMBOL(__kmalloc_node);
@@ -2337,6 +2339,7 @@ void kfree(const void *x)
 {
struct kmem_cache *s;
struct page *page;
+   unsigned long addr = (unsigned long) x;
 
if (!x)
return;
@@ -2344,7 +2347,9 @@ void kfree(const void *x)
page = virt_to_head_page(x);
s = page->slab;
 
-   slab_free(s, page, (void *)x, __builtin_return_address(0));
+   addr &= ~((unsigned long)s->size - 1);
+
+   slab_free(s, page, (void *)addr, __builtin_return_address(0));
 }
 EXPORT_SYMBOL(kfree);
 
-
To unsubscribe from this 

[PATCH RT] Fix NR_syscalls in ARM

2007-06-01 Thread Deepak Saxena

The -rt patch adds a NR_syscalls symbol to the arm/unistd.h but
it is not the correct value as there are 348 syscalls on ARM
and the existing change sets the symbol to 322.

Signed-off-by: Deepak Saxena <[EMAIL PROTECTED]>

---

Russell: Why isn't this in mainline? Other arches all seem to have 
this symbol already defined.

Index: linux-2.6.21/include/asm-arm/unistd.h
===
--- linux-2.6.21.orig/include/asm-arm/unistd.h
+++ linux-2.6.21/include/asm-arm/unistd.h
@@ -375,7 +375,7 @@
 #define __NR_kexec_load(__NR_SYSCALL_BASE+347)
 
 #ifndef __ASSEMBLY__
-#define NR_syscalls(__NR_set_mempolicy + 1 - 
__NR_SYSCALL_BASE)
+#define NR_syscalls(__NR_kexec_load + 1 - 
__NR_SYSCALL_BASE)
 #endif
 
 /*

-- 
Deepak Saxena - [EMAIL PROTECTED] - http://www.plexity.net 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-06-01 Thread H. Peter Anvin
Andy Whitcroft wrote:
> 
> I think that my debugging says that newsetup got the compressed kernel
> and decompressor into memory ok and execution passed to it normally.
> But I cannot figure out where the corruption is coming from.  I tried
> annotating the gzip decompressor to see if the input and output buffers
> were overlapping at any time and that debug said no (unsure how reliable
> that is).  And yet at some point the output image is munched up.
> 
> One last piece of information.  The decompressor also always seems to
> get to the end of the input stream in exactly the right place without
> reporting any kind of error, that is with exactly 8 bytes left over for
> the length and crc checks.  Which given the context sensitive nature of
> the algorithm tends to imply the input stream was ok for the whole
> duration of the decompress.  Yet the output stream is badly broken.
> 
> Anyone got any wacky suggestions ...
> 

It definitely sounds like a memory clobber of some sort.

Usual suspects, in addition to the input/output buffers you already
looked at, would be the heap and the stack.  Finding where the stack
pointer lives would be my first, instinctive guess.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] V4L: stk11xx, add a new webcam driver

2007-06-01 Thread Mauro Carvalho Chehab

> > This seems to be an interesting approach.
> >
> >   
> Interesting but impossible to do for ioctl calls.
> When the application does a ioctl(fd_of_mnt_video0,VIDIOC_G_FMT,) 
> for example, there is no way for the userspace helper to catch this ioctl.
> The application could only open/read from the userspace helper's file 
> /mnt/video0.
> ioctl would still have to be done on the kernel device driver.
> I thought also about a /proc interface for decompression algorithms (a 
> helper would listen on a /proc file and write on another /proc file) but 
> /proc is not designed for that kind of thing.
> A separate library seems to be the simplest solution.

There are some ways for this to work. For example, you may create a
helper device for the daemon driver to bind, even requiring it to have
root permission.

-- 
Cheers,
Mauro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A kexec approach to hibernation

2007-06-01 Thread Rafael J. Wysocki
On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > On Friday, 1 June 2007 22:39, Jeremy Maitin-Shepard wrote:
> >> I figured I'd throw this idea out, since although it is not perfect, it
> >> has the potential to elegantly solve a lot of issues with hibernate.
> >> 
> >> Just as kexec can now be used to write a crashdump after a kernel panic,
> >> a fresh kexec-loaded kernel (loaded into unused memory) could be used to
> >> write the hibernate image of the existing kernel to disk, and then power
> >> off the system (or suspend to ram, or anything else).  This avoids the
> >> need for the original kernel to jump through hoops to hibernate itself
> >> in place.
> >> 
> >> A hibernate sequence would be approximately as follows:
> >> 
> >> 1. Free some memory if needed or desired, and disable the swap device
> >> if it is going to be used to write the hibernate image.
> 
> > Why to disable it?
> 
> To make sure that the swap data won't get clobbered by the writing of
> the image, if the swap device is to be used to write the hibernate
> image.  Presumably something similar is already done.  In any case this
> is not an important point.
> 
> >> 2. Load the fresh kernel in a chunk of available (possibly
> >> pre-allocated) memory (there must also be enough available memory
> >> for this kernel to use).
> >> 
> >> 3. Disable interrupts and stop all devices.
> 
> > Well, this is one of the hardest parts of hibernation, so no advantage
> > here.
> 
> It seems like support for this is mostly already in place though, and it
> needs to be done for suspend to ram, kexec, and shutdown anyway.
> 
> >> 4. Jump to the new kernel, passing whatever state information will be
> >> needed by it to know how to write the image.
> 
> > How would we know which data to write (more precisely, which data to
> > tell the other kernel to write)?  How do we pass this information to
> > the new kernel?
> 
> Just before jumping into the new kernel, with interrupts disabled, the
> old kernel could either prepare a data structure that specifies what
> pages are allocated, or alternatively simply provide a pointer to the
> relevant data structure in the old kernel.

But for this purpose the old kernel will actually need to do what is currently
done in swsusp while the image is being created (the only difference is that
we allocate memory in the process, but that's a detail only).

> I can't say exactly how this data would be given to the new kernel, but I
> can't imagine it being difficult.  (For instance, multiboot headers, the
> kernel command line, initrd, or some other mechanism could be used.)

Besides, you need to load the new kernel somehow.  If that's to work without
problems, that should be done before we switch off devices.

> >> 5. The new kernel loads, and then either kernel space or user space
> >> writes the necessary data from the old kernel to disk.
> 
> > You also need to reinitialize devices needed to write the image.
> 
> Yes.  That would be done, as normal, when the kernel loads.  Currently
> devices are suspended or stopped anyway before the atomic copy, and then
> reinitialized to write the image.  In theory, this stopping shouldn't be
> needed, and I mentioned that if additional support were added to some
> drivers for passing some information about the current state of the
> device, the device might only need to be partially shut down before
> jumping to the new kernel.  This might allow, for instance, avoiding
> spinning down and then up again the disks.

Well, I don't quite agree.  I think that for this purpose we'll need devices to
be initialized from scratch by the new kernel, so the old kernel should put
them into states that allow this to be done.

We are going to implement something like this anyway, but that's a rather long
way to go.

> >> 6. The new kernel either powers off or suspends to ram.  If it suspends
> >> to ram, then it would need to be able to jump back to the old kernel
> >> when it resumes from ram.
> 
> > What if the user wants to abort the hibernation?
> 
> This would be handled in effectively the same way as if the user wants
> to suspend to ram after writing the image: it would be necessary to jump
> back to the old kernel.  This would effectively be handled in the same
> way as a resume, except that the copying back of memory would be
> avoided.  Presumably the image writing kernel would have devices in
> approximately the same state as the image loading kernel, and so the old
> kernel needs to be prepared to receive the devices in that state anyway.

Please see above.  I don't think that would be easy to arrange for.

> >> The advantages of this approach include:
> >> 
> >> - having a completely functional system (with a completely functional
> >> userspace) from which the image is written, without having to worry
> >> about messing up the state that is being saved (hell, the user could
> >> even do it via an interactive 

Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 15:41:48 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Fri, 1 Jun 2007, Andrew Morton wrote:
> 
> > > I should make SLUB put poisoning values in unused areas of a kmalloced 
> > > object?
> > 
> > hm, I hadn't thought of it that way actually.  I was thinking it was
> > specific to kmalloc(0) but as you point out, the situation is
> > generalisable.
> 
> Right it could catch a lot of other bugs as well.
> 
> > Yes, if someone does kmalloc(42) and we satisfy the allocation from the
> > size-64 slab, we should poison and then check the allegedly-unused 22
> > bytes.
> > 
> > Please ;)
> > 
> > (vaguely stunned that we didn't think of doing this years ago).
> 
> Well there are architectural problems. We determine the power of two slab 
> at compile time. The object size information is currently not available in 
> the binary :=).
>  
> > It'll be a large patch, I expect?
> 
> Ummm... Yes. We need to switch off the compile time power of two slab 
> calculation. Then I need to have some way of storing the object size in 
> the metainformation of each object. Changes a lot of function calls.

Oh well.  Don't lose any sleep over it ;)



We could store the size of the allocation in the allocated object?  Just
add four bytes to the user's request, then pick the appropriate cache based
on that, then put the user's `size' at the tail of the resulting allocation?

So a kmalloc(62) would get upped to 66, so we allocate from size-128
and put the number 62 at bytes 124-127 and we poison bytes 62-123?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] V4L: stk11xx, add a new webcam driver

2007-06-01 Thread Mauro Carvalho Chehab

> >> + * Copyright (C) Nicolas VIVIEN
> > 
> > It would be interesting to have Nicolas SOB as well, if possible.
> 
> I don't think he ever knows about this version of the driver. I got his GPL
> driver, cleaned up -- coding style, v4l1 and v4l2 ioctl conversion to v4l2
> functions, some bug fixes and so on... If you still want him to sign this
> of, I'll try my best to catch him but can't guarantee any results.

It would be nice. I can accept it without his ack, but it would be
better to have it, if possible.
> 
> >> +
> >> +#ifndef CONFIG_STK11XX_DEBUG_STREAM
> >> +#define CONFIG_STK11XX_DEBUG_STREAM   0
> >> +#endif
> >> +
> >> +#if CONFIG_STK11XX_DEBUG_STREAM
> > 
> > I would instead use:
> > #ifdef CONFIG_STK11XX_DEBUG_STREAM
> 
> Hmm, no, I would rather get rid of CONFIG_ thing, it may make things
> unclear, beacuse there is (will be) no option in Kconfig for this, because
> this is the most verbose option for the driver mainly used for algorithms
> debugging.
Seems ok to me.

> > We don't do format conversions in kernel. Instead, you should return a
> > proper Bayer Fourcc format (like V4L2_PIX_FMT_SBGGR8).
> 
> 
> Ok, there is a debate about this, I will do the changes after some decision
> will be made.

As you wish.

> > Please use instead the load_firmware routines. It is not a good idea to
> > have firmware inside the kernel. Also, this might rise some legal issues
> > due to licensing models.
> 
> Markus wrote:
> 
> Jiri, are you allowed to include that microcode, did you get any
> information about this from the manufacturer which could allow the
> inclusion?
> The sequences are rather small not putting it into extra firmware
> files would make life much easier for some users, on the other side if
> it raises legal issues Mauro's right with loading it from a file
> 
> 
> This seems to be a reverse engineered driver, I think, all those values are
> intercepted, so there are no licensing issues.

Are those a code, or just another internal driver configuration (for
example, maybe some register initialization inside the sensors)? We
should take care to avoid adding material here that can be later
complained.
> > 
> > Instead of using all those write, you should consider creating a table
> > of values and use something like:
> > stk11xx_write_regs(dev, table1);
> 
> There is a problem with this approach. There are reads every 3-5 writes and
> this can grow into many small tables.

Maybe you can do this then just for the bigger tables.

> > You may also consider writing a separate c file for stk1135. Having a
> > large .c file is not very nice. The better is to split the code into a
> > few parts.
> 
> I don't like many files for one driver and finding little pieces of code
> in each file separately -- 1125 + 1235 will be small pieces. Not considering
> the static functions and warning about unused code. But it's up to you, it's
> your subtree, make a decision.

Your driver have about 3600 lines. We target to keep newer files with a
maximum of about 1000 lines on the same file (unfortunately, some
drivers are bigger than that), separating the driver into logical
pieces. I think it would be interesting to split it into two or tree
files.

> 
> >> +static void *stk11xx_rvmalloc(unsigned long size)
> > 
> > Another rvmalloc implementation? You should consider using the one
> > already at kernel.
> 
> What's the name, I can't find it?

There are some rvmalloc on cpia, cpia2, em28xx, ... 

What the current drivers are doing is to replace it to vmalloc_32:

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,15)
if ((buff = rvmalloc(dev->num_frames * imagesize))) {
#else
if ((buff = vmalloc_32(dev->num_frames * imagesize))) {
#endif

> 
> The rest of comments has been applied, thanks,

You're welcome.

-- 
Cheers,
Mauro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning

2007-06-01 Thread Linus Torvalds


On Fri, 1 Jun 2007, Christoph Lameter wrote:
> 
> Right it could catch a lot of other bugs as well.

I'd actually prefer "malloc(0)" to _not_ return NULL, but some known 
(non-NULL) bogus pointer.

Why?

Because it's quite sane to have simple logic like

ptr = malloc(size);
if (!ptr)
return -ENOMEM;

and writing it as

if (size && !ptr)
return -ENOMEM;

is just annoying.

Also, NULL is _special_. There are absolutely tons of code in the kernel 
(and elsewhere) that just does something *different* from NULL pointers, 
and that totally breaks the whole notion of "you can allocate a zero-sized 
allocation, you just must not dereference it". If people special-case 
NULL as something else, they won't even go through the normal code-path.

So for *both* of the above reasons, it's actually stupid to return NULL 
for a zero-sized allocation. It would be much better to return another 
pointer that will trap on access. A good candidate might be to return

#define BADPTR ((void *)16)

which is a portable-enough (where "portable-enough" is "against strict 
ANSI C rules, but works in practice on all architectures") way to return 
something that will cause the same page fault behaviour as NULL, but will 
_not_ trigger the "NULL is special" code.

(Of course, you then need to teach "kfree()" to accept it as another 
pointer to be ignored, that's fine).

I bet you'd find *more* problems that way than by returning NULL, and 
you'd also avoid the whole problem with "if (!ptr) return -ENOMEM".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-06-01 Thread Kyle McMartin
On Fri, Jun 01, 2007 at 03:17:03PM -0400, Byron Stanoszek wrote:
> These are octal values, so you really want to use 01000 instead of
> 0800. :-)
> 

Wow. I am totally a dumbass, I saw a 'x' there. Sigh.

diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
index 317851f..7089507 100644
--- a/include/asm-parisc/fcntl.h
+++ b/include/asm-parisc/fcntl.h
@@ -14,6 +14,7 @@
 #define O_DSYNC0100 /* HPUX only */
 #define O_RSYNC0200 /* HPUX only */
 #define O_NOATIME  0400
+#define O_CLOEXEC  01000   /* set close_on_exec */
 
 #define O_DIRECTORY0001 /* must be a directory */
 #define O_NOFOLLOW 0200 /* don't follow links */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/3] 2.6.22-rc3: known regressions with patches v2

2007-06-01 Thread Antonino Daplas

On 6/1/07, Michal Piotrowski <[EMAIL PROTECTED]> wrote:

Hi all,

Subject: tty-related oops in latest kernel(s)
References : http://lkml.org/lkml/2007/5/27/104
Submitter  : Tero Roponen <[EMAIL PROTECTED]>
Handled-By : Antonino A. Daplas <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/5/31/35
Status : patch available


It's not actually a regression, but a long-standing, undetected bug
exposed by slub.

Anyway, the patch is already in Linus' tree.  You can remove this from
your list.

Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:47:35PM -0700, Andrew Morton wrote:
> Right.  I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top
> of Nick's stuff.  Then I repulled your tree and lost it all.  This is
> because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a
> separate fix-rejects-in-git-ocfs2.patch.
> 
> This is all unique-to-akpm stuff which you don't need to worry about ;)

Ok, I am no longer concerned! :)


> > So, which of Nick's patches are we talking about here?
> > 
> > Btw, I know you tend to handle rejects yourself, but if it's a major PITA
> > I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
> > trouble there :)
> 
> Is OK - I'll move Nick's patches back to behind the git trees and it'll all 
> come
> good.

Phew ok. Once again, thanks for all the work you do getting the ocfs2 git
patches into -mm.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >