Re: crash in kmem_cache_init

2008-01-23 Thread Mel Gorman
On (23/01/08 13:14), Olaf Hering didst pronounce:
> On Wed, Jan 23, Mel Gorman wrote:
> 
> > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of 
> > the
> > following patch against 2.6.24-rc8 please? It contains the debug information
> > that helped me figure out what was going wrong on the PPC64 machine here,
> > the revert and the !l3 checks (i.e. the two patches that made machines I
> > have access to work). Thanks
> 
> It boots with your change.
> 

... Nice one! As the only addition here is debugging output, I can
only assume that the two patches were being booted in isolation instead
of combination earlier. The two threads have been a little confused with
hand waving so that can easily happen.

Looking at your log;

> early_node_map[1] active PFN ranges
> 1:0 ->   892928

All memory on node 1

> Online nodes
> o 0
> o 1
> Nodes with regular memory
> o 1
> Current running CPU 0 is associated with node 0
> Current node is 0

Running CPU associated with node 0 so other than being node 1 instead of
node 2, your machine is similar to the one I had the problem on in terms
of memoryless nodes and CPU configuration.

> VFS: Cannot open root device "" or unknown-block(0,0)
> Please append a correct "root=" boot option; here are the available 
> partitions:
> Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
> Rebooting in 1 seconds..
> 

I see it failed to complete boot but I'm going to assume this is a relatively
normal commane-line, .config or initrd problem and not a regression of
some type.

I'll post a patch suitable for pick-up shortly. The two patches ran in
combination with CONFIG_DEBUG_SLAB a compile-based stress tests without
difficulty so hopefully there is not new surprises hiding in the corners.

Thanks Olaf.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Olaf Hering wrote:

> On Wed, Jan 23, Mel Gorman wrote:
> 
> > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of 
> > the
> > following patch against 2.6.24-rc8 please? It contains the debug information
> > that helped me figure out what was going wrong on the PPC64 machine here,
> > the revert and the !l3 checks (i.e. the two patches that made machines I
> > have access to work). Thanks
> 
> It boots with your change.

This version of the patch boots ok for me:
Maybe I made a mistake with earlier patches, no idea.

---
 mm/slab.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(_cache, _list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  _list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep->nodelists[node] = _list3[index + node];
cachep->nodelists[node]->next_reap = jiffies +
REAPTIMEOUT_LIST3 +
@@ -2099,7 +2099,7 @@ static int __init_refok setup_cpu_cache(
g_cpucache_up = PARTIAL_L3;
} else {
int node;
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep->nodelists[node] =
kmalloc_node(sizeof(struct kmem_list3),
GFP_KERNEL, node);
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(>list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3824,7 @@ static int alloc_kmemlist(struct kmem_ca
struct array_cache *new_shared;
struct array_cache **new_alien = NULL;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
 
 if (use_alien_caches) {
 new_alien = alloc_alien_cache(node, cachep->limit);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Mel Gorman wrote:

> Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
> following patch against 2.6.24-rc8 please? It contains the debug information
> that helped me figure out what was going wrong on the PPC64 machine here,
> the revert and the !l3 checks (i.e. the two patches that made machines I
> have access to work). Thanks

It boots with your change.


boot: x
Please wait, loading kernel...
Allocated 00a0 bytes for kernel @ 0020
   Elf64 kernel loaded...
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: debug xmon=on panic=1 loglevel=8 
memory layout at init:
  alloc_bottom : 00ac1000
  alloc_top: 1000
  alloc_top_hi : da00
  rmo_top  : 1000
  ram_top  : da00
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED], opening ... done
instantiating rtas at 0x0f6a1000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00cc2000 -> 0x00cc34e4
Device tree struct  0x00cc4000 -> 0x00cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #52 SMP Wed Jan 23 13:05:38 CET 2008
-
ppc64_pft_size= 0x1c
physicalMemorySize= 0xda00
htab_hash_mask= 0x1f
-
Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #52 SMP Wed Jan 23 13:05:38 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->   892928
  Normal 892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Online nodes
o 0
o 1
Nodes with regular memory
o 1
Current running CPU 0 is associated with node 0
Current node is 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 1
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 2
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 3
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 4
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 5
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 6
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 7
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 8
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 9
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 10
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 11
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 12
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 13
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 14
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 15
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 16
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 17
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 18
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 19
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 20
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 21
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 22
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 23
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 24
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 25
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 26
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 27
 o 

Re: crash in kmem_cache_init

2008-01-23 Thread Mel Gorman
On (23/01/08 08:58), Olaf Hering didst pronounce:
> On Tue, Jan 22, Christoph Lameter wrote:
> 
> > > 0xc00fe018 is in setup_cpu_cache 
> > > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> > > 2106BUG_ON(!cachep->nodelists[node]);
> > > 2107
> > > kmem_list3_init(cachep->nodelists[node]);
> > > 2108}
> > > 2109}
> > > 2110}
> > 
> > if (cachep->nodelists[numa_node_id()])
> > return;
> 
> Does not help.
> 

Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
following patch against 2.6.24-rc8 please? It contains the debug information
that helped me figure out what was going wrong on the PPC64 machine here,
the revert and the !l3 checks (i.e. the two patches that made machines I
have access to work). Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-015_debug_slab/mm/slab.c
--- linux-2.6.24-rc8-clean/mm/slab.c2008-01-16 04:22:48.0 +
+++ linux-2.6.24-rc8-015_debug_slab/mm/slab.c   2008-01-23 10:44:36.0 
+
@@ -348,6 +348,7 @@ static int slab_early_init = 1;
 
 static void kmem_list3_init(struct kmem_list3 *parent)
 {
+   printk(" o kmem_list3_init\n");
INIT_LIST_HEAD(>slabs_full);
INIT_LIST_HEAD(>slabs_partial);
INIT_LIST_HEAD(>slabs_free);
@@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long 
 * kmem_list3 and not this cpu's kmem_list3
 */
 
+   printk("cpuup_prepare %ld\n", cpu);
list_for_each_entry(cachep, _chain, next) {
/*
 * Set up the size64 kmemlist for cpu before we can
@@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long 
 * node has not already allocated this
 */
if (!cachep->nodelists[node]) {
+   printk(" o allocing %s %d\n", cachep->name, node);
l3 = kmalloc_node(memsize, GFP_KERNEL, node);
if (!l3)
goto bad;
@@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long 
 * protection here.
 */
cachep->nodelists[node] = l3;
+   printk(" o l3 setup\n");
}
 
spin_lock_irq(>nodelists[node]->list_lock);
@@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long 
}
return 0;
 bad:
+   printk(" o bad\n");
cpuup_canceled(cpu);
return -ENOMEM;
 }
@@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache 
spin_lock_init(>list_lock);
 
MAKE_ALL_LISTS(cachep, ptr, nodeid);
+   printk("init_list RESETTING %s node %d\n", cachep->name, nodeid);
cachep->nodelists[nodeid] = ptr;
local_irq_enable();
 }
@@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void)
numa_platform = 0;
}
 
+   printk("Online nodes\n");
+   for_each_online_node(node)
+   printk("o %d\n", node);
+   printk("Nodes with regular memory\n");
+   for_each_node_state(node, N_NORMAL_MEMORY)
+   printk("o %d\n", node);
+   printk("Current running CPU %d is associated with node %d\n",
+   smp_processor_id(),
+   cpu_to_node(smp_processor_id()));
+   printk("Current node is %d\n",
+   numa_node_id());
+
for (i = 0; i < NUM_INIT_LISTS; i++) {
kmem_list3_init(_list3[i]);
if (i < MAX_NUMNODES)
cache_cache.nodelists[i] = NULL;
+   printk("kmem_cache_init Setting %s NULL %d\n", 
cache_cache.name, i);
}
 
/*
@@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void)
cache_cache.colour_off = cache_line_size();
cache_cache.array[smp_processor_id()] = _cache.cache;
cache_cache.nodelists[node] = _list3[CACHE_CACHE];
+   printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node);
+   printk("kmem_cache_init Setting %s initkmem_list3 %d\n", 
cache_cache.name, node);
 
/*
 * struct kmem_cache size depends on nr_node_ids, which
@@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(_cache, _list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  _list3[SIZE_AC + nid], nid);
 
@@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   printk("set_up_list3s %s index %d\n", cachep->name, index);
+   for_each_online_node(node) {
   

Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Pekka Enberg wrote:

> Hi Christoph,
> 
> On Jan 23, 2008 1:18 AM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > My patch is useless (fascinating history of the changelog there through).
> > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
> > alloc_pages_node() will try to allocate on the current node but fallback
> > to neighboring node if nothing is there
> 
> Sure, but I was referring to the scenario where current node _has_
> pages available but no ->nodelists. Olaf, did you try it?

Does not help.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Pekka Enberg
Hi Christoph,

On Jan 23, 2008 1:18 AM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> My patch is useless (fascinating history of the changelog there through).
> fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
> alloc_pages_node() will try to allocate on the current node but fallback
> to neighboring node if nothing is there

Sure, but I was referring to the scenario where current node _has_
pages available but no ->nodelists. Olaf, did you try it?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Pekka Enberg
Hi Christoph,

On Jan 23, 2008 1:18 AM, Christoph Lameter [EMAIL PROTECTED] wrote:
 My patch is useless (fascinating history of the changelog there through).
 fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
 alloc_pages_node() will try to allocate on the current node but fallback
 to neighboring node if nothing is there

Sure, but I was referring to the scenario where current node _has_
pages available but no -nodelists. Olaf, did you try it?

Pekka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Pekka Enberg wrote:

 Hi Christoph,
 
 On Jan 23, 2008 1:18 AM, Christoph Lameter [EMAIL PROTECTED] wrote:
  My patch is useless (fascinating history of the changelog there through).
  fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
  alloc_pages_node() will try to allocate on the current node but fallback
  to neighboring node if nothing is there
 
 Sure, but I was referring to the scenario where current node _has_
 pages available but no -nodelists. Olaf, did you try it?

Does not help.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Mel Gorman
On (23/01/08 08:58), Olaf Hering didst pronounce:
 On Tue, Jan 22, Christoph Lameter wrote:
 
   0xc00fe018 is in setup_cpu_cache 
   (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
   2106BUG_ON(!cachep-nodelists[node]);
   2107
   kmem_list3_init(cachep-nodelists[node]);
   2108}
   2109}
   2110}
  
  if (cachep-nodelists[numa_node_id()])
  return;
 
 Does not help.
 

Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
following patch against 2.6.24-rc8 please? It contains the debug information
that helped me figure out what was going wrong on the PPC64 machine here,
the revert and the !l3 checks (i.e. the two patches that made machines I
have access to work). Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-015_debug_slab/mm/slab.c
--- linux-2.6.24-rc8-clean/mm/slab.c2008-01-16 04:22:48.0 +
+++ linux-2.6.24-rc8-015_debug_slab/mm/slab.c   2008-01-23 10:44:36.0 
+
@@ -348,6 +348,7 @@ static int slab_early_init = 1;
 
 static void kmem_list3_init(struct kmem_list3 *parent)
 {
+   printk( o kmem_list3_init\n);
INIT_LIST_HEAD(parent-slabs_full);
INIT_LIST_HEAD(parent-slabs_partial);
INIT_LIST_HEAD(parent-slabs_free);
@@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long 
 * kmem_list3 and not this cpu's kmem_list3
 */
 
+   printk(cpuup_prepare %ld\n, cpu);
list_for_each_entry(cachep, cache_chain, next) {
/*
 * Set up the size64 kmemlist for cpu before we can
@@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long 
 * node has not already allocated this
 */
if (!cachep-nodelists[node]) {
+   printk( o allocing %s %d\n, cachep-name, node);
l3 = kmalloc_node(memsize, GFP_KERNEL, node);
if (!l3)
goto bad;
@@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long 
 * protection here.
 */
cachep-nodelists[node] = l3;
+   printk( o l3 setup\n);
}
 
spin_lock_irq(cachep-nodelists[node]-list_lock);
@@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long 
}
return 0;
 bad:
+   printk( o bad\n);
cpuup_canceled(cpu);
return -ENOMEM;
 }
@@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache 
spin_lock_init(ptr-list_lock);
 
MAKE_ALL_LISTS(cachep, ptr, nodeid);
+   printk(init_list RESETTING %s node %d\n, cachep-name, nodeid);
cachep-nodelists[nodeid] = ptr;
local_irq_enable();
 }
@@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void)
numa_platform = 0;
}
 
+   printk(Online nodes\n);
+   for_each_online_node(node)
+   printk(o %d\n, node);
+   printk(Nodes with regular memory\n);
+   for_each_node_state(node, N_NORMAL_MEMORY)
+   printk(o %d\n, node);
+   printk(Current running CPU %d is associated with node %d\n,
+   smp_processor_id(),
+   cpu_to_node(smp_processor_id()));
+   printk(Current node is %d\n,
+   numa_node_id());
+
for (i = 0; i  NUM_INIT_LISTS; i++) {
kmem_list3_init(initkmem_list3[i]);
if (i  MAX_NUMNODES)
cache_cache.nodelists[i] = NULL;
+   printk(kmem_cache_init Setting %s NULL %d\n, 
cache_cache.name, i);
}
 
/*
@@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void)
cache_cache.colour_off = cache_line_size();
cache_cache.array[smp_processor_id()] = initarray_cache.cache;
cache_cache.nodelists[node] = initkmem_list3[CACHE_CACHE];
+   printk(kmem_cache_init Setting %s NULL %d\n, cache_cache.name, node);
+   printk(kmem_cache_init Setting %s initkmem_list3 %d\n, 
cache_cache.name, node);
 
/*
 * struct kmem_cache size depends on nr_node_ids, which
@@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(cache_cache, initkmem_list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   printk(set_up_list3s %s index %d\n, cachep-name, index);
+   for_each_online_node(node) {

Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Mel Gorman wrote:

 Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
 following patch against 2.6.24-rc8 please? It contains the debug information
 that helped me figure out what was going wrong on the PPC64 machine here,
 the revert and the !l3 checks (i.e. the two patches that made machines I
 have access to work). Thanks

It boots with your change.


boot: x
Please wait, loading kernel...
Allocated 00a0 bytes for kernel @ 0020
   Elf64 kernel loaded...
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: debug xmon=on panic=1 loglevel=8 
memory layout at init:
  alloc_bottom : 00ac1000
  alloc_top: 1000
  alloc_top_hi : da00
  rmo_top  : 1000
  ram_top  : da00
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED], opening ... done
instantiating rtas at 0x0f6a1000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00cc2000 - 0x00cc34e4
Device tree struct  0x00cc4000 - 0x00cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #52 SMP Wed Jan 23 13:05:38 CET 2008
-
ppc64_pft_size= 0x1c
physicalMemorySize= 0xda00
htab_hash_mask= 0x1f
-
Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #52 SMP Wed Jan 23 13:05:38 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -   892928
  Normal 892928 -   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 -   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Online nodes
o 0
o 1
Nodes with regular memory
o 1
Current running CPU 0 is associated with node 0
Current node is 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 1
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 2
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 3
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 4
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 5
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 6
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 7
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 8
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 9
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 10
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 11
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 12
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 13
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 14
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 15
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 16
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 17
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 18
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 19
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 20
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 21
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 22
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 23
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 24
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 25
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 26
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 27
 o kmem_list3_init

Re: crash in kmem_cache_init

2008-01-23 Thread Olaf Hering
On Wed, Jan 23, Olaf Hering wrote:

 On Wed, Jan 23, Mel Gorman wrote:
 
  Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of 
  the
  following patch against 2.6.24-rc8 please? It contains the debug information
  that helped me figure out what was going wrong on the PPC64 machine here,
  the revert and the !l3 checks (i.e. the two patches that made machines I
  have access to work). Thanks
 
 It boots with your change.

This version of the patch boots ok for me:
Maybe I made a mistake with earlier patches, no idea.

---
 mm/slab.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(cache_cache, initkmem_list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep-nodelists[node] = initkmem_list3[index + node];
cachep-nodelists[node]-next_reap = jiffies +
REAPTIMEOUT_LIST3 +
@@ -2099,7 +2099,7 @@ static int __init_refok setup_cpu_cache(
g_cpucache_up = PARTIAL_L3;
} else {
int node;
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep-nodelists[node] =
kmalloc_node(sizeof(struct kmem_list3),
GFP_KERNEL, node);
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(l3-list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3824,7 @@ static int alloc_kmemlist(struct kmem_ca
struct array_cache *new_shared;
struct array_cache **new_alien = NULL;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
 
 if (use_alien_caches) {
 new_alien = alloc_alien_cache(node, cachep-limit);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-23 Thread Mel Gorman
On (23/01/08 13:14), Olaf Hering didst pronounce:
 On Wed, Jan 23, Mel Gorman wrote:
 
  Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of 
  the
  following patch against 2.6.24-rc8 please? It contains the debug information
  that helped me figure out what was going wrong on the PPC64 machine here,
  the revert and the !l3 checks (i.e. the two patches that made machines I
  have access to work). Thanks
 
 It boots with your change.
 

... Nice one! As the only addition here is debugging output, I can
only assume that the two patches were being booted in isolation instead
of combination earlier. The two threads have been a little confused with
hand waving so that can easily happen.

Looking at your log;

 early_node_map[1] active PFN ranges
 1:0 -   892928

All memory on node 1

 Online nodes
 o 0
 o 1
 Nodes with regular memory
 o 1
 Current running CPU 0 is associated with node 0
 Current node is 0

Running CPU associated with node 0 so other than being node 1 instead of
node 2, your machine is similar to the one I had the problem on in terms
of memoryless nodes and CPU configuration.

 VFS: Cannot open root device NULL or unknown-block(0,0)
 Please append a correct root= boot option; here are the available 
 partitions:
 Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
 Rebooting in 1 seconds..
 

I see it failed to complete boot but I'm going to assume this is a relatively
normal commane-line, .config or initrd problem and not a regression of
some type.

I'll post a patch suitable for pick-up shortly. The two patches ran in
combination with CONFIG_DEBUG_SLAB a compile-based stress tests without
difficulty so hopefully there is not new surprises hiding in the corners.

Thanks Olaf.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Olaf Hering
On Tue, Jan 22, Christoph Lameter wrote:

> > 0xc00fe018 is in setup_cpu_cache 
> > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> > 2106BUG_ON(!cachep->nodelists[node]);
> > 2107
> > kmem_list3_init(cachep->nodelists[node]);
> > 2108}
> > 2109}
> > 2110}
> 
> if (cachep->nodelists[numa_node_id()])
>   return;

Does not help.


Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #48 SMP Wed Jan 23 08:54:23 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->   892928
  Normal 892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Kernel panic - not syncing: kmem_cache_create(): failed to create slab 
`size-32(DMA)'

Rebooting in 1 seconds..

---
 mm/slab.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(_cache, _list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  _list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep->nodelists[node] = _list3[index + node];
cachep->nodelists[node]->next_reap = jiffies +
REAPTIMEOUT_LIST3 +
@@ -2108,6 +2108,8 @@ static int __init_refok setup_cpu_cache(
}
}
}
+   if (!cachep->nodelists[numa_node_id()])
+   return -ENODEV;
cachep->nodelists[numa_node_id()]->next_reap =
jiffies + REAPTIMEOUT_LIST3 +
((unsigned long)cachep) % REAPTIMEOUT_LIST3;
@@ -2775,6 +2777,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(>list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3324,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3826,7 @@ static int alloc_kmemlist(struct kmem_ca
struct array_cache *new_shared;
struct array_cache **new_alien = NULL;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
 
 if (use_alien_caches) {
 new_alien = alloc_alien_cache(node, cachep->limit);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Christoph Lameter wrote:

> But I doubt that this is it. The fallback logic was added later and it 
> worked fine.

My patch is useless (fascinating history of the changelog there through). 
fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that 
alloc_pages_node() will try to allocate on the current node but fallback 
to neighboring node if nothing is there


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

> Rather it should be 2. I'll admit the physical setup of this machine is
>  less than ideal but clearly it's something that can happen even if
> it's a bad idea.

Ok. Lets hope that Pekka's find does the trick. But this would mean that 
fallback gets memory from node 2 for the page allocator. Then fallback 
alloc is going to try to insert it into the l3 of node 2 which is not 
there yet. So another ooops. Sigh.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Wed, 23 Jan 2008, Pekka Enberg wrote:

> When we call fallback_alloc() because the current node has ->nodelists set to
> NULL, we end up calling kmem_getpages() with -1 as the node id which is then
> translated to numa_node_id() by alloc_pages_node. But the reason we called
> fallback_alloc() in the first place is because numa_node_id() doesn't have a
> ->nodelist which makes cache_grow() oops.

Right, if nodeid == -1 then we need to call alloc_pages... 
Essentiall a revert of 50c85a19e7b3928b5b5188524c44ffcbacdd4e35 from 2005.

But I doubt that this is it. The fallback logic was added later and it 
worked fine.


---
 mm/slab.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c2008-01-22 15:05:26.185452369 -0800
+++ linux-2.6/mm/slab.c 2008-01-22 15:05:59.301637009 -0800
@@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
flags |= __GFP_RECLAIMABLE;
 
-   page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+   if (nodeid == -1)
+   page = alloc_pages(flags, cachep->gfporder);
+   else
+   page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+
if (!page)
return NULL;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 14:57), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > > > Whatever this was a problem fixed in the past or not, it's broken again 
> > > > now
> > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped 
> > > > early
> > > > at boot-time that would also fix this problem in a way that doesn't
> > > > affect runtime (like altering cache_grow in my patch does).
> > > 
> > > The dropping of GFP_THISNODE has the same effect as your patch. 
> > 
> > The dropping of it totally? If so, this patch might fix a boot but it'll
> > potentially be a performance regression on NUMA machines that only have
> > nodes with memory, right?
> 
> No the dropping during early allocations.,
> 

We can live with that if the machine otherwise survives during tests.
They are kicked off at the moment with CONFIG_SLAB_DEBUG set but the point
is moot if the patch doesn't work for Olaf. Am still waiting to hear if
the two patches in combination work for him.

> > o 0
> > o 2
> > Nodes with regular memory
> > o 2
> > Current running CPU 0 is associated with node 0
> > Current node is 0
> > 
> > So node 2 has regular memory but it's trying to use node 0 at a glance.
> > I've attached the patch I used against 2.6.24-rc8. It includes the revert.
> 
> We need the current processor to be attached to a node that has 
> memory. We cannot fall back that early because the structures for the 
> other nodes do not exist yet.
> 

Or bodge it early in the boot process so that a node with memory is
always used.

> > Online nodes
> > o 0
> > o 2
> > Nodes with regular memory
> > o 2
> > Current running CPU 0 is associated with node 0
> > Current node is 0
> >  o kmem_list3_init
> 
> This needs to be node 2.
> 

Rather it should be 2. I'll admit the physical setup of this machine is
 less than ideal but clearly it's something that can happen even if
it's a bad idea.

> > [c05c3b40] c00dadec .cache_grow+0x7c/0x338
> > [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224
> 
> Fallback during bootstrap.
> 

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Pekka Enberg

Hi,

Mel Gorman wrote:

Faulting instruction address: 0xc03c8c00
cpu 0x0: Vector: 300 (Data Access) at [c05c3840]
pc: c03c8c00: __lock_text_start+0x20/0x88
lr: c00dadec: .cache_grow+0x7c/0x338
sp: c05c3ac0
   msr: 80009032
   dar: 40
 dsisr: 4000
  current = 0xc0500f10
  paca= 0xc0501b80
pid   = 0, comm = swapper
enter ? for help
[c05c3b40] c00dadec .cache_grow+0x7c/0x338
[c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224
[c05c3cb0] c00db958 .kmem_cache_alloc+0xe0/0x14c
[c05c3d50] c00d .kmem_cache_create+0x230/0x4cc
[c05c3e30] c04c05f4 .kmem_cache_init+0x310/0x640
[c05c3ee0] c049f8d8 .start_kernel+0x304/0x3fc
[c05c3f90] c0008594 .start_here_common+0x54/0xc0
0:mon>


I mentioned this already but received no response (maybe I am missing 
something totally obvious here):


When we call fallback_alloc() because the current node has ->nodelists 
set to NULL, we end up calling kmem_getpages() with -1 as the node id 
which is then translated to numa_node_id() by alloc_pages_node. But the 
reason we called fallback_alloc() in the first place is because 
numa_node_id() doesn't have a ->nodelist which makes cache_grow() oops.


Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

> > > Whatever this was a problem fixed in the past or not, it's broken again 
> > > now
> > > :( . It's possible that there is a __GFP_THISNODE that can be dropped 
> > > early
> > > at boot-time that would also fix this problem in a way that doesn't
> > > affect runtime (like altering cache_grow in my patch does).
> > 
> > The dropping of GFP_THISNODE has the same effect as your patch. 
> 
> The dropping of it totally? If so, this patch might fix a boot but it'll
> potentially be a performance regression on NUMA machines that only have
> nodes with memory, right?

No the dropping during early allocations.,

> o 0
> o 2
> Nodes with regular memory
> o 2
> Current running CPU 0 is associated with node 0
> Current node is 0
> 
> So node 2 has regular memory but it's trying to use node 0 at a glance.
> I've attached the patch I used against 2.6.24-rc8. It includes the revert.

We need the current processor to be attached to a node that has 
memory. We cannot fall back that early because the structures for the 
other nodes do not exist yet.

> Online nodes
> o 0
> o 2
> Nodes with regular memory
> o 2
> Current running CPU 0 is associated with node 0
> Current node is 0
>  o kmem_list3_init

This needs to be node 2.

> [c05c3b40] c00dadec .cache_grow+0x7c/0x338
> [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224

Fallback during bootstrap.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 13:34), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > > After you reverted the slab memoryless node patch there should be per 
> > > node 
> > > structures created for node 0 unless the node is marked offline. Is it? 
> > > If 
> > > so then you are booting a cpu that is associated with an offline node. 
> > > 
> > 
> > I'll roll a patch that prints out the online states before startup and
> > see what it looks like.
> 
> Ok. Great.
> 

The dmesg output is below.


> > 
> > > > Can you see a better solution than this?
> > > 
> > > Well this means that bootstrap will work by introducing foreign objects 
> > > into the per cpu queue (should only hold per cpu objects). They will 
> > > later be consumed and then the queues will contain the right objects so 
> > > the effect of the patch is minimal.
> > > 
> > 
> > By minimal, do you mean that you expect it to break in some other
> > respect later or minimal as in "this is bad but should not have no
> > adverse impact".
> 
> Should not have any adverse impact after the objects from the cpu queue 
> have been consumed. If the cache_reaper tries to shift objects back 
> from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
> you run the tests with full debugging please.
> 

I am not running a full range of tests at the moment. Just getting boot
first. I'll queue up a range of tests to run with DEBUG on now but it'll
be the morning before I have the results.

> > Whatever this was a problem fixed in the past or not, it's broken again now
> > :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> > at boot-time that would also fix this problem in a way that doesn't
> > affect runtime (like altering cache_grow in my patch does).
> 
> The dropping of GFP_THISNODE has the same effect as your patch. 

The dropping of it totally? If so, this patch might fix a boot but it'll
potentially be a performance regression on NUMA machines that only have
nodes with memory, right?

> Objects from another node get into the per cpu queue. And on free we 
> assume that per cpu queue objects are from the local node. If debug is on 
> then we check that with BUG_ONs.
> 

The interesting parts of the dmesg output are

Online nodes
o 0
o 2
Nodes with regular memory
o 2
Current running CPU 0 is associated with node 0
Current node is 0

So node 2 has regular memory but it's trying to use node 0 at a glance.
I've attached the patch I used against 2.6.24-rc8. It includes the revert.

Here is the full output


Please wait, loading kernel...
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 0240, size: 1192 Kbytes
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 
loglevel=8 
memory layout at init:
  alloc_bottom : 0252a000
  alloc_top: 0800
  alloc_top_hi : 0001
  rmo_top  : 0800
  ram_top  : 0001
Looking for displays
instantiating rtas at 0x077d9000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0262b000 -> 0x0262c1d3
Device tree struct  0x0262d000 -> 0x02635000
Calling quiesce ...
returning from prom_init
Partition configured for 4 cpus.
Starting Linux PPC64 #1 SMP Tue Jan 22 17:15:48 EST 2008
-
ppc64_pft_size= 0x1a
physicalMemorySize= 0x1
htab_hash_mask= 0x7
-
Linux version 2.6.24-rc8-autokern1 ([EMAIL PROTECTED]) (gcc version 3.4.6 
20060404 (Red Hat 3.4.6-3)) #1 SMP Tue Jan 22 17:15:48 EST 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA 0 ->  1048576
  Normal1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
2:0 ->  1048576
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 1034240
Policy zone: DMA
Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 
ABAT:1201041303 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 238.059000 MHz
time_init: processor frequency   = 1904.472000 MHz
clocksource: timebase mult[10cd746] shift[22] registered
clockevent: decrementer mult[3cf1] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg0] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 

Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Olaf Hering wrote:

> It crashes now in a different way if the patch below is applied:

Yup no l3 structure for the current node. We are early in boostrap. You 
could just check if the l3 is there and if not just skip starting the 
reaper? This will be redone later anyways. Not sure if this will solve all 
your issues though. An l3 for the current node that we are booting on 
needs to be created early on for SLAB bootstrap to succeed. AFAICT SLUB 
doesnt care and simply uses whatever the page allocator gives it for the 
cpu slab. We may have gotten there because you only tested with SLUB 
recently and thus changes got in that broke SLAB boot assumptions.


> 0xc00fe018 is in setup_cpu_cache 
> (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> 2106BUG_ON(!cachep->nodelists[node]);
> 2107
> kmem_list3_init(cachep->nodelists[node]);
> 2108}
> 2109}
> 2110}

if (cachep->nodelists[numa_node_id()])
return;

> 2111cachep->nodelists[numa_node_id()]->next_reap =
> 2112jiffies + REAPTIMEOUT_LIST3 +
> 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> 2114
> 2115cpu_cache_get(cachep)->avail = 0;
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Nish Aravamudan
On 1/22/08, Olaf Hering <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 22, Mel Gorman wrote:
>
> > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
> > .. Can you please check on your machine if it fixes your problem?
>
> It does not fix or change the nature of the crash.
>
> > Olaf, please confirm whether you need the patch below as well as the
> > revert to make your machine boot.
>
> It crashes now in a different way if the patch below is applied:

Was this with the revert Mel mentioned applied as well? I get the
feeling both patches are needed to fix up the memoryless SLAB issue.

> Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 
> 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008



> early_node_map[1] active PFN ranges
> 1:0 ->   892928



> Unable to handle kernel paging request for data at address 0x0058
> Faulting instruction address: 0xc00fe018
> cpu 0x0: Vector: 300 (Data Access) at [c075bac0]
> pc: c00fe018: .setup_cpu_cache+0x184/0x1f4
> lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4
> sp: c075bd40
>msr: 80009032
>dar: 58
>  dsisr: 4200
>   current = 0xc0665a50
>   paca= 0xc0666380
> pid   = 0, comm = swapper
> enter ? for help
> [c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 
> (unreliable)
> [c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4
> [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
> [c075bf90] c0008590 .start_here_common+0x60/0xd0
> 0:mon>
>
> 0xc00fe018 is in setup_cpu_cache 
> (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> 2106BUG_ON(!cachep->nodelists[node]);
> 2107
> kmem_list3_init(cachep->nodelists[node]);

I might be barking up the wrong tree, but this block above is supposed
to set up the cachep->nodeslists[*] that are used immediately below.
But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or
whatever, you might get a bad access right below for node 0 that has
no memory, if that's the node we're running on...

> 2108}
> 2109}
> 2110}
> 2111cachep->nodelists[numa_node_id()]->next_reap =
> 2112jiffies + REAPTIMEOUT_LIST3 +
> 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> 2114
> 2115cpu_cache_get(cachep)->avail = 0;

Thanks,
Nish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Olaf Hering
On Tue, Jan 22, Mel Gorman wrote:

> http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
> .. Can you please check on your machine if it fixes your problem?

It does not fix or change the nature of the crash.

> Olaf, please confirm whether you need the patch below as well as the
> revert to make your machine boot.

It crashes now in a different way if the patch below is applied:

Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->   892928
  Normal 892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Unable to handle kernel paging request for data at address 0x0058
Faulting instruction address: 0xc00fe018
cpu 0x0: Vector: 300 (Data Access) at [c075bac0]
pc: c00fe018: .setup_cpu_cache+0x184/0x1f4
lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4
sp: c075bd40
   msr: 80009032
   dar: 58
 dsisr: 4200
  current = 0xc0665a50
  paca= 0xc0666380
pid   = 0, comm = swapper
enter ? for help
[c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 (unreliable)
[c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4
[c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
[c075bf90] c0008590 .start_here_common+0x60/0xd0
0:mon> 

0xc00fe018 is in setup_cpu_cache 
(/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
2106BUG_ON(!cachep->nodelists[node]);
2107
kmem_list3_init(cachep->nodelists[node]);
2108}
2109}
2110}
2111cachep->nodelists[numa_node_id()]->next_reap =
2112jiffies + REAPTIMEOUT_LIST3 +
2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
2114
2115cpu_cache_get(cachep)->avail = 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

> > After you reverted the slab memoryless node patch there should be per node 
> > structures created for node 0 unless the node is marked offline. Is it? If 
> > so then you are booting a cpu that is associated with an offline node. 
> > 
> 
> I'll roll a patch that prints out the online states before startup and
> see what it looks like.

Ok. Great.

> 
> > > Can you see a better solution than this?
> > 
> > Well this means that bootstrap will work by introducing foreign objects 
> > into the per cpu queue (should only hold per cpu objects). They will 
> > later be consumed and then the queues will contain the right objects so 
> > the effect of the patch is minimal.
> > 
> 
> By minimal, do you mean that you expect it to break in some other
> respect later or minimal as in "this is bad but should not have no
> adverse impact".

Should not have any adverse impact after the objects from the cpu queue 
have been consumed. If the cache_reaper tries to shift objects back 
from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
you run the tests with full debugging please.

> Whatever this was a problem fixed in the past or not, it's broken again now
> :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> at boot-time that would also fix this problem in a way that doesn't
> affect runtime (like altering cache_grow in my patch does).

The dropping of GFP_THISNODE has the same effect as your patch. 
Objects from another node get into the per cpu queue. And on free we 
assume that per cpu queue objects are from the local node. If debug is on 
then we check that with BUG_ONs.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 12:11), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > Christoph/Pekka, this patch is papering over the problem and something
> > more fundamental may be going wrong. The crash occurs because l3 is NULL
> > and the cache is kmem_cache so this is early in the boot process. It is
> > selecting l3 based on node 2 which is correct in terms of available memory
> > but it initialises the lists on node 0 because that is the node the CPUs are
> > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> > parts of the log for seeing the memoryless nodes in relation to CPUs is;
> 
> Would it be possible to run the bootstrap on a cpu that has a 
> node with memory associated to it?

Not in the way the machine is currently configured. All the CPUs appear to
be on a node with no memory. It's best to assume I cannot get the machine
reconfigured (which just hides the bug anyway). Physically, it's thousands
of miles away so I can't do the work. I can get lab support to do the job
but that will take a fair while and at the end of the day, it doesn't tell
us a lot. We know that other PPC64 machines work so it's not a general problem.

> I believe we had the same situation 
> last year when GFP_THISNODE was introduced?
> 

It feels vaguely familiar but I don't recall the details in sufficient detail
to recognise if this is the same problem or not.

> After you reverted the slab memoryless node patch there should be per node 
> structures created for node 0 unless the node is marked offline. Is it? If 
> so then you are booting a cpu that is associated with an offline node. 
> 

I'll roll a patch that prints out the online states before startup and
see what it looks like.

> > Can you see a better solution than this?
> 
> Well this means that bootstrap will work by introducing foreign objects 
> into the per cpu queue (should only hold per cpu objects). They will 
> later be consumed and then the queues will contain the right objects so 
> the effect of the patch is minimal.
> 

By minimal, do you mean that you expect it to break in some other
respect later or minimal as in "this is bad but should not have no
adverse impact".

> I thought we fixed the similar situation last year by dropping 
> GFP_THISNODE for some allocations?
> 

Whatever this was a problem fixed in the past or not, it's broken again now
:( . It's possible that there is a __GFP_THISNODE that can be dropped early
at boot-time that would also fix this problem in a way that doesn't
affect runtime (like altering cache_grow in my patch does).

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

> Christoph/Pekka, this patch is papering over the problem and something
> more fundamental may be going wrong. The crash occurs because l3 is NULL
> and the cache is kmem_cache so this is early in the boot process. It is
> selecting l3 based on node 2 which is correct in terms of available memory
> but it initialises the lists on node 0 because that is the node the CPUs are
> located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> parts of the log for seeing the memoryless nodes in relation to CPUs is;

Would it be possible to run the bootstrap on a cpu that has a 
node with memory associated to it? I believe we had the same situation 
last year when GFP_THISNODE was introduced?

After you reverted the slab memoryless node patch there should be per node 
structures created for node 0 unless the node is marked offline. Is it? If 
so then you are booting a cpu that is associated with an offline node. 

> Can you see a better solution than this?

Well this means that bootstrap will work by introducing foreign objects 
into the per cpu queue (should only hold per cpu objects). They will 
later be consumed and then the queues will contain the right objects so 
the effect of the patch is minimal.

I thought we fixed the similar situation last year by dropping 
GFP_THISNODE for some allocations?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (18/01/08 23:57), Olaf Hering didst pronounce:
> On Fri, Jan 18, Christoph Lameter wrote:
> 
> > Could you try this patch?
> 
> Does not help, same crash.
> 

Hi Olaf,

It was suggested this problem was the same as another slab-related boot problem
that was fixed for 2.6.24 by reverting a change. This fix can be found at
http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
. Can you please check on your machine if it fixes your problem?

I am 99.% it will *not* fix your problem because there was two bugs, not
one as previously believed. On two test machines here, this kmem_cache_init
problem still happens even with the revert which fixed a third machine. I
was delayed in testing because these boxen unavailable from Friday until
yesterday evening (a stellar display of timing). It was missed on TKO because
it was SLAB-specific and those machines were testing SLUB. I found that the
patch below was necessary to fix the problem.

Olaf, please confirm whether you need the patch below as well as the
revert to make your machine boot.

Christoph/Pekka, this patch is papering over the problem and something
more fundamental may be going wrong. The crash occurs because l3 is NULL
and the cache is kmem_cache so this is early in the boot process. It is
selecting l3 based on node 2 which is correct in terms of available memory
but it initialises the lists on node 0 because that is the node the CPUs are
located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
parts of the log for seeing the memoryless nodes in relation to CPUs is;

early_node_map[1] active PFN ranges
2:0 ->  1048576
Processor 1 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[2]
Processor 2 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 2 CPUs:

Can you see a better solution than this?


Recent changes to how slab operates mean a situation can occur on systems
with memoryless nodes whereby the nodeid used when growing the slab does
not map to the correct kmem_list3. The following patch adds the necessary
check to the indicated preferred nodeid and if it is bogus, use numa_node_id() 
instead.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>

--- 
 mm/slab.c |9 +
 1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 
linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c   2008-01-22 
17:46:32.0 +
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c2008-01-22 
18:42:53.0 +
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(>list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep->nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep->nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:


-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (18/01/08 23:57), Olaf Hering didst pronounce:
 On Fri, Jan 18, Christoph Lameter wrote:
 
  Could you try this patch?
 
 Does not help, same crash.
 

Hi Olaf,

It was suggested this problem was the same as another slab-related boot problem
that was fixed for 2.6.24 by reverting a change. This fix can be found at
http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
. Can you please check on your machine if it fixes your problem?

I am 99.% it will *not* fix your problem because there was two bugs, not
one as previously believed. On two test machines here, this kmem_cache_init
problem still happens even with the revert which fixed a third machine. I
was delayed in testing because these boxen unavailable from Friday until
yesterday evening (a stellar display of timing). It was missed on TKO because
it was SLAB-specific and those machines were testing SLUB. I found that the
patch below was necessary to fix the problem.

Olaf, please confirm whether you need the patch below as well as the
revert to make your machine boot.

Christoph/Pekka, this patch is papering over the problem and something
more fundamental may be going wrong. The crash occurs because l3 is NULL
and the cache is kmem_cache so this is early in the boot process. It is
selecting l3 based on node 2 which is correct in terms of available memory
but it initialises the lists on node 0 because that is the node the CPUs are
located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
parts of the log for seeing the memoryless nodes in relation to CPUs is;

early_node_map[1] active PFN ranges
2:0 -  1048576
Processor 1 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[2]
Processor 2 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 2 CPUs:

Can you see a better solution than this?


Recent changes to how slab operates mean a situation can occur on systems
with memoryless nodes whereby the nodeid used when growing the slab does
not map to the correct kmem_list3. The following patch adds the necessary
check to the indicated preferred nodeid and if it is bogus, use numa_node_id() 
instead.

Signed-off-by: Mel Gorman [EMAIL PROTECTED]

--- 
 mm/slab.c |9 +
 1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 
linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c   2008-01-22 
17:46:32.0 +
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c2008-01-22 
18:42:53.0 +
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(l3-list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:


-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

 Christoph/Pekka, this patch is papering over the problem and something
 more fundamental may be going wrong. The crash occurs because l3 is NULL
 and the cache is kmem_cache so this is early in the boot process. It is
 selecting l3 based on node 2 which is correct in terms of available memory
 but it initialises the lists on node 0 because that is the node the CPUs are
 located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
 parts of the log for seeing the memoryless nodes in relation to CPUs is;

Would it be possible to run the bootstrap on a cpu that has a 
node with memory associated to it? I believe we had the same situation 
last year when GFP_THISNODE was introduced?

After you reverted the slab memoryless node patch there should be per node 
structures created for node 0 unless the node is marked offline. Is it? If 
so then you are booting a cpu that is associated with an offline node. 

 Can you see a better solution than this?

Well this means that bootstrap will work by introducing foreign objects 
into the per cpu queue (should only hold per cpu objects). They will 
later be consumed and then the queues will contain the right objects so 
the effect of the patch is minimal.

I thought we fixed the similar situation last year by dropping 
GFP_THISNODE for some allocations?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 12:11), Christoph Lameter didst pronounce:
 On Tue, 22 Jan 2008, Mel Gorman wrote:
 
  Christoph/Pekka, this patch is papering over the problem and something
  more fundamental may be going wrong. The crash occurs because l3 is NULL
  and the cache is kmem_cache so this is early in the boot process. It is
  selecting l3 based on node 2 which is correct in terms of available memory
  but it initialises the lists on node 0 because that is the node the CPUs are
  located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
  parts of the log for seeing the memoryless nodes in relation to CPUs is;
 
 Would it be possible to run the bootstrap on a cpu that has a 
 node with memory associated to it?

Not in the way the machine is currently configured. All the CPUs appear to
be on a node with no memory. It's best to assume I cannot get the machine
reconfigured (which just hides the bug anyway). Physically, it's thousands
of miles away so I can't do the work. I can get lab support to do the job
but that will take a fair while and at the end of the day, it doesn't tell
us a lot. We know that other PPC64 machines work so it's not a general problem.

 I believe we had the same situation 
 last year when GFP_THISNODE was introduced?
 

It feels vaguely familiar but I don't recall the details in sufficient detail
to recognise if this is the same problem or not.

 After you reverted the slab memoryless node patch there should be per node 
 structures created for node 0 unless the node is marked offline. Is it? If 
 so then you are booting a cpu that is associated with an offline node. 
 

I'll roll a patch that prints out the online states before startup and
see what it looks like.

  Can you see a better solution than this?
 
 Well this means that bootstrap will work by introducing foreign objects 
 into the per cpu queue (should only hold per cpu objects). They will 
 later be consumed and then the queues will contain the right objects so 
 the effect of the patch is minimal.
 

By minimal, do you mean that you expect it to break in some other
respect later or minimal as in this is bad but should not have no
adverse impact.

 I thought we fixed the similar situation last year by dropping 
 GFP_THISNODE for some allocations?
 

Whatever this was a problem fixed in the past or not, it's broken again now
:( . It's possible that there is a __GFP_THISNODE that can be dropped early
at boot-time that would also fix this problem in a way that doesn't
affect runtime (like altering cache_grow in my patch does).

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

  After you reverted the slab memoryless node patch there should be per node 
  structures created for node 0 unless the node is marked offline. Is it? If 
  so then you are booting a cpu that is associated with an offline node. 
  
 
 I'll roll a patch that prints out the online states before startup and
 see what it looks like.

Ok. Great.

 
   Can you see a better solution than this?
  
  Well this means that bootstrap will work by introducing foreign objects 
  into the per cpu queue (should only hold per cpu objects). They will 
  later be consumed and then the queues will contain the right objects so 
  the effect of the patch is minimal.
  
 
 By minimal, do you mean that you expect it to break in some other
 respect later or minimal as in this is bad but should not have no
 adverse impact.

Should not have any adverse impact after the objects from the cpu queue 
have been consumed. If the cache_reaper tries to shift objects back 
from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
you run the tests with full debugging please.

 Whatever this was a problem fixed in the past or not, it's broken again now
 :( . It's possible that there is a __GFP_THISNODE that can be dropped early
 at boot-time that would also fix this problem in a way that doesn't
 affect runtime (like altering cache_grow in my patch does).

The dropping of GFP_THISNODE has the same effect as your patch. 
Objects from another node get into the per cpu queue. And on free we 
assume that per cpu queue objects are from the local node. If debug is on 
then we check that with BUG_ONs.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Olaf Hering
On Tue, Jan 22, Mel Gorman wrote:

 http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
 .. Can you please check on your machine if it fixes your problem?

It does not fix or change the nature of the crash.

 Olaf, please confirm whether you need the patch below as well as the
 revert to make your machine boot.

It crashes now in a different way if the patch below is applied:

Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -   892928
  Normal 892928 -   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 -   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Unable to handle kernel paging request for data at address 0x0058
Faulting instruction address: 0xc00fe018
cpu 0x0: Vector: 300 (Data Access) at [c075bac0]
pc: c00fe018: .setup_cpu_cache+0x184/0x1f4
lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4
sp: c075bd40
   msr: 80009032
   dar: 58
 dsisr: 4200
  current = 0xc0665a50
  paca= 0xc0666380
pid   = 0, comm = swapper
enter ? for help
[c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 (unreliable)
[c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4
[c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
[c075bf90] c0008590 .start_here_common+0x60/0xd0
0:mon 

0xc00fe018 is in setup_cpu_cache 
(/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
2106BUG_ON(!cachep-nodelists[node]);
2107
kmem_list3_init(cachep-nodelists[node]);
2108}
2109}
2110}
2111cachep-nodelists[numa_node_id()]-next_reap =
2112jiffies + REAPTIMEOUT_LIST3 +
2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
2114
2115cpu_cache_get(cachep)-avail = 0;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Nish Aravamudan
On 1/22/08, Olaf Hering [EMAIL PROTECTED] wrote:
 On Tue, Jan 22, Mel Gorman wrote:

  http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
  .. Can you please check on your machine if it fixes your problem?

 It does not fix or change the nature of the crash.

  Olaf, please confirm whether you need the patch below as well as the
  revert to make your machine boot.

 It crashes now in a different way if the patch below is applied:

Was this with the revert Mel mentioned applied as well? I get the
feeling both patches are needed to fix up the memoryless SLAB issue.

 Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 
 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008

snip

 early_node_map[1] active PFN ranges
 1:0 -   892928

snip

 Unable to handle kernel paging request for data at address 0x0058
 Faulting instruction address: 0xc00fe018
 cpu 0x0: Vector: 300 (Data Access) at [c075bac0]
 pc: c00fe018: .setup_cpu_cache+0x184/0x1f4
 lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4
 sp: c075bd40
msr: 80009032
dar: 58
  dsisr: 4200
   current = 0xc0665a50
   paca= 0xc0666380
 pid   = 0, comm = swapper
 enter ? for help
 [c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 
 (unreliable)
 [c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4
 [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
 [c075bf90] c0008590 .start_here_common+0x60/0xd0
 0:mon

 0xc00fe018 is in setup_cpu_cache 
 (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
 2106BUG_ON(!cachep-nodelists[node]);
 2107
 kmem_list3_init(cachep-nodelists[node]);

I might be barking up the wrong tree, but this block above is supposed
to set up the cachep-nodeslists[*] that are used immediately below.
But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or
whatever, you might get a bad access right below for node 0 that has
no memory, if that's the node we're running on...

 2108}
 2109}
 2110}
 2111cachep-nodelists[numa_node_id()]-next_reap =
 2112jiffies + REAPTIMEOUT_LIST3 +
 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 2114
 2115cpu_cache_get(cachep)-avail = 0;

Thanks,
Nish
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Olaf Hering wrote:

 It crashes now in a different way if the patch below is applied:

Yup no l3 structure for the current node. We are early in boostrap. You 
could just check if the l3 is there and if not just skip starting the 
reaper? This will be redone later anyways. Not sure if this will solve all 
your issues though. An l3 for the current node that we are booting on 
needs to be created early on for SLAB bootstrap to succeed. AFAICT SLUB 
doesnt care and simply uses whatever the page allocator gives it for the 
cpu slab. We may have gotten there because you only tested with SLUB 
recently and thus changes got in that broke SLAB boot assumptions.


 0xc00fe018 is in setup_cpu_cache 
 (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
 2106BUG_ON(!cachep-nodelists[node]);
 2107
 kmem_list3_init(cachep-nodelists[node]);
 2108}
 2109}
 2110}

if (cachep-nodelists[numa_node_id()])
return;

 2111cachep-nodelists[numa_node_id()]-next_reap =
 2112jiffies + REAPTIMEOUT_LIST3 +
 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 2114
 2115cpu_cache_get(cachep)-avail = 0;
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 13:34), Christoph Lameter didst pronounce:
 On Tue, 22 Jan 2008, Mel Gorman wrote:
 
   After you reverted the slab memoryless node patch there should be per 
   node 
   structures created for node 0 unless the node is marked offline. Is it? 
   If 
   so then you are booting a cpu that is associated with an offline node. 
   
  
  I'll roll a patch that prints out the online states before startup and
  see what it looks like.
 
 Ok. Great.
 

The dmesg output is below.


  
Can you see a better solution than this?
   
   Well this means that bootstrap will work by introducing foreign objects 
   into the per cpu queue (should only hold per cpu objects). They will 
   later be consumed and then the queues will contain the right objects so 
   the effect of the patch is minimal.
   
  
  By minimal, do you mean that you expect it to break in some other
  respect later or minimal as in this is bad but should not have no
  adverse impact.
 
 Should not have any adverse impact after the objects from the cpu queue 
 have been consumed. If the cache_reaper tries to shift objects back 
 from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
 you run the tests with full debugging please.
 

I am not running a full range of tests at the moment. Just getting boot
first. I'll queue up a range of tests to run with DEBUG on now but it'll
be the morning before I have the results.

  Whatever this was a problem fixed in the past or not, it's broken again now
  :( . It's possible that there is a __GFP_THISNODE that can be dropped early
  at boot-time that would also fix this problem in a way that doesn't
  affect runtime (like altering cache_grow in my patch does).
 
 The dropping of GFP_THISNODE has the same effect as your patch. 

The dropping of it totally? If so, this patch might fix a boot but it'll
potentially be a performance regression on NUMA machines that only have
nodes with memory, right?

 Objects from another node get into the per cpu queue. And on free we 
 assume that per cpu queue objects are from the local node. If debug is on 
 then we check that with BUG_ONs.
 

The interesting parts of the dmesg output are

Online nodes
o 0
o 2
Nodes with regular memory
o 2
Current running CPU 0 is associated with node 0
Current node is 0

So node 2 has regular memory but it's trying to use node 0 at a glance.
I've attached the patch I used against 2.6.24-rc8. It includes the revert.

Here is the full output


Please wait, loading kernel...
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 0240, size: 1192 Kbytes
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 
loglevel=8 
memory layout at init:
  alloc_bottom : 0252a000
  alloc_top: 0800
  alloc_top_hi : 0001
  rmo_top  : 0800
  ram_top  : 0001
Looking for displays
instantiating rtas at 0x077d9000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0262b000 - 0x0262c1d3
Device tree struct  0x0262d000 - 0x02635000
Calling quiesce ...
returning from prom_init
Partition configured for 4 cpus.
Starting Linux PPC64 #1 SMP Tue Jan 22 17:15:48 EST 2008
-
ppc64_pft_size= 0x1a
physicalMemorySize= 0x1
htab_hash_mask= 0x7
-
Linux version 2.6.24-rc8-autokern1 ([EMAIL PROTECTED]) (gcc version 3.4.6 
20060404 (Red Hat 3.4.6-3)) #1 SMP Tue Jan 22 17:15:48 EST 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA 0 -  1048576
  Normal1048576 -  1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
2:0 -  1048576
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 1034240
Policy zone: DMA
Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 
ABAT:1201041303 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 238.059000 MHz
time_init: processor frequency   = 1904.472000 MHz
clocksource: timebase mult[10cd746] shift[22] registered
clockevent: decrementer mult[3cf1] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg0] - real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 2
Memory: 4105560k/4194304k available (5004k 

Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

   Whatever this was a problem fixed in the past or not, it's broken again 
   now
   :( . It's possible that there is a __GFP_THISNODE that can be dropped 
   early
   at boot-time that would also fix this problem in a way that doesn't
   affect runtime (like altering cache_grow in my patch does).
  
  The dropping of GFP_THISNODE has the same effect as your patch. 
 
 The dropping of it totally? If so, this patch might fix a boot but it'll
 potentially be a performance regression on NUMA machines that only have
 nodes with memory, right?

No the dropping during early allocations.,

 o 0
 o 2
 Nodes with regular memory
 o 2
 Current running CPU 0 is associated with node 0
 Current node is 0
 
 So node 2 has regular memory but it's trying to use node 0 at a glance.
 I've attached the patch I used against 2.6.24-rc8. It includes the revert.

We need the current processor to be attached to a node that has 
memory. We cannot fall back that early because the structures for the 
other nodes do not exist yet.

 Online nodes
 o 0
 o 2
 Nodes with regular memory
 o 2
 Current running CPU 0 is associated with node 0
 Current node is 0
  o kmem_list3_init

This needs to be node 2.

 [c05c3b40] c00dadec .cache_grow+0x7c/0x338
 [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224

Fallback during bootstrap.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Pekka Enberg

Hi,

Mel Gorman wrote:

Faulting instruction address: 0xc03c8c00
cpu 0x0: Vector: 300 (Data Access) at [c05c3840]
pc: c03c8c00: __lock_text_start+0x20/0x88
lr: c00dadec: .cache_grow+0x7c/0x338
sp: c05c3ac0
   msr: 80009032
   dar: 40
 dsisr: 4000
  current = 0xc0500f10
  paca= 0xc0501b80
pid   = 0, comm = swapper
enter ? for help
[c05c3b40] c00dadec .cache_grow+0x7c/0x338
[c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224
[c05c3cb0] c00db958 .kmem_cache_alloc+0xe0/0x14c
[c05c3d50] c00d .kmem_cache_create+0x230/0x4cc
[c05c3e30] c04c05f4 .kmem_cache_init+0x310/0x640
[c05c3ee0] c049f8d8 .start_kernel+0x304/0x3fc
[c05c3f90] c0008594 .start_here_common+0x54/0xc0
0:mon


I mentioned this already but received no response (maybe I am missing 
something totally obvious here):


When we call fallback_alloc() because the current node has -nodelists 
set to NULL, we end up calling kmem_getpages() with -1 as the node id 
which is then translated to numa_node_id() by alloc_pages_node. But the 
reason we called fallback_alloc() in the first place is because 
numa_node_id() doesn't have a -nodelist which makes cache_grow() oops.


Pekka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Mel Gorman
On (22/01/08 14:57), Christoph Lameter didst pronounce:
 On Tue, 22 Jan 2008, Mel Gorman wrote:
 
Whatever this was a problem fixed in the past or not, it's broken again 
now
:( . It's possible that there is a __GFP_THISNODE that can be dropped 
early
at boot-time that would also fix this problem in a way that doesn't
affect runtime (like altering cache_grow in my patch does).
   
   The dropping of GFP_THISNODE has the same effect as your patch. 
  
  The dropping of it totally? If so, this patch might fix a boot but it'll
  potentially be a performance regression on NUMA machines that only have
  nodes with memory, right?
 
 No the dropping during early allocations.,
 

We can live with that if the machine otherwise survives during tests.
They are kicked off at the moment with CONFIG_SLAB_DEBUG set but the point
is moot if the patch doesn't work for Olaf. Am still waiting to hear if
the two patches in combination work for him.

  o 0
  o 2
  Nodes with regular memory
  o 2
  Current running CPU 0 is associated with node 0
  Current node is 0
  
  So node 2 has regular memory but it's trying to use node 0 at a glance.
  I've attached the patch I used against 2.6.24-rc8. It includes the revert.
 
 We need the current processor to be attached to a node that has 
 memory. We cannot fall back that early because the structures for the 
 other nodes do not exist yet.
 

Or bodge it early in the boot process so that a node with memory is
always used.

  Online nodes
  o 0
  o 2
  Nodes with regular memory
  o 2
  Current running CPU 0 is associated with node 0
  Current node is 0
   o kmem_list3_init
 
 This needs to be node 2.
 

Rather it should be 2. I'll admit the physical setup of this machine is
 less than ideal but clearly it's something that can happen even if
it's a bad idea.

  [c05c3b40] c00dadec .cache_grow+0x7c/0x338
  [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224
 
 Fallback during bootstrap.
 

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Wed, 23 Jan 2008, Pekka Enberg wrote:

 When we call fallback_alloc() because the current node has -nodelists set to
 NULL, we end up calling kmem_getpages() with -1 as the node id which is then
 translated to numa_node_id() by alloc_pages_node. But the reason we called
 fallback_alloc() in the first place is because numa_node_id() doesn't have a
 -nodelist which makes cache_grow() oops.

Right, if nodeid == -1 then we need to call alloc_pages... 
Essentiall a revert of 50c85a19e7b3928b5b5188524c44ffcbacdd4e35 from 2005.

But I doubt that this is it. The fallback logic was added later and it 
worked fine.


---
 mm/slab.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c2008-01-22 15:05:26.185452369 -0800
+++ linux-2.6/mm/slab.c 2008-01-22 15:05:59.301637009 -0800
@@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c
if (cachep-flags  SLAB_RECLAIM_ACCOUNT)
flags |= __GFP_RECLAIMABLE;
 
-   page = alloc_pages_node(nodeid, flags, cachep-gfporder);
+   if (nodeid == -1)
+   page = alloc_pages(flags, cachep-gfporder);
+   else
+   page = alloc_pages_node(nodeid, flags, cachep-gfporder);
+
if (!page)
return NULL;
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Mel Gorman wrote:

 Rather it should be 2. I'll admit the physical setup of this machine is
  less than ideal but clearly it's something that can happen even if
 it's a bad idea.

Ok. Lets hope that Pekka's find does the trick. But this would mean that 
fallback gets memory from node 2 for the page allocator. Then fallback 
alloc is going to try to insert it into the l3 of node 2 which is not 
there yet. So another ooops. Sigh.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Christoph Lameter
On Tue, 22 Jan 2008, Christoph Lameter wrote:

 But I doubt that this is it. The fallback logic was added later and it 
 worked fine.

My patch is useless (fascinating history of the changelog there through). 
fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that 
alloc_pages_node() will try to allocate on the current node but fallback 
to neighboring node if nothing is there


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-22 Thread Olaf Hering
On Tue, Jan 22, Christoph Lameter wrote:

  0xc00fe018 is in setup_cpu_cache 
  (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
  2106BUG_ON(!cachep-nodelists[node]);
  2107
  kmem_list3_init(cachep-nodelists[node]);
  2108}
  2109}
  2110}
 
 if (cachep-nodelists[numa_node_id()])
   return;

Does not help.


Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #48 SMP Wed Jan 23 08:54:23 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -   892928
  Normal 892928 -   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 -   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
Kernel panic - not syncing: kmem_cache_create(): failed to create slab 
`size-32(DMA)'

Rebooting in 1 seconds..

---
 mm/slab.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
/* Replace the static kmem_list3 structures for the boot cpu */
init_list(cache_cache, initkmem_list3[CACHE_CACHE], node);
 
-   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   for_each_online_node(nid) {
init_list(malloc_sizes[INDEX_AC].cs_cachep,
  initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
int node;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
cachep-nodelists[node] = initkmem_list3[index + node];
cachep-nodelists[node]-next_reap = jiffies +
REAPTIMEOUT_LIST3 +
@@ -2108,6 +2108,8 @@ static int __init_refok setup_cpu_cache(
}
}
}
+   if (!cachep-nodelists[numa_node_id()])
+   return -ENODEV;
cachep-nodelists[numa_node_id()]-next_reap =
jiffies + REAPTIMEOUT_LIST3 +
((unsigned long)cachep) % REAPTIMEOUT_LIST3;
@@ -2775,6 +2777,11 @@ static int cache_grow(struct kmem_cache 
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
+   BUG_ON(!l3);
spin_lock(l3-list_lock);
 
/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3324,10 @@ static void *cache_alloc_node(struct
int x;
 
l3 = cachep-nodelists[nodeid];
+   if (!l3) {
+   nodeid = numa_node_id();
+   l3 = cachep-nodelists[nodeid];
+   }
BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3826,7 @@ static int alloc_kmemlist(struct kmem_ca
struct array_cache *new_shared;
struct array_cache **new_alien = NULL;
 
-   for_each_node_state(node, N_NORMAL_MEMORY) {
+   for_each_online_node(node) {
 
 if (use_alien_caches) {
 new_alien = alloc_alien_cache(node, cachep-limit);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

> On Thu, Jan 17, Olaf Hering wrote:
> 
> > Since -mm boots further, what patch should I try?
> 
> rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.

Sigh. It looks like we need alien cache structures in some cases for nodes 
that have no memory. We must allocate structures for all nodes regardless 
if they have allocatable memory or not.

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8
> 
> calls cache_grow with nodeid 1
> > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

Okay that makes sense. You have no node 0 with normal memory but the node 
assigned to the executing processor is zero (correct?). Thus it needs to 
fallback to node 1 and that is not possible during bootstrap. You need to 
run kmem_cache_init() on a cpu on a processor with memory.

Or we need to revert the patch which would allocate control 
structures again for all online nodes regardless if they have memory or 
not.

Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the 
situation? (However, we tried this on the other thread without success).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Olaf Hering
On Fri, Jan 18, Christoph Lameter wrote:

> Could you try this patch?

Does not help, same crash.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Christoph Lameter wrote:

> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

If !CONFIG_HIGHMEM then

enum node_states {
#ifdef CONFIG_HIGHMEM
N_HIGH_MEMORY,  /* The node has regular or high memory */
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif

So
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, pgdat, NULL,
find_min_pfn_for_node(nid), NULL);

/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_HIGH_MEMORY);
^^^ sets N_NORMAL_MEMORY  
check_for_regular_memory(pgdat);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Nish Aravamudan
On 1/18/08, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> Could you try this patch?
>
> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support
> HIGHMEM
>
> It seems that we only scan through zones to set N_NORMAL_MEMORY only if
> CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set
> N_NORMAL_MEMORY
> in the !CONFIG_HIGHMEM case.

I'm testing this exact patch right now on the machine Mel saw the issues with.

Thanks,
Nish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
Could you try this patch?

Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

It seems that we only scan through zones to set N_NORMAL_MEMORY only if
CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY
in the !CONFIG_HIGHMEM case.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2008-01-18 14:08:41.0 -0800
+++ linux-2.6/mm/page_alloc.c   2008-01-18 14:13:34.0 -0800
@@ -3812,7 +3812,6 @@ restart:
 /* Any regular memory on that node ? */
 static void check_for_regular_memory(pg_data_t *pgdat)
 {
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
 
for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
@@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_
if (zone->present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
-#endif
 }
 
 /**

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Mel Gorman wrote:

> static void check_for_regular_memory(pg_data_t *pgdat)
> {
> #ifdef CONFIG_HIGHMEM
> enum zone_type zone_type;
> 
> for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
> struct zone *zone = >node_zones[zone_type];
> if (zone->present_pages)
> node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
> }
> #endif
> }
> 
> i.e. go through the other zones and if any of them have memory, set
> N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
> PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
> POWER That sounds bad.

Argh. We may need to do a

node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case.

> and one of them is in kmem_cache_init(). That seems very significant.
> Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
> being set would cause trouble for slab?

Yes. That results in the per node structures not being created and thus l3 
== NULL. Explains our failures.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Mel Gorman
On (18/01/08 10:47), Christoph Lameter didst pronounce:
> On Thu, 17 Jan 2008, Olaf Hering wrote:
> 
> > early_node_map[1] active PFN ranges
> > 1:0 ->   892928
> > Could not find start_pfn for node 0
> 
> Corrupted min_pfn?
> 

Doubtful. Node 0 has no memory but it is still being initialised.

Still, I looked closer at what is going on when that message gets
displayed and I see this in free_area_init_nodes()

for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, pgdat, NULL,
find_min_pfn_for_node(nid), NULL);

/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_HIGH_MEMORY);
check_for_regular_memory(pgdat);
}

This "Any memory on that node" thing is new and it says if there is any
memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these
changes closely. It calls check_for_regular_memory() which looks like

static void check_for_regular_memory(pg_data_t *pgdat)
{
#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;

for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
struct zone *zone = >node_zones[zone_type];
if (zone->present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
#endif
}

i.e. go through the other zones and if any of them have memory, set
N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
POWER That sounds bad.

[EMAIL PROTECTED]:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c 
1593:   for_each_node_state(nid, N_NORMAL_MEMORY) {
1971:   for_each_node_state(node, N_NORMAL_MEMORY) {
2102:   for_each_node_state(node, N_NORMAL_MEMORY) {
3818:   for_each_node_state(node, N_NORMAL_MEMORY) {

and one of them is in kmem_cache_init(). That seems very significant.
Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
being set would cause trouble for slab?

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

>   Normal 892928 ->   892928
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 1:0 ->   892928
> Could not find start_pfn for node 0

We only have a single node that is node 1? And then we initialize nodes 0 
to 3?

> Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 
> 1324k data, 1220k bss, 304k init)
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 
> l3 c05fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 
> l3 c05fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 
> l3 c05fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 
> l3 c05fddf0

???
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

> early_node_map[1] active PFN ranges
> 1:0 ->   892928
> Could not find start_pfn for node 0

Corrupted min_pfn?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8
> 
> calls cache_grow with nodeid 1
> > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

Hmmm... fallback_alloc should not be called during bootstrap.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Mel Gorman wrote:

 static void check_for_regular_memory(pg_data_t *pgdat)
 {
 #ifdef CONFIG_HIGHMEM
 enum zone_type zone_type;
 
 for (zone_type = 0; zone_type = ZONE_NORMAL; zone_type++) {
 struct zone *zone = pgdat-node_zones[zone_type];
 if (zone-present_pages)
 node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
 }
 #endif
 }
 
 i.e. go through the other zones and if any of them have memory, set
 N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
 PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
 POWER That sounds bad.

Argh. We may need to do a

node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case.

 and one of them is in kmem_cache_init(). That seems very significant.
 Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
 being set would cause trouble for slab?

Yes. That results in the per node structures not being created and thus l3 
== NULL. Explains our failures.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Olaf Hering
On Fri, Jan 18, Christoph Lameter wrote:

 Could you try this patch?

Does not help, same crash.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Nish Aravamudan
On 1/18/08, Christoph Lameter [EMAIL PROTECTED] wrote:
 Could you try this patch?

 Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support
 HIGHMEM

 It seems that we only scan through zones to set N_NORMAL_MEMORY only if
 CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set
 N_NORMAL_MEMORY
 in the !CONFIG_HIGHMEM case.

I'm testing this exact patch right now on the machine Mel saw the issues with.

Thanks,
Nish
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
Could you try this patch?

Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

It seems that we only scan through zones to set N_NORMAL_MEMORY only if
CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY
in the !CONFIG_HIGHMEM case.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2008-01-18 14:08:41.0 -0800
+++ linux-2.6/mm/page_alloc.c   2008-01-18 14:13:34.0 -0800
@@ -3812,7 +3812,6 @@ restart:
 /* Any regular memory on that node ? */
 static void check_for_regular_memory(pg_data_t *pgdat)
 {
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
 
for (zone_type = 0; zone_type = ZONE_NORMAL; zone_type++) {
@@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_
if (zone-present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
-#endif
 }
 
 /**

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Mel Gorman
On (18/01/08 10:47), Christoph Lameter didst pronounce:
 On Thu, 17 Jan 2008, Olaf Hering wrote:
 
  early_node_map[1] active PFN ranges
  1:0 -   892928
  Could not find start_pfn for node 0
 
 Corrupted min_pfn?
 

Doubtful. Node 0 has no memory but it is still being initialised.

Still, I looked closer at what is going on when that message gets
displayed and I see this in free_area_init_nodes()

for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, pgdat, NULL,
find_min_pfn_for_node(nid), NULL);

/* Any memory on that node */
if (pgdat-node_present_pages)
node_set_state(nid, N_HIGH_MEMORY);
check_for_regular_memory(pgdat);
}

This Any memory on that node thing is new and it says if there is any
memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these
changes closely. It calls check_for_regular_memory() which looks like

static void check_for_regular_memory(pg_data_t *pgdat)
{
#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;

for (zone_type = 0; zone_type = ZONE_NORMAL; zone_type++) {
struct zone *zone = pgdat-node_zones[zone_type];
if (zone-present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
#endif
}

i.e. go through the other zones and if any of them have memory, set
N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
POWER That sounds bad.

[EMAIL PROTECTED]:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c 
1593:   for_each_node_state(nid, N_NORMAL_MEMORY) {
1971:   for_each_node_state(node, N_NORMAL_MEMORY) {
2102:   for_each_node_state(node, N_NORMAL_MEMORY) {
3818:   for_each_node_state(node, N_NORMAL_MEMORY) {

and one of them is in kmem_cache_init(). That seems very significant.
Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
being set would cause trouble for slab?

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

   Normal 892928 -   892928
 Movable zone start PFN for each node
 early_node_map[1] active PFN ranges
 1:0 -   892928
 Could not find start_pfn for node 0

We only have a single node that is node 1? And then we initialize nodes 0 
to 3?

 Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 
 1324k data, 1220k bss, 304k init)
 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 
 l3 c05fddf0
 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 
 l3 c05fddf0
 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 
 l3 c05fddf0
 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 
 l3 c05fddf0

???
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Olaf Hering wrote:

 calls cache_grow with nodeid 0
  [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
 calls cache_grow with nodeid 0
  [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8
 
 calls cache_grow with nodeid 1
  [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

Hmmm... fallback_alloc should not be called during bootstrap.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

 early_node_map[1] active PFN ranges
 1:0 -   892928
 Could not find start_pfn for node 0

Corrupted min_pfn?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Olaf Hering wrote:

> On Thu, Jan 17, Christoph Lameter wrote:
> 
> > On Thu, 17 Jan 2008, Olaf Hering wrote:
> > 
> > > The patch does not help.
> > 
> > Duh. We need to know more about the problem.
> 
> cache_grow is called from 3 places. The third call has cleared l3 for
> some reason.

Typo in debug patch.

calls cache_grow with nodeid 0
> [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
calls cache_grow with nodeid 0
> [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8

calls cache_grow with nodeid 1
> [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Olaf Hering wrote:
> 
> > The patch does not help.
> 
> Duh. We need to know more about the problem.

cache_grow is called from 3 places. The third call has cleared l3 for
some reason.



Allocated 00a0 bytes for kernel @ 0020
   Elf64 kernel loaded...
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line:  xmon=on sysrq=1 debug panic=1 
memory layout at init:
  alloc_bottom : 00ac1000
  alloc_top: 1000
  alloc_top_hi : da00
  rmo_top  : 1000
  ram_top  : da00
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED], opening ... done
instantiating rtas at 0x0f6a1000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00cc2000 -> 0x00cc34e4
Device tree struct  0x00cc4000 -> 0x00cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008
-
ppc64_pft_size= 0x1c
physicalMemorySize= 0xda00
htab_hash_mask= 0x1f
-
Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->   892928
  Normal 892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line:  xmon=on sysrq=1 debug panic=1 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 l3 
c05fddf0
[ cut here ]
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c00f78f4 LR: c00f78e0 CTR: 801af404
REGS: c075b880 TRAP: 0700   Not tainted  (2.6.24-rc8-ppc64)
MSR: 80029032   CR: 2422  XER: 0001
TASK = c0665a50[0] 'swapper' THREAD: c0758000 CPU: 0
GPR00: 0004 c075bb00 c07544c0 0063 
GPR04: 0001 0001   
GPR08:  c06a19a0 c07a84b0 c07a84a8 
GPR12: 4000 c0666380   
GPR16:    4020 
GPR20:  007fbd70 c054f6c8 000492d0 
GPR24:  c06a4fb8 c06a4fb8 c05fdc80 
GPR28:  000412d0 c06e5b80 0004 
NIP [c00f78f4] .cache_grow+0xc8/0x39c
LR [c00f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c075bb00] [c00f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
[c075bc90] [c00f842c] .kmem_cache_alloc+0xd0/0x294
[c075bd40] [c00fb4e8] .kmem_cache_create+0x208/0x478
[c075be20] [c05e670c] .kmem_cache_init+0x218/0x4f4
[c075bee0] [c05bf8ec] .start_kernel+0x2f8/0x3fc
[c075bf90] [c0008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 6000 
381f0001 7c1f07b4 

Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Olaf Hering wrote:

> Since -mm boots further, what patch should I try?

rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

> > freeing bootmem node 1
> > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 
> > 1324k data, 1220k bss, 304k init)
> > cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3
> 
> Is there more backtrace information? What function called cache_grow?

I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the
one from the initial report.
Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change
anything.


Since -mm boots further, what patch should I try?

The kernel boots on a different p570.
See attached dmesg. huckleberry boots, cranberry crashes.


--- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:18.510309000 
+0100
+++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt   2008-01-17 20:48:09.425402000 
+0100
@@ -1,56 +1,55 @@
 Page orders: linear mapping = 24, others = 12
-Found initrd at 0xc270:0xc2a93000
+Found initrd at 0xc130:0xc16e6c1e
 Partition configured for 8 cpus.
 Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
 -
-ppc64_pft_size= 0x1b
+ppc64_pft_size= 0x1c
 ppc64_interrupt_controller= 0x2
 platform  = 0x101
-physicalMemorySize= 0x15800
+physicalMemorySize= 0xda00
 ppc64_caches.dcache_line_size = 0x80
 ppc64_caches.icache_line_size = 0x80
 htab_address  = 0x
-htab_hash_mask= 0xf
+htab_hash_mask= 0x1f
 -
 [boot]0100 MM Init
 [boot]0100 MM Init Done
 Linux version 2.6.16.57-0.5-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 
20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007
 [boot]0012 Setup Arch
-Node 0 Memory: 0x0-0xb000
-Node 1 Memory: 0xb000-0x15800
+Node 0 Memory:
+Node 1 Memory: 0x0-0xda00
 EEH: PCI Enhanced I/O Error Handling Enabled
-PPC64 nvram contains 7168 bytes
+PPC64 nvram contains 8192 bytes
 Using dedicated idle loop
-On node 0 totalpages: 720896
-  DMA zone: 720896 pages, LIFO batch:31
+On node 0 totalpages: 0
+  DMA zone: 0 pages, LIFO batch:0
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
-On node 1 totalpages: 688128
-  DMA zone: 688128 pages, LIFO batch:31
+On node 1 totalpages: 892928
+  DMA zone: 892928 pages, LIFO batch:31
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
 [boot]0015 Setup Done
 Built 2 zonelists
-Kernel command line: 
root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW57445Q010-part5  xmon=on 
sysrq=1 quiet 
+Kernel command line: root=/dev/system/root  xmon=on sysrq=1 quiet 
 [boot]0020 XICS Init
 xics: no ISA interrupt controller
 [boot]0021 XICS Done
 PID hash table entries: 4096 (order: 12, 131072 bytes)
-time_init: decrementer frequency = 207.052000 MHz
-time_init: processor frequency   = 1654.344000 MHz
+time_init: decrementer frequency = 275.07 MHz
+time_init: processor frequency   = 2197.80 MHz
 Console: colour dummy device 80x25
-Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
-Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
-freeing bootmem node 0
+Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
+Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
 freeing bootmem node 1
-Memory: 5524952k/5636096k available (4464k kernel code, 44k reserved, 
1992k data, 836k bss, 264k init)
-Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
+Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k 
data, 836k bss, 264k init)
+Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320)
 Security Framework v1.0.0 initialized
 Mount-cache hash table entries: 256
 checking if image is initramfs... it is
-Freeing initrd memory: 3660k freed
+Freeing initrd memory: 3995k freed
 Processor 1 found.
 Processor 2 found.
 Processor 3 found.
@@ -61,7 +60,7 @@ Processor 7 found.
 Brought up 8 CPUs
 Node 0 CPUs: 0-3
 Node 1 CPUs: 4-7
-migration_cost=41,0,4308
+migration_cost=38,0,3225
 NET: Registered protocol family 16
 PCI: Probing PCI hardware
 IOMMU table initialized, virtual merging enabled
Page orders: linear mapping = 24, others = 12
Found initrd at 0xc270:0xc2a93000
Partition configured for 8 cpus.
Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
-
ppc64_pft_size= 0x1b
ppc64_interrupt_controller= 0x2
platform  = 0x101
physicalMemorySize= 0x15800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf

Re: crash in kmem_cache_init

2008-01-17 Thread Christoph Lameter
Could you try Pekka's suggestion of reverting  
04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

> The patch does not help.

Duh. We need to know more about the problem.

> > --- linux-2.6.orig/mm/slab.c2008-01-03 12:26:42.0 -0800
> > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800
> > @@ -2977,7 +2977,10 @@ retry:
> > }
> > l3 = cachep->nodelists[node];
> >  
> > -   BUG_ON(ac->avail > 0 || !l3);
> > +   if (!l3)
> > +   return NULL;
> > +
> > +   BUG_ON(ac->avail > 0);
> > spin_lock(>list_lock);
> >  
> > /* See if we can refill from the shared array */
> 
> Is this hsupposed to go into cache_grow()? There is no NULL check
> for l3.

No its for cache_alloc_refill. cache_grow should only be called for
nodes that have memory. l3 is always used before cache_grow is called.

> freeing bootmem node 1
> Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 
> 1324k data, 1220k bss, 304k init)
> cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3

Is there more backtrace information? What function called cache_grow?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Pekka Enberg wrote:
> 
> > Looks similar to the one discussed on linux-mm ("[BUG] at
> > mm/slab.c:3320" thread). Christoph?
> 
> Right. Try the latest version of the patch to fix it:

The patch does not help.
 
> Index: linux-2.6/mm/slab.c
> ===
> --- linux-2.6.orig/mm/slab.c  2008-01-03 12:26:42.0 -0800
> +++ linux-2.6/mm/slab.c   2008-01-09 15:59:49.0 -0800
> @@ -2977,7 +2977,10 @@ retry:
>   }
>   l3 = cachep->nodelists[node];
>  
> - BUG_ON(ac->avail > 0 || !l3);
> + if (!l3)
> + return NULL;
> +
> + BUG_ON(ac->avail > 0);
>   spin_lock(>list_lock);
>  
>   /* See if we can refill from the shared array */

Is this hunk supposed to go into cache_grow()? There is no NULL check
for l3.

But if I do that, it does not help:

freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3
Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32'

Rebooting in 1 seconds..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Christoph Lameter
On Thu, 17 Jan 2008, Pekka Enberg wrote:

> Looks similar to the one discussed on linux-mm ("[BUG] at
> mm/slab.c:3320" thread). Christoph?

Right. Try the latest version of the patch to fix it:

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c2008-01-03 12:26:42.0 -0800
+++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800
@@ -2977,7 +2977,10 @@ retry:
}
l3 = cachep->nodelists[node];
 
-   BUG_ON(ac->avail > 0 || !l3);
+   if (!l3)
+   return NULL;
+
+   BUG_ON(ac->avail > 0);
spin_lock(>list_lock);
 
/* See if we can refill from the shared array */
@@ -3224,7 +3227,7 @@ static void *alternate_node_alloc(struct
nid_alloc = cpuset_mem_spread_node();
else if (current->mempolicy)
nid_alloc = slab_node(current->mempolicy);
-   if (nid_alloc != nid_here)
+   if (nid_alloc != nid_here && node_state(nid_alloc, N_NORMAL_MEMORY))
return cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
 }
@@ -3439,8 +3442,14 @@ __do_cache_alloc(struct kmem_cache *cach
 * We may just have run out of memory on the local node.
 * cache_alloc_node() knows how to locate memory on other nodes
 */
-   if (!objp)
-   objp = cache_alloc_node(cache, flags, numa_node_id());
+   if (!objp) {
+   int node_id = numa_node_id();
+   if (likely(cache->nodelists[node_id])) /* fast path */
+   objp = cache_alloc_node(cache, flags, node_id);
+   else /* this function can do good fallback */
+   objp = __cache_alloc_node(cache, flags, node_id,
+   __builtin_return_address(0));
+   }
 
   out:
return objp;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Pekka Enberg
Hi Olaf,

[Adding Christoph as cc.]

On Jan 15, 2008 5:09 PM, Olaf Hering <[EMAIL PROTECTED]> wrote:
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
>
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

So that's the "Memoryless nodes: Slab support" patch that I think
cause a similar oops while ago.

> Unable to handle kernel paging request for data at address 0x0040
> Faulting instruction address: 0xc0437470
> cpu 0x0: Vector: 300 (Data Access) at [c075b830]
> pc: c0437470: ._spin_lock+0x20/0x88
> lr: c00f78a8: .cache_grow+0x7c/0x338
> sp: c075bab0
>msr: 80009032
>dar: 40
>  dsisr: 4000
>   current = 0xc0665a50
>   paca= 0xc0666380
> pid   = 0, comm = swapper
> enter ? for help
> [c075bb30] c00f78a8 .cache_grow+0x7c/0x338
> [c075bbf0] c00f7d04 .fallback_alloc+0x1a0/0x1f4
> [c075bca0] c00f8544 .kmem_cache_alloc+0xec/0x150
> [c075bd40] c00fb1c0 .kmem_cache_create+0x208/0x478
> [c075be20] c05e670c .kmem_cache_init+0x218/0x4f4
> [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
> [c075bf90] c0008590 .start_here_common+0x60/0xd0

Looks similar to the one discussed on linux-mm ("[BUG] at
mm/slab.c:3320" thread). Christoph?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Pekka Enberg
Hi Olaf,

[Adding Christoph as cc.]

On Jan 15, 2008 5:09 PM, Olaf Hering [EMAIL PROTECTED] wrote:
 Current linus tree crashes in kmem_cache_init, as shown below. The
 system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
 Firmware is 240_332, 2.6.23 boots ok with the same config.

 There is a series of mm related patches in 2.6.24-rc1:
 commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

So that's the Memoryless nodes: Slab support patch that I think
cause a similar oops while ago.

 Unable to handle kernel paging request for data at address 0x0040
 Faulting instruction address: 0xc0437470
 cpu 0x0: Vector: 300 (Data Access) at [c075b830]
 pc: c0437470: ._spin_lock+0x20/0x88
 lr: c00f78a8: .cache_grow+0x7c/0x338
 sp: c075bab0
msr: 80009032
dar: 40
  dsisr: 4000
   current = 0xc0665a50
   paca= 0xc0666380
 pid   = 0, comm = swapper
 enter ? for help
 [c075bb30] c00f78a8 .cache_grow+0x7c/0x338
 [c075bbf0] c00f7d04 .fallback_alloc+0x1a0/0x1f4
 [c075bca0] c00f8544 .kmem_cache_alloc+0xec/0x150
 [c075bd40] c00fb1c0 .kmem_cache_create+0x208/0x478
 [c075be20] c05e670c .kmem_cache_init+0x218/0x4f4
 [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc
 [c075bf90] c0008590 .start_here_common+0x60/0xd0

Looks similar to the one discussed on linux-mm ([BUG] at
mm/slab.c:3320 thread). Christoph?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

 On Thu, 17 Jan 2008, Pekka Enberg wrote:
 
  Looks similar to the one discussed on linux-mm ([BUG] at
  mm/slab.c:3320 thread). Christoph?
 
 Right. Try the latest version of the patch to fix it:

The patch does not help.
 
 Index: linux-2.6/mm/slab.c
 ===
 --- linux-2.6.orig/mm/slab.c  2008-01-03 12:26:42.0 -0800
 +++ linux-2.6/mm/slab.c   2008-01-09 15:59:49.0 -0800
 @@ -2977,7 +2977,10 @@ retry:
   }
   l3 = cachep-nodelists[node];
  
 - BUG_ON(ac-avail  0 || !l3);
 + if (!l3)
 + return NULL;
 +
 + BUG_ON(ac-avail  0);
   spin_lock(l3-list_lock);
  
   /* See if we can refill from the shared array */

Is this hunk supposed to go into cache_grow()? There is no NULL check
for l3.

But if I do that, it does not help:

freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3
Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32'

Rebooting in 1 seconds..

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Christoph Lameter
Could you try Pekka's suggestion of reverting  
04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

 The patch does not help.

Duh. We need to know more about the problem.

  --- linux-2.6.orig/mm/slab.c2008-01-03 12:26:42.0 -0800
  +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800
  @@ -2977,7 +2977,10 @@ retry:
  }
  l3 = cachep-nodelists[node];
   
  -   BUG_ON(ac-avail  0 || !l3);
  +   if (!l3)
  +   return NULL;
  +
  +   BUG_ON(ac-avail  0);
  spin_lock(l3-list_lock);
   
  /* See if we can refill from the shared array */
 
 Is this hsupposed to go into cache_grow()? There is no NULL check
 for l3.

No its for cache_alloc_refill. cache_grow should only be called for
nodes that have memory. l3 is always used before cache_grow is called.

 freeing bootmem node 1
 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 
 1324k data, 1220k bss, 304k init)
 cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3

Is there more backtrace information? What function called cache_grow?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

  freeing bootmem node 1
  Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 
  1324k data, 1220k bss, 304k init)
  cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3
 
 Is there more backtrace information? What function called cache_grow?

I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the
one from the initial report.
Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change
anything.


Since -mm boots further, what patch should I try?

The kernel boots on a different p570.
See attached dmesg. huckleberry boots, cranberry crashes.


--- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:18.510309000 
+0100
+++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt   2008-01-17 20:48:09.425402000 
+0100
@@ -1,56 +1,55 @@
 Page orders: linear mapping = 24, others = 12
-Found initrd at 0xc270:0xc2a93000
+Found initrd at 0xc130:0xc16e6c1e
 Partition configured for 8 cpus.
 Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
 -
-ppc64_pft_size= 0x1b
+ppc64_pft_size= 0x1c
 ppc64_interrupt_controller= 0x2
 platform  = 0x101
-physicalMemorySize= 0x15800
+physicalMemorySize= 0xda00
 ppc64_caches.dcache_line_size = 0x80
 ppc64_caches.icache_line_size = 0x80
 htab_address  = 0x
-htab_hash_mask= 0xf
+htab_hash_mask= 0x1f
 -
 [boot]0100 MM Init
 [boot]0100 MM Init Done
 Linux version 2.6.16.57-0.5-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 
20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007
 [boot]0012 Setup Arch
-Node 0 Memory: 0x0-0xb000
-Node 1 Memory: 0xb000-0x15800
+Node 0 Memory:
+Node 1 Memory: 0x0-0xda00
 EEH: PCI Enhanced I/O Error Handling Enabled
-PPC64 nvram contains 7168 bytes
+PPC64 nvram contains 8192 bytes
 Using dedicated idle loop
-On node 0 totalpages: 720896
-  DMA zone: 720896 pages, LIFO batch:31
+On node 0 totalpages: 0
+  DMA zone: 0 pages, LIFO batch:0
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
-On node 1 totalpages: 688128
-  DMA zone: 688128 pages, LIFO batch:31
+On node 1 totalpages: 892928
+  DMA zone: 892928 pages, LIFO batch:31
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
 [boot]0015 Setup Done
 Built 2 zonelists
-Kernel command line: 
root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW57445Q010-part5  xmon=on 
sysrq=1 quiet 
+Kernel command line: root=/dev/system/root  xmon=on sysrq=1 quiet 
 [boot]0020 XICS Init
 xics: no ISA interrupt controller
 [boot]0021 XICS Done
 PID hash table entries: 4096 (order: 12, 131072 bytes)
-time_init: decrementer frequency = 207.052000 MHz
-time_init: processor frequency   = 1654.344000 MHz
+time_init: decrementer frequency = 275.07 MHz
+time_init: processor frequency   = 2197.80 MHz
 Console: colour dummy device 80x25
-Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
-Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
-freeing bootmem node 0
+Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
+Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
 freeing bootmem node 1
-Memory: 5524952k/5636096k available (4464k kernel code, 44k reserved, 
1992k data, 836k bss, 264k init)
-Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
+Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k 
data, 836k bss, 264k init)
+Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320)
 Security Framework v1.0.0 initialized
 Mount-cache hash table entries: 256
 checking if image is initramfs... it is
-Freeing initrd memory: 3660k freed
+Freeing initrd memory: 3995k freed
 Processor 1 found.
 Processor 2 found.
 Processor 3 found.
@@ -61,7 +60,7 @@ Processor 7 found.
 Brought up 8 CPUs
 Node 0 CPUs: 0-3
 Node 1 CPUs: 4-7
-migration_cost=41,0,4308
+migration_cost=38,0,3225
 NET: Registered protocol family 16
 PCI: Probing PCI hardware
 IOMMU table initialized, virtual merging enabled
Page orders: linear mapping = 24, others = 12
Found initrd at 0xc270:0xc2a93000
Partition configured for 8 cpus.
Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
-
ppc64_pft_size= 0x1b
ppc64_interrupt_controller= 0x2
platform  = 0x101
physicalMemorySize= 0x15800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-

Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Olaf Hering wrote:

 Since -mm boots further, what patch should I try?

rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Christoph Lameter wrote:

 On Thu, 17 Jan 2008, Olaf Hering wrote:
 
  The patch does not help.
 
 Duh. We need to know more about the problem.

cache_grow is called from 3 places. The third call has cleared l3 for
some reason.



Allocated 00a0 bytes for kernel @ 0020
   Elf64 kernel loaded...
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line:  xmon=on sysrq=1 debug panic=1 
memory layout at init:
  alloc_bottom : 00ac1000
  alloc_top: 1000
  alloc_top_hi : da00
  rmo_top  : 1000
  ram_top  : da00
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED], opening ... done
instantiating rtas at 0x0f6a1000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00cc2000 - 0x00cc34e4
Device tree struct  0x00cc4000 - 0x00cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008
-
ppc64_pft_size= 0x1c
physicalMemorySize= 0xda00
htab_hash_mask= 0x1f
-
Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -   892928
  Normal 892928 -   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1:0 -   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line:  xmon=on sysrq=1 debug panic=1 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.07 MHz
time_init: processor frequency   = 2197.80 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k 
data, 1220k bss, 304k init)
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 l3 
c05fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 l3 
c05fddf0
[ cut here ]
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c00f78f4 LR: c00f78e0 CTR: 801af404
REGS: c075b880 TRAP: 0700   Not tainted  (2.6.24-rc8-ppc64)
MSR: 80029032 EE,ME,IR,DR  CR: 2422  XER: 0001
TASK = c0665a50[0] 'swapper' THREAD: c0758000 CPU: 0
GPR00: 0004 c075bb00 c07544c0 0063 
GPR04: 0001 0001   
GPR08:  c06a19a0 c07a84b0 c07a84a8 
GPR12: 4000 c0666380   
GPR16:    4020 
GPR20:  007fbd70 c054f6c8 000492d0 
GPR24:  c06a4fb8 c06a4fb8 c05fdc80 
GPR28:  000412d0 c06e5b80 0004 
NIP [c00f78f4] .cache_grow+0xc8/0x39c
LR [c00f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c075bb00] [c00f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
[c075bc90] [c00f842c] .kmem_cache_alloc+0xd0/0x294
[c075bd40] [c00fb4e8] .kmem_cache_create+0x208/0x478
[c075be20] [c05e670c] .kmem_cache_init+0x218/0x4f4
[c075bee0] [c05bf8ec] .start_kernel+0x2f8/0x3fc
[c075bf90] [c0008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 6000 
381f0001 7c1f07b4 2f9f0004 

Re: crash in kmem_cache_init

2008-01-17 Thread Olaf Hering
On Thu, Jan 17, Olaf Hering wrote:

 On Thu, Jan 17, Christoph Lameter wrote:
 
  On Thu, 17 Jan 2008, Olaf Hering wrote:
  
   The patch does not help.
  
  Duh. We need to know more about the problem.
 
 cache_grow is called from 3 places. The third call has cleared l3 for
 some reason.

Typo in debug patch.

calls cache_grow with nodeid 0
 [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
calls cache_grow with nodeid 0
 [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8

calls cache_grow with nodeid 1
 [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-15 Thread Olaf Hering
On Tue, Jan 15, Olaf Hering wrote:

> 
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
> 
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later.
Likely unrelated to the kmem_cache_init bug:

...
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x4017800, mapped to 0xd8008008, size 
33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <[EMAIL PROTECTED]>)
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd :c8:01.2: EHCI Host Controller
ehci_hcd :c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd :c8:01.2: irq 85, io mem 0x400a0002000
ehci_hcd :c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
Unable to handle kernel paging request for data at address 0x0050
Faulting instruction address: 0xc00fa1c4
cpu 0x7: Vector: 300 (Data Access) at [c000d82e7a70]
pc: c00fa1c4: .cache_reap+0x74/0x29c
lr: c00fa198: .cache_reap+0x48/0x29c
sp: c000d82e7cf0
   msr: 80009032
   dar: 50
 dsisr: 4000
  current = 0xc000d82d85c0
  paca= 0xc0668e00
pid   = 27, comm = events/7
enter ? for help
[c000d82e7cf0] c070be98 vmstat_update+0x0/0x18 (unreliable)
[c000d82e7da0] c0092994 .run_workqueue+0x120/0x210
[c000d82e7e40] c0093bb8 .worker_thread+0xcc/0xf0
[c000d82e7f00] c0097b70 .kthread+0x78/0xc4
[c000d82e7f90] c002ab74 .kernel_thread+0x4c/0x68
7:mon> 
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-15 Thread Olaf Hering
On Tue, Jan 15, Olaf Hering wrote:

 
 Current linus tree crashes in kmem_cache_init, as shown below. The
 system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
 Firmware is 240_332, 2.6.23 boots ok with the same config.
 
 There is a series of mm related patches in 2.6.24-rc1:
 commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later.
Likely unrelated to the kmem_cache_init bug:

...
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x4017800, mapped to 0xd8008008, size 
33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt [EMAIL PROTECTED])
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd :c8:01.2: EHCI Host Controller
ehci_hcd :c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd :c8:01.2: irq 85, io mem 0x400a0002000
ehci_hcd :c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
Unable to handle kernel paging request for data at address 0x0050
Faulting instruction address: 0xc00fa1c4
cpu 0x7: Vector: 300 (Data Access) at [c000d82e7a70]
pc: c00fa1c4: .cache_reap+0x74/0x29c
lr: c00fa198: .cache_reap+0x48/0x29c
sp: c000d82e7cf0
   msr: 80009032
   dar: 50
 dsisr: 4000
  current = 0xc000d82d85c0
  paca= 0xc0668e00
pid   = 27, comm = events/7
enter ? for help
[c000d82e7cf0] c070be98 vmstat_update+0x0/0x18 (unreliable)
[c000d82e7da0] c0092994 .run_workqueue+0x120/0x210
[c000d82e7e40] c0093bb8 .worker_thread+0xcc/0xf0
[c000d82e7f00] c0097b70 .kthread+0x78/0xc4
[c000d82e7f90] c002ab74 .kernel_thread+0x4c/0x68
7:mon 
...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/