Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Michal Hocko
On Mon 22-05-17 02:03:21, Guenter Roeck wrote:
> On 05/22/2017 01:45 AM, Michal Hocko wrote:
> >On Sat 20-05-17 09:26:34, Michal Hocko wrote:
> >>On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> >>>Hi,
> >>>
> >>>my qemu tests of next-20170519 show the following results:
> >>>   total: 122 pass: 30 fail: 92
> >>>
> >>>I won't bother listing all of the failures; they are available at
> >>>http://kerneltests.org/builders. I bisected one (openrisc, because
> >>>it gives me some console output before dying). It points to
> >>>'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> >>>
> >>>A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> >>>32-bit x86 images fail and should provide an easy test case.
> >>
> >>Hmm, this is quite unexpected as the patch is not supposed to change
> >>things much. It just removes the flag and perform the new hash scaling
> >>automatically for all requeusts which do not have any high limit.
> >>Some of those didn't have HASH_ADAPT before but that shouldn't change
> >>the picture much. The only thing that I can imagine is that what
> >>formerly failed for early memblock allocations is now suceeding and that
> >>depletes the early memory. Do you have any serial console from the boot?
> >
> >OK, I guess I know what it going on here. Adaptive has scaling is not
> >really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
> >size and so we end up in the endless loop. So the issue has been
> >introduced by the original "mm: adaptive hash table scaling" but my
> >patch made it more visible because [di]cache has tables most probably
> >suceeded in the early initialization which didn't have HASH_ADAPT.
> >The following should fix the hang. I am not yet sure about the maximum
> >size for the scaling and something even smaller would make sense to me
> >because kernel address space is just too small for such a large has
> >tables.
> >---
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index a26e19c3e1ff..70c5fc1fb89a 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -7174,11 +7174,15 @@ static unsigned long __init 
> >arch_reserved_kernel_pages(void)
> >  /*
> >   * Adaptive scale is meant to reduce sizes of hash tables on large memory
> >   * machines. As memory size is increased the scale is also increased but at
> >- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
> >- * quadruples the scale is increased by one, which means the size of hash 
> >table
> >- * only doubles, instead of quadrupling as well.
> >+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
> >+ * on 32b), every time memory quadruples the scale is increased by one, 
> >which
> >+ * means the size of hash table only doubles, instead of quadrupling as 
> >well.
> >   */
> >+#if __BITS_PER_LONG == 64
> >  #define ADAPT_SCALE_BASE   (64ul << 30)
> >+#else
> >+#define ADAPT_SCALE_BASE(32ul << 20)
> >+#endif
> >  #define ADAPT_SCALE_SHIFT  2
> >  #define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT)
> >
> I have seen another patch making it 64ull. Not sure if adaptive scaling
> on 32 bit systems really makes sense; unless there is a clear need I'd rather
> leave it alone.

I've just found out that my incoming emails sync didn't work since
friday. So I've missed those follow up emails. I will double check.
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Michal Hocko
On Mon 22-05-17 02:03:21, Guenter Roeck wrote:
> On 05/22/2017 01:45 AM, Michal Hocko wrote:
> >On Sat 20-05-17 09:26:34, Michal Hocko wrote:
> >>On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> >>>Hi,
> >>>
> >>>my qemu tests of next-20170519 show the following results:
> >>>   total: 122 pass: 30 fail: 92
> >>>
> >>>I won't bother listing all of the failures; they are available at
> >>>http://kerneltests.org/builders. I bisected one (openrisc, because
> >>>it gives me some console output before dying). It points to
> >>>'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> >>>
> >>>A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> >>>32-bit x86 images fail and should provide an easy test case.
> >>
> >>Hmm, this is quite unexpected as the patch is not supposed to change
> >>things much. It just removes the flag and perform the new hash scaling
> >>automatically for all requeusts which do not have any high limit.
> >>Some of those didn't have HASH_ADAPT before but that shouldn't change
> >>the picture much. The only thing that I can imagine is that what
> >>formerly failed for early memblock allocations is now suceeding and that
> >>depletes the early memory. Do you have any serial console from the boot?
> >
> >OK, I guess I know what it going on here. Adaptive has scaling is not
> >really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
> >size and so we end up in the endless loop. So the issue has been
> >introduced by the original "mm: adaptive hash table scaling" but my
> >patch made it more visible because [di]cache has tables most probably
> >suceeded in the early initialization which didn't have HASH_ADAPT.
> >The following should fix the hang. I am not yet sure about the maximum
> >size for the scaling and something even smaller would make sense to me
> >because kernel address space is just too small for such a large has
> >tables.
> >---
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index a26e19c3e1ff..70c5fc1fb89a 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -7174,11 +7174,15 @@ static unsigned long __init 
> >arch_reserved_kernel_pages(void)
> >  /*
> >   * Adaptive scale is meant to reduce sizes of hash tables on large memory
> >   * machines. As memory size is increased the scale is also increased but at
> >- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
> >- * quadruples the scale is increased by one, which means the size of hash 
> >table
> >- * only doubles, instead of quadrupling as well.
> >+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
> >+ * on 32b), every time memory quadruples the scale is increased by one, 
> >which
> >+ * means the size of hash table only doubles, instead of quadrupling as 
> >well.
> >   */
> >+#if __BITS_PER_LONG == 64
> >  #define ADAPT_SCALE_BASE   (64ul << 30)
> >+#else
> >+#define ADAPT_SCALE_BASE(32ul << 20)
> >+#endif
> >  #define ADAPT_SCALE_SHIFT  2
> >  #define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT)
> >
> I have seen another patch making it 64ull. Not sure if adaptive scaling
> on 32 bit systems really makes sense; unless there is a clear need I'd rather
> leave it alone.

I've just found out that my incoming emails sync didn't work since
friday. So I've missed those follow up emails. I will double check.
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Guenter Roeck

On 05/22/2017 01:45 AM, Michal Hocko wrote:

On Sat 20-05-17 09:26:34, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling
automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?


OK, I guess I know what it going on here. Adaptive has scaling is not
really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
size and so we end up in the endless loop. So the issue has been
introduced by the original "mm: adaptive hash table scaling" but my
patch made it more visible because [di]cache has tables most probably
suceeded in the early initialization which didn't have HASH_ADAPT.
The following should fix the hang. I am not yet sure about the maximum
size for the scaling and something even smaller would make sense to me
because kernel address space is just too small for such a large has
tables.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a26e19c3e1ff..70c5fc1fb89a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7174,11 +7174,15 @@ static unsigned long __init 
arch_reserved_kernel_pages(void)
  /*
   * Adaptive scale is meant to reduce sizes of hash tables on large memory
   * machines. As memory size is increased the scale is also increased but at
- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
- * quadruples the scale is increased by one, which means the size of hash table
- * only doubles, instead of quadrupling as well.
+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
+ * on 32b), every time memory quadruples the scale is increased by one, which
+ * means the size of hash table only doubles, instead of quadrupling as well.
   */
+#if __BITS_PER_LONG == 64
  #define ADAPT_SCALE_BASE  (64ul << 30)
+#else
+#define ADAPT_SCALE_BASE   (32ul << 20)
+#endif
  #define ADAPT_SCALE_SHIFT 2
  #define ADAPT_SCALE_NPAGES(ADAPT_SCALE_BASE >> PAGE_SHIFT)
  


I have seen another patch making it 64ull. Not sure if adaptive scaling
on 32 bit systems really makes sense; unless there is a clear need I'd rather
leave it alone.

Guenter




Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Guenter Roeck

On 05/22/2017 01:45 AM, Michal Hocko wrote:

On Sat 20-05-17 09:26:34, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling
automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?


OK, I guess I know what it going on here. Adaptive has scaling is not
really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
size and so we end up in the endless loop. So the issue has been
introduced by the original "mm: adaptive hash table scaling" but my
patch made it more visible because [di]cache has tables most probably
suceeded in the early initialization which didn't have HASH_ADAPT.
The following should fix the hang. I am not yet sure about the maximum
size for the scaling and something even smaller would make sense to me
because kernel address space is just too small for such a large has
tables.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a26e19c3e1ff..70c5fc1fb89a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7174,11 +7174,15 @@ static unsigned long __init 
arch_reserved_kernel_pages(void)
  /*
   * Adaptive scale is meant to reduce sizes of hash tables on large memory
   * machines. As memory size is increased the scale is also increased but at
- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
- * quadruples the scale is increased by one, which means the size of hash table
- * only doubles, instead of quadrupling as well.
+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
+ * on 32b), every time memory quadruples the scale is increased by one, which
+ * means the size of hash table only doubles, instead of quadrupling as well.
   */
+#if __BITS_PER_LONG == 64
  #define ADAPT_SCALE_BASE  (64ul << 30)
+#else
+#define ADAPT_SCALE_BASE   (32ul << 20)
+#endif
  #define ADAPT_SCALE_SHIFT 2
  #define ADAPT_SCALE_NPAGES(ADAPT_SCALE_BASE >> PAGE_SHIFT)
  


I have seen another patch making it 64ull. Not sure if adaptive scaling
on 32 bit systems really makes sense; unless there is a clear need I'd rather
leave it alone.

Guenter




Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Michal Hocko
On Sat 20-05-17 09:26:34, Michal Hocko wrote:
> On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> > Hi,
> > 
> > my qemu tests of next-20170519 show the following results:
> > total: 122 pass: 30 fail: 92
> > 
> > I won't bother listing all of the failures; they are available at
> > http://kerneltests.org/builders. I bisected one (openrisc, because
> > it gives me some console output before dying). It points to
> > 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> > 
> > A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> > 32-bit x86 images fail and should provide an easy test case.
> 
> Hmm, this is quite unexpected as the patch is not supposed to change
> things much. It just removes the flag and perform the new hash scaling
> automatically for all requeusts which do not have any high limit.
> Some of those didn't have HASH_ADAPT before but that shouldn't change
> the picture much. The only thing that I can imagine is that what
> formerly failed for early memblock allocations is now suceeding and that
> depletes the early memory. Do you have any serial console from the boot?

OK, I guess I know what it going on here. Adaptive has scaling is not
really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
size and so we end up in the endless loop. So the issue has been
introduced by the original "mm: adaptive hash table scaling" but my
patch made it more visible because [di]cache has tables most probably
suceeded in the early initialization which didn't have HASH_ADAPT.
The following should fix the hang. I am not yet sure about the maximum
size for the scaling and something even smaller would make sense to me
because kernel address space is just too small for such a large has
tables.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a26e19c3e1ff..70c5fc1fb89a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7174,11 +7174,15 @@ static unsigned long __init 
arch_reserved_kernel_pages(void)
 /*
  * Adaptive scale is meant to reduce sizes of hash tables on large memory
  * machines. As memory size is increased the scale is also increased but at
- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
- * quadruples the scale is increased by one, which means the size of hash table
- * only doubles, instead of quadrupling as well.
+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
+ * on 32b), every time memory quadruples the scale is increased by one, which
+ * means the size of hash table only doubles, instead of quadrupling as well.
  */
+#if __BITS_PER_LONG == 64
 #define ADAPT_SCALE_BASE   (64ul << 30)
+#else
+#define ADAPT_SCALE_BASE   (32ul << 20)
+#endif
 #define ADAPT_SCALE_SHIFT  2
 #define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT)
 
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-22 Thread Michal Hocko
On Sat 20-05-17 09:26:34, Michal Hocko wrote:
> On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> > Hi,
> > 
> > my qemu tests of next-20170519 show the following results:
> > total: 122 pass: 30 fail: 92
> > 
> > I won't bother listing all of the failures; they are available at
> > http://kerneltests.org/builders. I bisected one (openrisc, because
> > it gives me some console output before dying). It points to
> > 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> > 
> > A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> > 32-bit x86 images fail and should provide an easy test case.
> 
> Hmm, this is quite unexpected as the patch is not supposed to change
> things much. It just removes the flag and perform the new hash scaling
> automatically for all requeusts which do not have any high limit.
> Some of those didn't have HASH_ADAPT before but that shouldn't change
> the picture much. The only thing that I can imagine is that what
> formerly failed for early memblock allocations is now suceeding and that
> depletes the early memory. Do you have any serial console from the boot?

OK, I guess I know what it going on here. Adaptive has scaling is not
really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
size and so we end up in the endless loop. So the issue has been
introduced by the original "mm: adaptive hash table scaling" but my
patch made it more visible because [di]cache has tables most probably
suceeded in the early initialization which didn't have HASH_ADAPT.
The following should fix the hang. I am not yet sure about the maximum
size for the scaling and something even smaller would make sense to me
because kernel address space is just too small for such a large has
tables.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a26e19c3e1ff..70c5fc1fb89a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7174,11 +7174,15 @@ static unsigned long __init 
arch_reserved_kernel_pages(void)
 /*
  * Adaptive scale is meant to reduce sizes of hash tables on large memory
  * machines. As memory size is increased the scale is also increased but at
- * slower pace.  Starting from ADAPT_SCALE_BASE (64G), every time memory
- * quadruples the scale is increased by one, which means the size of hash table
- * only doubles, instead of quadrupling as well.
+ * slower pace.  Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
+ * on 32b), every time memory quadruples the scale is increased by one, which
+ * means the size of hash table only doubles, instead of quadrupling as well.
  */
+#if __BITS_PER_LONG == 64
 #define ADAPT_SCALE_BASE   (64ul << 30)
+#else
+#define ADAPT_SCALE_BASE   (32ul << 20)
+#endif
 #define ADAPT_SCALE_SHIFT  2
 #define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT)
 
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Pasha Tatashin

The problem is due to 32-bit integer overflow in:

ADAPT_SCALE_BASE and adapt

In dcache_init_early() that is causing the problem. It was not enabled 
before 'mm: drop HASH_ADAPT' but is enabled now, and it should follow 
right after: "PID hash table entries: 1024 (order: 0, 4096 bytes)"


main()
  pidhash_init();
  vfs_caches_init_early();
dcache_init_early()
  alloc_large_system_hash("Dentry cache", ...)

for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries;
 adapt <<= ADAPT_SCALE_SHIFT)

numentries  is very small, so it should be always smaller than adapt, 
and algorithm should not kick in, but 32-bit causes adapt to be smaller 
than numentries.


I will send out an updated "mm: Adaptive hash table scaling", with "mm: 
drop HASH_ADAPT" integrated.


Pasha

On 05/20/2017 10:21 AM, Guenter Roeck wrote:

On 05/20/2017 12:26 AM, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels 
fail.

32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling


It may well be that the problem is introduced with an earlier patch and 
just

enabled by this one.


automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?



They are all the same. Either nothing or the following. Picking a couple:

metag:

Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) 
(gcc version 4.2.4 (IMG-1.4.0.300)) #1 Fri May 19 00:50:50 PDT 2017

LNKGET/SET go through cache but CONFIG_METAG_LNKGET_AROUND_CACHE=y
DA present
console [ttyDA1] enabled
OF: fdt: Machine model: toumaz,tz1090
Machine name: Generic Meta
Node 0: start_pfn = 0xb, low = 0xbfff7
Zone ranges:
   Normal   [mem 0xb000-0xbfff6fff]
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0xb000-0xbfff6fff]
Initmem setup node 0 [mem 0xb000-0xbfff6fff]
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65015
Kernel command line: rdinit=/sbin/init doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

crisv32:

Linux version 4.12.0-rc1-next-20170519 (gro...@desktop.roeck-us.net) 
(gcc version 4.9.2 (Buildroot 2015.02-rc1-5-gb13bd8e-dirty) ) #1 Fri 
May 19 00:52:55 PDT 2017

bootconsole [early0] enabled
Setting up paging and the MMU.
Linux/CRISv32 port on ETRAX FS (C) 2003, 2004 Axis Communications AB
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4080
Kernel command line: console=ttyS0,115200,N,8 rdinit=/sbin/init
PID hash table entries: 128 (order: -4, 512 bytes)

powerpc:mpc8548cds:

Memory CAM mapping: 256 Mb, residual: 0Mb
Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) 
(gcc version 4.8.1 (GCC) ) #1 Fri May 19 01:17:29 PDT 2017

Found initrd at 0xc400:0xc4200c00
Using MPC85xx CDS machine description
bootconsole [udbg0] enabled
-
phys_mem_size = 0x1000
dcache_bsize  = 0x20
icache_bsize  = 0x20
cpu_features  = 0x12100460
   possible= 0x12100460
   always  = 0x0010
cpu_user_features = 0x84e08000 0x0800
mmu_features  = 0x00020010
-
mpc85xx_cds_setup_arch()
Could not find FPGA node.
Zone ranges:
   DMA  [mem 0x-0x0fff]
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x-0x0fff]
Initmem setup node 0 [mem 0x-0x0fff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: rdinit=/sbin/init console=ttyS0 console=tty doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

Guenter


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Pasha Tatashin

The problem is due to 32-bit integer overflow in:

ADAPT_SCALE_BASE and adapt

In dcache_init_early() that is causing the problem. It was not enabled 
before 'mm: drop HASH_ADAPT' but is enabled now, and it should follow 
right after: "PID hash table entries: 1024 (order: 0, 4096 bytes)"


main()
  pidhash_init();
  vfs_caches_init_early();
dcache_init_early()
  alloc_large_system_hash("Dentry cache", ...)

for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries;
 adapt <<= ADAPT_SCALE_SHIFT)

numentries  is very small, so it should be always smaller than adapt, 
and algorithm should not kick in, but 32-bit causes adapt to be smaller 
than numentries.


I will send out an updated "mm: Adaptive hash table scaling", with "mm: 
drop HASH_ADAPT" integrated.


Pasha

On 05/20/2017 10:21 AM, Guenter Roeck wrote:

On 05/20/2017 12:26 AM, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels 
fail.

32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling


It may well be that the problem is introduced with an earlier patch and 
just

enabled by this one.


automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?



They are all the same. Either nothing or the following. Picking a couple:

metag:

Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) 
(gcc version 4.2.4 (IMG-1.4.0.300)) #1 Fri May 19 00:50:50 PDT 2017

LNKGET/SET go through cache but CONFIG_METAG_LNKGET_AROUND_CACHE=y
DA present
console [ttyDA1] enabled
OF: fdt: Machine model: toumaz,tz1090
Machine name: Generic Meta
Node 0: start_pfn = 0xb, low = 0xbfff7
Zone ranges:
   Normal   [mem 0xb000-0xbfff6fff]
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0xb000-0xbfff6fff]
Initmem setup node 0 [mem 0xb000-0xbfff6fff]
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65015
Kernel command line: rdinit=/sbin/init doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

crisv32:

Linux version 4.12.0-rc1-next-20170519 (gro...@desktop.roeck-us.net) 
(gcc version 4.9.2 (Buildroot 2015.02-rc1-5-gb13bd8e-dirty) ) #1 Fri 
May 19 00:52:55 PDT 2017

bootconsole [early0] enabled
Setting up paging and the MMU.
Linux/CRISv32 port on ETRAX FS (C) 2003, 2004 Axis Communications AB
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4080
Kernel command line: console=ttyS0,115200,N,8 rdinit=/sbin/init
PID hash table entries: 128 (order: -4, 512 bytes)

powerpc:mpc8548cds:

Memory CAM mapping: 256 Mb, residual: 0Mb
Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) 
(gcc version 4.8.1 (GCC) ) #1 Fri May 19 01:17:29 PDT 2017

Found initrd at 0xc400:0xc4200c00
Using MPC85xx CDS machine description
bootconsole [udbg0] enabled
-
phys_mem_size = 0x1000
dcache_bsize  = 0x20
icache_bsize  = 0x20
cpu_features  = 0x12100460
   possible= 0x12100460
   always  = 0x0010
cpu_user_features = 0x84e08000 0x0800
mmu_features  = 0x00020010
-
mpc85xx_cds_setup_arch()
Could not find FPGA node.
Zone ranges:
   DMA  [mem 0x-0x0fff]
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x-0x0fff]
Initmem setup node 0 [mem 0x-0x0fff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: rdinit=/sbin/init console=ttyS0 console=tty doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

Guenter


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Guenter Roeck

On 05/20/2017 12:26 AM, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling


It may well be that the problem is introduced with an earlier patch and just
enabled by this one.


automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?



They are all the same. Either nothing or the following. Picking a couple:

metag:

Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) (gcc 
version 4.2.4 (IMG-1.4.0.300)) #1 Fri May 19 00:50:50 PDT 2017
LNKGET/SET go through cache but CONFIG_METAG_LNKGET_AROUND_CACHE=y
DA present
console [ttyDA1] enabled
OF: fdt: Machine model: toumaz,tz1090
Machine name: Generic Meta
Node 0: start_pfn = 0xb, low = 0xbfff7
Zone ranges:
  Normal   [mem 0xb000-0xbfff6fff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0xb000-0xbfff6fff]
Initmem setup node 0 [mem 0xb000-0xbfff6fff]
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65015
Kernel command line: rdinit=/sbin/init doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

crisv32:

Linux version 4.12.0-rc1-next-20170519 (gro...@desktop.roeck-us.net) (gcc 
version 4.9.2 (Buildroot 2015.02-rc1-5-gb13bd8e-dirty) ) #1 Fri May 19 
00:52:55 PDT 2017
bootconsole [early0] enabled
Setting up paging and the MMU.
Linux/CRISv32 port on ETRAX FS (C) 2003, 2004 Axis Communications AB
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4080
Kernel command line: console=ttyS0,115200,N,8 rdinit=/sbin/init
PID hash table entries: 128 (order: -4, 512 bytes)

powerpc:mpc8548cds:

Memory CAM mapping: 256 Mb, residual: 0Mb
Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) (gcc 
version 4.8.1 (GCC) ) #1 Fri May 19 01:17:29 PDT 2017
Found initrd at 0xc400:0xc4200c00
Using MPC85xx CDS machine description
bootconsole [udbg0] enabled
-
phys_mem_size = 0x1000
dcache_bsize  = 0x20
icache_bsize  = 0x20
cpu_features  = 0x12100460
  possible= 0x12100460
  always  = 0x0010
cpu_user_features = 0x84e08000 0x0800
mmu_features  = 0x00020010
-
mpc85xx_cds_setup_arch()
Could not find FPGA node.
Zone ranges:
  DMA  [mem 0x-0x0fff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x0fff]
Initmem setup node 0 [mem 0x-0x0fff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: rdinit=/sbin/init console=ttyS0 console=tty doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

Guenter


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Guenter Roeck

On 05/20/2017 12:26 AM, Michal Hocko wrote:

On Fri 19-05-17 09:46:23, Guenter Roeck wrote:

Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.


Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling


It may well be that the problem is introduced with an earlier patch and just
enabled by this one.


automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?



They are all the same. Either nothing or the following. Picking a couple:

metag:

Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) (gcc 
version 4.2.4 (IMG-1.4.0.300)) #1 Fri May 19 00:50:50 PDT 2017
LNKGET/SET go through cache but CONFIG_METAG_LNKGET_AROUND_CACHE=y
DA present
console [ttyDA1] enabled
OF: fdt: Machine model: toumaz,tz1090
Machine name: Generic Meta
Node 0: start_pfn = 0xb, low = 0xbfff7
Zone ranges:
  Normal   [mem 0xb000-0xbfff6fff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0xb000-0xbfff6fff]
Initmem setup node 0 [mem 0xb000-0xbfff6fff]
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65015
Kernel command line: rdinit=/sbin/init doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

crisv32:

Linux version 4.12.0-rc1-next-20170519 (gro...@desktop.roeck-us.net) (gcc 
version 4.9.2 (Buildroot 2015.02-rc1-5-gb13bd8e-dirty) ) #1 Fri May 19 
00:52:55 PDT 2017
bootconsole [early0] enabled
Setting up paging and the MMU.
Linux/CRISv32 port on ETRAX FS (C) 2003, 2004 Axis Communications AB
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4080
Kernel command line: console=ttyS0,115200,N,8 rdinit=/sbin/init
PID hash table entries: 128 (order: -4, 512 bytes)

powerpc:mpc8548cds:

Memory CAM mapping: 256 Mb, residual: 0Mb
Linux version 4.12.0-rc1-next-20170519 (gro...@jupiter.roeck-us.net) (gcc 
version 4.8.1 (GCC) ) #1 Fri May 19 01:17:29 PDT 2017
Found initrd at 0xc400:0xc4200c00
Using MPC85xx CDS machine description
bootconsole [udbg0] enabled
-
phys_mem_size = 0x1000
dcache_bsize  = 0x20
icache_bsize  = 0x20
cpu_features  = 0x12100460
  possible= 0x12100460
  always  = 0x0010
cpu_user_features = 0x84e08000 0x0800
mmu_features  = 0x00020010
-
mpc85xx_cds_setup_arch()
Could not find FPGA node.
Zone ranges:
  DMA  [mem 0x-0x0fff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x0fff]
Initmem setup node 0 [mem 0x-0x0fff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Kernel command line: rdinit=/sbin/init console=ttyS0 console=tty doreboot
PID hash table entries: 1024 (order: 0, 4096 bytes)

Guenter


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Michal Hocko
On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> Hi,
> 
> my qemu tests of next-20170519 show the following results:
>   total: 122 pass: 30 fail: 92
> 
> I won't bother listing all of the failures; they are available at
> http://kerneltests.org/builders. I bisected one (openrisc, because
> it gives me some console output before dying). It points to
> 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> 
> A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> 32-bit x86 images fail and should provide an easy test case.

Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling
automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-20 Thread Michal Hocko
On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
> Hi,
> 
> my qemu tests of next-20170519 show the following results:
>   total: 122 pass: 30 fail: 92
> 
> I won't bother listing all of the failures; they are available at
> http://kerneltests.org/builders. I bisected one (openrisc, because
> it gives me some console output before dying). It points to
> 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
> 
> A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> 32-bit x86 images fail and should provide an easy test case.

Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling
automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?
-- 
Michal Hocko
SUSE Labs


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-19 Thread Kevin Hilman
On Fri, May 19, 2017 at 9:46 AM, Guenter Roeck  wrote:
> Hi,
>
> my qemu tests of next-20170519 show the following results:
> total: 122 pass: 30 fail: 92
>
> I won't bother listing all of the failures; they are available at
> http://kerneltests.org/builders. I bisected one (openrisc, because
> it gives me some console output before dying). It points to
> 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
>
> A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> 32-bit x86 images fail and should provide an easy test case.

32-bit ARM platforms also failing.

I also noticed widespread boot failures in kernelci.org, and bisected
one of them (32-bit ARM, Beaglebone Black) and it pointed at the same
patch.

Kevin

> Guenter
>
> ---
> # bad: [5666af8ae4a18b5ea6758d0c7c42ea765de216d2] Add linux-next specific 
> files for 20170519
> # good: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1
> git bisect start 'HEAD' 'v4.12-rc1'
> # good: [c7115270d8cc333801b11e541ddbc43e320a88ef] Merge remote-tracking 
> branch 'drm/drm-next'
> git bisect good c7115270d8cc333801b11e541ddbc43e320a88ef
> # good: [6bf84ee44e057051577be7edf306dc595b8d3c0f] Merge remote-tracking 
> branch 'tip/auto-latest'
> git bisect good 6bf84ee44e057051577be7edf306dc595b8d3c0f
> # good: [8def67a06d65a1b08c87a65a8ef4fd6e71b6745c] Merge remote-tracking 
> branch 'staging/staging-next'
> git bisect good 8def67a06d65a1b08c87a65a8ef4fd6e71b6745c
> # good: [0d538a750eaab91fc3f6dffe4c0e7d98d6252b81] Merge remote-tracking 
> branch 'userns/for-next'
> git bisect good 0d538a750eaab91fc3f6dffe4c0e7d98d6252b81
> # good: [eb64959cd8c405de533122dc72b64d6ca197cee1] powerpc/mm/hugetlb: remove 
> follow_huge_addr for powerpc
> git bisect good eb64959cd8c405de533122dc72b64d6ca197cee1
> # bad: [eb520e759caf124ba1c64e277939ff379d0ca8bd] procfs: fdinfo: extend 
> information about epoll target files
> git bisect bad eb520e759caf124ba1c64e277939ff379d0ca8bd
> # bad: [45f5e427d6326ca1c44cd6897b9939441063fb96] lib/kstrtox.c: use 
> "unsigned int" more
> git bisect bad 45f5e427d6326ca1c44cd6897b9939441063fb96
> # bad: [d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd] mm: drop NULL return check 
> of pte_offset_map_lock()
> git bisect bad d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd
> # good: [d4c9af9111d483efd5f302916639a0e9a626f90f] mm: adaptive hash table 
> scaling
> git bisect good d4c9af9111d483efd5f302916639a0e9a626f90f
> # bad: [90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa] mm/hugetlb: clean up 
> ARCH_HAS_GIGANTIC_PAGE
> git bisect bad 90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa
> # bad: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop HASH_ADAPT
> git bisect bad 67d0687224a93ef2adae7a2ed10f25b275f2ee91
> # first bad commit: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop 
> HASH_ADAPT


Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-19 Thread Kevin Hilman
On Fri, May 19, 2017 at 9:46 AM, Guenter Roeck  wrote:
> Hi,
>
> my qemu tests of next-20170519 show the following results:
> total: 122 pass: 30 fail: 92
>
> I won't bother listing all of the failures; they are available at
> http://kerneltests.org/builders. I bisected one (openrisc, because
> it gives me some console output before dying). It points to
> 'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.
>
> A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
> 32-bit x86 images fail and should provide an easy test case.

32-bit ARM platforms also failing.

I also noticed widespread boot failures in kernelci.org, and bisected
one of them (32-bit ARM, Beaglebone Black) and it pointed at the same
patch.

Kevin

> Guenter
>
> ---
> # bad: [5666af8ae4a18b5ea6758d0c7c42ea765de216d2] Add linux-next specific 
> files for 20170519
> # good: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1
> git bisect start 'HEAD' 'v4.12-rc1'
> # good: [c7115270d8cc333801b11e541ddbc43e320a88ef] Merge remote-tracking 
> branch 'drm/drm-next'
> git bisect good c7115270d8cc333801b11e541ddbc43e320a88ef
> # good: [6bf84ee44e057051577be7edf306dc595b8d3c0f] Merge remote-tracking 
> branch 'tip/auto-latest'
> git bisect good 6bf84ee44e057051577be7edf306dc595b8d3c0f
> # good: [8def67a06d65a1b08c87a65a8ef4fd6e71b6745c] Merge remote-tracking 
> branch 'staging/staging-next'
> git bisect good 8def67a06d65a1b08c87a65a8ef4fd6e71b6745c
> # good: [0d538a750eaab91fc3f6dffe4c0e7d98d6252b81] Merge remote-tracking 
> branch 'userns/for-next'
> git bisect good 0d538a750eaab91fc3f6dffe4c0e7d98d6252b81
> # good: [eb64959cd8c405de533122dc72b64d6ca197cee1] powerpc/mm/hugetlb: remove 
> follow_huge_addr for powerpc
> git bisect good eb64959cd8c405de533122dc72b64d6ca197cee1
> # bad: [eb520e759caf124ba1c64e277939ff379d0ca8bd] procfs: fdinfo: extend 
> information about epoll target files
> git bisect bad eb520e759caf124ba1c64e277939ff379d0ca8bd
> # bad: [45f5e427d6326ca1c44cd6897b9939441063fb96] lib/kstrtox.c: use 
> "unsigned int" more
> git bisect bad 45f5e427d6326ca1c44cd6897b9939441063fb96
> # bad: [d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd] mm: drop NULL return check 
> of pte_offset_map_lock()
> git bisect bad d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd
> # good: [d4c9af9111d483efd5f302916639a0e9a626f90f] mm: adaptive hash table 
> scaling
> git bisect good d4c9af9111d483efd5f302916639a0e9a626f90f
> # bad: [90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa] mm/hugetlb: clean up 
> ARCH_HAS_GIGANTIC_PAGE
> git bisect bad 90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa
> # bad: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop HASH_ADAPT
> git bisect bad 67d0687224a93ef2adae7a2ed10f25b275f2ee91
> # first bad commit: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop 
> HASH_ADAPT


Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-19 Thread Guenter Roeck
Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.

Guenter

---
# bad: [5666af8ae4a18b5ea6758d0c7c42ea765de216d2] Add linux-next specific files 
for 20170519
# good: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1
git bisect start 'HEAD' 'v4.12-rc1'
# good: [c7115270d8cc333801b11e541ddbc43e320a88ef] Merge remote-tracking branch 
'drm/drm-next'
git bisect good c7115270d8cc333801b11e541ddbc43e320a88ef
# good: [6bf84ee44e057051577be7edf306dc595b8d3c0f] Merge remote-tracking branch 
'tip/auto-latest'
git bisect good 6bf84ee44e057051577be7edf306dc595b8d3c0f
# good: [8def67a06d65a1b08c87a65a8ef4fd6e71b6745c] Merge remote-tracking branch 
'staging/staging-next'
git bisect good 8def67a06d65a1b08c87a65a8ef4fd6e71b6745c
# good: [0d538a750eaab91fc3f6dffe4c0e7d98d6252b81] Merge remote-tracking branch 
'userns/for-next'
git bisect good 0d538a750eaab91fc3f6dffe4c0e7d98d6252b81
# good: [eb64959cd8c405de533122dc72b64d6ca197cee1] powerpc/mm/hugetlb: remove 
follow_huge_addr for powerpc
git bisect good eb64959cd8c405de533122dc72b64d6ca197cee1
# bad: [eb520e759caf124ba1c64e277939ff379d0ca8bd] procfs: fdinfo: extend 
information about epoll target files
git bisect bad eb520e759caf124ba1c64e277939ff379d0ca8bd
# bad: [45f5e427d6326ca1c44cd6897b9939441063fb96] lib/kstrtox.c: use "unsigned 
int" more
git bisect bad 45f5e427d6326ca1c44cd6897b9939441063fb96
# bad: [d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd] mm: drop NULL return check of 
pte_offset_map_lock()
git bisect bad d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd
# good: [d4c9af9111d483efd5f302916639a0e9a626f90f] mm: adaptive hash table 
scaling
git bisect good d4c9af9111d483efd5f302916639a0e9a626f90f
# bad: [90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa] mm/hugetlb: clean up 
ARCH_HAS_GIGANTIC_PAGE
git bisect bad 90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa
# bad: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop HASH_ADAPT
git bisect bad 67d0687224a93ef2adae7a2ed10f25b275f2ee91
# first bad commit: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop 
HASH_ADAPT


Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

2017-05-19 Thread Guenter Roeck
Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.

Guenter

---
# bad: [5666af8ae4a18b5ea6758d0c7c42ea765de216d2] Add linux-next specific files 
for 20170519
# good: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1
git bisect start 'HEAD' 'v4.12-rc1'
# good: [c7115270d8cc333801b11e541ddbc43e320a88ef] Merge remote-tracking branch 
'drm/drm-next'
git bisect good c7115270d8cc333801b11e541ddbc43e320a88ef
# good: [6bf84ee44e057051577be7edf306dc595b8d3c0f] Merge remote-tracking branch 
'tip/auto-latest'
git bisect good 6bf84ee44e057051577be7edf306dc595b8d3c0f
# good: [8def67a06d65a1b08c87a65a8ef4fd6e71b6745c] Merge remote-tracking branch 
'staging/staging-next'
git bisect good 8def67a06d65a1b08c87a65a8ef4fd6e71b6745c
# good: [0d538a750eaab91fc3f6dffe4c0e7d98d6252b81] Merge remote-tracking branch 
'userns/for-next'
git bisect good 0d538a750eaab91fc3f6dffe4c0e7d98d6252b81
# good: [eb64959cd8c405de533122dc72b64d6ca197cee1] powerpc/mm/hugetlb: remove 
follow_huge_addr for powerpc
git bisect good eb64959cd8c405de533122dc72b64d6ca197cee1
# bad: [eb520e759caf124ba1c64e277939ff379d0ca8bd] procfs: fdinfo: extend 
information about epoll target files
git bisect bad eb520e759caf124ba1c64e277939ff379d0ca8bd
# bad: [45f5e427d6326ca1c44cd6897b9939441063fb96] lib/kstrtox.c: use "unsigned 
int" more
git bisect bad 45f5e427d6326ca1c44cd6897b9939441063fb96
# bad: [d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd] mm: drop NULL return check of 
pte_offset_map_lock()
git bisect bad d75db247b8f204bfa2e6d2b40afcae74f3b4c7fd
# good: [d4c9af9111d483efd5f302916639a0e9a626f90f] mm: adaptive hash table 
scaling
git bisect good d4c9af9111d483efd5f302916639a0e9a626f90f
# bad: [90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa] mm/hugetlb: clean up 
ARCH_HAS_GIGANTIC_PAGE
git bisect bad 90d2d8d8960a1b2ed868ce3bfd91e2ac8d4ff9aa
# bad: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop HASH_ADAPT
git bisect bad 67d0687224a93ef2adae7a2ed10f25b275f2ee91
# first bad commit: [67d0687224a93ef2adae7a2ed10f25b275f2ee91] mm: drop 
HASH_ADAPT