Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Tetsuo Handa
Holger Schurig wrote:
> So I did an "arm-linux-gnueabihf-objdump -Sgd linux/vmlinux", not sure
> if that helps:
> 
> c00972ec <__rmqueue>:
>  * Do the hard work of removing an element from the buddy allocator.
>  * Call me with the zone->lock already held.
>  */
> static struct page *__rmqueue(struct zone *zone, unsigned int order,
> int migratetype, gfp_t gfp_flags)
> {
> c00972ec:   e1a0c00dmov ip, sp
> c00972f0:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl, fp, ip, 
> lr, pc}
> c00972f4:   e24cb004sub fp, ip, #4
> c00972f8:   e24dd024sub sp, sp, #36 ; 0x24
> unsigned int current_order;
> struct free_area *area;
> struct page *page;
> 
> /* Find a page of the appropriate size in the preferred list */
> for (current_order = order; current_order < MAX_ORDER; 
> ++current_order) {
> c00972fc:   e351000acmp r1, #10
>  * Do the hard work of removing an element from the buddy allocator.
>  * Call me with the zone->lock already held.
>  */
> 
I tried on x86_64 but I could not reproduce it.
Thus, we need to examine this problem using your environment.

I didn't notice that c00972ec is __rmqueue+0x0.
Actual line number to examine is c0097360 ("pc" register) which is 
__rmqueue+0x74.
Please show us line number and assembly code around c0097360.


Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Tetsuo Handa
Holger Schurig wrote:
> So I did an "arm-linux-gnueabihf-objdump -Sgd linux/vmlinux", not sure
> if that helps:
> 
> c00972ec <__rmqueue>:
>  * Do the hard work of removing an element from the buddy allocator.
>  * Call me with the zone->lock already held.
>  */
> static struct page *__rmqueue(struct zone *zone, unsigned int order,
> int migratetype, gfp_t gfp_flags)
> {
> c00972ec:   e1a0c00dmov ip, sp
> c00972f0:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl, fp, ip, 
> lr, pc}
> c00972f4:   e24cb004sub fp, ip, #4
> c00972f8:   e24dd024sub sp, sp, #36 ; 0x24
> unsigned int current_order;
> struct free_area *area;
> struct page *page;
> 
> /* Find a page of the appropriate size in the preferred list */
> for (current_order = order; current_order < MAX_ORDER; 
> ++current_order) {
> c00972fc:   e351000acmp r1, #10
>  * Do the hard work of removing an element from the buddy allocator.
>  * Call me with the zone->lock already held.
>  */
> 
I tried on x86_64 but I could not reproduce it.
Thus, we need to examine this problem using your environment.

I didn't notice that c00972ec is __rmqueue+0x0.
Actual line number to examine is c0097360 ("pc" register) which is 
__rmqueue+0x74.
Please show us line number and assembly code around c0097360.


Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Holger Schurig
I have rejoiced prematurely, it just now took way longer I hit the
segfault. Previously 1m or at max 2m was enough.

root@ptxc:~# stress-ng --sock 20
stress-ng: info: [359] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 20 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
[   42.253392] random: nonblocking pool is initialized
[  567.649965] Unable to handle kernel NULL pointer dereference at virtual 
address 0104
[  567.658087] pgd = ee11c000
[  567.660797] [0104] *pgd=3eaf4831, *pte=, *ppte=
[  567.667112] Internal error: Oops: 817 [#1] SMP ARM
[  567.671904] Modules linked in: bnep btusb btrtl btbcm btintel bluetooth 
smsc95xx usbnet usbhid mii imx_sdma flexcan
[  567.682514] CPU: 1 PID: 383 Comm: stress-ng-socke Not tainted 4.4.4PTXC #3
[  567.689390] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  567.695920] task: ed9f9e00 ti: eeaf task.ti: eeaf
[  567.701333] PC is at __rmqueue+0x74/0x308
[  567.705346] LR is at 0x3
[  567.707882] pc : []lr : [<0003>]psr: 60030093
[  567.707882] sp : eeaf1c00  ip : 0200  fp : eeaf1c4c
[  567.719359] r10: efd5f514  r9 : 0008  r8 : 
[  567.724585] r7 : 0003  r6 :   r5 : c051343c  r4 : 0100
[  567.731113] r3 : c05d6e2c  r2 : 006c  r1 : 0200  r0 : 0100
[  567.737643] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
none
[  567.744866] Control: 10c5387d  Table: 3e11c04a  DAC: 0051
[  567.750612] Process stress-ng-socke (pid: 383, stack limit = 0xeeaf0210)
[  567.757314] Stack: (0xeeaf1c00 to 0xeeaf2000)
[  567.761677] 1c00:  c05d3200 ed976880 ed976880 eeaf1c54 c05d6d40 
c03dc720 c03da9e8
[  567.769859] 1c20: 20030013 eeaf1d54 c051343c c0513428 c0513428 6104de4b 
0008 c05d6d40
[  567.778040] 1c40: eeaf1ce4 eeaf1c50 c0097d84 c0097364 0141 0002a602 
ef7bc2c0 0018
[  567.786220] 1c60: c05d77c0 c05d77c0  c05d6d40   
 
[  567.794401] 1c80: 0100 c05d6f50 c05d77c8 c05d6e68 c05d78d5 0128 
0141 020252c0
[  567.802582] 1ca0:  fff8  eeaf1d54 60030013 0003 
 020052c0
[  567.810762] 1cc0: 0003 c05d77c0 ffcb 6104de4b eeaf1e84  
eeaf1d9c eeaf1ce8
[  567.818942] 1ce0: c0098154 c009766c c006cb38 00100010 60ecb9db 40030013 
eeaf1d1c ed976880
[  567.827123] 1d00: ed976880 0004 eacecc00 ed976880 eeaf1d84 eeaf1d20 
c03f4d44 c03f2c30
[  567.835304] 1d20: 0002 ef001c00  024102c0  000346db 
c05b8100 
[  567.843484] 1d40: 0002 ed976c14 0002   c05d77c0 
 c05d6d40
[  567.851664] 1d60:     eeaf1e84 ed9fa2b4 
024000c0 0fb0
[  567.859845] 1d80: ffcb 6104de4b eeaf1e84  eeaf1db4 eeaf1da0 
c03901d4 c0098088
[  567.868025] 1da0: ed976880 ed976880 eeaf1dcc eeaf1db8 c039024c c0390170 
eaf60600 ed976880
[  567.876206] 1dc0: eeaf1e4c eeaf1dd0 c03e83fc c039023c ffcb 0001 
23c09b2d 17c8
[  567.884386] 1de0: 0001 eeaf1e8c c05b8bb4 eeaf 0001  
ed9fa2b4 
[  567.892566] 1e00: 0001 ed976938 ffcb 0c90 c004c6d8 ffcb 
7fff c00473e8
[  567.900747] 1e20: ed983c80 ed976880    eea03780 
eeaf 
[  567.908927] 1e40: eeaf1e6c eeaf1e50 c040ec5c c03e8228   
eeaf1eec 
[  567.917107] 1e60: eeaf1e7c eeaf1e70 c038c308 c040ebd4 eeaf1ed4 eeaf1e80 
c038c3a4 c038c2f8
[  567.925287] 1e80: c0050004   0001 0c90 0fb0 
eeaf1ee4 0001
[  567.933467] 1ea0:    eeaf1f00 afb50401 eea03780 
eeaf1f80 
[  567.941647] 1ec0:  c000fae4 eeaf1f3c eeaf1ed8 c00cfe84 c038c324 
1c40 ef0a4000
[  567.949828] 1ee0: eeaf1f14 bef469bc 1c40 0001  1c40 
eeaf1ee4 0001
[  567.958008] 1f00: eea03780      
 
[  567.966189] 1f20: eea03780 1c40 bef469bc eeaf1f80 eeaf1f4c eeaf1f40 
c00cfedc c00cfe08
[  567.974369] 1f40: eeaf1f7c eeaf1f50 c00d0688 c00cfeb4   
eeaf1f7c eea03780
[  567.982550] 1f60: eea03780 1c40 bef469bc c000fae4 eeaf1fa4 eeaf1f80 
c00d0f64 c00d05fc
[  567.990730] 1f80:   0004 0002a1e8 b6f94598 0004 
 eeaf1fa8
[  567.998910] 1fa0: c000f920 c00d0f24 0004 0002a1e8 0004 bef469bc 
1c40 bef489bc
[  568.007091] 1fc0: 0004 0002a1e8 b6f94598 0004 1c40 018d 
0002a1f0 0003
[  568.015271] 1fe0:  bef468f4 00014a57 b6ecf4d6 40030030 0004 
 
[  568.023447] Backtrace: 
[  568.025916] [] (__rmqueue) from [] 
(get_page_from_freelist+0x724/0x914)
[  568.034267]  r10:c05d6d40 r9:0008 r8:6104de4b r7:c0513428 r6:c0513428 
r5:c051343c
[  568.042164]  r4:eeaf1d54
[  568.044716] [] 

Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Holger Schurig
I have rejoiced prematurely, it just now took way longer I hit the
segfault. Previously 1m or at max 2m was enough.

root@ptxc:~# stress-ng --sock 20
stress-ng: info: [359] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 20 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
[   42.253392] random: nonblocking pool is initialized
[  567.649965] Unable to handle kernel NULL pointer dereference at virtual 
address 0104
[  567.658087] pgd = ee11c000
[  567.660797] [0104] *pgd=3eaf4831, *pte=, *ppte=
[  567.667112] Internal error: Oops: 817 [#1] SMP ARM
[  567.671904] Modules linked in: bnep btusb btrtl btbcm btintel bluetooth 
smsc95xx usbnet usbhid mii imx_sdma flexcan
[  567.682514] CPU: 1 PID: 383 Comm: stress-ng-socke Not tainted 4.4.4PTXC #3
[  567.689390] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  567.695920] task: ed9f9e00 ti: eeaf task.ti: eeaf
[  567.701333] PC is at __rmqueue+0x74/0x308
[  567.705346] LR is at 0x3
[  567.707882] pc : []lr : [<0003>]psr: 60030093
[  567.707882] sp : eeaf1c00  ip : 0200  fp : eeaf1c4c
[  567.719359] r10: efd5f514  r9 : 0008  r8 : 
[  567.724585] r7 : 0003  r6 :   r5 : c051343c  r4 : 0100
[  567.731113] r3 : c05d6e2c  r2 : 006c  r1 : 0200  r0 : 0100
[  567.737643] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
none
[  567.744866] Control: 10c5387d  Table: 3e11c04a  DAC: 0051
[  567.750612] Process stress-ng-socke (pid: 383, stack limit = 0xeeaf0210)
[  567.757314] Stack: (0xeeaf1c00 to 0xeeaf2000)
[  567.761677] 1c00:  c05d3200 ed976880 ed976880 eeaf1c54 c05d6d40 
c03dc720 c03da9e8
[  567.769859] 1c20: 20030013 eeaf1d54 c051343c c0513428 c0513428 6104de4b 
0008 c05d6d40
[  567.778040] 1c40: eeaf1ce4 eeaf1c50 c0097d84 c0097364 0141 0002a602 
ef7bc2c0 0018
[  567.786220] 1c60: c05d77c0 c05d77c0  c05d6d40   
 
[  567.794401] 1c80: 0100 c05d6f50 c05d77c8 c05d6e68 c05d78d5 0128 
0141 020252c0
[  567.802582] 1ca0:  fff8  eeaf1d54 60030013 0003 
 020052c0
[  567.810762] 1cc0: 0003 c05d77c0 ffcb 6104de4b eeaf1e84  
eeaf1d9c eeaf1ce8
[  567.818942] 1ce0: c0098154 c009766c c006cb38 00100010 60ecb9db 40030013 
eeaf1d1c ed976880
[  567.827123] 1d00: ed976880 0004 eacecc00 ed976880 eeaf1d84 eeaf1d20 
c03f4d44 c03f2c30
[  567.835304] 1d20: 0002 ef001c00  024102c0  000346db 
c05b8100 
[  567.843484] 1d40: 0002 ed976c14 0002   c05d77c0 
 c05d6d40
[  567.851664] 1d60:     eeaf1e84 ed9fa2b4 
024000c0 0fb0
[  567.859845] 1d80: ffcb 6104de4b eeaf1e84  eeaf1db4 eeaf1da0 
c03901d4 c0098088
[  567.868025] 1da0: ed976880 ed976880 eeaf1dcc eeaf1db8 c039024c c0390170 
eaf60600 ed976880
[  567.876206] 1dc0: eeaf1e4c eeaf1dd0 c03e83fc c039023c ffcb 0001 
23c09b2d 17c8
[  567.884386] 1de0: 0001 eeaf1e8c c05b8bb4 eeaf 0001  
ed9fa2b4 
[  567.892566] 1e00: 0001 ed976938 ffcb 0c90 c004c6d8 ffcb 
7fff c00473e8
[  567.900747] 1e20: ed983c80 ed976880    eea03780 
eeaf 
[  567.908927] 1e40: eeaf1e6c eeaf1e50 c040ec5c c03e8228   
eeaf1eec 
[  567.917107] 1e60: eeaf1e7c eeaf1e70 c038c308 c040ebd4 eeaf1ed4 eeaf1e80 
c038c3a4 c038c2f8
[  567.925287] 1e80: c0050004   0001 0c90 0fb0 
eeaf1ee4 0001
[  567.933467] 1ea0:    eeaf1f00 afb50401 eea03780 
eeaf1f80 
[  567.941647] 1ec0:  c000fae4 eeaf1f3c eeaf1ed8 c00cfe84 c038c324 
1c40 ef0a4000
[  567.949828] 1ee0: eeaf1f14 bef469bc 1c40 0001  1c40 
eeaf1ee4 0001
[  567.958008] 1f00: eea03780      
 
[  567.966189] 1f20: eea03780 1c40 bef469bc eeaf1f80 eeaf1f4c eeaf1f40 
c00cfedc c00cfe08
[  567.974369] 1f40: eeaf1f7c eeaf1f50 c00d0688 c00cfeb4   
eeaf1f7c eea03780
[  567.982550] 1f60: eea03780 1c40 bef469bc c000fae4 eeaf1fa4 eeaf1f80 
c00d0f64 c00d05fc
[  567.990730] 1f80:   0004 0002a1e8 b6f94598 0004 
 eeaf1fa8
[  567.998910] 1fa0: c000f920 c00d0f24 0004 0002a1e8 0004 bef469bc 
1c40 bef489bc
[  568.007091] 1fc0: 0004 0002a1e8 b6f94598 0004 1c40 018d 
0002a1f0 0003
[  568.015271] 1fe0:  bef468f4 00014a57 b6ecf4d6 40030030 0004 
 
[  568.023447] Backtrace: 
[  568.025916] [] (__rmqueue) from [] 
(get_page_from_freelist+0x724/0x914)
[  568.034267]  r10:c05d6d40 r9:0008 r8:6104de4b r7:c0513428 r6:c0513428 
r5:c051343c
[  568.042164]  r4:eeaf1d54
[  568.044716] [] 

Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Holger Schurig
I compared my config with imx_v6_v7_defconfig which didn't segfault.

After I turned on CONFIG_SWAP, my segfault vanished.

I did turn off CONFIG_SWAP because my device only has SD-Card and eMMC.
So I never intended to create a swap partition. And thought "why compile
it in the kernel when I never use it?".

But it seems the kernel is instable with this setting.

So we have a potential denial-of-service in kernels compiled without
CONFIG_SWAP, don't we? At least when it comes to skb handling.

Other memory tests never showed anything weird, and my system is running
X11 with some Qt applications as well as Java applications since about a
year without trouble. During all this time without CONFIG_SWAP.


Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-07 Thread Holger Schurig
I compared my config with imx_v6_v7_defconfig which didn't segfault.

After I turned on CONFIG_SWAP, my segfault vanished.

I did turn off CONFIG_SWAP because my device only has SD-Card and eMMC.
So I never intended to create a swap partition. And thought "why compile
it in the kernel when I never use it?".

But it seems the kernel is instable with this setting.

So we have a potential denial-of-service in kernels compiled without
CONFIG_SWAP, don't we? At least when it comes to skb handling.

Other memory tests never showed anything weird, and my system is running
X11 with some Qt applications as well as Java applications since about a
year without trouble. During all this time without CONFIG_SWAP.


Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-04 Thread Holger Schurig
Tetsui wrote:

> This might be a mm problem. Please send to linux...@kvack.org .
>
> Before doing so, please identify line number using
>
>  $ addr2line -i -e /path/to/vmlinux c0097288
>
> etc. if built with CONFIG_DEBUG_INFO=y.
> (If CONFIG_DEBUG_INFO=n, please rebuild with CONFIG_DEBUG_INFO=y and try to 
> reproduce.)

thanks for this hint.

- I have recompiled it with CONFIG_DEBUG_INFO.
- while at it, I switched from 4.4.3 to 4.4.4
- this changed the address c0097288 to c00972ec.
- addr2line says it's in linux-4.4/mm/page_alloc.c:1792

1790: static struct page *__rmqueue(struct zone *zone, unsigned int order,
1791: int migratetype, gfp_t gfp_flags)
1792: {
1793: struct page *page;

So I did an "arm-linux-gnueabihf-objdump -Sgd linux/vmlinux", not sure
if that helps:

c00972ec <__rmqueue>:
 * Do the hard work of removing an element from the buddy allocator.
 * Call me with the zone->lock already held.
 */
static struct page *__rmqueue(struct zone *zone, unsigned int order,
int migratetype, gfp_t gfp_flags)
{
c00972ec:   e1a0c00dmov ip, sp
c00972f0:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl, fp, ip, 
lr, pc}
c00972f4:   e24cb004sub fp, ip, #4
c00972f8:   e24dd024sub sp, sp, #36 ; 0x24
unsigned int current_order;
struct free_area *area;
struct page *page;

/* Find a page of the appropriate size in the preferred list */
for (current_order = order; current_order < MAX_ORDER; ++current_order) 
{
c00972fc:   e351000acmp r1, #10
 * Do the hard work of removing an element from the buddy allocator.
 * Call me with the zone->lock already held.
 */



Here's the new backtrace:

root@ptxc:~# stress-ng --sock 5
stress-ng: info: [374] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 10 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
Unable to handle kernel NULL pointer dereference at virtual address 0104
pgd = eeb64000
[0104] *pgd=3c596831, *pte=, *ppte=
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: bnep smsc95xx usbhid usbnet mii imx_sdma flexcan btusb btrtl 
btbcm btintel bluetooth
CPU: 1 PID: 378 Comm: stress-ng-socke Not tainted 4.4.4 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: ee907300 ti: eebc task.ti: eebc
PC is at __rmqueue+0x74/0x308
LR is at 0x3
pc : []lr : [<0003>]psr: 60030093
sp : eebc1c00  ip : 0200  fp : eebc1c4c
r10: efd85114  r9 : 0008  r8 : 
r7 : 0003  r6 :   r5 : c050d068  r4 : 0100
r3 : c05d03ac  r2 : 006c  r1 : 0200  r0 : 0100
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 3eb6404a  DAC: 0051
Process stress-ng-socke (pid: 378, stack limit = 0xeebc0210)
Stack: (0xeebc1c00 to 0xeebc2000)
1c00: 60030113 ef7b6b54 c05d0e09 c05ade00 0003 c05d02c0 c043ab20 c05ade00
1c20: 20030013 eebc1d54 c050d068 c050d054 c050d054 32ea424b 0008 c05d02c0
1c40: eebc1ce4 eebc1c50 c0097d18 c00972f8 0141 0002bbd2 ef7bc380 0018
1c60: c05d0d40 c05b2100 17f9 c043abb0 000a c05d39c0  c05b2080
1c80: 0100 c05d04d0 c05d0d48 c05d03e8 c05d0e55 0128 0141 020252c0
1ca0:  fff8  eebc1d54 60030013 0003 c0061cb8 020052c0
1cc0: 0003 c05d0d40 5580 32ea424b eebc1e84  eebc1d9c eebc1ce8
1ce0: c00980e8 c0097600 eebc1d1c 80100010 c0009440 c005da78 c01d4fa0 20030013
1d00:  eebc1d54 ee30ed80 ee18f380 eebc1d84 eebc1d20 c03ee8e4 c03ec7d0
1d20: eeb976e0 ef001c00  024102c0  000346db c05b2100 
1d40: 0002 ee18f714 0005   c05d0d40  c05d02c0
1d60:     eebc1e84 ee9077b4 024000c0 1180
1d80: 5580 32ea424b eebc1e84  eebc1db4 eebc1da0 c0389d74 c009801c
1da0: ee18f380 ee18f380 eebc1dcc eebc1db8 c0389dec c0389d10 ee37e180 ee18f380
1dc0: eebc1e4c eebc1dd0 c03e1f9c c0389ddc 5580 0001 21bd7366 3590
1de0: 0001 eebc1e8c c05b2bb4 eebc 0001  ee9077b4 
1e00: 0001 ee18f438 5580 08d0 c04357e8 5580 7fff 6ea3
1e20: eebc1eb4 ee18f380    edd370c0 eebc 
1e40: eebc1e6c eebc1e50 c04087fc c03e1dc8 edfc1ea0 ee18f380 eebc1eec 
1e60: eebc1e7c eebc1e70 c0385ea8 c0408774 eebc1ed4 eebc1e80 c0385f44 c0385e98
1e80: c00e62b4   0001 08d0 1180 eebc1ee4 0001
1ea0:    eebc1f00  edd370c0 eebc1f80 
1ec0:  c000fae4 eebc1f3c eebc1ed8 c00c9b90 c0385ec4 1a50 0004
1ee0: eebc1f1c bef549bc 1a50 0001  1a50 eebc1ee4 0001
1f00: edd370c0       

Re: 4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-04 Thread Holger Schurig
Tetsui wrote:

> This might be a mm problem. Please send to linux...@kvack.org .
>
> Before doing so, please identify line number using
>
>  $ addr2line -i -e /path/to/vmlinux c0097288
>
> etc. if built with CONFIG_DEBUG_INFO=y.
> (If CONFIG_DEBUG_INFO=n, please rebuild with CONFIG_DEBUG_INFO=y and try to 
> reproduce.)

thanks for this hint.

- I have recompiled it with CONFIG_DEBUG_INFO.
- while at it, I switched from 4.4.3 to 4.4.4
- this changed the address c0097288 to c00972ec.
- addr2line says it's in linux-4.4/mm/page_alloc.c:1792

1790: static struct page *__rmqueue(struct zone *zone, unsigned int order,
1791: int migratetype, gfp_t gfp_flags)
1792: {
1793: struct page *page;

So I did an "arm-linux-gnueabihf-objdump -Sgd linux/vmlinux", not sure
if that helps:

c00972ec <__rmqueue>:
 * Do the hard work of removing an element from the buddy allocator.
 * Call me with the zone->lock already held.
 */
static struct page *__rmqueue(struct zone *zone, unsigned int order,
int migratetype, gfp_t gfp_flags)
{
c00972ec:   e1a0c00dmov ip, sp
c00972f0:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl, fp, ip, 
lr, pc}
c00972f4:   e24cb004sub fp, ip, #4
c00972f8:   e24dd024sub sp, sp, #36 ; 0x24
unsigned int current_order;
struct free_area *area;
struct page *page;

/* Find a page of the appropriate size in the preferred list */
for (current_order = order; current_order < MAX_ORDER; ++current_order) 
{
c00972fc:   e351000acmp r1, #10
 * Do the hard work of removing an element from the buddy allocator.
 * Call me with the zone->lock already held.
 */



Here's the new backtrace:

root@ptxc:~# stress-ng --sock 5
stress-ng: info: [374] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 10 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
Unable to handle kernel NULL pointer dereference at virtual address 0104
pgd = eeb64000
[0104] *pgd=3c596831, *pte=, *ppte=
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: bnep smsc95xx usbhid usbnet mii imx_sdma flexcan btusb btrtl 
btbcm btintel bluetooth
CPU: 1 PID: 378 Comm: stress-ng-socke Not tainted 4.4.4 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: ee907300 ti: eebc task.ti: eebc
PC is at __rmqueue+0x74/0x308
LR is at 0x3
pc : []lr : [<0003>]psr: 60030093
sp : eebc1c00  ip : 0200  fp : eebc1c4c
r10: efd85114  r9 : 0008  r8 : 
r7 : 0003  r6 :   r5 : c050d068  r4 : 0100
r3 : c05d03ac  r2 : 006c  r1 : 0200  r0 : 0100
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 3eb6404a  DAC: 0051
Process stress-ng-socke (pid: 378, stack limit = 0xeebc0210)
Stack: (0xeebc1c00 to 0xeebc2000)
1c00: 60030113 ef7b6b54 c05d0e09 c05ade00 0003 c05d02c0 c043ab20 c05ade00
1c20: 20030013 eebc1d54 c050d068 c050d054 c050d054 32ea424b 0008 c05d02c0
1c40: eebc1ce4 eebc1c50 c0097d18 c00972f8 0141 0002bbd2 ef7bc380 0018
1c60: c05d0d40 c05b2100 17f9 c043abb0 000a c05d39c0  c05b2080
1c80: 0100 c05d04d0 c05d0d48 c05d03e8 c05d0e55 0128 0141 020252c0
1ca0:  fff8  eebc1d54 60030013 0003 c0061cb8 020052c0
1cc0: 0003 c05d0d40 5580 32ea424b eebc1e84  eebc1d9c eebc1ce8
1ce0: c00980e8 c0097600 eebc1d1c 80100010 c0009440 c005da78 c01d4fa0 20030013
1d00:  eebc1d54 ee30ed80 ee18f380 eebc1d84 eebc1d20 c03ee8e4 c03ec7d0
1d20: eeb976e0 ef001c00  024102c0  000346db c05b2100 
1d40: 0002 ee18f714 0005   c05d0d40  c05d02c0
1d60:     eebc1e84 ee9077b4 024000c0 1180
1d80: 5580 32ea424b eebc1e84  eebc1db4 eebc1da0 c0389d74 c009801c
1da0: ee18f380 ee18f380 eebc1dcc eebc1db8 c0389dec c0389d10 ee37e180 ee18f380
1dc0: eebc1e4c eebc1dd0 c03e1f9c c0389ddc 5580 0001 21bd7366 3590
1de0: 0001 eebc1e8c c05b2bb4 eebc 0001  ee9077b4 
1e00: 0001 ee18f438 5580 08d0 c04357e8 5580 7fff 6ea3
1e20: eebc1eb4 ee18f380    edd370c0 eebc 
1e40: eebc1e6c eebc1e50 c04087fc c03e1dc8 edfc1ea0 ee18f380 eebc1eec 
1e60: eebc1e7c eebc1e70 c0385ea8 c0408774 eebc1ed4 eebc1e80 c0385f44 c0385e98
1e80: c00e62b4   0001 08d0 1180 eebc1ee4 0001
1ea0:    eebc1f00  edd370c0 eebc1f80 
1ec0:  c000fae4 eebc1f3c eebc1ed8 c00c9b90 c0385ec4 1a50 0004
1ee0: eebc1f1c bef549bc 1a50 0001  1a50 eebc1ee4 0001
1f00: edd370c0       

4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-03 Thread Holger Schurig
Hi,

on my system I can reproduce reliably a kernel OOPS when I run stress-ng
("apt-get install stress-ng"). Any help on how to track this down would
be appreciated, networking code is outside of my comfort zone (I'm just
a dilettante at device drivers ...).

It takes only a minute or two to get the OOPS:

root@ptxc:~# stress-ng --sock 5
stress-ng: info: [361] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 5 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
Unable to handle kernel NULL pointer dereference at virtual address 0104
pgd = ee0d8000
[0104] *pgd=3e17c831, *pte=, *ppte=
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: bnep smsc95xx usbnet mii usbhid imx_sdma flexcan btusb btrtl 
btbcm btintel bluetooth
CPU: 2 PID: 362 Comm: stress-ng-socke Not tainted 4.4.3 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: eeb30a00 ti: eea0a000 task.ti: eea0a000
PC is at __rmqueue+0x74/0x308
LR is at 0x3
pc : []lr : [<0003>]psr: 60030093
sp : eea0bc08  ip : 0200  fp : eea0bc54
r10: efd80b14  r9 : 0008  r8 : 
r7 : 0003  r6 :   r5 : c050bff8  r4 : 0100
r3 : c05ce36c  r2 : 006c  r1 : 0200  r0 : 0100
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 3e0d804a  DAC: 0051
Process stress-ng-socke (pid: 362, stack limit = 0xeea0a210)
Stack: (0xeea0bc08 to 0xeea0c000)
bc00:    c05ca780 ed93dd80 ed93dd80 eea0bc5c c05ce280
bc20: c03d5838 c03d3b00 c05b04f8 eea0bd5c c050bff8 c050bfe4 c050bfe4 ed93de38
bc40: 0008 c05ce280 eea0bcec eea0bc58 c0097cb4 c0097294 0141 0002c26d
bc60: c03d59c4 0018 c05ced00 c05b0100 acd4 c0439bb0 000a c05d19c0
bc80:  c05b0080 0100 c05ce490 c05ced08 c05ce3a8 c05cee15 0128
bca0: 0141 020252c0  fff8  eea0bd5c 60030013 0003
bcc0: eea0bcf4 020052c0 0003 c05ced00 ffcb ed93de38 eea0be84 
bce0: eea0bda4 eea0bcf0 c0098084 c009759c c006caf8 80100010 0fcfc2fc 40030013
bd00: eea0bd24 ed93dd80 ed93dd80 0004 ed999e00 ed93dd80 eea0bd8c eea0bd28
bd20: c03ee130 c03ebcac 0002 ef001c00  024102c0  000346db
bd40: c05b0100  0002 ed93e114 0005   c05ced00
bd60:  c05ce280     eea0be84 eeb30eb4
bd80: 024000c0 05d0 ffcb ed93de38 eea0be84  eea0bdbc eea0bda8
bda0: c0389650 c0097fb8 ed93dd80 ed93dd80 eea0bdd4 eea0bdc0 c03896c8 c03895ec
bdc0: ed999e00 ed93dd80 eea0be4c eea0bdd8 c03e14d4 c03896b8 ffcb 0014
bde0: 14bf 0001 eeb30eb4 0001 0001  eea0a018 
be00: eeb30eb4 0001 ffcb 0560 c0434ca8 ffcb 7fff 7fff
be20: ed958000 ed93dd80    eea6c000 eea0a000 
be40: eea0be6c eea0be50 c0407cbc c03e131c ee98c1a0 ed93dd80 eea0beec 
be60: eea0be7c eea0be70 c0385784 c0407c34 eea0bed4 eea0be80 c0385820 c0385774
be80: c00e6220   0001 0560 05d0 eea0bee4 0001
bea0:    eea0bf00  eea6c000 eea0bf80 
bec0:  c000fae4 eea0bf3c eea0bed8 c00c9b2c c03857a0 0b30 0004
bee0: eea0bf1c bea359bc 0b30 0001  0b30 eea0bee4 0001
bf00: eea6c000       
bf20: eea6c000 0b30 bea359bc eea0bf80 eea0bf4c eea0bf40 c00c9b84 c00c9ab0
bf40: eea0bf7c eea0bf50 c00ca330 c00c9b5c   eea0bf7c eea6c000
bf60: eea6c000 0b30 bea359bc c000fae4 eea0bfa4 eea0bf80 c00cac0c c00ca2a4
bf80:   0004 0002a1e8 b6f6f140 0004  eea0bfa8
bfa0: c000f920 c00cabcc 0004 0002a1e8 0004 bea359bc 0b30 bea379bc
bfc0: 0004 0002a1e8 b6f6f140 0004 0b30 016f 0002a1f0 0003
bfe0:  bea358f4 00014a57 b6eaa4d6 40030030 0004  
Backtrace: 
[] (__rmqueue) from [] (get_page_from_freelist+0x724/0x914)
 r10:c05ce280 r9:0008 r8:ed93de38 r7:c050bfe4 r6:c050bfe4 r5:c050bff8
 r4:eea0bd5c
[] (get_page_from_freelist) from [] 
(__alloc_pages_nodemask+0xd8/0x898)
 r10: r9:eea0be84 r8:ed93de38 r7:ffcb r6:c05ced00 r5:0003
 r4:020052c0
[] (__alloc_pages_nodemask) from [] 
(skb_page_frag_refill+0x70/0xcc)
 r10: r9:eea0be84 r8:ed93de38 r7:ffcb r6:05d0 r5:024000c0
 r4:eeb30eb4
[] (skb_page_frag_refill) from [] 
(sk_page_frag_refill+0x1c/0x74)
 r5:ed93dd80 r4:ed93dd80
[] (sk_page_frag_refill) from [] (tcp_sendmsg+0x1c4/0xa58)
 r5:ed93dd80 r4:ed999e00
[] (tcp_sendmsg) from [] (inet_sendmsg+0x94/0xc8)
 r10: r9:eea0a000 r8:eea6c000 r7: r6: r5:
 r4:ed93dd80
[] (inet_sendmsg) from [] (sock_sendmsg+0x1c/0x2c)
 r5: r4:eea0beec
[] (sock_sendmsg) from [] (sock_write_iter+0x8c/0xc0)
[] 

4.4.3: OOPS when running "stress-ng --sock 5"

2016-03-03 Thread Holger Schurig
Hi,

on my system I can reproduce reliably a kernel OOPS when I run stress-ng
("apt-get install stress-ng"). Any help on how to track this down would
be appreciated, networking code is outside of my comfort zone (I'm just
a dilettante at device drivers ...).

It takes only a minute or two to get the OOPS:

root@ptxc:~# stress-ng --sock 5
stress-ng: info: [361] dispatching hogs: 0 I/O-Sync, 0 CPU, 0 VM-mmap, 0 
HDD-Write, 0 Fork, 0 Context-switch, 0 Pipe, 0 Cache, 5 Socket, 0 Yield, 0 
Fallocate, 0 Flock, 0 Affinity, 0 Timer, 0 Dentry, 0 Urandom, 0 Float, 0 Int, 0 
Semaphore, 0 Open, 0 SigQueue, 0 Poll
Unable to handle kernel NULL pointer dereference at virtual address 0104
pgd = ee0d8000
[0104] *pgd=3e17c831, *pte=, *ppte=
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: bnep smsc95xx usbnet mii usbhid imx_sdma flexcan btusb btrtl 
btbcm btintel bluetooth
CPU: 2 PID: 362 Comm: stress-ng-socke Not tainted 4.4.3 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: eeb30a00 ti: eea0a000 task.ti: eea0a000
PC is at __rmqueue+0x74/0x308
LR is at 0x3
pc : []lr : [<0003>]psr: 60030093
sp : eea0bc08  ip : 0200  fp : eea0bc54
r10: efd80b14  r9 : 0008  r8 : 
r7 : 0003  r6 :   r5 : c050bff8  r4 : 0100
r3 : c05ce36c  r2 : 006c  r1 : 0200  r0 : 0100
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 3e0d804a  DAC: 0051
Process stress-ng-socke (pid: 362, stack limit = 0xeea0a210)
Stack: (0xeea0bc08 to 0xeea0c000)
bc00:    c05ca780 ed93dd80 ed93dd80 eea0bc5c c05ce280
bc20: c03d5838 c03d3b00 c05b04f8 eea0bd5c c050bff8 c050bfe4 c050bfe4 ed93de38
bc40: 0008 c05ce280 eea0bcec eea0bc58 c0097cb4 c0097294 0141 0002c26d
bc60: c03d59c4 0018 c05ced00 c05b0100 acd4 c0439bb0 000a c05d19c0
bc80:  c05b0080 0100 c05ce490 c05ced08 c05ce3a8 c05cee15 0128
bca0: 0141 020252c0  fff8  eea0bd5c 60030013 0003
bcc0: eea0bcf4 020052c0 0003 c05ced00 ffcb ed93de38 eea0be84 
bce0: eea0bda4 eea0bcf0 c0098084 c009759c c006caf8 80100010 0fcfc2fc 40030013
bd00: eea0bd24 ed93dd80 ed93dd80 0004 ed999e00 ed93dd80 eea0bd8c eea0bd28
bd20: c03ee130 c03ebcac 0002 ef001c00  024102c0  000346db
bd40: c05b0100  0002 ed93e114 0005   c05ced00
bd60:  c05ce280     eea0be84 eeb30eb4
bd80: 024000c0 05d0 ffcb ed93de38 eea0be84  eea0bdbc eea0bda8
bda0: c0389650 c0097fb8 ed93dd80 ed93dd80 eea0bdd4 eea0bdc0 c03896c8 c03895ec
bdc0: ed999e00 ed93dd80 eea0be4c eea0bdd8 c03e14d4 c03896b8 ffcb 0014
bde0: 14bf 0001 eeb30eb4 0001 0001  eea0a018 
be00: eeb30eb4 0001 ffcb 0560 c0434ca8 ffcb 7fff 7fff
be20: ed958000 ed93dd80    eea6c000 eea0a000 
be40: eea0be6c eea0be50 c0407cbc c03e131c ee98c1a0 ed93dd80 eea0beec 
be60: eea0be7c eea0be70 c0385784 c0407c34 eea0bed4 eea0be80 c0385820 c0385774
be80: c00e6220   0001 0560 05d0 eea0bee4 0001
bea0:    eea0bf00  eea6c000 eea0bf80 
bec0:  c000fae4 eea0bf3c eea0bed8 c00c9b2c c03857a0 0b30 0004
bee0: eea0bf1c bea359bc 0b30 0001  0b30 eea0bee4 0001
bf00: eea6c000       
bf20: eea6c000 0b30 bea359bc eea0bf80 eea0bf4c eea0bf40 c00c9b84 c00c9ab0
bf40: eea0bf7c eea0bf50 c00ca330 c00c9b5c   eea0bf7c eea6c000
bf60: eea6c000 0b30 bea359bc c000fae4 eea0bfa4 eea0bf80 c00cac0c c00ca2a4
bf80:   0004 0002a1e8 b6f6f140 0004  eea0bfa8
bfa0: c000f920 c00cabcc 0004 0002a1e8 0004 bea359bc 0b30 bea379bc
bfc0: 0004 0002a1e8 b6f6f140 0004 0b30 016f 0002a1f0 0003
bfe0:  bea358f4 00014a57 b6eaa4d6 40030030 0004  
Backtrace: 
[] (__rmqueue) from [] (get_page_from_freelist+0x724/0x914)
 r10:c05ce280 r9:0008 r8:ed93de38 r7:c050bfe4 r6:c050bfe4 r5:c050bff8
 r4:eea0bd5c
[] (get_page_from_freelist) from [] 
(__alloc_pages_nodemask+0xd8/0x898)
 r10: r9:eea0be84 r8:ed93de38 r7:ffcb r6:c05ced00 r5:0003
 r4:020052c0
[] (__alloc_pages_nodemask) from [] 
(skb_page_frag_refill+0x70/0xcc)
 r10: r9:eea0be84 r8:ed93de38 r7:ffcb r6:05d0 r5:024000c0
 r4:eeb30eb4
[] (skb_page_frag_refill) from [] 
(sk_page_frag_refill+0x1c/0x74)
 r5:ed93dd80 r4:ed93dd80
[] (sk_page_frag_refill) from [] (tcp_sendmsg+0x1c4/0xa58)
 r5:ed93dd80 r4:ed999e00
[] (tcp_sendmsg) from [] (inet_sendmsg+0x94/0xc8)
 r10: r9:eea0a000 r8:eea6c000 r7: r6: r5:
 r4:ed93dd80
[] (inet_sendmsg) from [] (sock_sendmsg+0x1c/0x2c)
 r5: r4:eea0beec
[] (sock_sendmsg) from [] (sock_write_iter+0x8c/0xc0)
[]