c21b48 causes nfs client to hang and kernel BUG at ./include/linux/mm.h:432 (bisected)

2017-07-23 Thread Trevor Cordes
Hi!  I've bisected a bug I'm seeing to:
c21b48cc1bbf2f5af3ef54ada559f7fadf8b508b
net: adjust skb->truesize in ___pskb_trim()

The bug manifests as my NFS4 (TCP) client mount hanging after 5-10s of 
heavy read data transfer, which also produces: kernel BUG at 
./include/linux/mm.h:462 (see full trace at bottom)

I must reboot to regain nfs access.  I can make this bug occur on demand 
by just playing some video from the nfs export and skipping around for 
5-10s or so.

Every 10 or so times this bug occurs the entire system hard freezes and 
even magic SysRq won't work.

This bug started for me in Fedora 24 as soon as 4.11.x was put out.  F24 
4.10.x does not have this bug, and that is what I am currently using. I am 
currently taking the final step of rpmbuilding the F24 src rpm with a 
patch to undo c21b48, just to confirm it for sure.  But it will take all 
day to build, so I thought I'd report my bisect right now.

I've had a RHBZ up for this: 
https://bugzilla.redhat.com/show_bug.cgi?id=1455086

That BZ has some info as to my mount/export options.  My setup *must* be 
different from normal as I see no other person on the net reporting this 
bug.  My mount options might be a little tweaked, I guess; also FYI, this 
box uses a strict iptables setup, if that could matter.  I also use NFS4 
only, with TCP, and strictly-preset ports to be more iptables friendly.

Thanks!

Complete bug trace below (have dozens of these from tests if more are 
needed, but stack trace is the same *every* time):

Jul 23 05:34:14 pog kernel: [  118.789187] page:f662c8761a00 count:0 
mapcount:0 mapping:  (null) index:0x0
Jul 23 05:34:14 pog kernel: [  118.789196] flags: 0x17c000()
Jul 23 05:34:14 pog kernel: [  118.789201] raw: 0017c000 
  
Jul 23 05:34:14 pog kernel: [  118.789204] raw: dead0100 
dead0200  
Jul 23 05:34:14 pog kernel: [  118.789206] page dumped because: 
VM_BUG_ON_PAGE(page_ref_count(page) == 0)
Jul 23 05:34:14 pog kernel: [  118.789218] [ cut here ]
Jul 23 05:34:14 pog kernel: [  118.789221] kernel BUG at 
./include/linux/mm.h:462!
Jul 23 05:34:14 pog kernel: [  118.789224] invalid opcode:  [#1] SMP
Jul 23 05:34:14 pog kernel: [  118.789226] Modules linked in: rpcsec_gss_krb5 
nfsv4 xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 dns_resolver af_key nfs 
fscache 8021q garp mrp stp llc cfg80211 rfkill nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netbios_ns 
nf_conntrack_broadcast xt_mac nf_log_ipv4 nf_log_common xt_LOG xt_limit 
xt_conntrack xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_REDIRECT 
nf_nat_redirect iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables asc7621 
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_TCPMSS iptable_mangle iptable_raw 
iptable_security snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic 
snd_hda_intel snd_hda_codec coretemp ppdev iTCO_wdt gpio_ich 
iTCO_vendor_support snd_hda_core kvm_intel snd_hwdep kvm snd_seq
Jul 23 05:34:14 pog kernel: [  118.789288]  snd_seq_device irqbypass joydev 
snd_pcm nfsd pcspkr snd_timer i2c_i801 snd parport_pc acpi_cpufreq tpm_tis 
parport shpchp tpm_tis_core soundcore lpc_ich tpm auth_rpcgss nfs_acl lockd 
grace sunrpc raid1 raid0 firewire_ohci serio_raw firewire_core ata_generic 
pata_acpi crc_itu_t aic7xxx e1000e e1000 scsi_transport_spi ptp pps_core 
pata_marvell floppy
Jul 23 05:34:14 pog kernel: [  118.789327] CPU: 2 PID: 541 Comm: kworker/2:1H 
Not tainted 4.11.0-rc8+ #13
Jul 23 05:34:14 pog kernel: [  118.789330] Hardware name:  
/D975XBX2, BIOS BX97520J.86A.2838.2008.0903.1859 09/03/2008
Jul 23 05:34:14 pog kernel: [  118.789362] Workqueue: xprtiod 
xs_tcp_data_receive_workfn [sunrpc]
Jul 23 05:34:14 pog kernel: [  118.789366] task: 8d2919184880 task.stack: 
b168813a8000
Jul 23 05:34:14 pog kernel: [  118.789372] RIP: 0010:page_frag_free+0x6d/0x80
Jul 23 05:34:14 pog kernel: [  118.789375] RSP: 0018:b168813abd10 EFLAGS: 
00010246
Jul 23 05:34:14 pog kernel: [  118.789378] RAX: 003e RBX: 
f662c88c6d80 RCX: 0006
Jul 23 05:34:14 pog kernel: [  118.789380] RDX:  RSI: 
 RDI: 8d292fd0e0a0
Jul 23 05:34:14 pog kernel: [  118.789383] RBP: b168813abd10 R08: 
000c0ad6 R09: 03f4
Jul 23 05:34:14 pog kernel: [  118.789385] R10: 8d291a556d00 R11: 
b222ac2d R12: 8d291a556d00
Jul 23 05:34:14 pog kernel: [  118.789388] R13: 0008 R14: 
8d291d868340 R15: 2d40
Jul 23 05:34:14 pog kernel: [  118.789391] FS:  () 
GS:8d292fd0() knlGS:
Jul 23 05:34:14 pog kernel: [  118.789394] CS:  0010 DS:  ES:  CR0: 
80050033
Jul 23 05:34:14 pog kernel: [  118.789396] CR2: 7f7597990008 CR3: 
00021e71 CR4: 

c21b48 causes nfs client to hang and kernel BUG at ./include/linux/mm.h:432 (bisected)

2017-07-23 Thread Trevor Cordes
Hi!  I've bisected a bug I'm seeing to:
c21b48cc1bbf2f5af3ef54ada559f7fadf8b508b
net: adjust skb->truesize in ___pskb_trim()

The bug manifests as my NFS4 (TCP) client mount hanging after 5-10s of 
heavy read data transfer, which also produces: kernel BUG at 
./include/linux/mm.h:462 (see full trace at bottom)

I must reboot to regain nfs access.  I can make this bug occur on demand 
by just playing some video from the nfs export and skipping around for 
5-10s or so.

Every 10 or so times this bug occurs the entire system hard freezes and 
even magic SysRq won't work.

This bug started for me in Fedora 24 as soon as 4.11.x was put out.  F24 
4.10.x does not have this bug, and that is what I am currently using. I am 
currently taking the final step of rpmbuilding the F24 src rpm with a 
patch to undo c21b48, just to confirm it for sure.  But it will take all 
day to build, so I thought I'd report my bisect right now.

I've had a RHBZ up for this: 
https://bugzilla.redhat.com/show_bug.cgi?id=1455086

That BZ has some info as to my mount/export options.  My setup *must* be 
different from normal as I see no other person on the net reporting this 
bug.  My mount options might be a little tweaked, I guess; also FYI, this 
box uses a strict iptables setup, if that could matter.  I also use NFS4 
only, with TCP, and strictly-preset ports to be more iptables friendly.

Thanks!

Complete bug trace below (have dozens of these from tests if more are 
needed, but stack trace is the same *every* time):

Jul 23 05:34:14 pog kernel: [  118.789187] page:f662c8761a00 count:0 
mapcount:0 mapping:  (null) index:0x0
Jul 23 05:34:14 pog kernel: [  118.789196] flags: 0x17c000()
Jul 23 05:34:14 pog kernel: [  118.789201] raw: 0017c000 
  
Jul 23 05:34:14 pog kernel: [  118.789204] raw: dead0100 
dead0200  
Jul 23 05:34:14 pog kernel: [  118.789206] page dumped because: 
VM_BUG_ON_PAGE(page_ref_count(page) == 0)
Jul 23 05:34:14 pog kernel: [  118.789218] [ cut here ]
Jul 23 05:34:14 pog kernel: [  118.789221] kernel BUG at 
./include/linux/mm.h:462!
Jul 23 05:34:14 pog kernel: [  118.789224] invalid opcode:  [#1] SMP
Jul 23 05:34:14 pog kernel: [  118.789226] Modules linked in: rpcsec_gss_krb5 
nfsv4 xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 dns_resolver af_key nfs 
fscache 8021q garp mrp stp llc cfg80211 rfkill nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netbios_ns 
nf_conntrack_broadcast xt_mac nf_log_ipv4 nf_log_common xt_LOG xt_limit 
xt_conntrack xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_REDIRECT 
nf_nat_redirect iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables asc7621 
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_TCPMSS iptable_mangle iptable_raw 
iptable_security snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic 
snd_hda_intel snd_hda_codec coretemp ppdev iTCO_wdt gpio_ich 
iTCO_vendor_support snd_hda_core kvm_intel snd_hwdep kvm snd_seq
Jul 23 05:34:14 pog kernel: [  118.789288]  snd_seq_device irqbypass joydev 
snd_pcm nfsd pcspkr snd_timer i2c_i801 snd parport_pc acpi_cpufreq tpm_tis 
parport shpchp tpm_tis_core soundcore lpc_ich tpm auth_rpcgss nfs_acl lockd 
grace sunrpc raid1 raid0 firewire_ohci serio_raw firewire_core ata_generic 
pata_acpi crc_itu_t aic7xxx e1000e e1000 scsi_transport_spi ptp pps_core 
pata_marvell floppy
Jul 23 05:34:14 pog kernel: [  118.789327] CPU: 2 PID: 541 Comm: kworker/2:1H 
Not tainted 4.11.0-rc8+ #13
Jul 23 05:34:14 pog kernel: [  118.789330] Hardware name:  
/D975XBX2, BIOS BX97520J.86A.2838.2008.0903.1859 09/03/2008
Jul 23 05:34:14 pog kernel: [  118.789362] Workqueue: xprtiod 
xs_tcp_data_receive_workfn [sunrpc]
Jul 23 05:34:14 pog kernel: [  118.789366] task: 8d2919184880 task.stack: 
b168813a8000
Jul 23 05:34:14 pog kernel: [  118.789372] RIP: 0010:page_frag_free+0x6d/0x80
Jul 23 05:34:14 pog kernel: [  118.789375] RSP: 0018:b168813abd10 EFLAGS: 
00010246
Jul 23 05:34:14 pog kernel: [  118.789378] RAX: 003e RBX: 
f662c88c6d80 RCX: 0006
Jul 23 05:34:14 pog kernel: [  118.789380] RDX:  RSI: 
 RDI: 8d292fd0e0a0
Jul 23 05:34:14 pog kernel: [  118.789383] RBP: b168813abd10 R08: 
000c0ad6 R09: 03f4
Jul 23 05:34:14 pog kernel: [  118.789385] R10: 8d291a556d00 R11: 
b222ac2d R12: 8d291a556d00
Jul 23 05:34:14 pog kernel: [  118.789388] R13: 0008 R14: 
8d291d868340 R15: 2d40
Jul 23 05:34:14 pog kernel: [  118.789391] FS:  () 
GS:8d292fd0() knlGS:
Jul 23 05:34:14 pog kernel: [  118.789394] CS:  0010 DS:  ES:  CR0: 
80050033
Jul 23 05:34:14 pog kernel: [  118.789396] CR2: 7f7597990008 CR3: 
00021e71 CR4: 

Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-05 Thread Trevor Cordes
On 2017-02-05 Michal Hocko wrote:
> On Fri 03-02-17 18:36:54, Trevor Cordes wrote:
> > I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50
> > hours and it does NOT have the bug!  No problems at all so far.  
> 
> OK, that is definitely good to know. My other fix ("mm, vmscan:
> consider eligible zones in get_scan_count") was more theoretical than
> bug driven. I would add your
> Tested-by: Trevor Cordes <tre...@tecnopolis.ca>
> 
> unless you have anything against that.

I am happy to be in the tested-by; go ahead.

> > So I think whatever to_test/linus-tree/oom_hickups has that
> > since-4.9 has that vanilla 4.10-rc6 does *not* have is indeed the
> > fix.
> > 
> > For my reference, and I know you guys aren't distro-specific, what
> > is the best way to get this fix into Fedora 24 (currently 4.9)?  
> 
> I will send this patch to 4.9+ stable as soon as it hits Linus tree.

That's great news!  It will make everyone on the rhbz happy.  Thank you!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-05 Thread Trevor Cordes
On 2017-02-05 Michal Hocko wrote:
> On Fri 03-02-17 18:36:54, Trevor Cordes wrote:
> > I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50
> > hours and it does NOT have the bug!  No problems at all so far.  
> 
> OK, that is definitely good to know. My other fix ("mm, vmscan:
> consider eligible zones in get_scan_count") was more theoretical than
> bug driven. I would add your
> Tested-by: Trevor Cordes 
> 
> unless you have anything against that.

I am happy to be in the tested-by; go ahead.

> > So I think whatever to_test/linus-tree/oom_hickups has that
> > since-4.9 has that vanilla 4.10-rc6 does *not* have is indeed the
> > fix.
> > 
> > For my reference, and I know you guys aren't distro-specific, what
> > is the best way to get this fix into Fedora 24 (currently 4.9)?  
> 
> I will send this patch to 4.9+ stable as soon as it hits Linus tree.

That's great news!  It will make everyone on the rhbz happy.  Thank you!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-03 Thread Trevor Cordes
On 2017-02-01 Michal Hocko wrote:
> On Wed 01-02-17 03:29:28, Trevor Cordes wrote:
> > On 2017-01-30 Michal Hocko wrote:  
> [...]
> > > Testing with Valinall rc6 released just yesterday would be a good
> > > fit. There are some more fixes sitting on mmotm on top and maybe
> > > we want some of them in finall 4.10. Anyway all those pending
> > > changes should be merged in the next merge window - aka 4.11  
> > 
> > After 30 hours of running vanilla 4.10.0-rc6, the box started to go
> > bonkers at 3am, so vanilla does not fix the bug :-(  But, the bug
> > hit differently this time, the box just bogged down like crazy and
> > gave really weird top output.  Starting nano would take 10s, then
> > would run full speed, then when saving a file would take 5s.
> > Starting any prog not in cache took equally as long.  
> 
> Could you try with to_test/linus-tree/oom_hickups branch on the same
> git tree? I have cherry-picked "mm, vmscan: consider eligible zones in
> get_scan_count" which might be the missing part.

I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50 hours
and it does NOT have the bug!  No problems at all so far.

So I think whatever to_test/linus-tree/oom_hickups has that since-4.9
has that vanilla 4.10-rc6 does *not* have is indeed the fix.

For my reference, and I know you guys aren't distro-specific, what is
the best way to get this fix into Fedora 24 (currently 4.9)?  Can it be
backported or made as a patch they can apply to 4.9?  Or 4.10?  If this
fix only goes into 4.11 then I fear we'll never see it in Fedora and us
rhbz guys will not have a stock-Fedora fix for this until F25 or F26.
Again, I'm not trying to force this out of scope, I'm just wondering
about the logistics in these situations.

Once again, thanks to all for your great work and help!  P.S. I'll try
a couple of the other ideas Mel had about ramping the RAM back up, etc.


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-03 Thread Trevor Cordes
On 2017-02-01 Michal Hocko wrote:
> On Wed 01-02-17 03:29:28, Trevor Cordes wrote:
> > On 2017-01-30 Michal Hocko wrote:  
> [...]
> > > Testing with Valinall rc6 released just yesterday would be a good
> > > fit. There are some more fixes sitting on mmotm on top and maybe
> > > we want some of them in finall 4.10. Anyway all those pending
> > > changes should be merged in the next merge window - aka 4.11  
> > 
> > After 30 hours of running vanilla 4.10.0-rc6, the box started to go
> > bonkers at 3am, so vanilla does not fix the bug :-(  But, the bug
> > hit differently this time, the box just bogged down like crazy and
> > gave really weird top output.  Starting nano would take 10s, then
> > would run full speed, then when saving a file would take 5s.
> > Starting any prog not in cache took equally as long.  
> 
> Could you try with to_test/linus-tree/oom_hickups branch on the same
> git tree? I have cherry-picked "mm, vmscan: consider eligible zones in
> get_scan_count" which might be the missing part.

I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50 hours
and it does NOT have the bug!  No problems at all so far.

So I think whatever to_test/linus-tree/oom_hickups has that since-4.9
has that vanilla 4.10-rc6 does *not* have is indeed the fix.

For my reference, and I know you guys aren't distro-specific, what is
the best way to get this fix into Fedora 24 (currently 4.9)?  Can it be
backported or made as a patch they can apply to 4.9?  Or 4.10?  If this
fix only goes into 4.11 then I fear we'll never see it in Fedora and us
rhbz guys will not have a stock-Fedora fix for this until F25 or F26.
Again, I'm not trying to force this out of scope, I'm just wondering
about the logistics in these situations.

Once again, thanks to all for your great work and help!  P.S. I'll try
a couple of the other ideas Mel had about ramping the RAM back up, etc.


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-01 Thread Trevor Cordes
On 2017-01-30 Michal Hocko wrote:
> On Sun 29-01-17 16:50:03, Trevor Cordes wrote:
> > On 2017-01-25 Michal Hocko wrote:  
> > > On Wed 25-01-17 04:02:46, Trevor Cordes wrote:  
> > > > OK, I patched & compiled mhocko's git tree from the other day
> > > > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using
> > > > from a couple of weeks ago shows the newest commit (git log) is
> > > > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me
> > > > know if I'm doing something wrong, see below.)
> > > 
> > > My fault. I should have noted that you should use since-4.9
> > > branch.  
> > 
> > OK, I have good news.  I compiled your mhocko git tree (properly
> > this tim!) using since-4.9 branch (last commit
> > ca63ff9b11f958efafd8c8fa60fda14baec6149c Jan 25) and the box
> > survived 3 3am's, over 60 hours, and I made sure all the usual oom
> > culprits ran, and I ran extras (finds on the whole tree, extra
> > rdiff-backups) to try to tax it.  Based on my previous criteria I
> > would say your since-4.9 as of the above commit solves my bug, at
> > least over a 3 day test span (which it never survives when the bug
> > is present)!
> > 
> > I tested WITHOUT any cgroup/mem boot options.  I do still have my
> > mem=6G limiter on, though (I've never tested with it off, until I
> > solve the bug with it on, since I've had it on for many months for
> > other reasons).  
> 
> Good news indeed.

Even better, another guy on the rhbz reported the mhocko git tree
since-4.9 solves the bug for him too!  And it ran another night (4+
total) without problems on my box.  Whatever is in since-4.9 fixes it,
as I reported before.

But...

> Testing with Valinall rc6 released just yesterday would be a good fit.
> There are some more fixes sitting on mmotm on top and maybe we want
> some of them in finall 4.10. Anyway all those pending changes should
> be merged in the next merge window - aka 4.11

After 30 hours of running vanilla 4.10.0-rc6, the box started to go
bonkers at 3am, so vanilla does not fix the bug :-(  But, the bug hit
differently this time, the box just bogged down like crazy and gave
really weird top output.  Starting nano would take 10s, then would run
full speed, then when saving a file would take 5s.  Starting any prog
not in cache took equally as long.

However, no oom hit.  I waited about 15 minutes and things seemed to
bog more, so I rebooted into since-4.9.  Maybe if I had kept waiting
the box would have oom'd, but I didn't want to take the chance (it's
remote, and I can't reset it).

I did capture a lot of the weird top, meminfo and slabinfo data before
rebooting.  I'll attached the output to this email.  Messages show a
lot of "page allocation stalls" during the bogged-down time.

So my hunch at this moment is 4.10.0-rc6 might help alleviate the
problem somewhat, but it's other things you have in since-4.9 that
solve it completely.

Let me know if you need any more testing or some bisecting or
something.  I'll keep on running since-4.9 in the meantime.  Thanks!


4.10.rc6-bogged
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-02-01 Thread Trevor Cordes
On 2017-01-30 Michal Hocko wrote:
> On Sun 29-01-17 16:50:03, Trevor Cordes wrote:
> > On 2017-01-25 Michal Hocko wrote:  
> > > On Wed 25-01-17 04:02:46, Trevor Cordes wrote:  
> > > > OK, I patched & compiled mhocko's git tree from the other day
> > > > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using
> > > > from a couple of weeks ago shows the newest commit (git log) is
> > > > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me
> > > > know if I'm doing something wrong, see below.)
> > > 
> > > My fault. I should have noted that you should use since-4.9
> > > branch.  
> > 
> > OK, I have good news.  I compiled your mhocko git tree (properly
> > this tim!) using since-4.9 branch (last commit
> > ca63ff9b11f958efafd8c8fa60fda14baec6149c Jan 25) and the box
> > survived 3 3am's, over 60 hours, and I made sure all the usual oom
> > culprits ran, and I ran extras (finds on the whole tree, extra
> > rdiff-backups) to try to tax it.  Based on my previous criteria I
> > would say your since-4.9 as of the above commit solves my bug, at
> > least over a 3 day test span (which it never survives when the bug
> > is present)!
> > 
> > I tested WITHOUT any cgroup/mem boot options.  I do still have my
> > mem=6G limiter on, though (I've never tested with it off, until I
> > solve the bug with it on, since I've had it on for many months for
> > other reasons).  
> 
> Good news indeed.

Even better, another guy on the rhbz reported the mhocko git tree
since-4.9 solves the bug for him too!  And it ran another night (4+
total) without problems on my box.  Whatever is in since-4.9 fixes it,
as I reported before.

But...

> Testing with Valinall rc6 released just yesterday would be a good fit.
> There are some more fixes sitting on mmotm on top and maybe we want
> some of them in finall 4.10. Anyway all those pending changes should
> be merged in the next merge window - aka 4.11

After 30 hours of running vanilla 4.10.0-rc6, the box started to go
bonkers at 3am, so vanilla does not fix the bug :-(  But, the bug hit
differently this time, the box just bogged down like crazy and gave
really weird top output.  Starting nano would take 10s, then would run
full speed, then when saving a file would take 5s.  Starting any prog
not in cache took equally as long.

However, no oom hit.  I waited about 15 minutes and things seemed to
bog more, so I rebooted into since-4.9.  Maybe if I had kept waiting
the box would have oom'd, but I didn't want to take the chance (it's
remote, and I can't reset it).

I did capture a lot of the weird top, meminfo and slabinfo data before
rebooting.  I'll attached the output to this email.  Messages show a
lot of "page allocation stalls" during the bogged-down time.

So my hunch at this moment is 4.10.0-rc6 might help alleviate the
problem somewhat, but it's other things you have in since-4.9 that
solve it completely.

Let me know if you need any more testing or some bisecting or
something.  I'll keep on running since-4.9 in the meantime.  Thanks!


4.10.rc6-bogged
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-29 Thread Trevor Cordes
On 2017-01-25 Michal Hocko wrote:
> On Wed 25-01-17 04:02:46, Trevor Cordes wrote:
> > OK, I patched & compiled mhocko's git tree from the other day
> > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using from a
> > couple of weeks ago shows the newest commit (git log) is
> > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know
> > if I'm doing something wrong, see below.)  
> 
> My fault. I should have noted that you should use since-4.9 branch.

OK, I have good news.  I compiled your mhocko git tree (properly this
tim!) using since-4.9 branch (last commit
ca63ff9b11f958efafd8c8fa60fda14baec6149c Jan 25) and the box survived 3
3am's, over 60 hours, and I made sure all the usual oom culprits ran,
and I ran extras (finds on the whole tree, extra rdiff-backups) to try
to tax it.  Based on my previous criteria I would say your since-4.9 as
of the above commit solves my bug, at least over a 3 day test span
(which it never survives when the bug is present)!

I tested WITHOUT any cgroup/mem boot options.  I do still have my
mem=6G limiter on, though (I've never tested with it off, until I solve
the bug with it on, since I've had it on for many months for other
reasons).

On 2017-01-27 Michal Hocko wrote:
> OK, that matches the theory that these OOMs are caused by the
> incorrect active list aging fixed by b4536f0c829c ("mm, memcg: fix
> the active list aging for lowmem requests when memcg is enabled")

b4536f0c829c isn't in the since-4.9 I tested above though?  So
something else you did must have fixed it (also)?  I don't think I've
run any tests yet with b4536f0c829c in them?  I think the vanillas I
was doing a couple of weeks ago were before b4536f0c829c, but I can't
be sure.

What do I test next?  Does the since-4.9 stuff get pushed into vanilla
(4.9 hopefully?) so it can find its way into Fedora's stuck F24
kernel?

I want to also note that the RHBZ
https://bugzilla.redhat.com/show_bug.cgi?id=1401012 is garnering more
interest as more people start me-too'ing.  The situation is almost
always the same: large rsync's or similar tree-scan accesses cause oom
on PAE boxes.  However, I wanted to note that many people there reported
that cgroup_disable=memory doesn't fix anything for them, whereas that
always makes the problem go away on my boxes.  Strange.

Thanks Michal and Mel, I really appreciate it!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-29 Thread Trevor Cordes
On 2017-01-25 Michal Hocko wrote:
> On Wed 25-01-17 04:02:46, Trevor Cordes wrote:
> > OK, I patched & compiled mhocko's git tree from the other day
> > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using from a
> > couple of weeks ago shows the newest commit (git log) is
> > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know
> > if I'm doing something wrong, see below.)  
> 
> My fault. I should have noted that you should use since-4.9 branch.

OK, I have good news.  I compiled your mhocko git tree (properly this
tim!) using since-4.9 branch (last commit
ca63ff9b11f958efafd8c8fa60fda14baec6149c Jan 25) and the box survived 3
3am's, over 60 hours, and I made sure all the usual oom culprits ran,
and I ran extras (finds on the whole tree, extra rdiff-backups) to try
to tax it.  Based on my previous criteria I would say your since-4.9 as
of the above commit solves my bug, at least over a 3 day test span
(which it never survives when the bug is present)!

I tested WITHOUT any cgroup/mem boot options.  I do still have my
mem=6G limiter on, though (I've never tested with it off, until I solve
the bug with it on, since I've had it on for many months for other
reasons).

On 2017-01-27 Michal Hocko wrote:
> OK, that matches the theory that these OOMs are caused by the
> incorrect active list aging fixed by b4536f0c829c ("mm, memcg: fix
> the active list aging for lowmem requests when memcg is enabled")

b4536f0c829c isn't in the since-4.9 I tested above though?  So
something else you did must have fixed it (also)?  I don't think I've
run any tests yet with b4536f0c829c in them?  I think the vanillas I
was doing a couple of weeks ago were before b4536f0c829c, but I can't
be sure.

What do I test next?  Does the since-4.9 stuff get pushed into vanilla
(4.9 hopefully?) so it can find its way into Fedora's stuck F24
kernel?

I want to also note that the RHBZ
https://bugzilla.redhat.com/show_bug.cgi?id=1401012 is garnering more
interest as more people start me-too'ing.  The situation is almost
always the same: large rsync's or similar tree-scan accesses cause oom
on PAE boxes.  However, I wanted to note that many people there reported
that cgroup_disable=memory doesn't fix anything for them, whereas that
always makes the problem go away on my boxes.  Strange.

Thanks Michal and Mel, I really appreciate it!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-26 Thread Trevor Cordes
On 2017-01-24 Michal Hocko wrote:
> On Sun 22-01-17 18:45:59, Trevor Cordes wrote:
> [...]
> > Also, completely separate from your patch I ran mhocko's 4.9 tree
> > with mem=2G to see if lower ram amount would help, but it didn't.
> > Even with 2G the system oom and hung same as usual.  So far the
> > only thing that helps at all was the cgroup_disable=memory option,
> > which makes the problem disappear completely for me.  
> 
> OK, can we reduce the problem space slightly more and could you boot
> with kmem accounting enabled? cgroup.memory=nokmem,nosocket

I ran for 30 hours with cgroup.memory=nokmem,nosocket using vanilla
4.9.0+ and it oom'd during a big rdiff-backup at 9am.  My script was
able to reboot it before it hung.  Only one oom occurred before the
reboot, which is a bit odd, usually there is 5-50.  See attached
messages log (oom6).

So, still, only cgroup_disable=memory mitigates this bug (so far).  If
you need me to test cgroup.memory=nokmem,nosocket with your since-4.9
branch specifically, let me know and I'll add it to the to-test list.

On 2017-01-25 Michal Hocko wrote:
> On Wed 25-01-17 04:02:46, Trevor Cordes wrote:
> > OK, I patched & compiled mhocko's git tree from the other day
> > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using from a
> > couple of weeks ago shows the newest commit (git log) is
> > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know
> > if I'm doing something wrong, see below.)  
> 
> My fault. I should have noted that you should use since-4.9 branch.

OK, I got it now, I'm retesting the runs I did (with/without the
various patches) on your git tree and will re-report the (correct)
results.  Will take a few days.  Thanks!


oom6
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-26 Thread Trevor Cordes
On 2017-01-24 Michal Hocko wrote:
> On Sun 22-01-17 18:45:59, Trevor Cordes wrote:
> [...]
> > Also, completely separate from your patch I ran mhocko's 4.9 tree
> > with mem=2G to see if lower ram amount would help, but it didn't.
> > Even with 2G the system oom and hung same as usual.  So far the
> > only thing that helps at all was the cgroup_disable=memory option,
> > which makes the problem disappear completely for me.  
> 
> OK, can we reduce the problem space slightly more and could you boot
> with kmem accounting enabled? cgroup.memory=nokmem,nosocket

I ran for 30 hours with cgroup.memory=nokmem,nosocket using vanilla
4.9.0+ and it oom'd during a big rdiff-backup at 9am.  My script was
able to reboot it before it hung.  Only one oom occurred before the
reboot, which is a bit odd, usually there is 5-50.  See attached
messages log (oom6).

So, still, only cgroup_disable=memory mitigates this bug (so far).  If
you need me to test cgroup.memory=nokmem,nosocket with your since-4.9
branch specifically, let me know and I'll add it to the to-test list.

On 2017-01-25 Michal Hocko wrote:
> On Wed 25-01-17 04:02:46, Trevor Cordes wrote:
> > OK, I patched & compiled mhocko's git tree from the other day
> > 4.9.0+. (To confirm, weird, but mhocko's git tree I'm using from a
> > couple of weeks ago shows the newest commit (git log) is
> > 69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know
> > if I'm doing something wrong, see below.)  
> 
> My fault. I should have noted that you should use since-4.9 branch.

OK, I got it now, I'm retesting the runs I did (with/without the
various patches) on your git tree and will re-report the (correct)
results.  Will take a few days.  Thanks!


oom6
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-25 Thread Trevor Cordes
On 2017-01-23 Mel Gorman wrote:
> On Sun, Jan 22, 2017 at 06:45:59PM -0600, Trevor Cordes wrote:
> > On 2017-01-20 Mel Gorman wrote:  
> > > > 
> > > > Thanks for the OOM report. I was expecting it to be a particular
> > > > shape and my expectations were not matched so it took time to
> > > > consider it further. Can you try the cumulative patch below? It
> > > > combines three patches that
> > > > 
> > > > 1. Allow slab shrinking even if the LRU patches are
> > > > unreclaimable in direct reclaim
> > > > 2. Shrinks slab based once based on the contents of all memcgs
> > > > instead of shrinking one at a time
> > > > 3. Tries to shrink slabs if the lowmem usage is too high
> > > > 
> > > > Unfortunately it's only boot tested on x86-64 as I didn't get
> > > > the chance to setup an i386 test bed.
> > > > 
> > > 
> > > There was one major flaw in that patch. This version fixes it and
> > > addresses other minor issues. It may still be too agressive
> > > shrinking slab but worth trying out. Thanks.  
> > 
> > I ran with your patch below and it oom'd on the first night.  It was
> > weird, it didn't hang the system, and my rebooter script started a
> > reboot but the system never got more than half down before it just
> > sat there in a weird state where a local console user could still
> > login but not much was working.  So the patches don't seem to solve
> > the problem.
> > 
> > For the above compile I applied your patches to 4.10.0-rc4+, I hope
> > that's ok.
> >   
> 
> It would be strongly preferred to run them on top of Michal's other
> fixes. The main reason it's preferred is because this OOM differs from
> earlier ones in that it OOM killed from GFP_NOFS|__GFP_NOFAIL context.
> That meant that the slab shrinking could not happen from direct
> reclaim so the balancing from my patches would not occur.  As
> Michal's other patches affect how kswapd behaves, it's important.

OK, I patched & compiled mhocko's git tree from the other day 4.9.0+.
(To confirm, weird, but mhocko's git tree I'm using from a couple of
weeks ago shows the newest commit (git log) is
69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know if
I'm doing something wrong, see below.)

Anyhow, it oom'd as usual at ~3am, system froze after 20 ooms hit in 7
secs.  So no help there.  Attached is the oom log from the first oom
hit.

On 2017-01-24 Michal Hocko wrote:
> On Sun 22-01-17 18:45:59, Trevor Cordes wrote:
> [...]
> > Also, completely separate from your patch I ran mhocko's 4.9 tree
> > with mem=2G to see if lower ram amount would help, but it didn't.
> > Even with 2G the system oom and hung same as usual.  So far the
> > only thing that helps at all was the cgroup_disable=memory option,
> > which makes the problem disappear completely for me.  
> 
> OK, can we reduce the problem space slightly more and could you boot
> with kmem accounting enabled? cgroup.memory=nokmem,nosocket

I will try that right now, I'll use the mhocko git tree without Mel's
emailed patch, and I'll refresh the git tree from origin first (let me
know that's a bad move).  As usual, I'll report back within 24-48 hours.

Actually, on my tests with mhocko git tree, I'm a bit confused and want
to make sure I'm compiling the right thing.  His tree doesn't seem to
have recent commits?  I did "git fetch origin" and "git reset --hard
origin/master" to refresh the tree just now and the latest commit is
still the one shown above "Linux 4.9"?  Is Michal making changes but
not comitting?  How do I ensure I'm compiling the version you guys want
me to test?  ("git log mm/vmscan.c" shows newest commit is Dec 2??)  Am
I supposed to be testing a specific branch?

If I've been testing the wrong branch, this *only* affects my mhocko
tree tests (not the vanilla or fedora-patched tests).  Thankfully I
think I've only done 1 or 2 mhocko tree tests, and I can easily redo
them.  If this turns out to be the case, I'm so sorry for the
confusion, the non-vanilla git tree thing is all new to me.

In any event, I'm still trying the above, and will adjust if necessary
if it's confirmed I'm doing something wrong with the mhocko git tree.
Thanks!


oom5
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-25 Thread Trevor Cordes
On 2017-01-23 Mel Gorman wrote:
> On Sun, Jan 22, 2017 at 06:45:59PM -0600, Trevor Cordes wrote:
> > On 2017-01-20 Mel Gorman wrote:  
> > > > 
> > > > Thanks for the OOM report. I was expecting it to be a particular
> > > > shape and my expectations were not matched so it took time to
> > > > consider it further. Can you try the cumulative patch below? It
> > > > combines three patches that
> > > > 
> > > > 1. Allow slab shrinking even if the LRU patches are
> > > > unreclaimable in direct reclaim
> > > > 2. Shrinks slab based once based on the contents of all memcgs
> > > > instead of shrinking one at a time
> > > > 3. Tries to shrink slabs if the lowmem usage is too high
> > > > 
> > > > Unfortunately it's only boot tested on x86-64 as I didn't get
> > > > the chance to setup an i386 test bed.
> > > > 
> > > 
> > > There was one major flaw in that patch. This version fixes it and
> > > addresses other minor issues. It may still be too agressive
> > > shrinking slab but worth trying out. Thanks.  
> > 
> > I ran with your patch below and it oom'd on the first night.  It was
> > weird, it didn't hang the system, and my rebooter script started a
> > reboot but the system never got more than half down before it just
> > sat there in a weird state where a local console user could still
> > login but not much was working.  So the patches don't seem to solve
> > the problem.
> > 
> > For the above compile I applied your patches to 4.10.0-rc4+, I hope
> > that's ok.
> >   
> 
> It would be strongly preferred to run them on top of Michal's other
> fixes. The main reason it's preferred is because this OOM differs from
> earlier ones in that it OOM killed from GFP_NOFS|__GFP_NOFAIL context.
> That meant that the slab shrinking could not happen from direct
> reclaim so the balancing from my patches would not occur.  As
> Michal's other patches affect how kswapd behaves, it's important.

OK, I patched & compiled mhocko's git tree from the other day 4.9.0+.
(To confirm, weird, but mhocko's git tree I'm using from a couple of
weeks ago shows the newest commit (git log) is
69973b830859bc6529a7a0468ba0d80ee5117826 "Linux 4.9"?  Let me know if
I'm doing something wrong, see below.)

Anyhow, it oom'd as usual at ~3am, system froze after 20 ooms hit in 7
secs.  So no help there.  Attached is the oom log from the first oom
hit.

On 2017-01-24 Michal Hocko wrote:
> On Sun 22-01-17 18:45:59, Trevor Cordes wrote:
> [...]
> > Also, completely separate from your patch I ran mhocko's 4.9 tree
> > with mem=2G to see if lower ram amount would help, but it didn't.
> > Even with 2G the system oom and hung same as usual.  So far the
> > only thing that helps at all was the cgroup_disable=memory option,
> > which makes the problem disappear completely for me.  
> 
> OK, can we reduce the problem space slightly more and could you boot
> with kmem accounting enabled? cgroup.memory=nokmem,nosocket

I will try that right now, I'll use the mhocko git tree without Mel's
emailed patch, and I'll refresh the git tree from origin first (let me
know that's a bad move).  As usual, I'll report back within 24-48 hours.

Actually, on my tests with mhocko git tree, I'm a bit confused and want
to make sure I'm compiling the right thing.  His tree doesn't seem to
have recent commits?  I did "git fetch origin" and "git reset --hard
origin/master" to refresh the tree just now and the latest commit is
still the one shown above "Linux 4.9"?  Is Michal making changes but
not comitting?  How do I ensure I'm compiling the version you guys want
me to test?  ("git log mm/vmscan.c" shows newest commit is Dec 2??)  Am
I supposed to be testing a specific branch?

If I've been testing the wrong branch, this *only* affects my mhocko
tree tests (not the vanilla or fedora-patched tests).  Thankfully I
think I've only done 1 or 2 mhocko tree tests, and I can easily redo
them.  If this turns out to be the case, I'm so sorry for the
confusion, the non-vanilla git tree thing is all new to me.

In any event, I'm still trying the above, and will adjust if necessary
if it's confirmed I'm doing something wrong with the mhocko git tree.
Thanks!


oom5
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-22 Thread Trevor Cordes
On 2017-01-20 Mel Gorman wrote:
> > 
> > Thanks for the OOM report. I was expecting it to be a particular
> > shape and my expectations were not matched so it took time to
> > consider it further. Can you try the cumulative patch below? It
> > combines three patches that
> > 
> > 1. Allow slab shrinking even if the LRU patches are unreclaimable in
> >direct reclaim
> > 2. Shrinks slab based once based on the contents of all memcgs
> > instead of shrinking one at a time
> > 3. Tries to shrink slabs if the lowmem usage is too high
> > 
> > Unfortunately it's only boot tested on x86-64 as I didn't get the
> > chance to setup an i386 test bed.
> >   
> 
> There was one major flaw in that patch. This version fixes it and
> addresses other minor issues. It may still be too agressive shrinking
> slab but worth trying out. Thanks.

I ran with your patch below and it oom'd on the first night.  It was
weird, it didn't hang the system, and my rebooter script started a
reboot but the system never got more than half down before it just sat
there in a weird state where a local console user could still login but
not much was working.  So the patches don't seem to solve the problem.

For the above compile I applied your patches to 4.10.0-rc4+, I hope
that's ok.

Attached is the first oom from that night.  I include some stuff below
the oom where the kernel is obviously having issues and dumping more
strange output.  I don't think I've seen that before.  That probably
explains the strange state it was left in.

Also, completely separate from your patch I ran mhocko's 4.9 tree with
mem=2G to see if lower ram amount would help, but it didn't.  Even with
2G the system oom and hung same as usual.  So far the only thing that
helps at all was the cgroup_disable=memory option, which makes the
problem disappear completely for me.  I added that option to 3 other
boxes I admin with PAE and that plus limiting ram to <4GB gets rid of
the bug.  However, on the RHBZ on this bug I am commenting on, someone
there reports that cgroup_disable=memory doesn't help him at all.

Hopefully the oom attached can help you figure out a next step.  Thanks!

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2281ad310d06..2c735ea24a85 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2318,6 +2318,59 @@ static void get_scan_count(struct lruvec
> *lruvec, struct mem_cgroup *memcg, }
>  }
>  
> +#ifdef CONFIG_HIGHMEM
> +static void balance_slab_lowmem(struct pglist_data *pgdat,
> + struct scan_control *sc)
> +{
> + unsigned long lru_pages = 0;
> + unsigned long slab_pages = 0;
> + unsigned long managed_pages = 0;
> + int zid;
> +
> + for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> + struct zone *zone = >node_zones[zid];
> +
> + if (!populated_zone(zone) || is_highmem_idx(zid))
> + continue;
> +
> + lru_pages += zone_page_state(zone,
> NR_ZONE_INACTIVE_FILE);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_ACTIVE_FILE);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_INACTIVE_ANON);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_ACTIVE_ANON);
> + slab_pages += zone_page_state(zone,
> NR_SLAB_RECLAIMABLE);
> + slab_pages += zone_page_state(zone,
> NR_SLAB_UNRECLAIMABLE);
> + }
> +
> + /* Do not balance until LRU and slab exceeds 50% of lowmem */
> + if (lru_pages + slab_pages < (managed_pages >> 1))
> + return;
> +
> + /*
> +  * Shrink reclaimable slabs if the number of lowmem slab
> pages is
> +  * over twice the size of LRU pages. Apply pressure relative
> to
> +  * the imbalance between LRU and slab pages.
> +  */
> + if (slab_pages > lru_pages << 1) {
> + struct reclaim_state *reclaim_state =
> current->reclaim_state;
> + unsigned long exceed = slab_pages - (lru_pages << 1);
> + int nid = pgdat->node_id;
> +
> + exceed = min(exceed, slab_pages);
> + shrink_slab(sc->gfp_mask, nid, NULL, exceed >> 3,
> slab_pages);
> + if (reclaim_state) {
> + sc->nr_reclaimed +=
> reclaim_state->reclaimed_slab;
> + reclaim_state->reclaimed_slab = 0;
> + }
> + }
> +}
> +#else
> +static void balance_slab_lowmem(struct pglist_data *pgdat,
> + struct scan_control *sc)
> +{
> + return;
> +}
> +#endif
> +
>  /*
>   * This is a basic per-node page freer.  Used by both kswapd and
> direct reclaim. */
> @@ -2336,6 +2389,27 @@ static void shrink_node_memcg(struct
> pglist_data *pgdat, struct mem_cgroup *memc 
>   get_scan_count(lruvec, memcg, sc, nr, lru_pages);
>  
> + /*
> +  * If direct reclaiming at elevated priority and the node is
> +  * unreclaimable then skip LRU reclaim and let kswapd poll
> it.
> +  */
> + if (!current_is_kswapd() &&
> + sc->priority != DEF_PRIORITY 

Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-22 Thread Trevor Cordes
On 2017-01-20 Mel Gorman wrote:
> > 
> > Thanks for the OOM report. I was expecting it to be a particular
> > shape and my expectations were not matched so it took time to
> > consider it further. Can you try the cumulative patch below? It
> > combines three patches that
> > 
> > 1. Allow slab shrinking even if the LRU patches are unreclaimable in
> >direct reclaim
> > 2. Shrinks slab based once based on the contents of all memcgs
> > instead of shrinking one at a time
> > 3. Tries to shrink slabs if the lowmem usage is too high
> > 
> > Unfortunately it's only boot tested on x86-64 as I didn't get the
> > chance to setup an i386 test bed.
> >   
> 
> There was one major flaw in that patch. This version fixes it and
> addresses other minor issues. It may still be too agressive shrinking
> slab but worth trying out. Thanks.

I ran with your patch below and it oom'd on the first night.  It was
weird, it didn't hang the system, and my rebooter script started a
reboot but the system never got more than half down before it just sat
there in a weird state where a local console user could still login but
not much was working.  So the patches don't seem to solve the problem.

For the above compile I applied your patches to 4.10.0-rc4+, I hope
that's ok.

Attached is the first oom from that night.  I include some stuff below
the oom where the kernel is obviously having issues and dumping more
strange output.  I don't think I've seen that before.  That probably
explains the strange state it was left in.

Also, completely separate from your patch I ran mhocko's 4.9 tree with
mem=2G to see if lower ram amount would help, but it didn't.  Even with
2G the system oom and hung same as usual.  So far the only thing that
helps at all was the cgroup_disable=memory option, which makes the
problem disappear completely for me.  I added that option to 3 other
boxes I admin with PAE and that plus limiting ram to <4GB gets rid of
the bug.  However, on the RHBZ on this bug I am commenting on, someone
there reports that cgroup_disable=memory doesn't help him at all.

Hopefully the oom attached can help you figure out a next step.  Thanks!

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2281ad310d06..2c735ea24a85 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2318,6 +2318,59 @@ static void get_scan_count(struct lruvec
> *lruvec, struct mem_cgroup *memcg, }
>  }
>  
> +#ifdef CONFIG_HIGHMEM
> +static void balance_slab_lowmem(struct pglist_data *pgdat,
> + struct scan_control *sc)
> +{
> + unsigned long lru_pages = 0;
> + unsigned long slab_pages = 0;
> + unsigned long managed_pages = 0;
> + int zid;
> +
> + for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> + struct zone *zone = >node_zones[zid];
> +
> + if (!populated_zone(zone) || is_highmem_idx(zid))
> + continue;
> +
> + lru_pages += zone_page_state(zone,
> NR_ZONE_INACTIVE_FILE);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_ACTIVE_FILE);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_INACTIVE_ANON);
> + lru_pages += zone_page_state(zone,
> NR_ZONE_ACTIVE_ANON);
> + slab_pages += zone_page_state(zone,
> NR_SLAB_RECLAIMABLE);
> + slab_pages += zone_page_state(zone,
> NR_SLAB_UNRECLAIMABLE);
> + }
> +
> + /* Do not balance until LRU and slab exceeds 50% of lowmem */
> + if (lru_pages + slab_pages < (managed_pages >> 1))
> + return;
> +
> + /*
> +  * Shrink reclaimable slabs if the number of lowmem slab
> pages is
> +  * over twice the size of LRU pages. Apply pressure relative
> to
> +  * the imbalance between LRU and slab pages.
> +  */
> + if (slab_pages > lru_pages << 1) {
> + struct reclaim_state *reclaim_state =
> current->reclaim_state;
> + unsigned long exceed = slab_pages - (lru_pages << 1);
> + int nid = pgdat->node_id;
> +
> + exceed = min(exceed, slab_pages);
> + shrink_slab(sc->gfp_mask, nid, NULL, exceed >> 3,
> slab_pages);
> + if (reclaim_state) {
> + sc->nr_reclaimed +=
> reclaim_state->reclaimed_slab;
> + reclaim_state->reclaimed_slab = 0;
> + }
> + }
> +}
> +#else
> +static void balance_slab_lowmem(struct pglist_data *pgdat,
> + struct scan_control *sc)
> +{
> + return;
> +}
> +#endif
> +
>  /*
>   * This is a basic per-node page freer.  Used by both kswapd and
> direct reclaim. */
> @@ -2336,6 +2389,27 @@ static void shrink_node_memcg(struct
> pglist_data *pgdat, struct mem_cgroup *memc 
>   get_scan_count(lruvec, memcg, sc, nr, lru_pages);
>  
> + /*
> +  * If direct reclaiming at elevated priority and the node is
> +  * unreclaimable then skip LRU reclaim and let kswapd poll
> it.
> +  */
> + if (!current_is_kswapd() &&
> + sc->priority != DEF_PRIORITY 

Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-19 Thread Trevor Cordes
On 2017-01-19 Michal Hocko wrote:
> On Thu 19-01-17 03:48:50, Trevor Cordes wrote:
> > On 2017-01-17 Michal Hocko wrote:  
> > > On Tue 17-01-17 14:21:14, Mel Gorman wrote:  
> > > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko
> > > > wrote:
> > > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > > > [...]
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 532a2a750952..46aac487b89a 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct
> > > > > > zonelist *zonelist, struct scan_control *sc) continue;
> > > > > >  
> > > > > > if (sc->priority != DEF_PRIORITY &&
> > > > > > +   !buffer_heads_over_limit &&
> > > > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > > > continue;   /* Let
> > > > > > kswapd poll it */
> > > > > 
> > > > > I think we should rather remove pgdat_reclaimable here. This
> > > > > sounds like a wrong layer to decide whether we want to reclaim
> > > > > and how much.   
> > > > 
> > > > I had considered that but it'd also be important to add the
> > > > other 32-bit patches you have posted to see the impact. Because
> > > > of the ratio of LRU pages to slab pages, it may not have an
> > > > impact but it'd need to be eliminated.
> > > 
> > > OK, Trevor you can pull from
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> > > fixes/highmem-node-fixes branch. This contains the current mmotm
> > > tree
> > > + the latest highmem fixes. I also do not expect this would help
> > > much in your case but as Mel've said we should rule that out at
> > > least.  
> > 
> > Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
> > it doesn't solve the bug.  If you need a oom messages dump let me
> > know.  
> 
> Yes please.

The first oom from that night attached.  Note, the oom wasn't as dire
with your mhocko/4.9.0+ as it usually is with stock 4.8.x: my oom
detector and reboot script was able to do its thing cleanly before the
system became unusable.

I'll await further instructions and test right away.  Maybe I'll try a
few tuning ideas until then.  Thanks!

> > Let me know what to try next, guys, and I'll test it out.
> >   
> > > > Before prototyping such a thing, I'd like to hear the outcome of
> > > > this heavy hack and then add your 32-bit patches onto the list.
> > > > If the problem is still there then I'd next look at taking slab
> > > > pages into account in pgdat_reclaimable() instead of an
> > > > outright removal that has a much wider impact. If that doesn't
> > > > work then I'll prototype a heavy-handed forced slab reclaim
> > > > when lower zones are almost all slab pages.  
> > 
> > I don't think I've tried the "heavy hack" patch yet?  It's not in
> > the mhocko tree I just tried?  Should I try the heavy hack on top
> > of mhocko git or on vanilla or what?
> > 
> > I also want to mention that these PAE boxes suffer from another
> > problem/bug that I've worked around for almost a year now.  For some
> > reason it keeps gnawing at me that it might be related.  The disk
> > I/O goes to pot on this/these PAE boxes after a certain amount of
> > disk writes (like some unknown number of GB, around 10-ish maybe).
> > Like writes go from 500MB/s to 10MB/s!! Reboot and it's magically
> > 500MB/s again.  I detail this here:
> > https://muug.ca/pipermail/roundtable/2016-June/004669.html
> > My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
> > kernel to be more sane about highmem choices.  I never filed a bug
> > because I read a ton of stuff saying Linus hates PAE, don't use over
> > 4G, blah blah.  But the other fix is to:
> > set /proc/sys/vm/highmem_is_dirtyable to 1  
> 
> Yes this sounds like a dirty memory throttling and there were some
> changes in that area. I do not remember when exactly.

I think my PAE-slow-IO bug started way back in Fedora 22 (4.0?), hard
to know exactly when as I didn't discover the bug for maybe a year as I
didn't realize IO was the problem right away.  Too late to bisect that
one, I guess.  I guess it's not related so we can ignore my tang

Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-19 Thread Trevor Cordes
On 2017-01-19 Michal Hocko wrote:
> On Thu 19-01-17 03:48:50, Trevor Cordes wrote:
> > On 2017-01-17 Michal Hocko wrote:  
> > > On Tue 17-01-17 14:21:14, Mel Gorman wrote:  
> > > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko
> > > > wrote:
> > > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > > > [...]
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 532a2a750952..46aac487b89a 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct
> > > > > > zonelist *zonelist, struct scan_control *sc) continue;
> > > > > >  
> > > > > > if (sc->priority != DEF_PRIORITY &&
> > > > > > +   !buffer_heads_over_limit &&
> > > > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > > > continue;   /* Let
> > > > > > kswapd poll it */
> > > > > 
> > > > > I think we should rather remove pgdat_reclaimable here. This
> > > > > sounds like a wrong layer to decide whether we want to reclaim
> > > > > and how much.   
> > > > 
> > > > I had considered that but it'd also be important to add the
> > > > other 32-bit patches you have posted to see the impact. Because
> > > > of the ratio of LRU pages to slab pages, it may not have an
> > > > impact but it'd need to be eliminated.
> > > 
> > > OK, Trevor you can pull from
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> > > fixes/highmem-node-fixes branch. This contains the current mmotm
> > > tree
> > > + the latest highmem fixes. I also do not expect this would help
> > > much in your case but as Mel've said we should rule that out at
> > > least.  
> > 
> > Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
> > it doesn't solve the bug.  If you need a oom messages dump let me
> > know.  
> 
> Yes please.

The first oom from that night attached.  Note, the oom wasn't as dire
with your mhocko/4.9.0+ as it usually is with stock 4.8.x: my oom
detector and reboot script was able to do its thing cleanly before the
system became unusable.

I'll await further instructions and test right away.  Maybe I'll try a
few tuning ideas until then.  Thanks!

> > Let me know what to try next, guys, and I'll test it out.
> >   
> > > > Before prototyping such a thing, I'd like to hear the outcome of
> > > > this heavy hack and then add your 32-bit patches onto the list.
> > > > If the problem is still there then I'd next look at taking slab
> > > > pages into account in pgdat_reclaimable() instead of an
> > > > outright removal that has a much wider impact. If that doesn't
> > > > work then I'll prototype a heavy-handed forced slab reclaim
> > > > when lower zones are almost all slab pages.  
> > 
> > I don't think I've tried the "heavy hack" patch yet?  It's not in
> > the mhocko tree I just tried?  Should I try the heavy hack on top
> > of mhocko git or on vanilla or what?
> > 
> > I also want to mention that these PAE boxes suffer from another
> > problem/bug that I've worked around for almost a year now.  For some
> > reason it keeps gnawing at me that it might be related.  The disk
> > I/O goes to pot on this/these PAE boxes after a certain amount of
> > disk writes (like some unknown number of GB, around 10-ish maybe).
> > Like writes go from 500MB/s to 10MB/s!! Reboot and it's magically
> > 500MB/s again.  I detail this here:
> > https://muug.ca/pipermail/roundtable/2016-June/004669.html
> > My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
> > kernel to be more sane about highmem choices.  I never filed a bug
> > because I read a ton of stuff saying Linus hates PAE, don't use over
> > 4G, blah blah.  But the other fix is to:
> > set /proc/sys/vm/highmem_is_dirtyable to 1  
> 
> Yes this sounds like a dirty memory throttling and there were some
> changes in that area. I do not remember when exactly.

I think my PAE-slow-IO bug started way back in Fedora 22 (4.0?), hard
to know exactly when as I didn't discover the bug for maybe a year as I
didn't realize IO was the problem right away.  Too late to bisect that
one, I guess.  I guess it's not related so we can ignore my tang

Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-19 Thread Trevor Cordes
On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:  
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]  
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >  
> > > > if (sc->priority != DEF_PRIORITY &&
> > > > +   !buffer_heads_over_limit &&
> > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > continue;   /* Let kswapd
> > > > poll it */  
> > > 
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much. 
> > 
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.  
> 
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
it doesn't solve the bug.  If you need a oom messages dump let me know.

Let me know what to try next, guys, and I'll test it out.

> > Before prototyping such a thing, I'd like to hear the outcome of
> > this heavy hack and then add your 32-bit patches onto the list. If
> > the problem is still there then I'd next look at taking slab pages
> > into account in pgdat_reclaimable() instead of an outright removal
> > that has a much wider impact. If that doesn't work then I'll
> > prototype a heavy-handed forced slab reclaim when lower zones are
> > almost all slab pages.

I don't think I've tried the "heavy hack" patch yet?  It's not in the
mhocko tree I just tried?  Should I try the heavy hack on top of mhocko
git or on vanilla or what?

I also want to mention that these PAE boxes suffer from another
problem/bug that I've worked around for almost a year now.  For some
reason it keeps gnawing at me that it might be related.  The disk I/O
goes to pot on this/these PAE boxes after a certain amount of disk
writes (like some unknown number of GB, around 10-ish maybe).  Like
writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s
again.  I detail this here:
https://muug.ca/pipermail/roundtable/2016-June/004669.html
My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
kernel to be more sane about highmem choices.  I never filed a bug
because I read a ton of stuff saying Linus hates PAE, don't use over
4G, blah blah.  But the other fix is to:
set /proc/sys/vm/highmem_is_dirtyable to 1

I'm not bringing this up to get attention to a new bug, I bring this up
because it smells like it might be related.  If something slowly eats
away at the box's vm to the point that I/O gets horribly slow, perhaps
it's related to the slab and high/lomem issue we have here?  And if
related, it may help to solve the oom bug.  If I'm way off base here,
just ignore my tangent!

The funny thing is I thought mem=XG where X<8 solved the problem, but
it doesn't!  It greatly mitigates it, but I still get subtle slowdown
that gets worse over time (like weeks instead of days).  I now use the
highmem_is_dirtyable on most boxes and that seems to solve it for good
in combo with mem=XG.  Let me note, however, that I have NOT set
highmem_is_dirtyable=1 on the test box I am using for all of this
building/testing, as I wanted the config to stay static while I work
through this oom bug.  (I'm real curious to see if
highmem_is_dirtyable=1 would have any impact on the oom though!)
Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-19 Thread Trevor Cordes
On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:  
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]  
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >  
> > > > if (sc->priority != DEF_PRIORITY &&
> > > > +   !buffer_heads_over_limit &&
> > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > continue;   /* Let kswapd
> > > > poll it */  
> > > 
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much. 
> > 
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.  
> 
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
it doesn't solve the bug.  If you need a oom messages dump let me know.

Let me know what to try next, guys, and I'll test it out.

> > Before prototyping such a thing, I'd like to hear the outcome of
> > this heavy hack and then add your 32-bit patches onto the list. If
> > the problem is still there then I'd next look at taking slab pages
> > into account in pgdat_reclaimable() instead of an outright removal
> > that has a much wider impact. If that doesn't work then I'll
> > prototype a heavy-handed forced slab reclaim when lower zones are
> > almost all slab pages.

I don't think I've tried the "heavy hack" patch yet?  It's not in the
mhocko tree I just tried?  Should I try the heavy hack on top of mhocko
git or on vanilla or what?

I also want to mention that these PAE boxes suffer from another
problem/bug that I've worked around for almost a year now.  For some
reason it keeps gnawing at me that it might be related.  The disk I/O
goes to pot on this/these PAE boxes after a certain amount of disk
writes (like some unknown number of GB, around 10-ish maybe).  Like
writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s
again.  I detail this here:
https://muug.ca/pipermail/roundtable/2016-June/004669.html
My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
kernel to be more sane about highmem choices.  I never filed a bug
because I read a ton of stuff saying Linus hates PAE, don't use over
4G, blah blah.  But the other fix is to:
set /proc/sys/vm/highmem_is_dirtyable to 1

I'm not bringing this up to get attention to a new bug, I bring this up
because it smells like it might be related.  If something slowly eats
away at the box's vm to the point that I/O gets horribly slow, perhaps
it's related to the slab and high/lomem issue we have here?  And if
related, it may help to solve the oom bug.  If I'm way off base here,
just ignore my tangent!

The funny thing is I thought mem=XG where X<8 solved the problem, but
it doesn't!  It greatly mitigates it, but I still get subtle slowdown
that gets worse over time (like weeks instead of days).  I now use the
highmem_is_dirtyable on most boxes and that seems to solve it for good
in combo with mem=XG.  Let me note, however, that I have NOT set
highmem_is_dirtyable=1 on the test box I am using for all of this
building/testing, as I wanted the config to stay static while I work
through this oom bug.  (I'm real curious to see if
highmem_is_dirtyable=1 would have any impact on the oom though!)
Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-17 Thread Trevor Cordes
On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:  
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]  
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >  
> > > > if (sc->priority != DEF_PRIORITY &&
> > > > +   !buffer_heads_over_limit &&
> > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > continue;   /* Let kswapd
> > > > poll it */  
> > > 
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much. 
> > 
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.  
> 
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

OK, ignore my last question re: what to do next.  I am building
this mhocko git tree now per your above instructions and will reboot
into it in a few hours with*out* the cgroup_disable=memory option.
Might take ~50 hours for a result.

I should note that the workload on the box with the bug is mostly as a
file server and iptables firewall/router.  It routes around 8GB(ytes) a
day, and periodic file server loads.  That's about it.  Everything else
that is running is not doing much, and not using much RAM; except
maybe clamav, by far the biggest RAM.

I don't see this bug on other nearly identical boxes, including:
F24 4.8.15 32-bit (no PAE) 1GB ram P4
F24 4.8.15 32-bit (no PAE) 2GB ram Core2 Quad

However, just noticed for the first time today that one other box is
also seeing this bug (gets an oom message), though with much less
frequency: twice in 2 months since upgrading to 4.8.  However, it
recovers from the oom without a reboot and hasn't hanged (yet).  That
could be because this box does not do as much file serving or I/O as
the one I've been building/testing on. Also, this box is a much older
Pentium-D with 4GB (PAE on).  If it would be helpful to see its oom
log, let me know.  (Scanning all my boxes now, I also found 1 single oom
on yet another 1 computer with the same story; but this is a Xeon
E3-1220 32-bit with PAE, 4GB.)

So far the commonality seems to be >2GB RAM and PAE on.  Might be
interesting to boot my build/test box with mem=2G and isolate it to
small RAM vs PAE.  "mem=2G" would make a great, easy, immediate
workaround for this problem for me (as cgroup_disable=memory also seems
to do, so far).  Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-17 Thread Trevor Cordes
On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:  
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]  
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >  
> > > > if (sc->priority != DEF_PRIORITY &&
> > > > +   !buffer_heads_over_limit &&
> > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > continue;   /* Let kswapd
> > > > poll it */  
> > > 
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much. 
> > 
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.  
> 
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

OK, ignore my last question re: what to do next.  I am building
this mhocko git tree now per your above instructions and will reboot
into it in a few hours with*out* the cgroup_disable=memory option.
Might take ~50 hours for a result.

I should note that the workload on the box with the bug is mostly as a
file server and iptables firewall/router.  It routes around 8GB(ytes) a
day, and periodic file server loads.  That's about it.  Everything else
that is running is not doing much, and not using much RAM; except
maybe clamav, by far the biggest RAM.

I don't see this bug on other nearly identical boxes, including:
F24 4.8.15 32-bit (no PAE) 1GB ram P4
F24 4.8.15 32-bit (no PAE) 2GB ram Core2 Quad

However, just noticed for the first time today that one other box is
also seeing this bug (gets an oom message), though with much less
frequency: twice in 2 months since upgrading to 4.8.  However, it
recovers from the oom without a reboot and hasn't hanged (yet).  That
could be because this box does not do as much file serving or I/O as
the one I've been building/testing on. Also, this box is a much older
Pentium-D with 4GB (PAE on).  If it would be helpful to see its oom
log, let me know.  (Scanning all my boxes now, I also found 1 single oom
on yet another 1 computer with the same story; but this is a Xeon
E3-1220 32-bit with PAE, 4GB.)

So far the commonality seems to be >2GB RAM and PAE on.  Might be
interesting to boot my build/test box with mem=2G and isolate it to
small RAM vs PAE.  "mem=2G" would make a great, easy, immediate
workaround for this problem for me (as cgroup_disable=memory also seems
to do, so far).  Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-17 Thread Trevor Cordes
On 2017-01-16 Mel Gorman wrote:
> > > You can easily check whether this is memcg related by trying to
> > > run the same workload with cgroup_disable=memory kernel command
> > > line parameter. This will put all the memcg specifics out of the
> > > way.  
> > 
> > I will try booting now into cgroup_disable=memory to see if that
> > helps at all.  I'll reply back in 48 hours, or when it oom's,
> > whichever comes first.
> >   
> 
> Thanks.

It has successfully survived 70 hours and 2 3am cycles (when it
normally oom's) with your first patch *and* cgroup_disable=memory
grafted on Fedora's 4.8.13.  Since it has never survived 2 3am cycles,
I strongly suspect the cgroup_disable=memory mitigates my bug.

> > Also, should I bother trying the latest git HEAD to see if that
> > solves anything?  Thanks!  
> 
> That's worth trying. If that also fails then could you try the
> following hack to encourage direct reclaim to reclaim slab when
> buffers are over the limit please?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 532a2a750952..46aac487b89a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> *zonelist, struct scan_control *sc) continue;
>  
>   if (sc->priority != DEF_PRIORITY &&
> + !buffer_heads_over_limit &&
>   !pgdat_reclaimable(zone->zone_pgdat))
>   continue;   /* Let kswapd poll
> it */ 

What's the next best step?  HEAD?  HEAD + the above patch?  A new
patch?  I'll start a HEAD compile until I hear more.  I assume I should
test without cgroup_disable=memory as that's just a kludge/workaround,
right?

Also, is there a way to spot the slab pressure you are talking about
before oom's occur?  slabinfo?  I suppose I'd be able to see some
counter slowly getting too high or low?  Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-17 Thread Trevor Cordes
On 2017-01-16 Mel Gorman wrote:
> > > You can easily check whether this is memcg related by trying to
> > > run the same workload with cgroup_disable=memory kernel command
> > > line parameter. This will put all the memcg specifics out of the
> > > way.  
> > 
> > I will try booting now into cgroup_disable=memory to see if that
> > helps at all.  I'll reply back in 48 hours, or when it oom's,
> > whichever comes first.
> >   
> 
> Thanks.

It has successfully survived 70 hours and 2 3am cycles (when it
normally oom's) with your first patch *and* cgroup_disable=memory
grafted on Fedora's 4.8.13.  Since it has never survived 2 3am cycles,
I strongly suspect the cgroup_disable=memory mitigates my bug.

> > Also, should I bother trying the latest git HEAD to see if that
> > solves anything?  Thanks!  
> 
> That's worth trying. If that also fails then could you try the
> following hack to encourage direct reclaim to reclaim slab when
> buffers are over the limit please?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 532a2a750952..46aac487b89a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> *zonelist, struct scan_control *sc) continue;
>  
>   if (sc->priority != DEF_PRIORITY &&
> + !buffer_heads_over_limit &&
>   !pgdat_reclaimable(zone->zone_pgdat))
>   continue;   /* Let kswapd poll
> it */ 

What's the next best step?  HEAD?  HEAD + the above patch?  A new
patch?  I'll start a HEAD compile until I hear more.  I assume I should
test without cgroup_disable=memory as that's just a kludge/workaround,
right?

Also, is there a way to spot the slab pressure you are talking about
before oom's occur?  slabinfo?  I suppose I'd be able to see some
counter slowly getting too high or low?  Thanks!


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-14 Thread Trevor Cordes
On 2017-01-12 Michal Hocko wrote:
> On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> [...]
> > I'm not sure how I can tell if my bug is because of memcgs so here
> > is a full first oom example (attached).  
> 
> 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> stat") so the OOM report will not tell us whether the Normal zone
> doesn't age active lists, unfortunatelly.

I compiled the patch Mel provided into the stock F23 kernel
4.8.13-100.fc23.i686+PAE and it ran for 2 nights.  It didn't oom the
first night, but did the second night.  So the bug persists even with
that patch.  However, it does *seem* a bit "better" since it took 2
nights (usually takes only one, but maybe 10% of the time it does take
two) before oom'ing, *and* it allowed my reboot script to reboot it
cleanly when it saw the oom (which happens only 25% of the time).

I'm attaching the 4.8.13 oom message which should have the memcg info
(71c799f4982d) you are asking for above?  Hopefully?

> You can easily check whether this is memcg related by trying to run
> the same workload with cgroup_disable=memory kernel command line
> parameter. This will put all the memcg specifics out of the way.

I will try booting now into cgroup_disable=memory to see if that helps
at all.  I'll reply back in 48 hours, or when it oom's, whichever comes
first.

Also, should I bother trying the latest git HEAD to see if that solves
anything?  Thanks!


oom2
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-14 Thread Trevor Cordes
On 2017-01-12 Michal Hocko wrote:
> On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> [...]
> > I'm not sure how I can tell if my bug is because of memcgs so here
> > is a full first oom example (attached).  
> 
> 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> stat") so the OOM report will not tell us whether the Normal zone
> doesn't age active lists, unfortunatelly.

I compiled the patch Mel provided into the stock F23 kernel
4.8.13-100.fc23.i686+PAE and it ran for 2 nights.  It didn't oom the
first night, but did the second night.  So the bug persists even with
that patch.  However, it does *seem* a bit "better" since it took 2
nights (usually takes only one, but maybe 10% of the time it does take
two) before oom'ing, *and* it allowed my reboot script to reboot it
cleanly when it saw the oom (which happens only 25% of the time).

I'm attaching the 4.8.13 oom message which should have the memcg info
(71c799f4982d) you are asking for above?  Hopefully?

> You can easily check whether this is memcg related by trying to run
> the same workload with cgroup_disable=memory kernel command line
> parameter. This will put all the memcg specifics out of the way.

I will try booting now into cgroup_disable=memory to see if that helps
at all.  I'll reply back in 48 hours, or when it oom's, whichever comes
first.

Also, should I bother trying the latest git HEAD to see if that solves
anything?  Thanks!


oom2
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-11 Thread Trevor Cordes
On 2017-01-11 Mel Gorman wrote:
> On Wed, Jan 11, 2017 at 12:11:46PM +, Mel Gorman wrote:
> > On Wed, Jan 11, 2017 at 04:32:43AM -0600, Trevor Cordes wrote:  
> > > Hi!  I have biected a nightly oom-killer flood and crash/hang on
> > > one of the boxes I admin.  It doesn't crash on Fedora 23/24
> > > 4.7.10 kernel but does on any 4.8 Fedora kernel.  I did a vanilla
> > > bisect and the bug is here:
> > > 
> > > commit b2e18757f2c9d1cdd746a882e9878852fdec9501
> > > Author: Mel Gorman <mgor...@techsingularity.net>
> > > Date:   Thu Jul 28 15:45:37 2016 -0700
> > > 
> > > mm, vmscan: begin reclaiming pages on a per-node basis
> > >   
> > 
> > Michal Hocko recently worked on a bug similar to this. Can you test
> > the following patch that is currently queued in Andrew Morton's
> > tree? It applies cleanly to 4.9
> >   
> 
> I should have pointed out that this patch primarily affects memcg but
> the bug report did not include an OOM report and did not describe
> whether memcgs could be involved or not. If memcgs are not involved
> then please post the first full OOM kill.

I will apply your patch tonight and it will take 48 hours to confirm
that it is "good" (<24 hours if it's bad), and I will reply back.

I'm not sure how I can tell if my bug is because of memcgs so here is
a full first oom example (attached).

Thanks for the help!


oom-example
Description: Binary data


Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-11 Thread Trevor Cordes
On 2017-01-11 Mel Gorman wrote:
> On Wed, Jan 11, 2017 at 12:11:46PM +, Mel Gorman wrote:
> > On Wed, Jan 11, 2017 at 04:32:43AM -0600, Trevor Cordes wrote:  
> > > Hi!  I have biected a nightly oom-killer flood and crash/hang on
> > > one of the boxes I admin.  It doesn't crash on Fedora 23/24
> > > 4.7.10 kernel but does on any 4.8 Fedora kernel.  I did a vanilla
> > > bisect and the bug is here:
> > > 
> > > commit b2e18757f2c9d1cdd746a882e9878852fdec9501
> > > Author: Mel Gorman 
> > > Date:   Thu Jul 28 15:45:37 2016 -0700
> > > 
> > > mm, vmscan: begin reclaiming pages on a per-node basis
> > >   
> > 
> > Michal Hocko recently worked on a bug similar to this. Can you test
> > the following patch that is currently queued in Andrew Morton's
> > tree? It applies cleanly to 4.9
> >   
> 
> I should have pointed out that this patch primarily affects memcg but
> the bug report did not include an OOM report and did not describe
> whether memcgs could be involved or not. If memcgs are not involved
> then please post the first full OOM kill.

I will apply your patch tonight and it will take 48 hours to confirm
that it is "good" (<24 hours if it's bad), and I will reply back.

I'm not sure how I can tell if my bug is because of memcgs so here is
a full first oom example (attached).

Thanks for the help!


oom-example
Description: Binary data


mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-11 Thread Trevor Cordes
Hi!  I have biected a nightly oom-killer flood and crash/hang on one of 
the boxes I admin.  It doesn't crash on Fedora 23/24 4.7.10 kernel but 
does on any 4.8 Fedora kernel.  I did a vanilla bisect and the bug is 
here:

commit b2e18757f2c9d1cdd746a882e9878852fdec9501
Author: Mel Gorman 
Date:   Thu Jul 28 15:45:37 2016 -0700

mm, vmscan: begin reclaiming pages on a per-node basis

I bisected between:
# bad: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
# good: [523d939ef98fd712632d93a5a2b588e477a7565e] Linux 4.7

I have not tried newer than 4.8.13 Fedora kernel, but if someone thinks 
this bug is already fixed in HEAD I could try that next.  It took 3 weeks 
to bisect because the crash only seems to happen in the middle of the 
night, and not every, but most, nights.

It does not occur on most of my other boxes, just this one.  The box is a 
bit unique in that it's running 32-bit PAE on a 64-bit capable CPU, and I 
have the memory tuned down to mem=6G in the kernel command line (I think 
it has 16GB actual).  I tuned the RAM down because around 8GB the PAE 
kernel has massive IO speed issues.

It is a relatively new Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz on an 
Intel S1200BTL board.  I will eventually change it to 64-bit Fedora which 
I'm sure will solve this bug, but since there's no easy upgrade path, 
that's on the backburner on this production box.

I'm sure this will be another "PAE sucks, don't use it" issue, but like I 
said, I'm currently stuck with it, and in theory the kernel shouldn't 
crash like this (I'm guessing/hoping).

I think I pinned the trigger down to either (or both) big dir scans (like 
"find /bigdir-foo") running at around 3am.  It's either a remote box doing 
indexing via smbd and/or rsync or rdiff-backup also doing big dir scans.  
But when I do "find /" manually I can't trigger the bug.  Very weird.

The commit notes make it sound like the author thought perhaps there could 
be a problem in some scenarios?  I guess I found the scenario.

The only discussion I found on the net regarding this commit is
https://lkml.org/lkml/2016/8/29/154
And perhaps it's somewhat relevant, it's a bit over my head.

I'm available for testing, etc, and can usually rule out a bad kernel 
within 24-hours by just waiting for 3am to roll around.  I also have 
copious logs I can provide and screenshots of the crashes.

The box is extremely lightly loaded, and RAM use is almost always under 
1GB, and swap is 0-20k used most of the time with GB's free.  Everything 
looks great until all of a sudden oom-killer starts running and goes 
through 10-260 iterations before the system just dies.  I wrote a script 
to watch for oom-killer and issue "reboot" immediately, but 80% of the 
time the box will hang before the reboot actually manages to shutdown.

Any information/help I can provide, please just holler.  Thanks!


mm, vmscan: commit makes PAE kernel crash nightly (bisected)

2017-01-11 Thread Trevor Cordes
Hi!  I have biected a nightly oom-killer flood and crash/hang on one of 
the boxes I admin.  It doesn't crash on Fedora 23/24 4.7.10 kernel but 
does on any 4.8 Fedora kernel.  I did a vanilla bisect and the bug is 
here:

commit b2e18757f2c9d1cdd746a882e9878852fdec9501
Author: Mel Gorman 
Date:   Thu Jul 28 15:45:37 2016 -0700

mm, vmscan: begin reclaiming pages on a per-node basis

I bisected between:
# bad: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
# good: [523d939ef98fd712632d93a5a2b588e477a7565e] Linux 4.7

I have not tried newer than 4.8.13 Fedora kernel, but if someone thinks 
this bug is already fixed in HEAD I could try that next.  It took 3 weeks 
to bisect because the crash only seems to happen in the middle of the 
night, and not every, but most, nights.

It does not occur on most of my other boxes, just this one.  The box is a 
bit unique in that it's running 32-bit PAE on a 64-bit capable CPU, and I 
have the memory tuned down to mem=6G in the kernel command line (I think 
it has 16GB actual).  I tuned the RAM down because around 8GB the PAE 
kernel has massive IO speed issues.

It is a relatively new Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz on an 
Intel S1200BTL board.  I will eventually change it to 64-bit Fedora which 
I'm sure will solve this bug, but since there's no easy upgrade path, 
that's on the backburner on this production box.

I'm sure this will be another "PAE sucks, don't use it" issue, but like I 
said, I'm currently stuck with it, and in theory the kernel shouldn't 
crash like this (I'm guessing/hoping).

I think I pinned the trigger down to either (or both) big dir scans (like 
"find /bigdir-foo") running at around 3am.  It's either a remote box doing 
indexing via smbd and/or rsync or rdiff-backup also doing big dir scans.  
But when I do "find /" manually I can't trigger the bug.  Very weird.

The commit notes make it sound like the author thought perhaps there could 
be a problem in some scenarios?  I guess I found the scenario.

The only discussion I found on the net regarding this commit is
https://lkml.org/lkml/2016/8/29/154
And perhaps it's somewhat relevant, it's a bit over my head.

I'm available for testing, etc, and can usually rule out a bad kernel 
within 24-hours by just waiting for 3am to roll around.  I also have 
copious logs I can provide and screenshots of the crashes.

The box is extremely lightly loaded, and RAM use is almost always under 
1GB, and swap is 0-20k used most of the time with GB's free.  Everything 
looks great until all of a sudden oom-killer starts running and goes 
through 10-260 iterations before the system just dies.  I wrote a script 
to watch for oom-killer and issue "reboot" immediately, but 80% of the 
time the box will hang before the reboot actually manages to shutdown.

Any information/help I can provide, please just holler.  Thanks!


Re: netfilter regression causes lost pings "operation not permitted"

2016-12-07 Thread Trevor Cordes
On 2016-12-07 Trevor Cordes wrote:
> Bisected down to:
> 870190a9ec9075205c0fa795a09fa931694a3ff1
> 7c9664351980aaa6a4b8837a314360b3a4ad382a

Oh!  I forgot to mention the most important point: iptable_nat module
MUST be loaded for the bug to show up!

modprobe iptable_nat

If you rmmod it, the bug goes away.  Interestingly, the bug occurs even
if you have every iptables table (including -t nat) completely empty
(no rules).  All that is required is iptable_nat simply to be loaded.


Re: netfilter regression causes lost pings "operation not permitted"

2016-12-07 Thread Trevor Cordes
On 2016-12-07 Trevor Cordes wrote:
> Bisected down to:
> 870190a9ec9075205c0fa795a09fa931694a3ff1
> 7c9664351980aaa6a4b8837a314360b3a4ad382a

Oh!  I forgot to mention the most important point: iptable_nat module
MUST be loaded for the bug to show up!

modprobe iptable_nat

If you rmmod it, the bug goes away.  Interestingly, the bug occurs even
if you have every iptables table (including -t nat) completely empty
(no rules).  All that is required is iptable_nat simply to be loaded.


netfilter regression causes lost pings "operation not permitted"

2016-12-07 Thread Trevor Cordes
Bisected down to:
870190a9ec9075205c0fa795a09fa931694a3ff1
7c9664351980aaa6a4b8837a314360b3a4ad382a

Hi!  4.8.x caused a script of mine that pings all IPs on my LAN /24 subnet 
in about 0.5s, and nmap doing the same, to error on the send() call with 
"operation not permitted".  This happens after a somewhat random number of 
packets have already been sent.  That number shrinks each time you run the 
script, so the first run you'll get up to around 200 pings, then it goes 
down to 50 pings, before the error.  If you wait, it goes back up to 
around 200 pings.  It almost never completes all 253 of them.

Interestingly, the problem only occurs when you ping different IPs.  If 
you send the same ping count using my script to just one IP, there is no 
bug.

4.7.0 kernels don't have this problem: the pings go out and everything is 
fine no matter how fast you repeat the script.

I bisected the bug to the above commits.  I had to skip 
7c9664351980aaa6a4b8837a314360b3a4ad382a because it wouldn't boot... just 
panic on every try.  So I can't narrow it any closer than within 2 
commits.

You can reproduce this bug in 4.8.8 or newer with:

# change to your LAN subnet
nmap -PE 192.168.100.0/24

Or use my test script I will paste below.  (Modify the top lines to suit 
your LAN IPs; or more work for different netmasks.)  Sometimes you have to 
run the script a few times before the error occurs.

When you see "operation not permitted", that's the symptom.  Boot into 
4.7.10, say, and you don't get any error.

I played with all the sysctls that looked relevant, like: ratelimit, 
per_sec, max, etc.  I modified everything I could find but nothing made 
the problem go away, though I *think* some had a modest effect on how many 
times I could run the script before the error popped up, but even if I 
took them to extreme values the bug never went away.

I'm back to the Fedora defaults now:

#sysctl -a | grep -iP 'icmp|nf_|conntrack|iptable'|grep -viP 'nf_log'
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv6.icmp.ratelimit = 1000
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 201
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 262144


Thanks for your help!



TEST SCRIPT:

#!/usr/bin/perl -w
# sorry, cheesy formatting, this is a test case I just slapped together

my $subnet = '192.168.100.';
#my $single = '192.168.101.110';

use Socket;
use Symbol;
use NetAddr::IP::Lite;

sub ICMP_ECHO   ()  { 8 }
sub ICMP_SUBCODE()  { 0 }
sub ICMP_STRUCT ()  { 'C2S3A56' }
sub ICMP_FLAGS  ()  { 0 }
sub ICMP_PORT   ()  { 0 }

$sequence=0;

for $i (2..254) {

$protocol = (getprotobyname('icmp'))[2] or
   die('Cannot get ICMP protocol number by name - ', $!);

$socket = Symbol::gensym;
socket($socket, PF_INET, SOCK_RAW, $protocol) or
die('Cannot create IMCP socket - ', $!);


$sequence = ($sequence+1) & 0x;

my $checksum = 0;
my $msg = pack(
ICMP_STRUCT,
ICMP_ECHO,
ICMP_SUBCODE,
$checksum,
$$ & 0x,
$sequence,
'0' x 56
);

my $short = int(length($msg) / 2);
$checksum += $_ for unpack "S$short", $msg;
$checksum += ord(substr($msg, -1)) if length($msg) % 2;
$checksum = ($checksum >> 16) + ($checksum & 0x);
$checksum = ~(($checksum >> 16) + $checksum) & 0x;

$msg = pack(
ICMP_STRUCT,
ICMP_ECHO,

netfilter regression causes lost pings "operation not permitted"

2016-12-07 Thread Trevor Cordes
Bisected down to:
870190a9ec9075205c0fa795a09fa931694a3ff1
7c9664351980aaa6a4b8837a314360b3a4ad382a

Hi!  4.8.x caused a script of mine that pings all IPs on my LAN /24 subnet 
in about 0.5s, and nmap doing the same, to error on the send() call with 
"operation not permitted".  This happens after a somewhat random number of 
packets have already been sent.  That number shrinks each time you run the 
script, so the first run you'll get up to around 200 pings, then it goes 
down to 50 pings, before the error.  If you wait, it goes back up to 
around 200 pings.  It almost never completes all 253 of them.

Interestingly, the problem only occurs when you ping different IPs.  If 
you send the same ping count using my script to just one IP, there is no 
bug.

4.7.0 kernels don't have this problem: the pings go out and everything is 
fine no matter how fast you repeat the script.

I bisected the bug to the above commits.  I had to skip 
7c9664351980aaa6a4b8837a314360b3a4ad382a because it wouldn't boot... just 
panic on every try.  So I can't narrow it any closer than within 2 
commits.

You can reproduce this bug in 4.8.8 or newer with:

# change to your LAN subnet
nmap -PE 192.168.100.0/24

Or use my test script I will paste below.  (Modify the top lines to suit 
your LAN IPs; or more work for different netmasks.)  Sometimes you have to 
run the script a few times before the error occurs.

When you see "operation not permitted", that's the symptom.  Boot into 
4.7.10, say, and you don't get any error.

I played with all the sysctls that looked relevant, like: ratelimit, 
per_sec, max, etc.  I modified everything I could find but nothing made 
the problem go away, though I *think* some had a modest effect on how many 
times I could run the script before the error popped up, but even if I 
took them to extreme values the bug never went away.

I'm back to the Fedora defaults now:

#sysctl -a | grep -iP 'icmp|nf_|conntrack|iptable'|grep -viP 'nf_log'
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv6.icmp.ratelimit = 1000
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 201
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 262144


Thanks for your help!



TEST SCRIPT:

#!/usr/bin/perl -w
# sorry, cheesy formatting, this is a test case I just slapped together

my $subnet = '192.168.100.';
#my $single = '192.168.101.110';

use Socket;
use Symbol;
use NetAddr::IP::Lite;

sub ICMP_ECHO   ()  { 8 }
sub ICMP_SUBCODE()  { 0 }
sub ICMP_STRUCT ()  { 'C2S3A56' }
sub ICMP_FLAGS  ()  { 0 }
sub ICMP_PORT   ()  { 0 }

$sequence=0;

for $i (2..254) {

$protocol = (getprotobyname('icmp'))[2] or
   die('Cannot get ICMP protocol number by name - ', $!);

$socket = Symbol::gensym;
socket($socket, PF_INET, SOCK_RAW, $protocol) or
die('Cannot create IMCP socket - ', $!);


$sequence = ($sequence+1) & 0x;

my $checksum = 0;
my $msg = pack(
ICMP_STRUCT,
ICMP_ECHO,
ICMP_SUBCODE,
$checksum,
$$ & 0x,
$sequence,
'0' x 56
);

my $short = int(length($msg) / 2);
$checksum += $_ for unpack "S$short", $msg;
$checksum += ord(substr($msg, -1)) if length($msg) % 2;
$checksum = ($checksum >> 16) + ($checksum & 0x);
$checksum = ~(($checksum >> 16) + $checksum) & 0x;

$msg = pack(
ICMP_STRUCT,
ICMP_ECHO,

Re: [PATCH v3] ktime: Fix ktime_divns to do signed division

2015-05-11 Thread Trevor Cordes
On 2015-05-08 John Stultz wrote:
> It was noted that the 32bit implementation of ktime_divns()
> was doing unsigned division and didn't properly handle
> negative values.
[...]

I have compiled, installed and tested (all weekend) the v3 of the patch
against 3.19.5-201.fc21.i686+PAE and it seems to work fine / stable,
and fixes my bug.  I think it's a done deal!  Thanks once again!

> Cc: Nicolas Pitre 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Josh Boyer 
> Cc: One Thousand Gnomes 
> Cc: Trevor Cordes 
> Cc:  # 3.17+ for regression
> Tested-by: Trevor Cordes 
> Reported-by: Trevor Cordes 

Tested-by: Trevor Cordes  [runtime test i686-PAE]

> Signed-off-by: John Stultz 
> ---
> 
> New in v3:
> * Fix casting issue Nicolas pointed out
> * Use WARN_ON for 64bit case
> 
>  include/linux/ktime.h | 27 +++
>  kernel/time/hrtimer.c | 11 ---
>  2 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 5fc3d10..ab2de1c7 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t
> cmp1, const ktime_t cmp2) }
>  
>  #if BITS_PER_LONG < 64
> -extern u64 __ktime_divns(const ktime_t kt, s64 div);
> -static inline u64 ktime_divns(const ktime_t kt, s64 div)
> +extern s64 __ktime_divns(const ktime_t kt, s64 div);
> +static inline s64 ktime_divns(const ktime_t kt, s64 div)
>  {
> + /*
> +  * Negative divisors could cause an inf loop,
> +  * so bug out here.
> +  */
> + BUG_ON(div < 0);
>   if (__builtin_constant_p(div) && !(div >> 32)) {
> - u64 ns = kt.tv64;
> + s64 ns = kt.tv64;
> + int neg = (ns < 0);
> +
> + if (neg)
> + ns = -ns;
>   do_div(ns, div);
> + if (neg)
> + ns = -ns;
>   return ns;
>   } else {
>   return __ktime_divns(kt, div);
>   }
>  }
>  #else /* BITS_PER_LONG < 64 */
> -# define ktime_divns(kt, div)(u64)((kt).tv64 / (div))
> +static inline s64 ktime_divns(const ktime_t kt, s64 div)
> +{
> + /*
> +  * 32-bit implementation cannot handle negative divisors,
> +  * so catch them on 64bit as well.
> +  */
> + WARN_ON(div < 0);
> + return kt.tv64 / div;
> +}
>  #endif
>  
>  static inline s64 ktime_to_us(const ktime_t kt)
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 76d4bd9..c98ce4d 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer,
> unsigned long *flags) /*
>   * Divide a ktime value by a nanosecond value
>   */
> -u64 __ktime_divns(const ktime_t kt, s64 div)
> +s64 __ktime_divns(const ktime_t kt, s64 div)
>  {
> - u64 dclc;
> - int sft = 0;
> + s64 dclc;
> + int neg, sft = 0;
>  
>   dclc = ktime_to_ns(kt);
> + neg = (dclc < 0);
> + if (neg)
> + dclc = -dclc;
>   /* Make sure the divisor is less than 2^32: */
>   while (div >> 32) {
>   sft++;
> @@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
>   }
>   dclc >>= sft;
>   do_div(dclc, (unsigned long) div);
> + if (neg)
> + dclc = -dclc;
>  
>   return dclc;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ktime: Fix ktime_divns to do signed division

2015-05-11 Thread Trevor Cordes
On 2015-05-08 John Stultz wrote:
 It was noted that the 32bit implementation of ktime_divns()
 was doing unsigned division and didn't properly handle
 negative values.
[...]

I have compiled, installed and tested (all weekend) the v3 of the patch
against 3.19.5-201.fc21.i686+PAE and it seems to work fine / stable,
and fixes my bug.  I think it's a done deal!  Thanks once again!

 Cc: Nicolas Pitre nicolas.pi...@linaro.org
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Josh Boyer jwbo...@redhat.com
 Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk
 Cc: Trevor Cordes tre...@tecnopolis.ca
 Cc: sta...@vger.kernel.org # 3.17+ for regression
 Tested-by: Trevor Cordes tre...@tecnopolis.ca
 Reported-by: Trevor Cordes tre...@tecnopolis.ca

Tested-by: Trevor Cordes tre...@tecnopolis.ca [runtime test i686-PAE]

 Signed-off-by: John Stultz john.stu...@linaro.org
 ---
 
 New in v3:
 * Fix casting issue Nicolas pointed out
 * Use WARN_ON for 64bit case
 
  include/linux/ktime.h | 27 +++
  kernel/time/hrtimer.c | 11 ---
  2 files changed, 31 insertions(+), 7 deletions(-)
 
 diff --git a/include/linux/ktime.h b/include/linux/ktime.h
 index 5fc3d10..ab2de1c7 100644
 --- a/include/linux/ktime.h
 +++ b/include/linux/ktime.h
 @@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t
 cmp1, const ktime_t cmp2) }
  
  #if BITS_PER_LONG  64
 -extern u64 __ktime_divns(const ktime_t kt, s64 div);
 -static inline u64 ktime_divns(const ktime_t kt, s64 div)
 +extern s64 __ktime_divns(const ktime_t kt, s64 div);
 +static inline s64 ktime_divns(const ktime_t kt, s64 div)
  {
 + /*
 +  * Negative divisors could cause an inf loop,
 +  * so bug out here.
 +  */
 + BUG_ON(div  0);
   if (__builtin_constant_p(div)  !(div  32)) {
 - u64 ns = kt.tv64;
 + s64 ns = kt.tv64;
 + int neg = (ns  0);
 +
 + if (neg)
 + ns = -ns;
   do_div(ns, div);
 + if (neg)
 + ns = -ns;
   return ns;
   } else {
   return __ktime_divns(kt, div);
   }
  }
  #else /* BITS_PER_LONG  64 */
 -# define ktime_divns(kt, div)(u64)((kt).tv64 / (div))
 +static inline s64 ktime_divns(const ktime_t kt, s64 div)
 +{
 + /*
 +  * 32-bit implementation cannot handle negative divisors,
 +  * so catch them on 64bit as well.
 +  */
 + WARN_ON(div  0);
 + return kt.tv64 / div;
 +}
  #endif
  
  static inline s64 ktime_to_us(const ktime_t kt)
 diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
 index 76d4bd9..c98ce4d 100644
 --- a/kernel/time/hrtimer.c
 +++ b/kernel/time/hrtimer.c
 @@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer,
 unsigned long *flags) /*
   * Divide a ktime value by a nanosecond value
   */
 -u64 __ktime_divns(const ktime_t kt, s64 div)
 +s64 __ktime_divns(const ktime_t kt, s64 div)
  {
 - u64 dclc;
 - int sft = 0;
 + s64 dclc;
 + int neg, sft = 0;
  
   dclc = ktime_to_ns(kt);
 + neg = (dclc  0);
 + if (neg)
 + dclc = -dclc;
   /* Make sure the divisor is less than 2^32: */
   while (div  32) {
   sft++;
 @@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
   }
   dclc = sft;
   do_div(dclc, (unsigned long) div);
 + if (neg)
 + dclc = -dclc;
  
   return dclc;
  }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-02 Thread Trevor Cordes
On 2015-05-01 John Stultz wrote:
> It was noted that the 32bit implementation of ktime_divns
> was doing unsgined division adn didn't properly handle
> negative values.
> 
> This patch fixes the problem by checking and preserving
> the sign bit, and then reapplying it if appropriate
> after the division.

I worked your new patch into my test system into Fedora 21's latest
3.19.5-200.fc21 (they haven't switched to 4.0 yet).  I had to add in
Nicolas' 8b618628 commit as well to get this new patch to go into
3.19.5.

After rebuild and reboot I can confirm this patch fixes my bug: irsend
doesn't hang lircd.  I'll report if anything else weird goes on, but
this all seems pretty straightforward so I doubt I'll have any problems.

> I'll send it out here shortly. If you could give it a spin at your
> leisure, and if it works give me a Tested-by: tag I'd appreciate it!

I'm not quite sure how to give a Tested-by, but from the minimal docs I
found on the net, I am trying below (after your Signed-off-by tag).  If
I need to do something else, please point me in the general direction
of instructions.

> Great work again on chasing this down, and thanks for helping with
> debugging and validating the fix!

Thanks!  I'm really glad this is getting worked out and am happy to
help.  It's a big step forward for me to move past simple bugzilla onto
kernel bisects.  I plan on giving a presentation about all this at my
local Linux user group.  It'll be nice to report the happy ending!  You
guys have been great.  I guess kernel bugzilla isn't the place to get
help: it's here on the LKML.

P.S. Here's the kernel bz link, we/I can close it with the results once
this is all done:
https://bugzilla.kernel.org/show_bug.cgi?id=95431

> Cc: Nicolas Pitre 
> Cc: Thomas Gleixner 
> Cc: Josh Boyer 
> Cc: One Thousand Gnomes 
> Reported-by: Trevor Cordes 
> Signed-off-by: John Stultz 
Tested-by: Trevor Cordes  (runtime test on i686)
> ---
>  include/linux/ktime.h | 12 ++--
>  kernel/time/hrtimer.c | 11 +--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 5fc3d10..d947263 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -166,12 +166,20 @@ static inline bool ktime_before(const ktime_t
> cmp1, const ktime_t cmp2) }
>  
>  #if BITS_PER_LONG < 64
> -extern u64 __ktime_divns(const ktime_t kt, s64 div);
> +extern s64 __ktime_divns(const ktime_t kt, s64 div);
>  static inline u64 ktime_divns(const ktime_t kt, s64 div)
>  {
>   if (__builtin_constant_p(div) && !(div >> 32)) {
> - u64 ns = kt.tv64;
> + s64 ns = kt.tv64;
> + int neg = 0;
> +
> + if (ns < 0) {
> + neg = 1;
> + ns = -ns;
> + }
>   do_div(ns, div);
> + if (neg)
> + ns = -ns;
>   return ns;
>   } else {
>   return __ktime_divns(kt, div);
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 76d4bd9..4c1b294 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -266,12 +266,17 @@ lock_hrtimer_base(const struct hrtimer *timer,
> unsigned long *flags) /*
>   * Divide a ktime value by a nanosecond value
>   */
> -u64 __ktime_divns(const ktime_t kt, s64 div)
> +s64 __ktime_divns(const ktime_t kt, s64 div)
>  {
> - u64 dclc;
> + s64 dclc;
>   int sft = 0;
> + int neg = 0;
>  
>   dclc = ktime_to_ns(kt);
> + if (dclc < 0) {
> + neg = 1;
> + dclc = -dclc;
> + }
>   /* Make sure the divisor is less than 2^32: */
>   while (div >> 32) {
>   sft++;
> @@ -279,6 +284,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
>   }
>   dclc >>= sft;
>   do_div(dclc, (unsigned long) div);
> + if (neg)
> + dclc = -dclc;
>  
>   return dclc;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-02 Thread Trevor Cordes
On 2015-05-01 John Stultz wrote:
 It was noted that the 32bit implementation of ktime_divns
 was doing unsgined division adn didn't properly handle
 negative values.
 
 This patch fixes the problem by checking and preserving
 the sign bit, and then reapplying it if appropriate
 after the division.

I worked your new patch into my test system into Fedora 21's latest
3.19.5-200.fc21 (they haven't switched to 4.0 yet).  I had to add in
Nicolas' 8b618628 commit as well to get this new patch to go into
3.19.5.

After rebuild and reboot I can confirm this patch fixes my bug: irsend
doesn't hang lircd.  I'll report if anything else weird goes on, but
this all seems pretty straightforward so I doubt I'll have any problems.

 I'll send it out here shortly. If you could give it a spin at your
 leisure, and if it works give me a Tested-by: tag I'd appreciate it!

I'm not quite sure how to give a Tested-by, but from the minimal docs I
found on the net, I am trying below (after your Signed-off-by tag).  If
I need to do something else, please point me in the general direction
of instructions.

 Great work again on chasing this down, and thanks for helping with
 debugging and validating the fix!

Thanks!  I'm really glad this is getting worked out and am happy to
help.  It's a big step forward for me to move past simple bugzilla onto
kernel bisects.  I plan on giving a presentation about all this at my
local Linux user group.  It'll be nice to report the happy ending!  You
guys have been great.  I guess kernel bugzilla isn't the place to get
help: it's here on the LKML.

P.S. Here's the kernel bz link, we/I can close it with the results once
this is all done:
https://bugzilla.kernel.org/show_bug.cgi?id=95431

 Cc: Nicolas Pitre nicolas.pi...@linaro.org
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Josh Boyer jwbo...@redhat.com
 Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk
 Reported-by: Trevor Cordes tre...@tecnopolis.ca
 Signed-off-by: John Stultz john.stu...@linaro.org
Tested-by: Trevor Cordes tre...@tecnopolis.ca (runtime test on i686)
 ---
  include/linux/ktime.h | 12 ++--
  kernel/time/hrtimer.c | 11 +--
  2 files changed, 19 insertions(+), 4 deletions(-)
 
 diff --git a/include/linux/ktime.h b/include/linux/ktime.h
 index 5fc3d10..d947263 100644
 --- a/include/linux/ktime.h
 +++ b/include/linux/ktime.h
 @@ -166,12 +166,20 @@ static inline bool ktime_before(const ktime_t
 cmp1, const ktime_t cmp2) }
  
  #if BITS_PER_LONG  64
 -extern u64 __ktime_divns(const ktime_t kt, s64 div);
 +extern s64 __ktime_divns(const ktime_t kt, s64 div);
  static inline u64 ktime_divns(const ktime_t kt, s64 div)
  {
   if (__builtin_constant_p(div)  !(div  32)) {
 - u64 ns = kt.tv64;
 + s64 ns = kt.tv64;
 + int neg = 0;
 +
 + if (ns  0) {
 + neg = 1;
 + ns = -ns;
 + }
   do_div(ns, div);
 + if (neg)
 + ns = -ns;
   return ns;
   } else {
   return __ktime_divns(kt, div);
 diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
 index 76d4bd9..4c1b294 100644
 --- a/kernel/time/hrtimer.c
 +++ b/kernel/time/hrtimer.c
 @@ -266,12 +266,17 @@ lock_hrtimer_base(const struct hrtimer *timer,
 unsigned long *flags) /*
   * Divide a ktime value by a nanosecond value
   */
 -u64 __ktime_divns(const ktime_t kt, s64 div)
 +s64 __ktime_divns(const ktime_t kt, s64 div)
  {
 - u64 dclc;
 + s64 dclc;
   int sft = 0;
 + int neg = 0;
  
   dclc = ktime_to_ns(kt);
 + if (dclc  0) {
 + neg = 1;
 + dclc = -dclc;
 + }
   /* Make sure the divisor is less than 2^32: */
   while (div  32) {
   sft++;
 @@ -279,6 +284,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
   }
   dclc = sft;
   do_div(dclc, (unsigned long) div);
 + if (neg)
 + dclc = -dclc;
  
   return dclc;
  }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Trevor Cordes
On 2015-04-30 John Stultz wrote:
> >From your description it does seem like some sort of edge case
> >problem
> w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
> with Alan to do both calculations and print out warn when that
> happens.
> 
> There's also not a ton of users of that function, but ktime_us_delta()
> is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
> ir_lirc_transmit_ir().
> 
> We should instrument that to see if its calculating negative deltas.

Thanks for looking at this!  I didn't have the know-how to add the
debug code myself (I was scared this fn might be called a zillion times
in other kernel operations and overload the system with debug logging).

> I'll send you a debug patch to do the above.

Got the patch.  Sorry for the delay (takes hours to compile using
rpmbuild on this old P4 box).

I think I have "success" in terms of useful debug output (relevant
lines only):

May  1 04:41:08 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on 
/var/run/lirc/lircd
May  1 04:41:11 piles kernel: [   55.265023] JDB: ktime_to_us: -20782699 -> 
divns 18446744073688768 != old method: -20783
May  1 04:44:00 piles lircd-0.9.1a[978]: removed client
May  1 04:44:00 piles lircd-0.9.1a[978]: caught signal
May  1 04:44:00 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:03 piles kernel: [  287.445027] JDB: ktime_to_us: -20599906 -> 
divns 18446744073688951 != old method: -20600
May  1 04:45:37 piles lircd-0.9.1a[1523]: removed client
May  1 04:45:37 piles lircd-0.9.1a[1523]: caught signal
May  1 04:45:37 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:40 piles kernel: [  324.209023] JDB: ktime_to_us: -20443355 -> 
divns 18446744073689108 != old method: -20444
May  1 04:46:12 piles lircd-0.9.1a[1579]: removed client
May  1 04:46:12 piles lircd-0.9.1a[1579]: caught signal
May  1 04:46:12 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on 
/var/run/lirc/lircd
May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 -> 
divns 18446744073689394 != old method: -20158

The last 2-3 or 3 groups of output I could produce on demand by stopping 
mythbackend and running:
systemctl restart lircd.service ; irsend SEND_ONCE dct700 info

Subsequent irsends don't trigger the bug, since (as I found out a
while ago) by that point lircd is "hung", at least for a long while.
Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
If this result is used as a delay timer, the negative would produce
0 delay, and the + number the "hang".  I calculate that hang is 584
years?  :-)

So it looks like maybe my theory wasn't so wacky: we're dealing
with a caller passing negative numbers (or 32/64 weirdness).  Very
strange as it seems the caller *wants* (or is happy with) negative
numbers!

Let me know if you need any more debugging/patch-tests.  But give
me 4+ hours between rpmbuilds (probably my responses will be 24-hr
later).

Thanks a million!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Trevor Cordes
On 2015-04-30 John Stultz wrote:
 From your description it does seem like some sort of edge case
 problem
 w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
 with Alan to do both calculations and print out warn when that
 happens.
 
 There's also not a ton of users of that function, but ktime_us_delta()
 is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
 ir_lirc_transmit_ir().
 
 We should instrument that to see if its calculating negative deltas.

Thanks for looking at this!  I didn't have the know-how to add the
debug code myself (I was scared this fn might be called a zillion times
in other kernel operations and overload the system with debug logging).

 I'll send you a debug patch to do the above.

Got the patch.  Sorry for the delay (takes hours to compile using
rpmbuild on this old P4 box).

I think I have success in terms of useful debug output (relevant
lines only):

May  1 04:41:08 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on 
/var/run/lirc/lircd
May  1 04:41:11 piles kernel: [   55.265023] JDB: ktime_to_us: -20782699 - 
divns 18446744073688768 != old method: -20783
May  1 04:44:00 piles lircd-0.9.1a[978]: removed client
May  1 04:44:00 piles lircd-0.9.1a[978]: caught signal
May  1 04:44:00 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:03 piles kernel: [  287.445027] JDB: ktime_to_us: -20599906 - 
divns 18446744073688951 != old method: -20600
May  1 04:45:37 piles lircd-0.9.1a[1523]: removed client
May  1 04:45:37 piles lircd-0.9.1a[1523]: caught signal
May  1 04:45:37 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:40 piles kernel: [  324.209023] JDB: ktime_to_us: -20443355 - 
divns 18446744073689108 != old method: -20444
May  1 04:46:12 piles lircd-0.9.1a[1579]: removed client
May  1 04:46:12 piles lircd-0.9.1a[1579]: caught signal
May  1 04:46:12 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on 
/var/run/lirc/lircd
May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 - 
divns 18446744073689394 != old method: -20158

The last 2-3 or 3 groups of output I could produce on demand by stopping 
mythbackend and running:
systemctl restart lircd.service ; irsend SEND_ONCE dct700 info

Subsequent irsends don't trigger the bug, since (as I found out a
while ago) by that point lircd is hung, at least for a long while.
Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
If this result is used as a delay timer, the negative would produce
0 delay, and the + number the hang.  I calculate that hang is 584
years?  :-)

So it looks like maybe my theory wasn't so wacky: we're dealing
with a caller passing negative numbers (or 32/64 weirdness).  Very
strange as it seems the caller *wants* (or is happy with) negative
numbers!

Let me know if you need any more debugging/patch-tests.  But give
me 4+ hours between rpmbuilds (probably my responses will be 24-hr
later).

Thanks a million!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-29 Thread Trevor Cordes
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
from last month which didn't get any replies.  I'm still having this
weird kernel bug affecting me and I've bisected it down to like 2-4
lines of code.  (I've thought more about my theory regarding
unsigned/signed below and it's probably wrong, so ignore my
prognosticating.)  Please see my rhbz link near the bottom for the full
details.

Thanks everyone!

On 2015-03-23 Trevor Cordes wrote:
> Hello everyone, this is my first attempt at bisecting a kernel to
> solve a bug.  Please bear with me.
> 
> I have successfully bisected and located a commit that is causing my 
> problem.  Look at commit 166afb64511.
> 
> ktime_to_us returns s64, but the commit changes it so ktime_to_us
> just returns what ktime_divns returns, and ktime_divns returns a
> u64!  If the u64 is big enough, wouldn't it wrap s64 around to a
> negative number?  Or, perhaps if some caller is passing in negative
> ktime_t to begin with it will trigger without having to hit big
> numbers.  With my limited knowledge of C, I am stabbing in the dark
> here.
> 
> That's just my guess as to why this commit causes my problem.  My bug 
> symptom is my previously working MythTV lirc blaster no longer
> reliably sends IR signals.  Using irsend to test I can see irsend is
> just timing out (and only sometimes blasts, usually the first
> attempt).  On good kernels it returns immediately after blasting.
> 
> This little patch (at bottom of email) that puts the code back in
> place and gets rid of the function call fixes the problem for me.  I
> applied this patch to the very latest FC21
> kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is
> gone!  I can once again MythTV.  Hooray.
> 
> I suspect no one else is seeing this because less people are running 
> 32-bit now, and perhaps in most code paths the value of the u64 never
> gets above 2^63.  I suspect something in drivers/media (possibly) is
> passing very high or negative values (possibly another bug) to these
> calls.
> 
> Obviously my patch isn't the real solution, the real solution is to
> make the new function calls use a consistent 64-bit type, or figure
> out what in my code path is calling these functions and check it for
> value sanity.
> 
> I've documented the whole process / details of this bug in RHBZ:
> https://bugzilla.redhat.com/show_bug.cgi?id=1200353
> 
> Thanks!
> 
> diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
> --- a/include/linux/ktime.h   2015-02-08 20:54:22.0 -0600
> +++ b/include/linux/ktime.h   2015-03-23 01:09:43.0 -0500
> @@ -173,12 +173,16 @@
>  
>  static inline s64 ktime_to_us(const ktime_t kt)
>  {
> - return ktime_divns(kt, NSEC_PER_USEC);
> +/*   return ktime_divns(kt, NSEC_PER_USEC); */
> + struct timeval tv = ktime_to_timeval(kt);
> + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
>  }
>  
>  static inline s64 ktime_to_ms(const ktime_t kt)
>  {
> - return ktime_divns(kt, NSEC_PER_MSEC);
> +/*   return ktime_divns(kt, NSEC_PER_MSEC); */
> + struct timeval tv = ktime_to_timeval(kt);
> + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec /
> USEC_PER_MSEC; }
>  
>  static inline s64 ktime_us_delta(const ktime_t later, const ktime_t
> earlier)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-29 Thread Trevor Cordes
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
from last month which didn't get any replies.  I'm still having this
weird kernel bug affecting me and I've bisected it down to like 2-4
lines of code.  (I've thought more about my theory regarding
unsigned/signed below and it's probably wrong, so ignore my
prognosticating.)  Please see my rhbz link near the bottom for the full
details.

Thanks everyone!

On 2015-03-23 Trevor Cordes wrote:
 Hello everyone, this is my first attempt at bisecting a kernel to
 solve a bug.  Please bear with me.
 
 I have successfully bisected and located a commit that is causing my 
 problem.  Look at commit 166afb64511.
 
 ktime_to_us returns s64, but the commit changes it so ktime_to_us
 just returns what ktime_divns returns, and ktime_divns returns a
 u64!  If the u64 is big enough, wouldn't it wrap s64 around to a
 negative number?  Or, perhaps if some caller is passing in negative
 ktime_t to begin with it will trigger without having to hit big
 numbers.  With my limited knowledge of C, I am stabbing in the dark
 here.
 
 That's just my guess as to why this commit causes my problem.  My bug 
 symptom is my previously working MythTV lirc blaster no longer
 reliably sends IR signals.  Using irsend to test I can see irsend is
 just timing out (and only sometimes blasts, usually the first
 attempt).  On good kernels it returns immediately after blasting.
 
 This little patch (at bottom of email) that puts the code back in
 place and gets rid of the function call fixes the problem for me.  I
 applied this patch to the very latest FC21
 kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is
 gone!  I can once again MythTV.  Hooray.
 
 I suspect no one else is seeing this because less people are running 
 32-bit now, and perhaps in most code paths the value of the u64 never
 gets above 2^63.  I suspect something in drivers/media (possibly) is
 passing very high or negative values (possibly another bug) to these
 calls.
 
 Obviously my patch isn't the real solution, the real solution is to
 make the new function calls use a consistent 64-bit type, or figure
 out what in my code path is calling these functions and check it for
 value sanity.
 
 I've documented the whole process / details of this bug in RHBZ:
 https://bugzilla.redhat.com/show_bug.cgi?id=1200353
 
 Thanks!
 
 diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
 --- a/include/linux/ktime.h   2015-02-08 20:54:22.0 -0600
 +++ b/include/linux/ktime.h   2015-03-23 01:09:43.0 -0500
 @@ -173,12 +173,16 @@
  
  static inline s64 ktime_to_us(const ktime_t kt)
  {
 - return ktime_divns(kt, NSEC_PER_USEC);
 +/*   return ktime_divns(kt, NSEC_PER_USEC); */
 + struct timeval tv = ktime_to_timeval(kt);
 + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
  }
  
  static inline s64 ktime_to_ms(const ktime_t kt)
  {
 - return ktime_divns(kt, NSEC_PER_MSEC);
 +/*   return ktime_divns(kt, NSEC_PER_MSEC); */
 + struct timeval tv = ktime_to_timeval(kt);
 + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec /
 USEC_PER_MSEC; }
  
  static inline s64 ktime_us_delta(const ktime_t later, const ktime_t
 earlier)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-03-23 Thread Trevor Cordes
Hello everyone, this is my first attempt at bisecting a kernel to solve a 
bug.  Please bear with me.

I have successfully bisected and located a commit that is causing my 
problem.  Look at commit 166afb64511.

ktime_to_us returns s64, but the commit changes it so ktime_to_us just 
returns what ktime_divns returns, and ktime_divns returns a u64!  If the 
u64 is big enough, wouldn't it wrap s64 around to a negative number?  Or, 
perhaps if some caller is passing in negative ktime_t to begin with it 
will trigger without having to hit big numbers.  With my limited knowledge 
of C, I am stabbing in the dark here.

That's just my guess as to why this commit causes my problem.  My bug 
symptom is my previously working MythTV lirc blaster no longer reliably 
sends IR signals.  Using irsend to test I can see irsend is just timing 
out (and only sometimes blasts, usually the first attempt).  On good 
kernels it returns immediately after blasting.

This little patch (at bottom of email) that puts the code back in place 
and gets rid of the function call fixes the problem for me.  I applied 
this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm 
and rpmbuilded and the bug is gone!  I can once again MythTV.  Hooray.

I suspect no one else is seeing this because less people are running 
32-bit now, and perhaps in most code paths the value of the u64 never gets 
above 2^63.  I suspect something in drivers/media (possibly) is passing 
very high or negative values (possibly another bug) to these calls.

Obviously my patch isn't the real solution, the real solution is to make 
the new function calls use a consistent 64-bit type, or figure out what in 
my code path is calling these functions and check it for value sanity.

I've documented the whole process / details of this bug in RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353

Thanks!

diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
--- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600
+++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500
@@ -173,12 +173,16 @@
 
 static inline s64 ktime_to_us(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_USEC);
+/* return ktime_divns(kt, NSEC_PER_USEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }
 
 static inline s64 ktime_to_ms(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_MSEC);
+/* return ktime_divns(kt, NSEC_PER_MSEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC;
 }
 
 static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-03-23 Thread Trevor Cordes
Hello everyone, this is my first attempt at bisecting a kernel to solve a 
bug.  Please bear with me.

I have successfully bisected and located a commit that is causing my 
problem.  Look at commit 166afb64511.

ktime_to_us returns s64, but the commit changes it so ktime_to_us just 
returns what ktime_divns returns, and ktime_divns returns a u64!  If the 
u64 is big enough, wouldn't it wrap s64 around to a negative number?  Or, 
perhaps if some caller is passing in negative ktime_t to begin with it 
will trigger without having to hit big numbers.  With my limited knowledge 
of C, I am stabbing in the dark here.

That's just my guess as to why this commit causes my problem.  My bug 
symptom is my previously working MythTV lirc blaster no longer reliably 
sends IR signals.  Using irsend to test I can see irsend is just timing 
out (and only sometimes blasts, usually the first attempt).  On good 
kernels it returns immediately after blasting.

This little patch (at bottom of email) that puts the code back in place 
and gets rid of the function call fixes the problem for me.  I applied 
this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm 
and rpmbuilded and the bug is gone!  I can once again MythTV.  Hooray.

I suspect no one else is seeing this because less people are running 
32-bit now, and perhaps in most code paths the value of the u64 never gets 
above 2^63.  I suspect something in drivers/media (possibly) is passing 
very high or negative values (possibly another bug) to these calls.

Obviously my patch isn't the real solution, the real solution is to make 
the new function calls use a consistent 64-bit type, or figure out what in 
my code path is calling these functions and check it for value sanity.

I've documented the whole process / details of this bug in RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353

Thanks!

diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
--- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600
+++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500
@@ -173,12 +173,16 @@
 
 static inline s64 ktime_to_us(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_USEC);
+/* return ktime_divns(kt, NSEC_PER_USEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }
 
 static inline s64 ktime_to_ms(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_MSEC);
+/* return ktime_divns(kt, NSEC_PER_MSEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC;
 }
 
 static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


memory bug ever since 3.12, oom-killer invoked, computer freezes

2014-07-08 Thread Trevor Cordes
Excuse a novice on his first post to this list.  I have tried to obtain
help elsewhere with no success.

I have been dealing with a bad kernel bug since 3.12 came out.  It is
present in 3.12, 3.13 and 3.14 up to 3.14.8 (Fedora 19 kernel).

What happens is around the same time every day, using the buggy
kernels, I get dozens of oom-killer messages over about 3-5 minutes,
the system slows to a crawl instantly, and usually freezes (numlock no
longer works, etc) within a few minutes.

Using 3.11, the system runs fine, there is no bug.

I think I have isolated the trigger of the problem to a simple
backup-helper script I run nightly at the same time.  I have come to
this conclusion based on the fact I can run in 3.14 for many days with
no problems if I disable my script from running.  As soon as I enable
the script, the bug will hit the subsequent morning at the same time as
usual.  Again, in 3.11 there is no bug even if my script is running.

I have made a RH bugzilla bug for this that contains even more detail:
https://bugzilla.redhat.com/show_bug.cgi?id=1075185

My script looks like this (simplified):
#!/bin/perl
$dirs="/ /mnt/peecee/DATA";
$Ddest="/data/Bak/FindList";
system "/bin/nice -n19 /usr/bin/ionice -c2 -n7 -t find $dirs -xdev -ls
2>/dev/null > $Ddest/find-list";

Notes: /mnt/peecee is a cifs share (old XP box).  $Ddest is an NFS
mount on my file server.

This script runs in about 1 min when nothing is cached, about 10s when
everything is cached.

I can run this script 200 times over and over again manually for
testing (not via the usual cron) and it does NOT trigger the bug.  It
is only when I enable this script via cron that the bug occurs.

I have captured key /proc files at moments in time before/during the
bug occurring, which may help figure out the problem.  I have attached
those files to the bugzilla linked above.  I can post them here if
required.  I can obtain more/finer results if required.  I can
reproduce this bug "sort of on demand" by enabling my script to run the
following morning.

Known buggy kernels:
3.14.8-100.fc19
3.14.4-100.fc19
3.13.9-100.fc19
3.13.5-103.fc19
3.12.9-201.fc19

Known good kernel:
3.11.10-200.fc19

My kernels are all 32-bit, PAE.

My / is md RAID1.  The disks are 15k UW-SCSI enterprise drives.  The
controller is Adaptec AIC-7892A U160/m, a 29160 card I believe.  I am
usually tainted with Nvidia video driver binary, but can untaint for
purposes of testing.

I wanted to bisect to help figure this out but cannot using Fedora
tools due to bug in 32-bit python libraries.  I don't know how to
bisect the vanilla kernel whilst still incorporating all Fedora tweaks
without using Fedora tools.

I did much googling and discovered this thread which sounds very much
related to my problem, though not an exact duplicate:
http://marc.info/?l=linux-mm=139267140606805=2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


memory bug ever since 3.12, oom-killer invoked, computer freezes

2014-07-08 Thread Trevor Cordes
Excuse a novice on his first post to this list.  I have tried to obtain
help elsewhere with no success.

I have been dealing with a bad kernel bug since 3.12 came out.  It is
present in 3.12, 3.13 and 3.14 up to 3.14.8 (Fedora 19 kernel).

What happens is around the same time every day, using the buggy
kernels, I get dozens of oom-killer messages over about 3-5 minutes,
the system slows to a crawl instantly, and usually freezes (numlock no
longer works, etc) within a few minutes.

Using 3.11, the system runs fine, there is no bug.

I think I have isolated the trigger of the problem to a simple
backup-helper script I run nightly at the same time.  I have come to
this conclusion based on the fact I can run in 3.14 for many days with
no problems if I disable my script from running.  As soon as I enable
the script, the bug will hit the subsequent morning at the same time as
usual.  Again, in 3.11 there is no bug even if my script is running.

I have made a RH bugzilla bug for this that contains even more detail:
https://bugzilla.redhat.com/show_bug.cgi?id=1075185

My script looks like this (simplified):
#!/bin/perl
$dirs=/ /mnt/peecee/DATA;
$Ddest=/data/Bak/FindList;
system /bin/nice -n19 /usr/bin/ionice -c2 -n7 -t find $dirs -xdev -ls
2/dev/null  $Ddest/find-list;

Notes: /mnt/peecee is a cifs share (old XP box).  $Ddest is an NFS
mount on my file server.

This script runs in about 1 min when nothing is cached, about 10s when
everything is cached.

I can run this script 200 times over and over again manually for
testing (not via the usual cron) and it does NOT trigger the bug.  It
is only when I enable this script via cron that the bug occurs.

I have captured key /proc files at moments in time before/during the
bug occurring, which may help figure out the problem.  I have attached
those files to the bugzilla linked above.  I can post them here if
required.  I can obtain more/finer results if required.  I can
reproduce this bug sort of on demand by enabling my script to run the
following morning.

Known buggy kernels:
3.14.8-100.fc19
3.14.4-100.fc19
3.13.9-100.fc19
3.13.5-103.fc19
3.12.9-201.fc19

Known good kernel:
3.11.10-200.fc19

My kernels are all 32-bit, PAE.

My / is md RAID1.  The disks are 15k UW-SCSI enterprise drives.  The
controller is Adaptec AIC-7892A U160/m, a 29160 card I believe.  I am
usually tainted with Nvidia video driver binary, but can untaint for
purposes of testing.

I wanted to bisect to help figure this out but cannot using Fedora
tools due to bug in 32-bit python libraries.  I don't know how to
bisect the vanilla kernel whilst still incorporating all Fedora tweaks
without using Fedora tools.

I did much googling and discovered this thread which sounds very much
related to my problem, though not an exact duplicate:
http://marc.info/?l=linux-mmm=139267140606805w=2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/