Re: reproducable panic eviction work queue
Op 7/22/2015 om 4:14 PM schreef Nikolay Aleksandrov: On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote: On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. Thanks Nik. I'm afraid this adds bug when netns is exiting. Currently, we wait until timer has finished, but after the change we might destroy percpu counter while a timer is still executing on another cpu. I pushed a patch series to https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02 It includes this patch with a small change -- deferral of the percpu counter subtraction until after queue has been free'd. Frank -- it would be great if you could test with the four patches in that series applied. I'll then add your tested-by Tag to all of them before submitting this. Thanks again for all your help in getting this fixed! Sure, I didn't think it through, just supplied it for the test. :-) Thanks for fixing it up! Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally cleared this way. I'll give them a try. Hi, I'm currently building a new kernel bases on 3.18.19 + patches. One of the patches however fails to apply as we dont have a net/ieee802154/6lowpan/ directory. Modifying the patch to use net/ieee802154/reassembly.c does work without problems. Is this a due to the different kernel version or something else? I'll come back to you as soon as I have my first test results. Thanks, Frank -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Op 7/22/2015 om 10:09 AM schreef Frank Schreuder: Op 7/21/2015 om 8:34 PM schreef Florian Westphal: Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags INET_FRAG_EVICTED test is not reliable. It would be great if you could confirm that this fixes the problem for you, we'll then make formal patch submission. Please apply this on kernel without previous test patches, wheter you use affected -stable or net-next kernel shouldn't matter since those are similar enough. Many thanks! diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_tlock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16max_size; struct netns_frags*net; +struct hlist_nodelist_evictor; }; #define INETFRAGS_HASHSZ1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a0..1722348 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -151,14 +151,13 @@ evict_again: } fq-flags |= INET_FRAG_EVICTED; -hlist_del(fq-list); -hlist_add_head(fq-list, expired); +hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); -hlist_for_each_entry_safe(fq, n, expired, list) +hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); -if (!(fq-flags INET_FRAG_EVICTED)) -hlist_del(fq-list); +hlist_del(fq-list); spin_unlock(hb-chain_lock); } Hi Florian, Thanks for the patch! After implementing the patch in our setup we are no longer able to reproduct the kernel panic. Unfortunately the server load increases after 5/10 minutes and the logs are getting spammed with stacktraces. I included a snippet below. Do you have any insights on why this happens, and how we can resolve this? Thanks, Frank Jul 22 09:44:17 dommy0 kernel: [ 360.121516] Modules linked in: parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 pps_core mdio Jul 22 09:44:17 dommy0 kernel: [ 360.121560] CPU: 3 PID: 42 Comm: kworker/3:1 Tainted: GWL 3.18.18-transip-1.6 #1 Jul 22 09:44:17 dommy0 kernel: [ 360.121562] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013 Jul 22 09:44:17 dommy0 kernel: [ 360.121567] Workqueue: events inet_frag_worker Jul 22 09:44:17 dommy0 kernel: [ 360.121568] task: 880224574490 ti: 8802240a task.ti: 8802240a Jul 22 09:44:17 dommy0 kernel: [ 360.121570] RIP: 0010:[810c0872] [810c0872] del_timer_sync+0x42/0x60 Jul 22 09:44:17 dommy0 kernel: [ 360.121575] RSP: 0018:8802240a3d48 EFLAGS: 0246 Jul 22 09:44:17 dommy0 kernel: [ 360.121576] RAX: 0200 RBX: RCX: Jul 22 09:44:17 dommy0 kernel: [ 360.121578] RDX: 88022215ce40 RSI: 0030 RDI: 88022215cdf0 Jul 22 09:44:17 dommy0 kernel: [ 360.121579] RBP: 0003 R08: 880222343c00 R09: 0101 Jul 22 09:44:17 dommy0 kernel: [ 360.121581] R10: R11: 0027 R12: 880222343c00 Jul 22 09:44:17 dommy0 kernel: [ 360.121582] R13: 0101 R14: R15: 0027 Jul 22 09:44:17 dommy0 kernel: [ 360.121584] FS: () GS:88022f26() knlGS: Jul 22 09:44:17
Re: reproducable panic eviction work queue
Op 7/21/2015 om 8:34 PM schreef Florian Westphal: Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags INET_FRAG_EVICTED test is not reliable. It would be great if you could confirm that this fixes the problem for you, we'll then make formal patch submission. Please apply this on kernel without previous test patches, wheter you use affected -stable or net-next kernel shouldn't matter since those are similar enough. Many thanks! diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a0..1722348 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -151,14 +151,13 @@ evict_again: } fq-flags |= INET_FRAG_EVICTED; - hlist_del(fq-list); - hlist_add_head(fq-list, expired); + hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); - hlist_for_each_entry_safe(fq, n, expired, list) + hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq-flags INET_FRAG_EVICTED)) - hlist_del(fq-list); + hlist_del(fq-list); spin_unlock(hb-chain_lock); } Hi Florian, Thanks for the patch! After implementing the patch in our setup we are no longer able to reproduct the kernel panic. Unfortunately the server load increases after 5/10 minutes and the logs are getting spammed with stacktraces. I included a snippet below. Do you have any insights on why this happens, and how we can resolve this? Thanks, Frank Jul 22 09:44:17 dommy0 kernel: [ 360.121516] Modules linked in: parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 pps_core mdio Jul 22 09:44:17 dommy0 kernel: [ 360.121560] CPU: 3 PID: 42 Comm: kworker/3:1 Tainted: GWL 3.18.18-transip-1.6 #1 Jul 22 09:44:17 dommy0 kernel: [ 360.121562] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013 Jul 22 09:44:17 dommy0 kernel: [ 360.121567] Workqueue: events inet_frag_worker Jul 22 09:44:17 dommy0 kernel: [ 360.121568] task: 880224574490 ti: 8802240a task.ti: 8802240a Jul 22 09:44:17 dommy0 kernel: [ 360.121570] RIP: 0010:[810c0872] [810c0872] del_timer_sync+0x42/0x60 Jul 22 09:44:17 dommy0 kernel: [ 360.121575] RSP: 0018:8802240a3d48 EFLAGS: 0246 Jul 22 09:44:17 dommy0 kernel: [ 360.121576] RAX: 0200 RBX: RCX: Jul 22 09:44:17 dommy0 kernel: [ 360.121578] RDX: 88022215ce40 RSI: 0030 RDI: 88022215cdf0 Jul 22 09:44:17 dommy0 kernel: [ 360.121579] RBP: 0003 R08: 880222343c00 R09: 0101 Jul 22 09:44:17 dommy0 kernel: [ 360.121581] R10: R11: 0027 R12: 880222343c00 Jul 22 09:44:17 dommy0 kernel: [ 360.121582] R13: 0101 R14: R15: 0027 Jul 22 09:44:17 dommy0 kernel: [ 360.121584] FS: () GS:88022f26() knlGS: Jul 22 09:44:17 dommy0 kernel: [ 360.121585] CS: 0010 DS: ES: CR0: 8005003b Jul 22 09:44:17 dommy0 kernel: [ 360.121587] CR2: 7fb1e9884095 CR3: 00021c084000 CR4: 07e0 Jul 22 09:44:17 dommy0 kernel: [ 360.121588] Stack: Jul 22 09:44:17 dommy0 kernel: [
Re: reproducable panic eviction work queue
On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index e1300b3dd597..56a3a5685f76 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a082e5f..aaae37949c14 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -138,27 +138,17 @@ evict_again: if (!inet_fragq_should_evict(fq)) continue; - if (!del_timer(fq-timer)) { - /* q expiring right now thus increment its refcount so -* it won't be freed under us and wait until the timer -* has finished executing then destroy it -*/ - atomic_inc(fq-refcnt); - spin_unlock(hb-chain_lock); - del_timer_sync(fq-timer); - inet_frag_put(fq, f); - goto evict_again; - } + if (!del_timer(fq-timer)) + continue; fq-flags |= INET_FRAG_EVICTED; - hlist_del(fq-list); - hlist_add_head(fq-list, expired); + hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); - hlist_for_each_entry_safe(fq, n, expired, list) + hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq-flags INET_FRAG_EVICTED)) - hlist_del(fq-list); + hlist_del(fq-list); spin_unlock(hb-chain_lock); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Hi Nikolay, Thanks for this patch. I'm no longer able to reproduce this panic on our test environment! The server has been handling 120k fragmented UDP packets per second for over 40 minutes So far everything is running stable without stacktraces in the logs. All other panics happened within 5-10 minutes. I will let this test environment run for another day or 2. I will inform you as soon as something happens! Thanks, Frank Op 7/22/2015 om 11:11 AM schreef Nikolay Aleksandrov: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index e1300b3dd597..56a3a5685f76 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a082e5f..aaae37949c14 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -138,27 +138,17 @@ evict_again: if (!inet_fragq_should_evict(fq)) continue; - if (!del_timer(fq-timer)) { - /* q expiring right now thus increment its refcount so -* it won't be freed under us and wait until the timer -* has finished executing then destroy it -*/ - atomic_inc(fq-refcnt); - spin_unlock(hb-chain_lock); - del_timer_sync(fq-timer); - inet_frag_put(fq, f); - goto evict_again; - } + if (!del_timer(fq-timer)) + continue; fq-flags |= INET_FRAG_EVICTED; - hlist_del(fq-list); - hlist_add_head(fq-list, expired); + hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); - hlist_for_each_entry_safe(fq, n, expired, list) + hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq-flags INET_FRAG_EVICTED)) - hlist_del(fq-list); + hlist_del(fq-list); spin_unlock(hb-chain_lock); } -- TransIP BV Schipholweg 11E 2316XB Leiden E: fschreu...@transip.nl I: https://www.transip.nl -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. Thanks Nik. I'm afraid this adds bug when netns is exiting. Currently, we wait until timer has finished, but after the change we might destroy percpu counter while a timer is still executing on another cpu. I pushed a patch series to https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02 It includes this patch with a small change -- deferral of the percpu counter subtraction until after queue has been free'd. Frank -- it would be great if you could test with the four patches in that series applied. I'll then add your tested-by Tag to all of them before submitting this. Thanks again for all your help in getting this fixed! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote: On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. Thanks Nik. I'm afraid this adds bug when netns is exiting. Currently, we wait until timer has finished, but after the change we might destroy percpu counter while a timer is still executing on another cpu. I pushed a patch series to https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02 It includes this patch with a small change -- deferral of the percpu counter subtraction until after queue has been free'd. Frank -- it would be great if you could test with the four patches in that series applied. I'll then add your tested-by Tag to all of them before submitting this. Thanks again for all your help in getting this fixed! Sure, I didn't think it through, just supplied it for the test. :-) Thanks for fixing it up! Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally cleared this way. I'll give them a try. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987 q=23168) Thanks, Frank Hi, It looks like it's happening because of the evict_again logic, I think we should also add Florian's first suggestion about simplifying it to the patch and just skip the entry if we can't delete its timer otherwise we can restart the eviction and see entries that already had their timer stopped by us and can keep restarting for a long time. Here's an updated patch that removes the evict_again logic. Thanks Nik. I'm afraid this adds bug when netns is exiting. Currently, we wait until timer has finished, but after the change we might destroy percpu counter while a timer is still executing on another cpu. I pushed a patch series to https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02 It includes this patch with a small change -- deferral of the percpu counter subtraction until after queue has been free'd. Frank -- it would be great if you could test with the four patches in that series applied. I'll then add your tested-by Tag to all of them before submitting this. Thanks again for all your help in getting this fixed! Sure, I didn't think it through, just supplied it for the test. :-) Thanks for fixing it up! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags INET_FRAG_EVICTED test is not reliable. It would be great if you could confirm that this fixes the problem for you, we'll then make formal patch submission. Please apply this on kernel without previous test patches, wheter you use affected -stable or net-next kernel shouldn't matter since those are similar enough. Many thanks! diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a0..1722348 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -151,14 +151,13 @@ evict_again: } fq-flags |= INET_FRAG_EVICTED; - hlist_del(fq-list); - hlist_add_head(fq-list, expired); + hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); - hlist_for_each_entry_safe(fq, n, expired, list) + hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq-flags INET_FRAG_EVICTED)) - hlist_del(fq-list); + hlist_del(fq-list); spin_unlock(hb-chain_lock); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 7/20/2015 04:30 PM Florian Westphal wrote: Frank Schreuder fschreu...@transip.nl wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - - Linux tree - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan Great, thank you! I'm able to reproduce this panic on the following kernel builds: - 3.18.7 - 3.18.18 - 3.18.18 + patch from Nikolay Aleksandrov - 4.1.0 Would you happen to have any more suggestions we can try? Yes, although I admit its clutching at straws. Problem is that I don't see how we can race with timer, but OTOH I don't see why this needs to play refcnt tricks if we can just skip the entry completely ... The other issue is parallel completion on other cpu, but don't see how we could trip there either. Do you always get this one crash backtrace from evictor wq? I'll set up a bigger test machine soon and will also try to reproduce this. Thanks for reporting! diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb) unsigned int evicted = 0; HLIST_HEAD(expired); -evict_again: spin_lock(hb-chain_lock); hlist_for_each_entry_safe(fq, n, hb-chain, list) { if (!inet_fragq_should_evict(fq)) continue; - if (!del_timer(fq-timer)) { - /* q expiring right now thus increment its refcount so -* it won't be freed under us and wait until the timer -* has finished executing then destroy it -*/ - atomic_inc(fq-refcnt); - spin_unlock(hb-chain_lock); - del_timer_sync(fq-timer); - inet_frag_put(fq, f); - goto evict_again; - } + if (!del_timer(fq-timer)) + continue; fq-flags |= INET_FRAG_EVICTED; hlist_del(fq-list); @@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f) int i; nf-low_thresh = 0; - local_bh_disable(); evict_again: + local_bh_disable(); seq = read_seqbegin(f-rnd_seqlock); for (i = 0; i INETFRAGS_HASHSZ ; i++) inet_evict_bucket(f, f-hash[i]); - if (read_seqretry(f-rnd_seqlock, seq)) - goto evict_again; - local_bh_enable(); + cond_resched(); + + if (read_seqretry(f-rnd_seqlock, seq) || + percpu_counter_sum(nf-mem)) + goto evict_again; percpu_counter_destroy(nf-mem); } @@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) hb = get_frag_bucket_locked(fq, f); if (!(fq-flags INET_FRAG_EVICTED)) hlist_del(fq-list); + + fq-flags |= INET_FRAG_COMPLETE; spin_unlock(hb-chain_lock); } @@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f) if (!(fq-flags INET_FRAG_COMPLETE)) { fq_unlink(fq, f); atomic_dec(fq-refcnt); - fq-flags |= INET_FRAG_COMPLETE; } } EXPORT_SYMBOL(inet_frag_kill); Thanks a lot for your time and the patch. Unfortunately we are still able to reproduce the panic on kernel 3.18.18 with this patch included. From all previous tests, the same backtrace occurs. If there is any way we can provide you with more debug information, please let me know. Thanks a lot, Frank -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - - Linux tree - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan Great, thank you! I'm able to reproduce this panic on the following kernel builds: - 3.18.7 - 3.18.18 - 3.18.18 + patch from Nikolay Aleksandrov - 4.1.0 Would you happen to have any more suggestions we can try? Thanks, Frank -- TransIP BV Schipholweg 11E 2316XB Leiden E: fschreu...@transip.nl I: https://www.transip.nl -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Frank Schreuder fschreu...@transip.nl wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - - Linux tree - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan Great, thank you! I'm able to reproduce this panic on the following kernel builds: - 3.18.7 - 3.18.18 - 3.18.18 + patch from Nikolay Aleksandrov - 4.1.0 Would you happen to have any more suggestions we can try? Yes, although I admit its clutching at straws. Problem is that I don't see how we can race with timer, but OTOH I don't see why this needs to play refcnt tricks if we can just skip the entry completely ... The other issue is parallel completion on other cpu, but don't see how we could trip there either. Do you always get this one crash backtrace from evictor wq? I'll set up a bigger test machine soon and will also try to reproduce this. Thanks for reporting! diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb) unsigned int evicted = 0; HLIST_HEAD(expired); -evict_again: spin_lock(hb-chain_lock); hlist_for_each_entry_safe(fq, n, hb-chain, list) { if (!inet_fragq_should_evict(fq)) continue; - if (!del_timer(fq-timer)) { - /* q expiring right now thus increment its refcount so -* it won't be freed under us and wait until the timer -* has finished executing then destroy it -*/ - atomic_inc(fq-refcnt); - spin_unlock(hb-chain_lock); - del_timer_sync(fq-timer); - inet_frag_put(fq, f); - goto evict_again; - } + if (!del_timer(fq-timer)) + continue; fq-flags |= INET_FRAG_EVICTED; hlist_del(fq-list); @@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f) int i; nf-low_thresh = 0; - local_bh_disable(); evict_again: + local_bh_disable(); seq = read_seqbegin(f-rnd_seqlock); for (i = 0; i INETFRAGS_HASHSZ ; i++) inet_evict_bucket(f, f-hash[i]); - if (read_seqretry(f-rnd_seqlock, seq)) - goto evict_again; - local_bh_enable(); + cond_resched(); + + if (read_seqretry(f-rnd_seqlock, seq) || + percpu_counter_sum(nf-mem)) + goto evict_again; percpu_counter_destroy(nf-mem); } @@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) hb = get_frag_bucket_locked(fq, f); if (!(fq-flags INET_FRAG_EVICTED)) hlist_del(fq-list); + + fq-flags |= INET_FRAG_COMPLETE; spin_unlock(hb-chain_lock); } @@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f) if (!(fq-flags INET_FRAG_COMPLETE)) { fq_unlink(fq, f); atomic_dec(fq-refcnt); - fq-flags |= INET_FRAG_COMPLETE; } } EXPORT_SYMBOL(inet_frag_kill); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/20/2015 02:47 PM, Frank Schreuder wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - - Linux tree - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan Great, thank you! I'm able to reproduce this panic on the following kernel builds: - 3.18.7 - 3.18.18 - 3.18.18 + patch from Nikolay Aleksandrov - 4.1.0 Would you happen to have any more suggestions we can try? Thanks, Frank Unfortunately I was wrong about my theory because I mixed qp and qp_in, the new frag doesn't make the chainlist if that codepath is hit so it couldn't mix the flags. I'm still trying (unsuccessfully) to reproduce this, I've tried with up to 4 cores and 4 different pinned irqs but no luck so far. Anyway, I'll keep looking into this and will let you know if I get anywhere. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs in kernels = 3.17 and is a result of the following change: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 The race/panic we see seems to be the same as, or similar to: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 We can confirm that this is directly exposed by the IRQ pinning since disabling this stops us from being able to reproduce this case :) How te reproduce: in our test-setup we have 4 machines generating UDP packets which are send to the vulnerable host. These all have a MTU of 100 (for test purposes) and send UDP packets of a size of 256 bytes. Within half an hour you will see the following panic: crash bt PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: kworker/9:0 #0 [885f3da03b60] machine_kexec at 8104a1f7 #1 [885f3da03bb0] crash_kexec at 810db187 #2 [885f3da03c80] oops_end at 81015140 #3 [885f3da03ca0] general_protection at 814f6c88 [exception RIP: inet_evict_bucket+281] RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 RBP: 0002 R8: 0286 R9: 88302f401640 R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 R13: 885f3da03d70 R14: R15: 881d6efe1a00 ORIG_RAX: CS: 0010 SS: 0018 #4 [885f3da03db0] inet_frag_worker at 8148075a #5 [885f3da03e10] process_one_work at 8107be19 #6 [885f3da03e60] worker_thread at 8107c6e3 #7 [885f3da03ed0] kthread at 8108103e #8 [885f3da03f50] ret_from_fork at 814f4d7c We would love to receive your input on this matter. Thx in advance, - Johan Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 d70127e8a942364de8dd140fe73893efda363293 Also please send your mails in text format, not html, and CC netdev ( I did here)
Re: reproducable panic eviction work queue
On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs in kernels = 3.17 and is a result of the following change: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 The race/panic we see seems to be the same as, or similar to: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 We can confirm that this is directly exposed by the IRQ pinning since disabling this stops us from being able to reproduce this case :) How te reproduce: in our test-setup we have 4 machines generating UDP packets which are send to the vulnerable host. These all have a MTU of 100 (for test purposes) and send UDP packets of a size of 256 bytes. Within half an hour you will see the following panic: crash bt PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: kworker/9:0 #0 [885f3da03b60] machine_kexec at 8104a1f7 #1 [885f3da03bb0] crash_kexec at 810db187 #2 [885f3da03c80] oops_end at 81015140 #3 [885f3da03ca0] general_protection at 814f6c88 [exception RIP: inet_evict_bucket+281] RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 RBP: 0002 R8: 0286 R9: 88302f401640 R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 R13: 885f3da03d70 R14: R15: 881d6efe1a00 ORIG_RAX: CS: 0010 SS: 0018 #4 [885f3da03db0] inet_frag_worker at 8148075a #5 [885f3da03e10] process_one_work at 8107be19 #6 [885f3da03e60] worker_thread at 8107c6e3 #7 [885f3da03ed0] kthread at 8108103e #8 [885f3da03f50] ret_from_fork at 814f4d7c We would love to receive your input on this matter. Thx in advance, - Johan Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 d70127e8a942364de8dd140fe73893efda363293 Also please send your mails in text format, not html, and CC netdev ( I did here) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/18/2015 12:02 PM, Nikolay Aleksandrov wrote: On 07/18/2015 11:01 AM, Johan Schuijt wrote: Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs in kernels = 3.17 and is a result of the following change: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 The race/panic we see seems to be the same as, or similar to: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 We can confirm that this is directly exposed by the IRQ pinning since disabling this stops us from being able to reproduce this case :) How te reproduce: in our test-setup we have 4 machines generating UDP packets which are send to the vulnerable host. These all have a MTU of 100 (for test purposes) and send UDP packets of a size of 256 bytes. Within half an hour you will see the following panic: crash bt PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: kworker/9:0 #0 [885f3da03b60] machine_kexec at 8104a1f7 #1 [885f3da03bb0] crash_kexec at 810db187 #2 [885f3da03c80] oops_end at 81015140 #3 [885f3da03ca0] general_protection at 814f6c88 [exception RIP: inet_evict_bucket+281] RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 RBP: 0002 R8: 0286 R9: 88302f401640 R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 R13: 885f3da03d70 R14: R15: 881d6efe1a00 ORIG_RAX: CS: 0010 SS: 0018 #4 [885f3da03db0] inet_frag_worker at 8148075a #5 [885f3da03e10] process_one_work at 8107be19 #6 [885f3da03e60] worker_thread at 8107c6e3 #7 [885f3da03ed0] kthread at 8108103e #8 [885f3da03f50] ret_from_fork at 814f4d7c We would love to receive your input on this matter. Thx in advance, - Johan Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 d70127e8a942364de8dd140fe73893efda363293 Also please send your mails in text format, not html, and CC netdev ( I did here) N�r��y���b�X��ǧv�^�){.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ)ߡ�a�����G���h��j:+v���w�٥ Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Also could you test with a clean current kernel from Linus' tree or Dave's -net ? These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? Flags seems to be modified while still linked and we may get the following (theoretical) situation: CPU 1 CPU 2 inet_frag_evictor (wait for chainlock) spin_lock(chainlock) unlock(chainlock) get lock, set EVICT flag, hlist_del etc. change flags again while qp is in the evict list So could you please try the following patch which sets the flag while holding the chain lock: diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a082e5f..2521ed9c1b52 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -354,8 +354,8 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf, hlist_for_each_entry(qp, hb-chain, list) { if (qp-net == nf f-match(qp, arg)) { atomic_inc(qp-refcnt); - spin_unlock(hb-chain_lock); qp_in-flags |= INET_FRAG_COMPLETE; + spin_unlock(hb-chain_lock); inet_frag_put(qp_in, f); return qp; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - - Linux tree - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan Great, thank you! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
With attachment this time, also not sure wether this is what you were referring to, so let me know if anything else needed! - Johan On 18 Jul 2015, at 17:28, Johan Schuijt-Li jo...@transip.nl wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - Linux tree - Dave’s -net tree I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - Johan [28732.285611] general protection fault: [#1] SMP [28732.285665] Modules linked in: vhost_net vhost macvtap macvlan act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack xt_physdev br_netfilter ebt_arp ebt_ip6 ebt_ip ebtable_nat tun rpcsec_gss_krb5 nfsv4 dns_resolver ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_filter ip6_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc bridge 8021q garp mrp stp llc bonding xt_CT xt_DSCP iptable_mangle ipt_REJECT nf_reject_ipv4 xt_pkttype xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner iptable_filter iptable_raw ip_tables x_tables loop joydev hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ttm crct10dif_pclmul crc32_pclmul [28732.286421] ghash_clmulni_intel aesni_intel drm_kms_helper drm i2c_algo_bit aes_x86_64 lrw gf128mul dcdbas ipmi_si i2c_core evdev glue_helper ablk_helper tpm_tis mei_me tpm ehci_pci ehci_hcd mei cryptd usbcore iTCO_wdt iTCO_vendor_support ipmi_msghandler lpc_ich mfd_core wmi pcspkr usb_common shpchp sb_edac edac_core acpi_power_meter acpi_pad button processor thermal_sys ext4 crc16 mbcache jbd2 dm_mod sg sd_mod ahci libahci bnx2x libata ptp pps_core mdio crc32c_generic megaraid_sas crc32c_intel scsi_mod libcrc32c [28732.286955] CPU: 9 PID: 56 Comm: kworker/9:0 Not tainted 3.18.7-transip-2.0 #1 [28732.287023] Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.2 02/03/2015 [28732.287096] Workqueue: events inet_frag_worker [28732.287139] task: 885f3d9fc210 ti: 885f3da0 task.ti: 885f3da0 [28732.287205] RIP: 0010:[81480699] [81480699] inet_evict_bucket+0x119/0x180 [28732.287278] RSP: 0018:885f3da03d58 EFLAGS: 00010292 [28732.287318] RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 [28732.287362] RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 [28732.287406] RBP: 0002 R08: 0286 R09: 88302f401640 [28732.287450] R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 [28732.287494] R13: 885f3da03d70 R14: R15: 881d6efe1a00 [28732.287538] FS: () GS:88602f28() knlGS: [28732.287606] CS: 0010 DS: ES: CR0: 80050033 [28732.287647] CR2: 00b11000 CR3: 004f05b24000 CR4: 000427e0 [28732.287691] Stack: [28732.287722] 81a905e0 81a905e8 814f4599 881d6efe1a58 [28732.287807] 0246 002e 81a8d8c0 81a918c0 [28732.287891] 02d3 0019 0240 8148075a [28732.287975] Call Trace: [28732.288013] [814f4599] ? _raw_spin_unlock_irqrestore+0x9/0x10 [28732.288056] [8148075a] ? inet_frag_worker+0x5a/0x250 [28732.288103] [8107be19] ? process_one_work+0x149/0x3f0 [28732.288146] [8107c6e3] ? worker_thread+0x63/0x490 [28732.288187] [8107c680] ? rescuer_thread+0x290/0x290 [28732.288229] [8108103e] ? kthread+0xce/0xf0 [28732.288269] [81080f70] ? kthread_create_on_node+0x180/0x180 [28732.288313] [814f4d7c] ? ret_from_fork+0x7c/0xb0 [28732.288353] [81080f70] ? kthread_create_on_node+0x180/0x180 [28732.288396] Code: 8b 04 24 66 83 40 08 01 48 8b 7c 24 18 48 85 ff 74 2a 48 83 ef 58 75 13 eb 22 0f 1f 84 00 00 00 00 00 48 83 eb 58 48 89 df 74 11 48 8b 5f 58 41 ff 94 24 70 40 00 00 48 85 db 75 e6 48 83 c4 28 [28732.288827] RIP [81480699] inet_evict_bucket+0x119/0x180 [28732.288873] RSP 885f3da03d58
Re: reproducable panic eviction work queue
Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - Linux tree - Dave’s -net tree I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - JohanN�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: reproducable panic eviction work queue
On 07/18/2015 11:01 AM, Johan Schuijt wrote: Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs in kernels = 3.17 and is a result of the following change: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 The race/panic we see seems to be the same as, or similar to: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 We can confirm that this is directly exposed by the IRQ pinning since disabling this stops us from being able to reproduce this case :) How te reproduce: in our test-setup we have 4 machines generating UDP packets which are send to the vulnerable host. These all have a MTU of 100 (for test purposes) and send UDP packets of a size of 256 bytes. Within half an hour you will see the following panic: crash bt PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: kworker/9:0 #0 [885f3da03b60] machine_kexec at 8104a1f7 #1 [885f3da03bb0] crash_kexec at 810db187 #2 [885f3da03c80] oops_end at 81015140 #3 [885f3da03ca0] general_protection at 814f6c88 [exception RIP: inet_evict_bucket+281] RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 RBP: 0002 R8: 0286 R9: 88302f401640 R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 R13: 885f3da03d70 R14: R15: 881d6efe1a00 ORIG_RAX: CS: 0010 SS: 0018 #4 [885f3da03db0] inet_frag_worker at 8148075a #5 [885f3da03e10] process_one_work at 8107be19 #6 [885f3da03e60] worker_thread at 8107c6e3 #7 [885f3da03ed0] kthread at 8108103e #8 [885f3da03f50] ret_from_fork at 814f4d7c We would love to receive your input on this matter. Thx in advance, - Johan Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 d70127e8a942364de8dd140fe73893efda363293 Also please send your mails in text format, not html, and CC netdev ( I did here) N�r��y���b�X��ǧv�^�){.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ)ߡ�a�����G���h��j:+v���w�٥ Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Also could you test with a clean current kernel from Linus' tree or Dave's -net ? These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html