Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder


Op 7/22/2015 om 4:14 PM schreef Nikolay Aleksandrov:

On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote:

On 07/22/2015 03:58 PM, Florian Westphal wrote:

Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:

On 07/22/2015 10:17 AM, Frank Schreuder wrote:

I got some additional information from syslog:

Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - 
CPU#3 stuck for 22s! [kworker/3:1:42]
Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected 
stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)

Thanks,
Frank



Hi,
It looks like it's happening because of the evict_again logic, I think we 
should also
add Florian's first suggestion about simplifying it to the patch and just skip 
the
entry if we can't delete its timer otherwise we can restart the eviction and see
entries that already had their timer stopped by us and can keep restarting for
a long time.
Here's an updated patch that removes the evict_again logic.

Thanks Nik.  I'm afraid this adds bug when netns is exiting.

Currently, we wait until timer has finished, but after the change
we might destroy percpu counter while a timer is still executing on
another cpu.

I pushed a patch series to
https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02

It includes this patch with a small change -- deferral of the percpu
counter subtraction until after queue has been free'd.

Frank -- it would be great if you could test with the four patches in
that series applied.

I'll then add your tested-by Tag to all of them before submitting this.

Thanks again for all your help in getting this fixed!


Sure, I didn't think it through, just supplied it for the test. :-)
Thanks for fixing it up!


Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally 
cleared
this way. I'll give them a try.




Hi,

I'm currently building a new kernel bases on 3.18.19 + patches.
One of the patches however fails to apply as we dont have a 
net/ieee802154/6lowpan/ directory.
Modifying the patch to use net/ieee802154/reassembly.c does work 
without problems.

Is this a due to the different kernel version or something else?

I'll come back to you as soon as I have my first test results.

Thanks,
Frank

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder

I got some additional information from syslog:

Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft 
lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched 
self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)


Thanks,
Frank


Op 7/22/2015 om 10:09 AM schreef Frank Schreuder:



Op 7/21/2015 om 8:34 PM schreef Florian Westphal:

Frank Schreuder fschreu...@transip.nl wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags  INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
   * @flags: fragment queue flags
   * @max_size: maximum received fragment size
   * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to 
low memory)

   */
  struct inet_frag_queue {
  spinlock_tlock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
  __u8flags;
  u16max_size;
  struct netns_frags*net;
+struct hlist_nodelist_evictor;
  };
#define INETFRAGS_HASHSZ1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
  }
fq-flags |= INET_FRAG_EVICTED;
-hlist_del(fq-list);
-hlist_add_head(fq-list, expired);
+hlist_add_head(fq-list_evictor, expired);
  ++evicted;
  }
spin_unlock(hb-chain_lock);
  -hlist_for_each_entry_safe(fq, n, expired, list)
+hlist_for_each_entry_safe(fq, n, expired, list_evictor)
  f-frag_expire((unsigned long) fq);
return evicted;
@@ -284,8 +283,7 @@ static inline void fq_unlink(struct 
inet_frag_queue *fq, struct inet_frags *f)

  struct inet_frag_bucket *hb;
hb = get_frag_bucket_locked(fq, f);
-if (!(fq-flags  INET_FRAG_EVICTED))
-hlist_del(fq-list);
+hlist_del(fq-list);
  spin_unlock(hb-chain_lock);
  }

Hi Florian,

Thanks for the patch!

After implementing the patch in our setup we are no longer able to 
reproduct the kernel panic.
Unfortunately the server load increases after 5/10 minutes and the 
logs are getting spammed with stacktraces.

I included a snippet below.

Do you have any insights on why this happens, and how we can resolve 
this?


Thanks,
Frank


Jul 22 09:44:17 dommy0 kernel: [  360.121516] Modules linked in: 
parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core 
evdev button acpi_power_meter processor thermal_sys ext4 crc16 mbcache 
jbd2 sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid 
crc32c_intel ata_piix mptsas scsi_transport_sas mptscsih libata 
mptbase ehci_pci scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe 
dca ptp bnx2 pps_core mdio
Jul 22 09:44:17 dommy0 kernel: [  360.121560] CPU: 3 PID: 42 Comm: 
kworker/3:1 Tainted: GWL 3.18.18-transip-1.6 #1
Jul 22 09:44:17 dommy0 kernel: [  360.121562] Hardware name: Dell Inc. 
PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
Jul 22 09:44:17 dommy0 kernel: [  360.121567] Workqueue: events 
inet_frag_worker
Jul 22 09:44:17 dommy0 kernel: [  360.121568] task: 880224574490 
ti: 8802240a task.ti: 8802240a
Jul 22 09:44:17 dommy0 kernel: [  360.121570] RIP: 
0010:[810c0872]  [810c0872] del_timer_sync+0x42/0x60
Jul 22 09:44:17 dommy0 kernel: [  360.121575] RSP: 
0018:8802240a3d48  EFLAGS: 0246
Jul 22 09:44:17 dommy0 kernel: [  360.121576] RAX: 0200 
RBX:  RCX: 
Jul 22 09:44:17 dommy0 kernel: [  360.121578] RDX: 88022215ce40 
RSI: 0030 RDI: 88022215cdf0
Jul 22 09:44:17 dommy0 kernel: [  360.121579] RBP: 0003 
R08: 880222343c00 R09: 0101
Jul 22 09:44:17 dommy0 kernel: [  360.121581] R10:  
R11: 0027 R12: 880222343c00
Jul 22 09:44:17 dommy0 kernel: [  360.121582] R13: 0101 
R14:  R15: 0027
Jul 22 09:44:17 dommy0 kernel: [  360.121584] FS: 
() GS:88022f26() knlGS:
Jul 22 09:44:17 

Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder



Op 7/21/2015 om 8:34 PM schreef Florian Westphal:

Frank Schreuder fschreu...@transip.nl wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags  INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
   * @flags: fragment queue flags
   * @max_size: maximum received fragment size
   * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
   */
  struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
  };
  
  #define INETFRAGS_HASHSZ	1024

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
}
  
  		fq-flags |= INET_FRAG_EVICTED;

-   hlist_del(fq-list);
-   hlist_add_head(fq-list, expired);
+   hlist_add_head(fq-list_evictor, expired);
++evicted;
}
  
  	spin_unlock(hb-chain_lock);
  
-	hlist_for_each_entry_safe(fq, n, expired, list)

+   hlist_for_each_entry_safe(fq, n, expired, list_evictor)
f-frag_expire((unsigned long) fq);
  
  	return evicted;

@@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
  
  	hb = get_frag_bucket_locked(fq, f);

-   if (!(fq-flags  INET_FRAG_EVICTED))
-   hlist_del(fq-list);
+   hlist_del(fq-list);
spin_unlock(hb-chain_lock);
  }
  

Hi Florian,

Thanks for the patch!

After implementing the patch in our setup we are no longer able to 
reproduct the kernel panic.
Unfortunately the server load increases after 5/10 minutes and the logs 
are getting spammed with stacktraces.

I included a snippet below.

Do you have any insights on why this happens, and how we can resolve this?

Thanks,
Frank


Jul 22 09:44:17 dommy0 kernel: [  360.121516] Modules linked in: 
parport_pc ppdev lp parport bnep rfcomm bluetooth rfkill uinput nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop 
coretemp kvm ttm drm_kms_helper iTCO_wdt drm psmouse ipmi_si 
iTCO_vendor_support tpm_tis tpm ipmi_msghandler i2c_algo_bit i2c_core 
i7core_edac dcdbas serio_raw pcspkr wmi lpc_ich edac_core mfd_core evdev 
button acpi_power_meter processor thermal_sys ext4 crc16 mbcache jbd2 
sd_mod sg sr_mod cdrom hid_generic usbhid ata_generic hid crc32c_intel 
ata_piix mptsas scsi_transport_sas mptscsih libata mptbase ehci_pci 
scsi_mod uhci_hcd ehci_hcd usbcore usb_common ixgbe dca ptp bnx2 
pps_core mdio
Jul 22 09:44:17 dommy0 kernel: [  360.121560] CPU: 3 PID: 42 Comm: 
kworker/3:1 Tainted: GWL 3.18.18-transip-1.6 #1
Jul 22 09:44:17 dommy0 kernel: [  360.121562] Hardware name: Dell Inc. 
PowerEdge R410/01V648, BIOS 1.12.0 07/30/2013
Jul 22 09:44:17 dommy0 kernel: [  360.121567] Workqueue: events 
inet_frag_worker
Jul 22 09:44:17 dommy0 kernel: [  360.121568] task: 880224574490 ti: 
8802240a task.ti: 8802240a
Jul 22 09:44:17 dommy0 kernel: [  360.121570] RIP: 
0010:[810c0872]  [810c0872] del_timer_sync+0x42/0x60
Jul 22 09:44:17 dommy0 kernel: [  360.121575] RSP: 
0018:8802240a3d48  EFLAGS: 0246
Jul 22 09:44:17 dommy0 kernel: [  360.121576] RAX: 0200 RBX: 
 RCX: 
Jul 22 09:44:17 dommy0 kernel: [  360.121578] RDX: 88022215ce40 RSI: 
0030 RDI: 88022215cdf0
Jul 22 09:44:17 dommy0 kernel: [  360.121579] RBP: 0003 R08: 
880222343c00 R09: 0101
Jul 22 09:44:17 dommy0 kernel: [  360.121581] R10:  R11: 
0027 R12: 880222343c00
Jul 22 09:44:17 dommy0 kernel: [  360.121582] R13: 0101 R14: 
 R15: 0027
Jul 22 09:44:17 dommy0 kernel: [  360.121584] FS: () 
GS:88022f26() knlGS:
Jul 22 09:44:17 dommy0 kernel: [  360.121585] CS:  0010 DS:  ES: 
 CR0: 8005003b
Jul 22 09:44:17 dommy0 kernel: [  360.121587] CR2: 7fb1e9884095 CR3: 
00021c084000 CR4: 07e0

Jul 22 09:44:17 dommy0 kernel: [  360.121588] Stack:
Jul 22 09:44:17 dommy0 kernel: [  

Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 10:17 AM, Frank Schreuder wrote:
 I got some additional information from syslog:
 
 Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup 
 - CPU#3 stuck for 22s! [kworker/3:1:42]
 Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected 
 stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
 
 Thanks,
 Frank
 
 

Hi,
It looks like it's happening because of the evict_again logic, I think we 
should also
add Florian's first suggestion about simplifying it to the patch and just skip 
the
entry if we can't delete its timer otherwise we can restart the eviction and see
entries that already had their timer stopped by us and can keep restarting for
a long time.
Here's an updated patch that removes the evict_again logic.


diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index e1300b3dd597..56a3a5685f76 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
 };
 
 #define INETFRAGS_HASHSZ   1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a082e5f..aaae37949c14 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -138,27 +138,17 @@ evict_again:
if (!inet_fragq_should_evict(fq))
continue;
 
-   if (!del_timer(fq-timer)) {
-   /* q expiring right now thus increment its refcount so
-* it won't be freed under us and wait until the timer
-* has finished executing then destroy it
-*/
-   atomic_inc(fq-refcnt);
-   spin_unlock(hb-chain_lock);
-   del_timer_sync(fq-timer);
-   inet_frag_put(fq, f);
-   goto evict_again;
-   }
+   if (!del_timer(fq-timer))
+   continue;
 
fq-flags |= INET_FRAG_EVICTED;
-   hlist_del(fq-list);
-   hlist_add_head(fq-list, expired);
+   hlist_add_head(fq-list_evictor, expired);
++evicted;
}
 
spin_unlock(hb-chain_lock);
 
-   hlist_for_each_entry_safe(fq, n, expired, list)
+   hlist_for_each_entry_safe(fq, n, expired, list_evictor)
f-frag_expire((unsigned long) fq);
 
return evicted;
@@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
 
hb = get_frag_bucket_locked(fq, f);
-   if (!(fq-flags  INET_FRAG_EVICTED))
-   hlist_del(fq-list);
+   hlist_del(fq-list);
spin_unlock(hb-chain_lock);
 }
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder

Hi Nikolay,

Thanks for this patch. I'm no longer able to reproduce this panic on our 
test environment!
The server has been handling 120k fragmented UDP packets per second for 
over 40 minutes
So far everything is running stable without stacktraces in the logs. All 
other panics happened within 5-10 minutes.


I will let this test environment run for another day or 2. I will inform 
you as soon as something happens!


Thanks,
Frank



Op 7/22/2015 om 11:11 AM schreef Nikolay Aleksandrov:

On 07/22/2015 10:17 AM, Frank Schreuder wrote:

I got some additional information from syslog:

Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - 
CPU#3 stuck for 22s! [kworker/3:1:42]
Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected 
stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)

Thanks,
Frank



Hi,
It looks like it's happening because of the evict_again logic, I think we 
should also
add Florian's first suggestion about simplifying it to the patch and just skip 
the
entry if we can't delete its timer otherwise we can restart the eviction and see
entries that already had their timer stopped by us and can keep restarting for
a long time.
Here's an updated patch that removes the evict_again logic.


diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index e1300b3dd597..56a3a5685f76 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
   * @flags: fragment queue flags
   * @max_size: maximum received fragment size
   * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
   */
  struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
  };
  
  #define INETFRAGS_HASHSZ	1024

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a082e5f..aaae37949c14 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -138,27 +138,17 @@ evict_again:
if (!inet_fragq_should_evict(fq))
continue;
  
-		if (!del_timer(fq-timer)) {

-   /* q expiring right now thus increment its refcount so
-* it won't be freed under us and wait until the timer
-* has finished executing then destroy it
-*/
-   atomic_inc(fq-refcnt);
-   spin_unlock(hb-chain_lock);
-   del_timer_sync(fq-timer);
-   inet_frag_put(fq, f);
-   goto evict_again;
-   }
+   if (!del_timer(fq-timer))
+   continue;
  
  		fq-flags |= INET_FRAG_EVICTED;

-   hlist_del(fq-list);
-   hlist_add_head(fq-list, expired);
+   hlist_add_head(fq-list_evictor, expired);
++evicted;
}
  
  	spin_unlock(hb-chain_lock);
  
-	hlist_for_each_entry_safe(fq, n, expired, list)

+   hlist_for_each_entry_safe(fq, n, expired, list_evictor)
f-frag_expire((unsigned long) fq);
  
  	return evicted;

@@ -284,8 +274,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
  
  	hb = get_frag_bucket_locked(fq, f);

-   if (!(fq-flags  INET_FRAG_EVICTED))
-   hlist_del(fq-list);
+   hlist_del(fq-list);
spin_unlock(hb-chain_lock);
  }
  





--

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreu...@transip.nl
I: https://www.transip.nl

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-22 Thread Florian Westphal
Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:
 On 07/22/2015 10:17 AM, Frank Schreuder wrote:
  I got some additional information from syslog:
  
  Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft 
  lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
  Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected 
  stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
  
  Thanks,
  Frank
  
  
 
 Hi,
 It looks like it's happening because of the evict_again logic, I think we 
 should also
 add Florian's first suggestion about simplifying it to the patch and just 
 skip the
 entry if we can't delete its timer otherwise we can restart the eviction and 
 see
 entries that already had their timer stopped by us and can keep restarting for
 a long time.
 Here's an updated patch that removes the evict_again logic.

Thanks Nik.  I'm afraid this adds bug when netns is exiting.

Currently, we wait until timer has finished, but after the change
we might destroy percpu counter while a timer is still executing on
another cpu.

I pushed a patch series to
https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02

It includes this patch with a small change -- deferral of the percpu
counter subtraction until after queue has been free'd.

Frank -- it would be great if you could test with the four patches in
that series applied.

I'll then add your tested-by Tag to all of them before submitting this.

Thanks again for all your help in getting this fixed!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote:
 On 07/22/2015 03:58 PM, Florian Westphal wrote:
 Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:
 On 07/22/2015 10:17 AM, Frank Schreuder wrote:
 I got some additional information from syslog:

 Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft 
 lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
 Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched 
 self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)

 Thanks,
 Frank



 Hi,
 It looks like it's happening because of the evict_again logic, I think we 
 should also
 add Florian's first suggestion about simplifying it to the patch and just 
 skip the
 entry if we can't delete its timer otherwise we can restart the eviction 
 and see
 entries that already had their timer stopped by us and can keep restarting 
 for
 a long time.
 Here's an updated patch that removes the evict_again logic.

 Thanks Nik.  I'm afraid this adds bug when netns is exiting.

 Currently, we wait until timer has finished, but after the change
 we might destroy percpu counter while a timer is still executing on
 another cpu.

 I pushed a patch series to
 https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02

 It includes this patch with a small change -- deferral of the percpu
 counter subtraction until after queue has been free'd.

 Frank -- it would be great if you could test with the four patches in
 that series applied.

 I'll then add your tested-by Tag to all of them before submitting this.

 Thanks again for all your help in getting this fixed!

 
 Sure, I didn't think it through, just supplied it for the test. :-)
 Thanks for fixing it up!
 

Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally 
cleared 
this way. I'll give them a try.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 03:58 PM, Florian Westphal wrote:
 Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:
 On 07/22/2015 10:17 AM, Frank Schreuder wrote:
 I got some additional information from syslog:

 Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft 
 lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
 Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected 
 stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)

 Thanks,
 Frank



 Hi,
 It looks like it's happening because of the evict_again logic, I think we 
 should also
 add Florian's first suggestion about simplifying it to the patch and just 
 skip the
 entry if we can't delete its timer otherwise we can restart the eviction and 
 see
 entries that already had their timer stopped by us and can keep restarting 
 for
 a long time.
 Here's an updated patch that removes the evict_again logic.
 
 Thanks Nik.  I'm afraid this adds bug when netns is exiting.
 
 Currently, we wait until timer has finished, but after the change
 we might destroy percpu counter while a timer is still executing on
 another cpu.
 
 I pushed a patch series to
 https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02
 
 It includes this patch with a small change -- deferral of the percpu
 counter subtraction until after queue has been free'd.
 
 Frank -- it would be great if you could test with the four patches in
 that series applied.
 
 I'll then add your tested-by Tag to all of them before submitting this.
 
 Thanks again for all your help in getting this fixed!
 

Sure, I didn't think it through, just supplied it for the test. :-)
Thanks for fixing it up!




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-21 Thread Florian Westphal
Frank Schreuder fschreu...@transip.nl wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags  INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
 };
 
 #define INETFRAGS_HASHSZ   1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
}
 
fq-flags |= INET_FRAG_EVICTED;
-   hlist_del(fq-list);
-   hlist_add_head(fq-list, expired);
+   hlist_add_head(fq-list_evictor, expired);
++evicted;
}
 
spin_unlock(hb-chain_lock);
 
-   hlist_for_each_entry_safe(fq, n, expired, list)
+   hlist_for_each_entry_safe(fq, n, expired, list_evictor)
f-frag_expire((unsigned long) fq);
 
return evicted;
@@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
 
hb = get_frag_bucket_locked(fq, f);
-   if (!(fq-flags  INET_FRAG_EVICTED))
-   hlist_del(fq-list);
+   hlist_del(fq-list);
spin_unlock(hb-chain_lock);
 }
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-21 Thread Frank Schreuder



On 7/20/2015 04:30 PM Florian Westphal wrote:

Frank Schreuder fschreu...@transip.nl wrote:

On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:

On 07/18/2015 05:28 PM, Johan Schuijt wrote:

Thx for your looking into this!


Thank you for the report, I will try to reproduce this locally
Could you please post the full crash log ?

Of course, please see attached file.


Also could you test
with a clean current kernel from Linus' tree or Dave's -net ?

Will do.


These are available at:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
respectively.

One last question how many IRQs do you pin i.e. how many cores
do you actively use for receive ?

This varies a bit across our systems, but we’ve managed to reproduce this with 
IRQs pinned on as many as 2,4,8 or 20 cores.

I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
scenario’s then:
- Your patch

-

- Linux tree
- Dave’s -net tree

Just one of these two would be enough. I couldn't reproduce it here but
I don't have as many machines to test right now and had to improvise with VMs. 
:-)


I’ll make sure to keep you posted on all the results then. We have a kernel 
dump of the panic, so if you need me to extract any data from there just let me 
know! (Some instructions might be needed)

- Johan


Great, thank you!


I'm able to reproduce this panic on the following kernel builds:
- 3.18.7
- 3.18.18
- 3.18.18 + patch from Nikolay Aleksandrov
- 4.1.0

Would you happen to have any more suggestions we can try?

Yes, although I admit its clutching at straws.

Problem is that I don't see how we can race with timer, but OTOH
I don't see why this needs to play refcnt tricks if we can just skip
the entry completely ...

The other issue is parallel completion on other cpu, but don't
see how we could trip there either.

Do you always get this one crash backtrace from evictor wq?

I'll set up a bigger test machine soon and will also try to reproduce
this.

Thanks for reporting!

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct 
inet_frag_bucket *hb)
unsigned int evicted = 0;
HLIST_HEAD(expired);
  
-evict_again:

spin_lock(hb-chain_lock);
  
  	hlist_for_each_entry_safe(fq, n, hb-chain, list) {

if (!inet_fragq_should_evict(fq))
continue;
  
-		if (!del_timer(fq-timer)) {

-   /* q expiring right now thus increment its refcount so
-* it won't be freed under us and wait until the timer
-* has finished executing then destroy it
-*/
-   atomic_inc(fq-refcnt);
-   spin_unlock(hb-chain_lock);
-   del_timer_sync(fq-timer);
-   inet_frag_put(fq, f);
-   goto evict_again;
-   }
+   if (!del_timer(fq-timer))
+   continue;
  
  		fq-flags |= INET_FRAG_EVICTED;

hlist_del(fq-list);
@@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct 
inet_frags *f)
int i;
  
  	nf-low_thresh = 0;

-   local_bh_disable();
  
  evict_again:

+   local_bh_disable();
seq = read_seqbegin(f-rnd_seqlock);
  
  	for (i = 0; i  INETFRAGS_HASHSZ ; i++)

inet_evict_bucket(f, f-hash[i]);
  
-	if (read_seqretry(f-rnd_seqlock, seq))

-   goto evict_again;
-
local_bh_enable();
+   cond_resched();
+
+   if (read_seqretry(f-rnd_seqlock, seq) ||
+   percpu_counter_sum(nf-mem))
+   goto evict_again;
  
  	percpu_counter_destroy(nf-mem);

  }
@@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
hb = get_frag_bucket_locked(fq, f);
if (!(fq-flags  INET_FRAG_EVICTED))
hlist_del(fq-list);
+
+   fq-flags |= INET_FRAG_COMPLETE;
spin_unlock(hb-chain_lock);
  }
  
@@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)

if (!(fq-flags  INET_FRAG_COMPLETE)) {
fq_unlink(fq, f);
atomic_dec(fq-refcnt);
-   fq-flags |= INET_FRAG_COMPLETE;
}
  }
  EXPORT_SYMBOL(inet_frag_kill);
Thanks a lot for your time and the patch. Unfortunately we are still 
able to reproduce the panic on kernel 3.18.18 with this patch included.
From all previous tests, the same backtrace occurs. If there is any way 
we can provide you with more debug information, please let me know.


Thanks a lot,
Frank

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-20 Thread Frank Schreuder


On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:

On 07/18/2015 05:28 PM, Johan Schuijt wrote:

Thx for your looking into this!


Thank you for the report, I will try to reproduce this locally
Could you please post the full crash log ?

Of course, please see attached file.


Also could you test
with a clean current kernel from Linus' tree or Dave's -net ?

Will do.


These are available at:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
respectively.

One last question how many IRQs do you pin i.e. how many cores
do you actively use for receive ?

This varies a bit across our systems, but we’ve managed to reproduce this with 
IRQs pinned on as many as 2,4,8 or 20 cores.

I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
scenario’s then:
- Your patch

-

- Linux tree
- Dave’s -net tree

Just one of these two would be enough. I couldn't reproduce it here but
I don't have as many machines to test right now and had to improvise with VMs. 
:-)


I’ll make sure to keep you posted on all the results then. We have a kernel 
dump of the panic, so if you need me to extract any data from there just let me 
know! (Some instructions might be needed)

- Johan


Great, thank you!


I'm able to reproduce this panic on the following kernel builds:
- 3.18.7
- 3.18.18
- 3.18.18 + patch from Nikolay Aleksandrov
- 4.1.0

Would you happen to have any more suggestions we can try?

Thanks,
Frank

--

TransIP BV

Schipholweg 11E
2316XB Leiden
E: fschreu...@transip.nl
I: https://www.transip.nl

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-20 Thread Florian Westphal
Frank Schreuder fschreu...@transip.nl wrote:
 
 On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
 On 07/18/2015 05:28 PM, Johan Schuijt wrote:
 Thx for your looking into this!
 
 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ?
 Of course, please see attached file.
 
 Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?
 Will do.
 
 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.
 
 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?
 This varies a bit across our systems, but we’ve managed to reproduce this 
 with IRQs pinned on as many as 2,4,8 or 20 cores.
 
 I won’t have access to our test-setup till Monday again, so I’ll be testing 
 3 scenario’s then:
 - Your patch
 -
 - Linux tree
 - Dave’s -net tree
 Just one of these two would be enough. I couldn't reproduce it here but
 I don't have as many machines to test right now and had to improvise with 
 VMs. :-)
 
 I’ll make sure to keep you posted on all the results then. We have a kernel 
 dump of the panic, so if you need me to extract any data from there just 
 let me know! (Some instructions might be needed)
 
 - Johan
 
 Great, thank you!
 
 I'm able to reproduce this panic on the following kernel builds:
 - 3.18.7
 - 3.18.18
 - 3.18.18 + patch from Nikolay Aleksandrov
 - 4.1.0
 
 Would you happen to have any more suggestions we can try?

Yes, although I admit its clutching at straws.

Problem is that I don't see how we can race with timer, but OTOH
I don't see why this needs to play refcnt tricks if we can just skip
the entry completely ...

The other issue is parallel completion on other cpu, but don't
see how we could trip there either.

Do you always get this one crash backtrace from evictor wq?

I'll set up a bigger test machine soon and will also try to reproduce
this.

Thanks for reporting!

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct 
inet_frag_bucket *hb)
unsigned int evicted = 0;
HLIST_HEAD(expired);
 
-evict_again:
spin_lock(hb-chain_lock);
 
hlist_for_each_entry_safe(fq, n, hb-chain, list) {
if (!inet_fragq_should_evict(fq))
continue;
 
-   if (!del_timer(fq-timer)) {
-   /* q expiring right now thus increment its refcount so
-* it won't be freed under us and wait until the timer
-* has finished executing then destroy it
-*/
-   atomic_inc(fq-refcnt);
-   spin_unlock(hb-chain_lock);
-   del_timer_sync(fq-timer);
-   inet_frag_put(fq, f);
-   goto evict_again;
-   }
+   if (!del_timer(fq-timer))
+   continue;
 
fq-flags |= INET_FRAG_EVICTED;
hlist_del(fq-list);
@@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct 
inet_frags *f)
int i;
 
nf-low_thresh = 0;
-   local_bh_disable();
 
 evict_again:
+   local_bh_disable();
seq = read_seqbegin(f-rnd_seqlock);
 
for (i = 0; i  INETFRAGS_HASHSZ ; i++)
inet_evict_bucket(f, f-hash[i]);
 
-   if (read_seqretry(f-rnd_seqlock, seq))
-   goto evict_again;
-
local_bh_enable();
+   cond_resched();
+
+   if (read_seqretry(f-rnd_seqlock, seq) ||
+   percpu_counter_sum(nf-mem))
+   goto evict_again;
 
percpu_counter_destroy(nf-mem);
 }
@@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
hb = get_frag_bucket_locked(fq, f);
if (!(fq-flags  INET_FRAG_EVICTED))
hlist_del(fq-list);
+
+   fq-flags |= INET_FRAG_COMPLETE;
spin_unlock(hb-chain_lock);
 }
 
@@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct 
inet_frags *f)
if (!(fq-flags  INET_FRAG_COMPLETE)) {
fq_unlink(fq, f);
atomic_dec(fq-refcnt);
-   fq-flags |= INET_FRAG_COMPLETE;
}
 }
 EXPORT_SYMBOL(inet_frag_kill);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-20 Thread Nikolay Aleksandrov
On 07/20/2015 02:47 PM, Frank Schreuder wrote:
 
 On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:
 On 07/18/2015 05:28 PM, Johan Schuijt wrote:
 Thx for your looking into this!

 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ?
 Of course, please see attached file.

 Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?
 Will do.

 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.

 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?
 This varies a bit across our systems, but we’ve managed to reproduce this 
 with IRQs pinned on as many as 2,4,8 or 20 cores.

 I won’t have access to our test-setup till Monday again, so I’ll be testing 
 3 scenario’s then:
 - Your patch
 -
 - Linux tree
 - Dave’s -net tree
 Just one of these two would be enough. I couldn't reproduce it here but
 I don't have as many machines to test right now and had to improvise with 
 VMs. :-)

 I’ll make sure to keep you posted on all the results then. We have a kernel 
 dump of the panic, so if you need me to extract any data from there just 
 let me know! (Some instructions might be needed)

 - Johan

 Great, thank you!

 I'm able to reproduce this panic on the following kernel builds:
 - 3.18.7
 - 3.18.18
 - 3.18.18 + patch from Nikolay Aleksandrov
 - 4.1.0
 
 Would you happen to have any more suggestions we can try?
 
 Thanks,
 Frank
 

Unfortunately I was wrong about my theory because I mixed qp and qp_in, the new 
frag
doesn't make the chainlist if that codepath is hit so it couldn't mix the flags.
I'm still trying (unsuccessfully) to reproduce this, I've tried with up to 4 
cores
and 4 different pinned irqs but no luck so far.
Anyway, I'll keep looking into this and will let you know if I get anywhere.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
Yes, we already found these and are included in our kernel, but even with these 
patches we still receive the panic.

- Johan


 On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote:
 
 On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote:
 Hey guys, 
 
 
 We’re currently running into a reproducible panic in the eviction work
 queue code when we pin al our eth* IRQ to different CPU cores (in
 order to scale our networking performance for our virtual servers).
 This only occurs in kernels = 3.17 and is a result of the following
 change:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
 
 
 The race/panic we see seems to be the same as, or similar to:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
 
 
 We can confirm that this is directly exposed by the IRQ pinning since
 disabling this stops us from being able to reproduce this case :)
 
 
 How te reproduce: in our test-setup we have 4 machines generating UDP
 packets which are send to the vulnerable host. These all have a MTU of
 100 (for test purposes) and send UDP packets of a size of 256 bytes.
 Within half an hour you will see the following panic:
 
 
 crash bt
 PID: 56 TASK: 885f3d9fc210  CPU: 9   COMMAND: kworker/9:0
 #0 [885f3da03b60] machine_kexec at 8104a1f7
 #1 [885f3da03bb0] crash_kexec at 810db187
 #2 [885f3da03c80] oops_end at 81015140
 #3 [885f3da03ca0] general_protection at 814f6c88
[exception RIP: inet_evict_bucket+281]
RIP: 81480699  RSP: 885f3da03d58  RFLAGS: 00010292
RAX: 885f3da03d08  RBX: dead001000a8  RCX:
 885f3da03d08
RDX: 0006  RSI: 885f3da03ce8  RDI:
 dead001000a8
RBP: 0002   R8: 0286   R9:
 88302f401640
R10: 8000  R11: 88602ec0c138  R12:
 81a8d8c0
R13: 885f3da03d70  R14:   R15:
 881d6efe1a00
ORIG_RAX:   CS: 0010  SS: 0018
 #4 [885f3da03db0] inet_frag_worker at 8148075a
 #5 [885f3da03e10] process_one_work at 8107be19
 #6 [885f3da03e60] worker_thread at 8107c6e3
 #7 [885f3da03ed0] kthread at 8108103e
 #8 [885f3da03f50] ret_from_fork at 814f4d7c
 
 
 We would love to receive your input on this matter.
 
 
 Thx in advance,
 
 
 - Johan
 
 Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 
 d70127e8a942364de8dd140fe73893efda363293
 
 Also please send your mails in text format, not html, and CC netdev ( I
 did here)
 
 
 
 
 



Re: reproducable panic eviction work queue

2015-07-18 Thread Eric Dumazet
On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote:
 Hey guys, 
 
 
 We’re currently running into a reproducible panic in the eviction work
 queue code when we pin al our eth* IRQ to different CPU cores (in
 order to scale our networking performance for our virtual servers).
 This only occurs in kernels = 3.17 and is a result of the following
 change:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
 
 
 The race/panic we see seems to be the same as, or similar to:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
 
 
 We can confirm that this is directly exposed by the IRQ pinning since
 disabling this stops us from being able to reproduce this case :)
 
 
 How te reproduce: in our test-setup we have 4 machines generating UDP
 packets which are send to the vulnerable host. These all have a MTU of
 100 (for test purposes) and send UDP packets of a size of 256 bytes.
 Within half an hour you will see the following panic:
 
 
 crash bt
 PID: 56 TASK: 885f3d9fc210  CPU: 9   COMMAND: kworker/9:0
  #0 [885f3da03b60] machine_kexec at 8104a1f7
  #1 [885f3da03bb0] crash_kexec at 810db187
  #2 [885f3da03c80] oops_end at 81015140
  #3 [885f3da03ca0] general_protection at 814f6c88
 [exception RIP: inet_evict_bucket+281]
 RIP: 81480699  RSP: 885f3da03d58  RFLAGS: 00010292
 RAX: 885f3da03d08  RBX: dead001000a8  RCX:
 885f3da03d08
 RDX: 0006  RSI: 885f3da03ce8  RDI:
 dead001000a8
 RBP: 0002   R8: 0286   R9:
 88302f401640
 R10: 8000  R11: 88602ec0c138  R12:
 81a8d8c0
 R13: 885f3da03d70  R14:   R15:
 881d6efe1a00
 ORIG_RAX:   CS: 0010  SS: 0018
  #4 [885f3da03db0] inet_frag_worker at 8148075a
  #5 [885f3da03e10] process_one_work at 8107be19
  #6 [885f3da03e60] worker_thread at 8107c6e3
  #7 [885f3da03ed0] kthread at 8108103e
  #8 [885f3da03f50] ret_from_fork at 814f4d7c
 
 
 We would love to receive your input on this matter.
 
 
 Thx in advance,
 
 
 - Johan

Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 
d70127e8a942364de8dd140fe73893efda363293

Also please send your mails in text format, not html, and CC netdev ( I
did here)

 
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 12:02 PM, Nikolay Aleksandrov wrote:
 On 07/18/2015 11:01 AM, Johan Schuijt wrote:
 Yes, we already found these and are included in our kernel, but even with 
 these patches we still receive the panic.

 - Johan


 On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote:

 On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote:
 Hey guys, 


 We’re currently running into a reproducible panic in the eviction work
 queue code when we pin al our eth* IRQ to different CPU cores (in
 order to scale our networking performance for our virtual servers).
 This only occurs in kernels = 3.17 and is a result of the following
 change:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24


 The race/panic we see seems to be the same as, or similar to:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29


 We can confirm that this is directly exposed by the IRQ pinning since
 disabling this stops us from being able to reproduce this case :)


 How te reproduce: in our test-setup we have 4 machines generating UDP
 packets which are send to the vulnerable host. These all have a MTU of
 100 (for test purposes) and send UDP packets of a size of 256 bytes.
 Within half an hour you will see the following panic:


 crash bt
 PID: 56 TASK: 885f3d9fc210  CPU: 9   COMMAND: kworker/9:0
 #0 [885f3da03b60] machine_kexec at 8104a1f7
 #1 [885f3da03bb0] crash_kexec at 810db187
 #2 [885f3da03c80] oops_end at 81015140
 #3 [885f3da03ca0] general_protection at 814f6c88
[exception RIP: inet_evict_bucket+281]
RIP: 81480699  RSP: 885f3da03d58  RFLAGS: 00010292
RAX: 885f3da03d08  RBX: dead001000a8  RCX:
 885f3da03d08
RDX: 0006  RSI: 885f3da03ce8  RDI:
 dead001000a8
RBP: 0002   R8: 0286   R9:
 88302f401640
R10: 8000  R11: 88602ec0c138  R12:
 81a8d8c0
R13: 885f3da03d70  R14:   R15:
 881d6efe1a00
ORIG_RAX:   CS: 0010  SS: 0018
 #4 [885f3da03db0] inet_frag_worker at 8148075a
 #5 [885f3da03e10] process_one_work at 8107be19
 #6 [885f3da03e60] worker_thread at 8107c6e3
 #7 [885f3da03ed0] kthread at 8108103e
 #8 [885f3da03f50] ret_from_fork at 814f4d7c


 We would love to receive your input on this matter.


 Thx in advance,


 - Johan

 Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 
 d70127e8a942364de8dd140fe73893efda363293

 Also please send your mails in text format, not html, and CC netdev ( I
 did here)






 N�r��y���b�X��ǧv�^�)޺{.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ)ߡ�a�����G���h��j:+v���w�٥

 
 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ? Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?
 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.
 
 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?
 

Flags seems to be modified while still linked and we may get the
following (theoretical) situation:
CPU 1   CPU 2
inet_frag_evictor (wait for chainlock)  spin_lock(chainlock)
unlock(chainlock)
get lock, set EVICT flag, hlist_del etc.
change flags again while
qp is in the evict list

So could you please try the following patch which sets the flag while
holding the chain lock:


diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a082e5f..2521ed9c1b52 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -354,8 +354,8 @@ static struct inet_frag_queue *inet_frag_intern(struct 
netns_frags *nf,
hlist_for_each_entry(qp, hb-chain, list) {
if (qp-net == nf  f-match(qp, arg)) {
atomic_inc(qp-refcnt);
-   spin_unlock(hb-chain_lock);
qp_in-flags |= INET_FRAG_COMPLETE;
+   spin_unlock(hb-chain_lock);
inet_frag_put(qp_in, f);
return qp;
}
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 05:28 PM, Johan Schuijt wrote:
 Thx for your looking into this!
 

 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ?
 
 Of course, please see attached file.
 
 Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?
 
 Will do.
 
 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.

 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?
 
 This varies a bit across our systems, but we’ve managed to reproduce this 
 with IRQs pinned on as many as 2,4,8 or 20 cores.
 
 I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
 scenario’s then:
 - Your patch
-
 - Linux tree
 - Dave’s -net tree
Just one of these two would be enough. I couldn't reproduce it here but
I don't have as many machines to test right now and had to improvise with VMs. 
:-)

 
 I’ll make sure to keep you posted on all the results then. We have a kernel 
 dump of the panic, so if you need me to extract any data from there just let 
 me know! (Some instructions might be needed)
 
 - Johan
 
Great, thank you!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
With attachment this time, also not sure wether this is what you were referring 
to, so let me know if anything else needed!

- Johan


 On 18 Jul 2015, at 17:28, Johan Schuijt-Li jo...@transip.nl wrote:
 
 Thx for your looking into this!
 
 
 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ?
 
 Of course, please see attached file.
 
 Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?
 
 Will do.
 
 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.
 
 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?
 
 This varies a bit across our systems, but we’ve managed to reproduce this 
 with IRQs pinned on as many as 2,4,8 or 20 cores.
 
 I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
 scenario’s then:
 - Your patch
 - Linux tree
 - Dave’s -net tree
 
 I’ll make sure to keep you posted on all the results then. We have a kernel 
 dump of the panic, so if you need me to extract any data from there just let 
 me know! (Some instructions might be needed)
 
 - Johan

[28732.285611] general protection fault:  [#1] SMP 
[28732.285665] Modules linked in: vhost_net vhost macvtap macvlan act_police 
cls_u32 sch_ingress cls_fw sch_sfq sch_htb nf_conntrack_ipv6 nf_defrag_ipv6 
xt_conntrack xt_physdev br_netfilter ebt_arp ebt_ip6 ebt_ip ebtable_nat tun 
rpcsec_gss_krb5 nfsv4 dns_resolver ebtable_filter ebtables ip6table_raw 
ip6table_mangle ip6table_filter ip6_tables nfsd auth_rpcgss oid_registry 
nfs_acl nfs lockd grace fscache sunrpc bridge 8021q garp mrp stp llc bonding 
xt_CT xt_DSCP iptable_mangle ipt_REJECT nf_reject_ipv4 xt_pkttype xt_tcpudp 
nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack xt_owner iptable_filter iptable_raw 
ip_tables x_tables loop joydev hid_generic usbhid hid x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm ttm crct10dif_pclmul crc32_pclmul
[28732.286421]  ghash_clmulni_intel aesni_intel drm_kms_helper drm i2c_algo_bit 
aes_x86_64 lrw gf128mul dcdbas ipmi_si i2c_core evdev glue_helper ablk_helper 
tpm_tis mei_me tpm ehci_pci ehci_hcd mei cryptd usbcore iTCO_wdt 
iTCO_vendor_support ipmi_msghandler lpc_ich mfd_core wmi pcspkr usb_common 
shpchp sb_edac edac_core acpi_power_meter acpi_pad button processor thermal_sys 
ext4 crc16 mbcache jbd2 dm_mod sg sd_mod ahci libahci bnx2x libata ptp pps_core 
mdio crc32c_generic megaraid_sas crc32c_intel scsi_mod libcrc32c
[28732.286955] CPU: 9 PID: 56 Comm: kworker/9:0 Not tainted 3.18.7-transip-2.0 
#1
[28732.287023] Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.2 
02/03/2015
[28732.287096] Workqueue: events inet_frag_worker
[28732.287139] task: 885f3d9fc210 ti: 885f3da0 task.ti: 
885f3da0
[28732.287205] RIP: 0010:[81480699]  [81480699] 
inet_evict_bucket+0x119/0x180
[28732.287278] RSP: 0018:885f3da03d58  EFLAGS: 00010292
[28732.287318] RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08
[28732.287362] RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8
[28732.287406] RBP: 0002 R08: 0286 R09: 88302f401640
[28732.287450] R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0
[28732.287494] R13: 885f3da03d70 R14:  R15: 881d6efe1a00
[28732.287538] FS:  () GS:88602f28() 
knlGS:
[28732.287606] CS:  0010 DS:  ES:  CR0: 80050033
[28732.287647] CR2: 00b11000 CR3: 004f05b24000 CR4: 000427e0
[28732.287691] Stack:
[28732.287722]  81a905e0 81a905e8 814f4599 
881d6efe1a58
[28732.287807]  0246 002e 81a8d8c0 
81a918c0
[28732.287891]  02d3 0019 0240 
8148075a
[28732.287975] Call Trace:
[28732.288013]  [814f4599] ? _raw_spin_unlock_irqrestore+0x9/0x10
[28732.288056]  [8148075a] ? inet_frag_worker+0x5a/0x250
[28732.288103]  [8107be19] ? process_one_work+0x149/0x3f0
[28732.288146]  [8107c6e3] ? worker_thread+0x63/0x490
[28732.288187]  [8107c680] ? rescuer_thread+0x290/0x290
[28732.288229]  [8108103e] ? kthread+0xce/0xf0
[28732.288269]  [81080f70] ? kthread_create_on_node+0x180/0x180
[28732.288313]  [814f4d7c] ? ret_from_fork+0x7c/0xb0
[28732.288353]  [81080f70] ? kthread_create_on_node+0x180/0x180
[28732.288396] Code: 8b 04 24 66 83 40 08 01 48 8b 7c 24 18 48 85 ff 74 2a 48 
83 ef 58 75 13 eb 22 0f 1f 84 00 00 00 00 00 48 83 eb 58 48 89 df 74 11 48 8b 
5f 58 41 ff 94 24 70 40 00 00 48 85 db 75 e6 48 83 c4 28 
[28732.288827] RIP  [81480699] inet_evict_bucket+0x119/0x180
[28732.288873]  RSP 885f3da03d58


Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
Thx for your looking into this!

 
 Thank you for the report, I will try to reproduce this locally
 Could you please post the full crash log ?

Of course, please see attached file.

 Also could you test
 with a clean current kernel from Linus' tree or Dave's -net ?

Will do.

 These are available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
 respectively.
 
 One last question how many IRQs do you pin i.e. how many cores
 do you actively use for receive ?

This varies a bit across our systems, but we’ve managed to reproduce this with 
IRQs pinned on as many as 2,4,8 or 20 cores.

I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
scenario’s then:
- Your patch
- Linux tree
- Dave’s -net tree

I’ll make sure to keep you posted on all the results then. We have a kernel 
dump of the panic, so if you need me to extract any data from there just let me 
know! (Some instructions might be needed)

- 
JohanN�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 11:01 AM, Johan Schuijt wrote:
 Yes, we already found these and are included in our kernel, but even with 
 these patches we still receive the panic.
 
 - Johan
 
 
 On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote:

 On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote:
 Hey guys, 


 We’re currently running into a reproducible panic in the eviction work
 queue code when we pin al our eth* IRQ to different CPU cores (in
 order to scale our networking performance for our virtual servers).
 This only occurs in kernels = 3.17 and is a result of the following
 change:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24


 The race/panic we see seems to be the same as, or similar to:
 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.yid=65ba1f1ec0eff1c25933468e1d238201c0c2cb29


 We can confirm that this is directly exposed by the IRQ pinning since
 disabling this stops us from being able to reproduce this case :)


 How te reproduce: in our test-setup we have 4 machines generating UDP
 packets which are send to the vulnerable host. These all have a MTU of
 100 (for test purposes) and send UDP packets of a size of 256 bytes.
 Within half an hour you will see the following panic:


 crash bt
 PID: 56 TASK: 885f3d9fc210  CPU: 9   COMMAND: kworker/9:0
 #0 [885f3da03b60] machine_kexec at 8104a1f7
 #1 [885f3da03bb0] crash_kexec at 810db187
 #2 [885f3da03c80] oops_end at 81015140
 #3 [885f3da03ca0] general_protection at 814f6c88
[exception RIP: inet_evict_bucket+281]
RIP: 81480699  RSP: 885f3da03d58  RFLAGS: 00010292
RAX: 885f3da03d08  RBX: dead001000a8  RCX:
 885f3da03d08
RDX: 0006  RSI: 885f3da03ce8  RDI:
 dead001000a8
RBP: 0002   R8: 0286   R9:
 88302f401640
R10: 8000  R11: 88602ec0c138  R12:
 81a8d8c0
R13: 885f3da03d70  R14:   R15:
 881d6efe1a00
ORIG_RAX:   CS: 0010  SS: 0018
 #4 [885f3da03db0] inet_frag_worker at 8148075a
 #5 [885f3da03e10] process_one_work at 8107be19
 #6 [885f3da03e60] worker_thread at 8107c6e3
 #7 [885f3da03ed0] kthread at 8108103e
 #8 [885f3da03f50] ret_from_fork at 814f4d7c


 We would love to receive your input on this matter.


 Thx in advance,


 - Johan

 Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 
 d70127e8a942364de8dd140fe73893efda363293

 Also please send your mails in text format, not html, and CC netdev ( I
 did here)





 
 N�r��y���b�X��ǧv�^�)޺{.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ)ߡ�a�����G���h��j:+v���w�٥
 

Thank you for the report, I will try to reproduce this locally
Could you please post the full crash log ? Also could you test
with a clean current kernel from Linus' tree or Dave's -net ?
These are available at:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
respectively.

One last question how many IRQs do you pin i.e. how many cores
do you actively use for receive ?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html