Re: [E1000-devel] Memory Corruption with e1000
On 06/05/2013 08:34 PM, Peter LaDow wrote: On 6/5/13, Ronciak, John john.ronc...@intel.com wrote: So I have a couple of questions. Does this happen with a non-preemptive kernel? I understand that you probably need to use a preemptive kernel but for testing purposes it would be good to know. We don't always test with preemptive kernels. Hmmm... If you mean no RT patches, then yes. On a vanilla 3.0.80 kernel. What about the pre-emption behavior of the kernel? Namely Processor type and Features - Preemption Model. Are you using no preemption, or forced preemption? -PJ When doing the up/down transitions is there system under test? I mean sending and receiving packets? If it is what is the load like? Does changing the load make a difference? Does stopping the network traffic first make a difference in the outcome? Yes, the load makes a difference. On a silent network (or no link at all) this does not occur. Our network is quite busy. It isn't sending much (perhaps DHCP discovers and some IPv6 stuff). Thanks, Pete -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM - fixed
On 06.06.2013 02:02, Allan, Bruce W wrote: -Original Message- From: Allan, Bruce W [mailto:bruce.w.al...@intel.com] Sent: Monday, June 03, 2013 4:28 PM To: Hrvoje Habjanić; e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM - fixed -Original Message- From: Hrvoje Habjanić [mailto:hrvoje.habja...@zg.t-com.hr] Sent: Saturday, June 01, 2013 8:43 AM To: e1000-devel@lists.sourceforge.net Subject: Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM - fixed On 01.06.2013 17:06, Hrvoje Habjanić wrote: Patch is attached, and it is against 2.3.2 of e1000e driver. OK, now patch: --- e1000e-2.3.2/src/ich8lan.c2013-02-28 05:25:36.0 +0100 +++ e1000e-2.3.2.ok/src/ich8lan.c2013-06-01 12:24:45.391577666 +0200 @@ -2244,6 +2244,8 @@ if (ret_val) return ret_val; pm_phy_reg = ~HV_PM_CTRL_PLL_STOP_IN_K1_GIGA; +/* Disable K1 mode when in GIGA mode */ +pm_phy_reg |= 0x2000; ret_val = e1e_wphy(hw, HV_PM_CTRL, pm_phy_reg); if (ret_val) return ret_val; H. There have been a number of issues with K1 on some parts. I will check into whether or not this is already a known issue on 82579LM and/or currently under investigation and get back to you. Thanks, Bruce. Apparently, this is a known and documented issue. There is a BIOS workaround that has been communicated to the various board vendors - you should check with your board vendor if there is a BIOS update available for your system. Hi. I guess this is a known issue, because there is some kind of errata about this. BUT, as you did notice, my motherboard is Intel DQ67EP, and i do have latest bios installed. And, still, my network card is suffering from this bug. So, my board vendor does know about issue, but oddly, they are not able to fix this!?? In any case, i do believe that this will be fixed in bioses eventually, BUT until this happens, this fix is necessary. So, i would ask again that this fix is included in driver. It must not be enabled by default, but to be able to enable it via option is just fine. Regards, H. -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] [PATCH v9 net-next 2/7] net: add low latency socket poll
On 05/06/2013 18:59, Eric Dumazet wrote: On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote: On 05/06/2013 18:39, Eric Dumazet wrote: On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote: On 05/06/2013 18:21, Eric Dumazet wrote: It would also make sense to give end_time as a parameter, so that the polling() code could really give a end_time for the whole duration of poll(). (You then should test can_poll_ll(end_time) _before_ call to ndo_ll_poll()) how would you handle a nonblocking operation in that case? I guess if we have a socket option, then we don't need to handle none blocking any diffrent, since the user specified exactly how much time to waste polling. right? If the thread already spent 50us in the poll() system call, it for sure should not call any ndo_ll_poll(). This makes no more sense at this point. what about a non-blocking read from a socket? Right now we assume this means poll only once since the application will repeat as needed. maybe add a once parameter that will cause sk_poll_ll() to ignore end time and only try once? extern bool __sk_poll_ll(struct sock *sk, cycles_t end); static inline bool sk_poll_ll(struct sock *sk, bool nonblock) { return __sk_poll_ll(sk, nonblock, ll_end_time()); } In the poll() code, we should call ll_end_time() once, even if we poll 1000 fds. Right now we have three uses for sk_poll_ll 1. blocking read - In this case we loop until: a !skb_queue_empty(sk-sk_receive_queue) or b !can_poll_ll(end_time) 2. non-blocking read - only try once, ignoring end time. 3. poll/select - for each socket we only try once (nonblock==1), we loop in poll/select until we are lucky or run out of time. For 1 we want to loop inside sk_poll_ll() but for 3 we loop in poll/select. So it seems all we need is for sk_poll_ll() to not call ll_end_time() if nonblock is set. ( something like cycles_t end_time = nonblock ? 0 : ll_end_time(); ) Or we could move out looping in all cases to the calling function. Does this mean we should push out rcu_read_lock_bh() into the caller as well? -Eliezer -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Memory Corruption with e1000
On 6/6/13, Peter P Waskiewicz Jr peter.p.waskiewicz...@intel.com wrote: What about the pre-emption behavior of the kernel? Namely Processor type and Features - Preemption Model. Are you using no preemption, or forced preemption? It is PREEMPT_FULL. I'll turn it off and give it a spin. Thanks, Pete -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Memory Corruption with e1000
On Thu, Jun 6, 2013 at 12:30 AM, Peter P Waskiewicz Jr peter.p.waskiewicz...@intel.com wrote: What about the pre-emption behavior of the kernel? Namely Processor type and Features - Preemption Model. Are you using no preemption, or forced preemption? Ok. I've done testing. Yes, we were building with PREEMPT_FULL. I've done some further testing and can re-create the problem on vanilla, non-preempt kernels. See below. # uname -a Linux (none) 3.0.80-rt108 #2 Thu Jun 6 16:09:35 UTC 2013 ppc GNU/Linux And I still get the slab corruption leading up to the kernel panic: Slab corruption: size-2048 start=ee2b2070, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c0208514](skb_release_data+0xb4/0xc8) 020: 6b 6b ff ff ff ff ff ff 00 0d ed 47 d9 87 81 00 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 0d ed 47 040: d9 87 0a f1 0a ea 00 00 00 00 00 00 0a f1 0a ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 060: 00 00 09 81 d2 0f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=ee2b2888, len=2048 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Slab corruption: size-2048 start=ed401480, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c0208514](skb_release_data+0xb4/0xc8) 020: 6b 6b ff ff ff ff ff ff e0 db 55 e4 ce f9 08 00 030: 45 00 01 3e 3e 1a 00 00 80 11 ca c0 0a ca 0d 42 040: 0a ca 0d ff 00 8a 00 8a 01 2a a5 96 11 0e af 81 050: 0a ca 0d 42 00 8a 01 14 00 00 20 45 42 45 4f 45 060: 45 46 43 45 4c 45 50 45 44 45 49 45 4f 45 43 43 070: 41 43 41 43 41 43 41 43 41 41 41 00 20 46 44 45 Prev obj: start=ed400c68, len=2048 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Unable to handle kernel paging request for data at address 0x20454c45 Faulting instruction address: 0xc0062498 Oops: Kernel access of bad area, sig: 11 [#1] SEL35xx Platform Modules linked in: NIP: c0062498 LR: c02084d8 CTR: c000cbbc REGS: ee85bc60 TRAP: 0300 Not tainted (3.0.80-rt108) MSR: 9032 EE,ME,IR,DR CR: 24008248 XER: DAR: 20454c45, DSISR: 2000 TASK = ef3e5830[4616] 'ifconfig' THREAD: ee85a000 GPR00: ee85bd10 ef3e5830 20454c45 2d746baa 05f2 0002 GPR08: c03b14e4 ed7471a8 ee85bcd0 5c26 10087a48 bfe0e41c 10064ae4 GPR16: 10064bc0 bfe0e40c bfe0e3f4 0228 8914 c019a488 GPR24: c019a9cc ed70f4b0 005c ed70f340 ef063120 0001 ee62bd30 NIP [c0062498] put_page+0x0/0x34 LR [c02084d8] skb_release_data+0x78/0xc8 Call Trace: [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54 [ee85bd70] [c0198d40] e1000_close+0x30/0xb4 [ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0 [ee85bda0] [c02141a0] __dev_close+0x2c/0x4c [ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140 [ee85bde0] [c0212324] dev_change_flags+0x1c/0x60 [ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700 [ee85be60] [c026839c] inet_ioctl+0xc8/0xfc [ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0 [ee85be90] [c009145c] vfs_ioctl+0x2c/0x58 [ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698 [ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88 [ee85bf40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 Instruction dump: 419e0018 3c80c006 38630180 38842abc 38a0 4bfffe65 80010014 bbc10008 38210010 7c0803a6 4e800020 4b54 8003 7c691b78 700bc000 41a20008 Kernel panic - not syncing: Fatal exception Call Trace: [ee85bb90] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee85bbd0] [c001c3a8] panic+0xa8/0x1cc [ee85bc20] [c000b1f0] die+0x178/0x19c [ee85bc40] [c0011a44] bad_page_fault+0xe8/0xfc [ee85bc50] [c000eb14] handle_page_fault+0x7c/0x80 --- Exception: 300 at put_page+0x0/0x34 LR = skb_release_data+0x78/0xc8 [ee85bd10] [] (null) (unreliable) [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54 [ee85bd70] [c0198d40] e1000_close+0x30/0xb4 [ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0 [ee85bda0] [c02141a0] __dev_close+0x2c/0x4c [ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140 [ee85bde0] [c0212324] dev_change_flags+0x1c/0x60 [ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700 [ee85be60] [c026839c] inet_ioctl+0xc8/0xfc [ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0 [ee85be90] [c009145c] vfs_ioctl+0x2c/0x58 [ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698 [ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88 [ee85bf40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 And with a vanilla, no-preempt kernel: # uname -a Linux (none) 3.0.80 #5 Thu Jun 6 16:26:15 UTC 2013 ppc GNU/Linux slab error in
Re: [E1000-devel] Memory Corruption with e1000
On Thu, 6 Jun 2013 09:38:50 -0700 Peter LaDow pet...@gocougs.wsu.edu wrote: On Thu, Jun 6, 2013 at 12:30 AM, Peter P Waskiewicz Jr peter.p.waskiewicz...@intel.com wrote: What about the pre-emption behavior of the kernel? Namely Processor type and Features - Preemption Model. Are you using no preemption, or forced preemption? Ok. I've done testing. Yes, we were building with PREEMPT_FULL. I've done some further testing and can re-create the problem on vanilla, non-preempt kernels. See below. # uname -a Linux (none) 3.0.80-rt108 #2 Thu Jun 6 16:09:35 UTC 2013 ppc GNU/Linux And I still get the slab corruption leading up to the kernel panic: Slab corruption: size-2048 start=ee2b2070, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c0208514](skb_release_data+0xb4/0xc8) 020: 6b 6b ff ff ff ff ff ff 00 0d ed 47 d9 87 81 00 that is quite clearly a broadcast, seems to me maybe a vlan packet 0x8100 to maybe vlan 0xf2? so this means that the receive unit of the e1000 is not being stopped completely (or is restarted by something) but that the memory of the DMA buffer (the 2kB allocation) is being freed and then still DMA'd to. 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 0d ed 47 040: d9 87 0a f1 0a ea 00 00 00 00 00 00 0a f1 0a ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 060: 00 00 09 81 d2 0f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=ee2b2888, len=2048 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Slab corruption: size-2048 start=ed401480, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c0208514](skb_release_data+0xb4/0xc8) 020: 6b 6b ff ff ff ff ff ff e0 db 55 e4 ce f9 08 00 030: 45 00 01 3e 3e 1a 00 00 80 11 ca c0 0a ca 0d 42 same thing here, but this is an IP packet. this is clearly a network adapter putting frames into memory that has been freed. I will see if someone here can reproduce this issue, but it seems quite clear what is happening, we just need to figure out why. 040: 0a ca 0d ff 00 8a 00 8a 01 2a a5 96 11 0e af 81 050: 0a ca 0d 42 00 8a 01 14 00 00 20 45 42 45 4f 45 060: 45 46 43 45 4c 45 50 45 44 45 49 45 4f 45 43 43 070: 41 43 41 43 41 43 41 43 41 41 41 00 20 46 44 45 Prev obj: start=ed400c68, len=2048 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Unable to handle kernel paging request for data at address 0x20454c45 Faulting instruction address: 0xc0062498 Oops: Kernel access of bad area, sig: 11 [#1] SEL35xx Platform Modules linked in: NIP: c0062498 LR: c02084d8 CTR: c000cbbc REGS: ee85bc60 TRAP: 0300 Not tainted (3.0.80-rt108) MSR: 9032 EE,ME,IR,DR CR: 24008248 XER: DAR: 20454c45, DSISR: 2000 TASK = ef3e5830[4616] 'ifconfig' THREAD: ee85a000 GPR00: ee85bd10 ef3e5830 20454c45 2d746baa 05f2 0002 GPR08: c03b14e4 ed7471a8 ee85bcd0 5c26 10087a48 bfe0e41c 10064ae4 GPR16: 10064bc0 bfe0e40c bfe0e3f4 0228 8914 c019a488 GPR24: c019a9cc ed70f4b0 005c ed70f340 ef063120 0001 ee62bd30 NIP [c0062498] put_page+0x0/0x34 LR [c02084d8] skb_release_data+0x78/0xc8 Call Trace: [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54 [ee85bd70] [c0198d40] e1000_close+0x30/0xb4 [ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0 [ee85bda0] [c02141a0] __dev_close+0x2c/0x4c [ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140 [ee85bde0] [c0212324] dev_change_flags+0x1c/0x60 [ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700 [ee85be60] [c026839c] inet_ioctl+0xc8/0xfc [ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0 [ee85be90] [c009145c] vfs_ioctl+0x2c/0x58 [ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698 [ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88 [ee85bf40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 Instruction dump: 419e0018 3c80c006 38630180 38842abc 38a0 4bfffe65 80010014 bbc10008 38210010 7c0803a6 4e800020 4b54 8003 7c691b78 700bc000 41a20008 Kernel panic - not syncing: Fatal exception Call Trace: [ee85bb90] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee85bbd0] [c001c3a8] panic+0xa8/0x1cc [ee85bc20] [c000b1f0] die+0x178/0x19c [ee85bc40] [c0011a44] bad_page_fault+0xe8/0xfc [ee85bc50] [c000eb14] handle_page_fault+0x7c/0x80 --- Exception: 300 at put_page+0x0/0x34 LR = skb_release_data+0x78/0xc8 [ee85bd10] [] (null) (unreliable) [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
Re: [E1000-devel] Memory Corruption with e1000
On Thu, Jun 6, 2013 at 11:23 AM, Ronciak, John john.ronc...@intel.com wrote: I agree with Jesse but this driver has been in the field for a very long time with no reports like this coming to us. Can you send us the dmesg when this is happening? I want to see if there are messages from the driver like if the down is being delayed somehow. Or re-enabled. I stripped out the up/down messages. But yes, there are sometimes up messages. At the end is the complete dmesg output. I've tweaked the script to print whenever the interface is changed. It appears that the slab errors are when the interface comes down: Bringing eth2 up... e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX ADDRCONF(NETDEV_UP): eth2: link is not ready ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down... slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c] kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0] [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 ee2a97b8: redzone 1:0x300574f524b4752, redzone 2:0xd84156c5635688c0. slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c] kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0] [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 ee2a8fa0: redzone 1:0xd84156c5635688c0, redzone 2:0x534c4f545c42524f. Bringing eth2 up... e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX ADDRCONF(NETDEV_UP): eth2: link is not ready ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down... Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0061c64 Oops: Kernel access of bad area, sig: 11 [#1] SEL35xx Platform Modules linked in: NIP: c0061c64 LR: c02079a8 CTR: c000cbbc REGS: ee2c3c60 TRAP: 0300 Not tainted (3.0.80) MSR: 9032 EE,ME,IR,DR CR: 24008248 XER: DAR: , DSISR: 2000 TASK = ed56dba0[4730] 'ifconfig' THREAD: ee2c2000 GPR00: ee2c3d10 ed56dba0 2e6a2e2a 05f2 0002 GPR08: ef3d8da0 ee6a3428 0800 f04d 10087a48 bfd6bb1c 10064ae4 GPR16: 10064bc0 bfd6bb0c bfd6baf4 0228 8914 c0199aa4 GPR24: c0199fe8 ed70f4b0 0059 ed70f340 ef063120 0001 ee75e818 NIP [c0061c64] put_page+0x0/0x34 LR [c02079a8] skb_release_data+0x78/0xc8 Call Trace: [ee2c3d20] [c02075dc] __kfree_skb+0x18/0xbc [ee2c3d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4 [ee2c3d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54 [ee2c3d70] [c019835c] e1000_close+0x30/0xb4 [ee2c3d90] [c02118d8] __dev_close_many+0xa0/0xe0 [ee2c3da0] [c0213670] __dev_close+0x2c/0x4c [ee2c3dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee2c3de0] [c02117f4] dev_change_flags+0x1c/0x60 [ee2c3e00] [c02669b4] devinet_ioctl+0x2a4/0x700 [ee2c3e60] [c02677bc] inet_ioctl+0xc8/0xfc [ee2c3e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee2c3e90] [c0090a80] vfs_ioctl+0x2c/0x58 [ee2c3ea0] [c00911ec] do_vfs_ioctl+0x610/0x698 [ee2c3f10] [c00912cc] sys_ioctl+0x58/0x88 [ee2c3f40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 Instruction dump:
[E1000-devel] Cannot set parameters for igb
Here is the relevant information. Help understanding why this does not work according to the README file description for 3.2.10 will be greatly appreciated. The driver module was installed with the distribution, not built from the code on Sourceforge. Thank you. -- Don Smith == Kernel (RHEL 6): uname -r 2.6.32-279.11.1.el6.x86_64 Adapter (Intel I350-T2): ethtool -i p6p1 driver: igb version: 3.2.10-k firmware-version: 1.6-3 Modprobe error: rmmod igb modprobe igb InterruptThrottleRate=0 FATAL: Error inserting igb (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko): Unknown symbol in module, or unknown parameter (see dmesg) dmesg: igb: Unknown parameter `InterruptThrottleRate' -- F. Donelson Smith (Don) (919) 962-1884 Research Professor smit...@cs.unc.edu Department of Computer Science www.cs.unc.edu/~smithfd University of North Carolina at Chapel Hill -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Memory Corruption with e1000
OK so a couple of thing kind of stand out. What interface is the e1000 on? eth0? That's not being called out or you filtered it out from the dmesg. Early on eth2 is the e1000 interface but later it's one of the Gianfar interfaces. Can you clear this up for us? Also, it looks like you have a bonding configuration. What interfaces are being bonded? You also have a Gianfar NIC with 2 interfaces. Is this still happening when no bonding is configured? Does the problem occur when the Gianfar interfaces are down/inactive? I'm just trying to narrow things down a bit. I'd like this to be tried with just the e1000 driver being active to see if it's happening then. Can you send the entire dmesg? Is it too big to email? Cheers, John -Original Message- From: pla...@gmail.com [mailto:pla...@gmail.com] On Behalf Of Peter LaDow Sent: Thursday, June 06, 2013 12:40 PM To: Ronciak, John Cc: Brandeburg, Jesse; Waskiewicz Jr, Peter P; e1000- de...@lists.sourceforge.net Subject: Re: [E1000-devel] Memory Corruption with e1000 On Thu, Jun 6, 2013 at 11:23 AM, Ronciak, John john.ronc...@intel.com wrote: I agree with Jesse but this driver has been in the field for a very long time with no reports like this coming to us. Can you send us the dmesg when this is happening? I want to see if there are messages from the driver like if the down is being delayed somehow. Or re-enabled. I stripped out the up/down messages. But yes, there are sometimes up messages. At the end is the complete dmesg output. I've tweaked the script to print whenever the interface is changed. It appears that the slab errors are when the interface comes down: Bringing eth2 up... e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX ADDRCONF(NETDEV_UP): eth2: link is not ready ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down... slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c] kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0] [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 ee2a97b8: redzone 1:0x300574f524b4752, redzone 2:0xd84156c5635688c0. slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c] kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0] [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff35a3c LR = 0xff359a0 ee2a8fa0: redzone 1:0xd84156c5635688c0, redzone 2:0x534c4f545c42524f. Bringing eth2 up... e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX ADDRCONF(NETDEV_UP): eth2: link is not ready ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down... Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0061c64 Oops: Kernel access of bad area, sig: 11 [#1] SEL35xx Platform Modules linked in: NIP: c0061c64 LR: c02079a8 CTR: c000cbbc REGS: ee2c3c60 TRAP: 0300 Not tainted (3.0.80) MSR: 9032 EE,ME,IR,DR CR: 24008248 XER: DAR: , DSISR: 2000 TASK = ed56dba0[4730] 'ifconfig' THREAD: ee2c2000 GPR00: ee2c3d10 ed56dba0 2e6a2e2a
Re: [E1000-devel] Memory Corruption with e1000
On Thu, Jun 6, 2013 at 1:10 PM, Ronciak, John john.ronc...@intel.com wrote: OK so a couple of thing kind of stand out. What interface is the e1000 on? eth0? That's not being called out or you filtered it out from the dmesg. Early on eth2 is the e1000 interface but later it's one of the Gianfar interfaces. Can you clear this up for us? The interfaces do get renamed early in the boot process. We use ifrename to force the e1000 interface to eth2. The gianfar are on eth0 and eth1. Also, it looks like you have a bonding configuration. What interfaces are being bonded? You also have a Gianfar NIC with 2 interfaces. Is this still happening when no bonding is configured? Does the problem occur when the Gianfar interfaces are down/inactive? I'm just trying to narrow things down a bit. I'd like this to be tried with just the e1000 driver being active to see if it's happening then. Currently, there is no bonding configured at all. While we do allow bonding, there is currently no bonded interfaces. I tried the up/down loop with the gianfar devices, and I do not get the failure. They are connected to the same network, and no problem. I shutdown the gianfar adapters (eth0 and eth1) and re-ran the up/down loop. Still get the same panic. Can you send the entire dmesg? Is it too big to email? That was the entire dmesg output. Thanks, Pete -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Cannot set parameters for igb
Module parameters aren't well-liked in the kernel and they won't let them into the code. Sourceforge is a different matter. You can get the same setting using ethtool -C rx-usecs 0. On Thu, 6 Jun 2013, Don Smith wrote: Here is the relevant information. Help understanding why this does not work according to the README file description for 3.2.10 will be greatly appreciated. The driver module was installed with the distribution, not built from the code on Sourceforge. Thank you. -- Don Smith == Kernel (RHEL 6): uname -r 2.6.32-279.11.1.el6.x86_64 Adapter (Intel I350-T2): ethtool -i p6p1 driver: igb version: 3.2.10-k firmware-version: 1.6-3 Modprobe error: rmmod igb modprobe igb InterruptThrottleRate=0 FATAL: Error inserting igb (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko): Unknown symbol in module, or unknown parameter (see dmesg) dmesg: igb: Unknown parameter `InterruptThrottleRate' -- Hisashi T Fujinaka - ht...@twofifty.com BSEE(6/86) + BSChem(3/95) + BAEnglish(8/95) + MSCS(8/03) + $2.50 = latte -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Cannot set parameters for igb
-Original Message- From: Don Smith [mailto:smit...@cs.unc.edu] Sent: Thursday, June 06, 2013 12:37 PM To: e1000-de...@lists.sf.net Subject: [E1000-devel] Cannot set parameters for igb Here is the relevant information. Help understanding why this does not work according to the README file description for 3.2.10 will be greatly appreciated. The driver module was installed with the distribution, not built from the code on Sourceforge. Thank you. -- Don Smith == Kernel (RHEL 6): uname -r 2.6.32-279.11.1.el6.x86_64 Adapter (Intel I350-T2): ethtool -i p6p1 driver: igb version: 3.2.10-k firmware-version: 1.6-3 Modprobe error: rmmod igb modprobe igb InterruptThrottleRate=0 FATAL: Error inserting igb (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko): Unknown symbol in module, or unknown parameter (see dmesg) dmesg: igb: Unknown parameter `InterruptThrottleRate' -- F. Donelson Smith (Don) (919) 962-1884 Research Professor smit...@cs.unc.edu Department of Computer Science www.cs.unc.edu/~smithfd University of North Carolina at Chapel Hill Don, That would be because the drivers included in the RHEL kernel come from the Linux upstream driver, in which using modprobe parameters are frowned upon. For Linux upstream drivers and their derivatives, you need to use ethtool: ethtool -C ethx rx-usecs 0 You can see which modprobe parameters are supported by using 'modinfo igb'. I hope this helps, Jeff -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Memory Corruption with e1000
Hi Peter, We have some ideas and are working on a patch for you to try. Since we won't really be able to test it can you do that if we get it to you? Do you know how to patch a driver? Or should we send you the whole thing (a complete new driver like you would get off of our SF site)? Cheers, John -Original Message- From: pla...@gmail.com [mailto:pla...@gmail.com] On Behalf Of Peter LaDow Sent: Thursday, June 06, 2013 1:22 PM To: Ronciak, John Cc: Brandeburg, Jesse; Waskiewicz Jr, Peter P; e1000- de...@lists.sourceforge.net Subject: Re: [E1000-devel] Memory Corruption with e1000 On Thu, Jun 6, 2013 at 1:10 PM, Ronciak, John john.ronc...@intel.com wrote: OK so a couple of thing kind of stand out. What interface is the e1000 on? eth0? That's not being called out or you filtered it out from the dmesg. Early on eth2 is the e1000 interface but later it's one of the Gianfar interfaces. Can you clear this up for us? The interfaces do get renamed early in the boot process. We use ifrename to force the e1000 interface to eth2. The gianfar are on eth0 and eth1. Also, it looks like you have a bonding configuration. What interfaces are being bonded? You also have a Gianfar NIC with 2 interfaces. Is this still happening when no bonding is configured? Does the problem occur when the Gianfar interfaces are down/inactive? I'm just trying to narrow things down a bit. I'd like this to be tried with just the e1000 driver being active to see if it's happening then. Currently, there is no bonding configured at all. While we do allow bonding, there is currently no bonded interfaces. I tried the up/down loop with the gianfar devices, and I do not get the failure. They are connected to the same network, and no problem. I shutdown the gianfar adapters (eth0 and eth1) and re-ran the up/down loop. Still get the same panic. Can you send the entire dmesg? Is it too big to email? That was the entire dmesg output. Thanks, Pete -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired