Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Peter P Waskiewicz Jr
On 06/05/2013 08:34 PM, Peter LaDow wrote:
 On 6/5/13, Ronciak, John john.ronc...@intel.com wrote:
 So I have a couple of questions.  Does this happen with a non-preemptive
 kernel?  I understand that you probably need to use a preemptive kernel but
 for testing purposes it would be good to know.  We don't always test with
 preemptive kernels.
 Hmmm... If you mean no RT patches, then yes. On a vanilla 3.0.80 kernel.

What about the pre-emption behavior of the kernel?  Namely Processor 
type and Features - Preemption Model.  Are you using no preemption, or 
forced preemption?

-PJ


 When doing the up/down transitions is there system under test?  I mean
 sending and receiving packets?  If it is what is the load like?  Does
 changing the load make a difference?  Does stopping the network traffic
 first make a difference in the outcome?
 Yes, the load makes a difference. On a silent network (or no link at
 all) this does not occur. Our network is quite busy. It isn't sending
 much (perhaps DHCP discovers and some IPv6 stuff).

 Thanks,
 Pete

 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/e1000-devel
 To learn more about Intel#174; Ethernet, visit 
 http://communities.intel.com/community/wired


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM - fixed

2013-06-06 Thread Hrvoje Habjanić
On 06.06.2013 02:02, Allan, Bruce W wrote:
 -Original Message-
 From: Allan, Bruce W [mailto:bruce.w.al...@intel.com]
 Sent: Monday, June 03, 2013 4:28 PM
 To: Hrvoje Habjanić; e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM - fixed

 -Original Message-
 From: Hrvoje Habjanić [mailto:hrvoje.habja...@zg.t-com.hr]
 Sent: Saturday, June 01, 2013 8:43 AM
 To: e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] [PATCH] Packet drops/loss with 82579LM -
 fixed
 On 01.06.2013 17:06, Hrvoje Habjanić wrote:
  Patch is attached, and it is against 2.3.2 of
 e1000e driver.

 OK, now patch:

 --- e1000e-2.3.2/src/ich8lan.c2013-02-28 05:25:36.0 +0100
 +++ e1000e-2.3.2.ok/src/ich8lan.c2013-06-01 12:24:45.391577666 +0200
 @@ -2244,6 +2244,8 @@
  if (ret_val)
  return ret_val;
  pm_phy_reg = ~HV_PM_CTRL_PLL_STOP_IN_K1_GIGA;
 +/* Disable K1 mode when in GIGA mode */
 +pm_phy_reg |= 0x2000;
  ret_val = e1e_wphy(hw, HV_PM_CTRL, pm_phy_reg);
  if (ret_val)
  return ret_val;


 H.
 There have been a number of issues with K1 on some parts.  I will check into
 whether
 or not this is already a known issue on 82579LM and/or currently under
 investigation
 and get back to you.

 Thanks,
 Bruce.
 Apparently, this is a known and documented issue.  There is a BIOS workaround 
 that
 has been communicated to the various board vendors - you should check with 
 your
 board vendor if there is a BIOS update available for your system.

Hi.

I guess this is a known issue, because there is some kind of errata
about this.

BUT, as you did notice, my motherboard is Intel DQ67EP, and i do have
latest bios installed. And, still, my network card is suffering from
this bug.

So, my board vendor does know about issue, but oddly, they are not able
to fix this!??

In any case, i do believe that this will be fixed in bioses eventually,
BUT until this happens, this fix is necessary.

So, i would ask again that this fix is included in driver. It must not
be enabled by default, but to be able to enable it via option is just
fine.

Regards,

H.





--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH v9 net-next 2/7] net: add low latency socket poll

2013-06-06 Thread Eliezer Tamir
On 05/06/2013 18:59, Eric Dumazet wrote:
 On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote:
 On 05/06/2013 18:39, Eric Dumazet wrote:
 On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
 On 05/06/2013 18:21, Eric Dumazet wrote:

 It would also make sense to give end_time as a parameter, so that the
 polling() code could really give  a end_time for the whole duration of
 poll().

 (You then should test can_poll_ll(end_time) _before_ call to
 ndo_ll_poll())

 how would you handle a nonblocking operation in that case?
 I guess if we have a socket option, then we don't need to handle none
 blocking any diffrent, since the user specified exactly how much time to
 waste polling. right?

 If the thread already spent 50us in the poll() system call, it for sure
 should not call any ndo_ll_poll(). This makes no more sense at this
 point.

 what about a non-blocking read from a socket?
 Right now we assume this means poll only once since the application will
 repeat as needed.

 maybe add a once parameter that will cause sk_poll_ll() to ignore end
 time and only try once?

 extern bool __sk_poll_ll(struct sock *sk, cycles_t end);

 static inline bool sk_poll_ll(struct sock *sk, bool nonblock)
 {
   return __sk_poll_ll(sk, nonblock, ll_end_time());
 }

 In the poll() code, we should call ll_end_time() once, even if we poll
 1000 fds.

Right now we have three uses for sk_poll_ll

1. blocking read - In this case we loop until:
   a !skb_queue_empty(sk-sk_receive_queue)
or
   b !can_poll_ll(end_time)

2. non-blocking read - only try once, ignoring end time.

3. poll/select - for each socket we only try once (nonblock==1),
  we loop in poll/select until we are lucky or run out of time.

For 1 we want to loop inside sk_poll_ll() but for 3 we loop in poll/select.

So it seems all we need is for sk_poll_ll() to not call ll_end_time() if 
nonblock is set.

( something like cycles_t end_time = nonblock ? 0 : ll_end_time(); )

Or we could move out looping in all cases to the calling function.
Does this mean we should push out rcu_read_lock_bh() into the caller
  as well?

-Eliezer

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Peter LaDow
On 6/6/13, Peter P Waskiewicz Jr peter.p.waskiewicz...@intel.com wrote:
 What about the pre-emption behavior of the kernel?  Namely Processor
 type and Features - Preemption Model.  Are you using no preemption, or
 forced preemption?

It is PREEMPT_FULL. I'll turn it off and give it a spin.

Thanks,
Pete

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Peter LaDow
On Thu, Jun 6, 2013 at 12:30 AM, Peter P Waskiewicz Jr
peter.p.waskiewicz...@intel.com wrote:
 What about the pre-emption behavior of the kernel?  Namely Processor type
 and Features - Preemption Model.  Are you using no preemption, or forced
 preemption?

Ok.  I've done testing.  Yes, we were building with PREEMPT_FULL.
I've done some further testing and can re-create the problem on
vanilla, non-preempt kernels.  See below.

# uname -a
Linux (none) 3.0.80-rt108 #2 Thu Jun 6 16:09:35 UTC 2013 ppc GNU/Linux

And I still get the slab corruption leading up to the kernel panic:

Slab corruption: size-2048 start=ee2b2070, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [c0208514](skb_release_data+0xb4/0xc8)
020: 6b 6b ff ff ff ff ff ff 00 0d ed 47 d9 87 81 00
030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 0d ed 47
040: d9 87 0a f1 0a ea 00 00 00 00 00 00 0a f1 0a ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 09 81 d2 0f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=ee2b2888, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
Slab corruption: size-2048 start=ed401480, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [c0208514](skb_release_data+0xb4/0xc8)
020: 6b 6b ff ff ff ff ff ff e0 db 55 e4 ce f9 08 00
030: 45 00 01 3e 3e 1a 00 00 80 11 ca c0 0a ca 0d 42
040: 0a ca 0d ff 00 8a 00 8a 01 2a a5 96 11 0e af 81
050: 0a ca 0d 42 00 8a 01 14 00 00 20 45 42 45 4f 45
060: 45 46 43 45 4c 45 50 45 44 45 49 45 4f 45 43 43
070: 41 43 41 43 41 43 41 43 41 41 41 00 20 46 44 45
Prev obj: start=ed400c68, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
Unable to handle kernel paging request for data at address 0x20454c45
Faulting instruction address: 0xc0062498
Oops: Kernel access of bad area, sig: 11 [#1]
SEL35xx Platform
Modules linked in:
NIP: c0062498 LR: c02084d8 CTR: c000cbbc
REGS: ee85bc60 TRAP: 0300   Not tainted  (3.0.80-rt108)
MSR: 9032 EE,ME,IR,DR  CR: 24008248  XER: 
DAR: 20454c45, DSISR: 2000
TASK = ef3e5830[4616] 'ifconfig' THREAD: ee85a000
GPR00:  ee85bd10 ef3e5830 20454c45 2d746baa 05f2 0002 
GPR08: c03b14e4 ed7471a8 ee85bcd0 5c26  10087a48 bfe0e41c 10064ae4
GPR16: 10064bc0 bfe0e40c  bfe0e3f4 0228  8914 c019a488
GPR24: c019a9cc ed70f4b0 005c ed70f340 ef063120  0001 ee62bd30
NIP [c0062498] put_page+0x0/0x34
LR [c02084d8] skb_release_data+0x78/0xc8
Call Trace:
[ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
[ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
[ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
[ee85bd70] [c0198d40] e1000_close+0x30/0xb4
[ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0
[ee85bda0] [c02141a0] __dev_close+0x2c/0x4c
[ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140
[ee85bde0] [c0212324] dev_change_flags+0x1c/0x60
[ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700
[ee85be60] [c026839c] inet_ioctl+0xc8/0xfc
[ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0
[ee85be90] [c009145c] vfs_ioctl+0x2c/0x58
[ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698
[ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88
[ee85bf40] [c000e674] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff35a3c
LR = 0xff359a0
Instruction dump:
419e0018 3c80c006 38630180 38842abc 38a0 4bfffe65 80010014 bbc10008
38210010 7c0803a6 4e800020 4b54
8003 7c691b78 700bc000 41a20008
Kernel panic - not syncing: Fatal exception
Call Trace:
[ee85bb90] [c0007b80] show_stack+0x58/0x154 (unreliable)
[ee85bbd0] [c001c3a8] panic+0xa8/0x1cc
[ee85bc20] [c000b1f0] die+0x178/0x19c
[ee85bc40] [c0011a44] bad_page_fault+0xe8/0xfc
[ee85bc50] [c000eb14] handle_page_fault+0x7c/0x80
--- Exception: 300 at put_page+0x0/0x34
LR = skb_release_data+0x78/0xc8
[ee85bd10] []   (null) (unreliable)
[ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
[ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
[ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
[ee85bd70] [c0198d40] e1000_close+0x30/0xb4
[ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0
[ee85bda0] [c02141a0] __dev_close+0x2c/0x4c
[ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140
[ee85bde0] [c0212324] dev_change_flags+0x1c/0x60
[ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700
[ee85be60] [c026839c] inet_ioctl+0xc8/0xfc
[ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0
[ee85be90] [c009145c] vfs_ioctl+0x2c/0x58
[ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698
[ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88
[ee85bf40] [c000e674] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff35a3c
LR = 0xff359a0

And with a vanilla, no-preempt kernel:

# uname -a
Linux (none) 3.0.80 #5 Thu Jun 6 16:26:15 UTC 2013 ppc GNU/Linux

slab error in 

Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Jesse Brandeburg
On Thu, 6 Jun 2013 09:38:50 -0700
Peter LaDow pet...@gocougs.wsu.edu wrote:

 On Thu, Jun 6, 2013 at 12:30 AM, Peter P Waskiewicz Jr
 peter.p.waskiewicz...@intel.com wrote:
  What about the pre-emption behavior of the kernel?  Namely Processor type
  and Features - Preemption Model.  Are you using no preemption, or forced
  preemption?
 
 Ok.  I've done testing.  Yes, we were building with PREEMPT_FULL.
 I've done some further testing and can re-create the problem on
 vanilla, non-preempt kernels.  See below.
 
 # uname -a
 Linux (none) 3.0.80-rt108 #2 Thu Jun 6 16:09:35 UTC 2013 ppc GNU/Linux
 
 And I still get the slab corruption leading up to the kernel panic:
 
 Slab corruption: size-2048 start=ee2b2070, len=2048
 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
 Last user: [c0208514](skb_release_data+0xb4/0xc8)
 020: 6b 6b ff ff ff ff ff ff 00 0d ed 47 d9 87 81 00

that is quite clearly a broadcast, seems to me maybe a vlan packet
0x8100 to maybe vlan 0xf2?

so this means that the receive unit of the e1000 is not being stopped
completely (or is restarted by something) but that the memory of the DMA
buffer (the 2kB allocation) is being freed and then still DMA'd to.

 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 0d ed 47
 040: d9 87 0a f1 0a ea 00 00 00 00 00 00 0a f1 0a ea
 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 060: 00 00 09 81 d2 0f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
 Next obj: start=ee2b2888, len=2048
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 Slab corruption: size-2048 start=ed401480, len=2048
 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
 Last user: [c0208514](skb_release_data+0xb4/0xc8)
 020: 6b 6b ff ff ff ff ff ff e0 db 55 e4 ce f9 08 00
 030: 45 00 01 3e 3e 1a 00 00 80 11 ca c0 0a ca 0d 42

same thing here, but this is an IP packet.

this is clearly a network adapter putting frames into memory that has
been freed.

I will see if someone here can reproduce this issue, but it seems quite
clear what is happening, we just need to figure out why.


 040: 0a ca 0d ff 00 8a 00 8a 01 2a a5 96 11 0e af 81
 050: 0a ca 0d 42 00 8a 01 14 00 00 20 45 42 45 4f 45
 060: 45 46 43 45 4c 45 50 45 44 45 49 45 4f 45 43 43
 070: 41 43 41 43 41 43 41 43 41 41 41 00 20 46 44 45
 Prev obj: start=ed400c68, len=2048
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 Unable to handle kernel paging request for data at address 0x20454c45
 Faulting instruction address: 0xc0062498
 Oops: Kernel access of bad area, sig: 11 [#1]
 SEL35xx Platform
 Modules linked in:
 NIP: c0062498 LR: c02084d8 CTR: c000cbbc
 REGS: ee85bc60 TRAP: 0300   Not tainted  (3.0.80-rt108)
 MSR: 9032 EE,ME,IR,DR  CR: 24008248  XER: 
 DAR: 20454c45, DSISR: 2000
 TASK = ef3e5830[4616] 'ifconfig' THREAD: ee85a000
 GPR00:  ee85bd10 ef3e5830 20454c45 2d746baa 05f2 0002 
 GPR08: c03b14e4 ed7471a8 ee85bcd0 5c26  10087a48 bfe0e41c 10064ae4
 GPR16: 10064bc0 bfe0e40c  bfe0e3f4 0228  8914 c019a488
 GPR24: c019a9cc ed70f4b0 005c ed70f340 ef063120  0001 ee62bd30
 NIP [c0062498] put_page+0x0/0x34
 LR [c02084d8] skb_release_data+0x78/0xc8
 Call Trace:
 [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
 [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
 [ee85bd70] [c0198d40] e1000_close+0x30/0xb4
 [ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0
 [ee85bda0] [c02141a0] __dev_close+0x2c/0x4c
 [ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140
 [ee85bde0] [c0212324] dev_change_flags+0x1c/0x60
 [ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700
 [ee85be60] [c026839c] inet_ioctl+0xc8/0xfc
 [ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0
 [ee85be90] [c009145c] vfs_ioctl+0x2c/0x58
 [ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698
 [ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88
 [ee85bf40] [c000e674] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0
 Instruction dump:
 419e0018 3c80c006 38630180 38842abc 38a0 4bfffe65 80010014 bbc10008
 38210010 7c0803a6 4e800020 4b54
 8003 7c691b78 700bc000 41a20008
 Kernel panic - not syncing: Fatal exception
 Call Trace:
 [ee85bb90] [c0007b80] show_stack+0x58/0x154 (unreliable)
 [ee85bbd0] [c001c3a8] panic+0xa8/0x1cc
 [ee85bc20] [c000b1f0] die+0x178/0x19c
 [ee85bc40] [c0011a44] bad_page_fault+0xe8/0xfc
 [ee85bc50] [c000eb14] handle_page_fault+0x7c/0x80
 --- Exception: 300 at put_page+0x0/0x34
 LR = skb_release_data+0x78/0xc8
 [ee85bd10] []   (null) (unreliable)
 [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
 [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
 

Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Peter LaDow
On Thu, Jun 6, 2013 at 11:23 AM, Ronciak, John john.ronc...@intel.com wrote:
 I agree with Jesse but this driver has been in the field for a very long time 
 with no reports like this coming to us.  Can you send us the dmesg when this 
 is happening?  I want to see if there are messages from the driver like if 
 the down is being delayed somehow.  Or re-enabled.

I stripped out the up/down messages.  But yes, there are sometimes up
messages.  At the end is the complete dmesg output.  I've tweaked the
script to print whenever the interface is changed.  It appears that
the slab errors are when the interface comes down:

Bringing eth2 up...
e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
ADDRCONF(NETDEV_UP): eth2: link is not ready
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Tearing eth2 down...
slab error in verify_redzone_free(): cache `size-2048': memory outside
object was overwritten
Call Trace:
[ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable)
[ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c
[ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274
[ee275cf0] [c007c36c] kfree+0x90/0x10c
[ee275d10] [c02079e4] skb_release_data+0xb4/0xc8
[ee275d20] [c02075dc] __kfree_skb+0x18/0xbc
[ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4
[ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54
[ee275d70] [c019835c] e1000_close+0x30/0xb4
[ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0
[ee275da0] [c0213670] __dev_close+0x2c/0x4c
[ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140
[ee275de0] [c02117f4] dev_change_flags+0x1c/0x60
[ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700
[ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc
[ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0
[ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58
[ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698
[ee275f10] [c00912cc] sys_ioctl+0x58/0x88
[ee275f40] [c000e674] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff35a3c
LR = 0xff359a0
ee2a97b8: redzone 1:0x300574f524b4752, redzone 2:0xd84156c5635688c0.
slab error in verify_redzone_free(): cache `size-2048': memory outside
object was overwritten
Call Trace:
[ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable)
[ee275cb0] [c007bb0c] __slab_error+0x2c/0x3c
[ee275cc0] [c007c0d0] cache_free_debugcheck+0x184/0x274
[ee275cf0] [c007c36c] kfree+0x90/0x10c
[ee275d10] [c02079e4] skb_release_data+0xb4/0xc8
[ee275d20] [c02075dc] __kfree_skb+0x18/0xbc
[ee275d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4
[ee275d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54
[ee275d70] [c019835c] e1000_close+0x30/0xb4
[ee275d90] [c02118d8] __dev_close_many+0xa0/0xe0
[ee275da0] [c0213670] __dev_close+0x2c/0x4c
[ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140
[ee275de0] [c02117f4] dev_change_flags+0x1c/0x60
[ee275e00] [c02669b4] devinet_ioctl+0x2a4/0x700
[ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc
[ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0
[ee275e90] [c0090a80] vfs_ioctl+0x2c/0x58
[ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698
[ee275f10] [c00912cc] sys_ioctl+0x58/0x88
[ee275f40] [c000e674] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff35a3c
LR = 0xff359a0
ee2a8fa0: redzone 1:0xd84156c5635688c0, redzone 2:0x534c4f545c42524f.
Bringing eth2 up...
e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
ADDRCONF(NETDEV_UP): eth2: link is not ready
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Tearing eth2 down...
Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc0061c64
Oops: Kernel access of bad area, sig: 11 [#1]
SEL35xx Platform
Modules linked in:
NIP: c0061c64 LR: c02079a8 CTR: c000cbbc
REGS: ee2c3c60 TRAP: 0300   Not tainted  (3.0.80)
MSR: 9032 EE,ME,IR,DR  CR: 24008248  XER: 
DAR: , DSISR: 2000
TASK = ed56dba0[4730] 'ifconfig' THREAD: ee2c2000
GPR00:  ee2c3d10 ed56dba0  2e6a2e2a 05f2 0002 
GPR08: ef3d8da0 ee6a3428 0800 f04d  10087a48 bfd6bb1c 10064ae4
GPR16: 10064bc0 bfd6bb0c  bfd6baf4 0228  8914 c0199aa4
GPR24: c0199fe8 ed70f4b0 0059 ed70f340 ef063120  0001 ee75e818
NIP [c0061c64] put_page+0x0/0x34
LR [c02079a8] skb_release_data+0x78/0xc8
Call Trace:
[ee2c3d20] [c02075dc] __kfree_skb+0x18/0xbc
[ee2c3d30] [c0194d50] e1000_clean_rx_ring+0x10c/0x1a4
[ee2c3d60] [c0194e10] e1000_clean_all_rx_rings+0x28/0x54
[ee2c3d70] [c019835c] e1000_close+0x30/0xb4
[ee2c3d90] [c02118d8] __dev_close_many+0xa0/0xe0
[ee2c3da0] [c0213670] __dev_close+0x2c/0x4c
[ee2c3dc0] [c020ff28] __dev_change_flags+0xb8/0x140
[ee2c3de0] [c02117f4] dev_change_flags+0x1c/0x60
[ee2c3e00] [c02669b4] devinet_ioctl+0x2a4/0x700
[ee2c3e60] [c02677bc] inet_ioctl+0xc8/0xfc
[ee2c3e70] [c01ffba4] sock_ioctl+0x260/0x2a0
[ee2c3e90] [c0090a80] vfs_ioctl+0x2c/0x58
[ee2c3ea0] [c00911ec] do_vfs_ioctl+0x610/0x698
[ee2c3f10] [c00912cc] sys_ioctl+0x58/0x88
[ee2c3f40] [c000e674] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff35a3c
LR = 0xff359a0
Instruction dump:

[E1000-devel] Cannot set parameters for igb

2013-06-06 Thread Don Smith
Here is the relevant information.  Help understanding why this does not 
work according to the README file description for 3.2.10 will be greatly 
appreciated.  The driver module was installed with the distribution, not 
built from the code on Sourceforge.
Thank you.

  -- Don Smith
==
Kernel (RHEL 6):
uname -r
2.6.32-279.11.1.el6.x86_64

Adapter (Intel I350-T2):
ethtool -i p6p1
driver: igb
version: 3.2.10-k
firmware-version: 1.6-3

Modprobe error:
rmmod igb
modprobe igb InterruptThrottleRate=0
FATAL: Error inserting igb 
(/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko): 
Unknown symbol in module, or unknown parameter (see dmesg)

dmesg:
igb: Unknown parameter `InterruptThrottleRate'

-- 
F. Donelson Smith (Don)  (919) 962-1884
Research Professor   smit...@cs.unc.edu
Department of Computer Science   www.cs.unc.edu/~smithfd
University of North Carolina at Chapel Hill

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Ronciak, John
OK so a couple of thing kind of stand out.  What interface is the e1000 on?  
eth0? That's not being called out or you filtered it out from the dmesg.  Early 
on eth2 is the e1000 interface but later it's one of the Gianfar interfaces.  
Can you clear this up for us?

Also, it looks like you have a bonding configuration.  What interfaces are 
being bonded?  You also have a Gianfar NIC with 2 interfaces.  Is this still 
happening when no bonding is configured?  Does the problem occur when the 
Gianfar interfaces are down/inactive?  I'm just trying to narrow things down a 
bit.  I'd like this to be tried with just the e1000 driver being active to see 
if it's happening then.

Can you send the entire dmesg?  Is it too big to email?

Cheers,
John


 -Original Message-
 From: pla...@gmail.com [mailto:pla...@gmail.com] On Behalf Of Peter
 LaDow
 Sent: Thursday, June 06, 2013 12:40 PM
 To: Ronciak, John
 Cc: Brandeburg, Jesse; Waskiewicz Jr, Peter P; e1000-
 de...@lists.sourceforge.net
 Subject: Re: [E1000-devel] Memory Corruption with e1000
 
 On Thu, Jun 6, 2013 at 11:23 AM, Ronciak, John john.ronc...@intel.com
 wrote:
  I agree with Jesse but this driver has been in the field for a very
 long time with no reports like this coming to us.  Can you send us the
 dmesg when this is happening?  I want to see if there are messages from
 the driver like if the down is being delayed somehow.  Or re-enabled.
 
 I stripped out the up/down messages.  But yes, there are sometimes up
 messages.  At the end is the complete dmesg output.  I've tweaked the
 script to print whenever the interface is changed.  It appears that the
 slab errors are when the interface comes down:
 
 Bringing eth2 up...
 e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
 ADDRCONF(NETDEV_UP): eth2: link is not ready
 ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down...
 slab error in verify_redzone_free(): cache `size-2048': memory outside
 object was overwritten Call Trace:
 [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0]
 [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0]
 cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c]
 kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8
 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50]
 e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10]
 e1000_clean_all_rx_rings+0x28/0x54
 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8]
 __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c
 [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0]
 [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4]
 devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc
 [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80]
 vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698
 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674]
 ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0
 ee2a97b8: redzone 1:0x300574f524b4752, redzone 2:0xd84156c5635688c0.
 slab error in verify_redzone_free(): cache `size-2048': memory outside
 object was overwritten Call Trace:
 [ee275c70] [c0007b80] show_stack+0x58/0x154 (unreliable) [ee275cb0]
 [c007bb0c] __slab_error+0x2c/0x3c [ee275cc0] [c007c0d0]
 cache_free_debugcheck+0x184/0x274 [ee275cf0] [c007c36c]
 kfree+0x90/0x10c [ee275d10] [c02079e4] skb_release_data+0xb4/0xc8
 [ee275d20] [c02075dc] __kfree_skb+0x18/0xbc [ee275d30] [c0194d50]
 e1000_clean_rx_ring+0x10c/0x1a4 [ee275d60] [c0194e10]
 e1000_clean_all_rx_rings+0x28/0x54
 [ee275d70] [c019835c] e1000_close+0x30/0xb4 [ee275d90] [c02118d8]
 __dev_close_many+0xa0/0xe0 [ee275da0] [c0213670] __dev_close+0x2c/0x4c
 [ee275dc0] [c020ff28] __dev_change_flags+0xb8/0x140 [ee275de0]
 [c02117f4] dev_change_flags+0x1c/0x60 [ee275e00] [c02669b4]
 devinet_ioctl+0x2a4/0x700 [ee275e60] [c02677bc] inet_ioctl+0xc8/0xfc
 [ee275e70] [c01ffba4] sock_ioctl+0x260/0x2a0 [ee275e90] [c0090a80]
 vfs_ioctl+0x2c/0x58 [ee275ea0] [c00911ec] do_vfs_ioctl+0x610/0x698
 [ee275f10] [c00912cc] sys_ioctl+0x58/0x88 [ee275f40] [c000e674]
 ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0
 ee2a8fa0: redzone 1:0xd84156c5635688c0, redzone 2:0x534c4f545c42524f.
 Bringing eth2 up...
 e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
 ADDRCONF(NETDEV_UP): eth2: link is not ready
 ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Tearing eth2 down...
 Unable to handle kernel paging request for data at address 0x
 Faulting instruction address: 0xc0061c64
 Oops: Kernel access of bad area, sig: 11 [#1] SEL35xx Platform Modules
 linked in:
 NIP: c0061c64 LR: c02079a8 CTR: c000cbbc
 REGS: ee2c3c60 TRAP: 0300   Not tainted  (3.0.80)
 MSR: 9032 EE,ME,IR,DR  CR: 24008248  XER: 
 DAR: , DSISR: 2000
 TASK = ed56dba0[4730] 'ifconfig' THREAD: ee2c2000
 GPR00:  ee2c3d10 ed56dba0  2e6a2e2a 

Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Peter LaDow
On Thu, Jun 6, 2013 at 1:10 PM, Ronciak, John john.ronc...@intel.com wrote:
 OK so a couple of thing kind of stand out.  What interface is the e1000 on?  
 eth0? That's not being called out or you filtered it out from the dmesg.  
 Early on eth2 is the e1000 interface but later it's one of the Gianfar 
 interfaces.  Can you clear this up for us?

The interfaces do get renamed early in the boot process.  We use
ifrename to force the e1000 interface to eth2.  The gianfar are on
eth0 and eth1.

 Also, it looks like you have a bonding configuration.  What interfaces are 
 being bonded?  You also have a Gianfar NIC with 2 interfaces.  Is this still 
 happening when no bonding is configured?  Does the problem occur when the 
 Gianfar interfaces are down/inactive?  I'm just trying to narrow things down 
 a bit.  I'd like this to be tried with just the e1000 driver being active to 
 see if it's happening then.

Currently, there is no bonding configured at all.  While we do allow
bonding, there is currently no bonded interfaces.

I tried the up/down loop with the gianfar devices, and I do not get
the failure.  They are connected to the same network, and no problem.

I shutdown the gianfar adapters (eth0 and eth1) and re-ran the up/down
loop.  Still get the same panic.

 Can you send the entire dmesg?  Is it too big to email?

That was the entire dmesg output.

Thanks,
Pete

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Cannot set parameters for igb

2013-06-06 Thread Hisashi T Fujinaka
Module parameters aren't well-liked in the kernel and they won't let
them into the code. Sourceforge is a different matter. You can get the
same setting using ethtool -C rx-usecs 0.

On Thu, 6 Jun 2013, Don Smith wrote:

 Here is the relevant information.  Help understanding why this does not
 work according to the README file description for 3.2.10 will be greatly
 appreciated.  The driver module was installed with the distribution, not
 built from the code on Sourceforge.
 Thank you.

  -- Don Smith
 ==
 Kernel (RHEL 6):
 uname -r
 2.6.32-279.11.1.el6.x86_64

 Adapter (Intel I350-T2):
 ethtool -i p6p1
 driver: igb
 version: 3.2.10-k
 firmware-version: 1.6-3

 Modprobe error:
 rmmod igb
 modprobe igb InterruptThrottleRate=0
 FATAL: Error inserting igb
 (/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko):
 Unknown symbol in module, or unknown parameter (see dmesg)

 dmesg:
 igb: Unknown parameter `InterruptThrottleRate'



-- 
Hisashi T Fujinaka - ht...@twofifty.com
BSEE(6/86) + BSChem(3/95) + BAEnglish(8/95) + MSCS(8/03) + $2.50 = latte

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Cannot set parameters for igb

2013-06-06 Thread Pieper, Jeffrey E
-Original Message-
From: Don Smith [mailto:smit...@cs.unc.edu] 
Sent: Thursday, June 06, 2013 12:37 PM
To: e1000-de...@lists.sf.net
Subject: [E1000-devel] Cannot set parameters for igb

Here is the relevant information.  Help understanding why this does not 
work according to the README file description for 3.2.10 will be greatly 
appreciated.  The driver module was installed with the distribution, not 
built from the code on Sourceforge.
Thank you.

  -- Don Smith
==
Kernel (RHEL 6):
uname -r
2.6.32-279.11.1.el6.x86_64

Adapter (Intel I350-T2):
ethtool -i p6p1
driver: igb
version: 3.2.10-k
firmware-version: 1.6-3

Modprobe error:
rmmod igb
modprobe igb InterruptThrottleRate=0
FATAL: Error inserting igb 
(/lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/igb/igb.ko): 
Unknown symbol in module, or unknown parameter (see dmesg)

dmesg:
igb: Unknown parameter `InterruptThrottleRate'

-- 
F. Donelson Smith (Don)  (919) 962-1884
Research Professor   smit...@cs.unc.edu
Department of Computer Science   www.cs.unc.edu/~smithfd
University of North Carolina at Chapel Hill

Don,

That would be because the drivers included in the RHEL kernel come from the 
Linux upstream driver, in which using modprobe parameters are frowned upon. For 
Linux upstream drivers and their derivatives, you need to use ethtool:

ethtool -C ethx rx-usecs 0

You can see which modprobe parameters are supported by using 'modinfo igb'.

I hope this helps,

Jeff


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Ronciak, John
Hi Peter,

We have some ideas and are working on a patch for you to try.  Since we won't 
really be able to test it can you do that if we get it to you?  Do you know how 
to patch a driver?  Or should we send  you the whole thing (a complete new 
driver like you would get off of our SF site)?

Cheers,
John


 -Original Message-
 From: pla...@gmail.com [mailto:pla...@gmail.com] On Behalf Of Peter
 LaDow
 Sent: Thursday, June 06, 2013 1:22 PM
 To: Ronciak, John
 Cc: Brandeburg, Jesse; Waskiewicz Jr, Peter P; e1000-
 de...@lists.sourceforge.net
 Subject: Re: [E1000-devel] Memory Corruption with e1000
 
 On Thu, Jun 6, 2013 at 1:10 PM, Ronciak, John john.ronc...@intel.com
 wrote:
  OK so a couple of thing kind of stand out.  What interface is the
 e1000 on?  eth0? That's not being called out or you filtered it out
 from the dmesg.  Early on eth2 is the e1000 interface but later it's
 one of the Gianfar interfaces.  Can you clear this up for us?
 
 The interfaces do get renamed early in the boot process.  We use
 ifrename to force the e1000 interface to eth2.  The gianfar are on
 eth0 and eth1.
 
  Also, it looks like you have a bonding configuration.  What
 interfaces are being bonded?  You also have a Gianfar NIC with 2
 interfaces.  Is this still happening when no bonding is configured?
 Does the problem occur when the Gianfar interfaces are down/inactive?
 I'm just trying to narrow things down a bit.  I'd like this to be tried
 with just the e1000 driver being active to see if it's happening then.
 
 Currently, there is no bonding configured at all.  While we do allow
 bonding, there is currently no bonded interfaces.
 
 I tried the up/down loop with the gianfar devices, and I do not get the
 failure.  They are connected to the same network, and no problem.
 
 I shutdown the gianfar adapters (eth0 and eth1) and re-ran the up/down
 loop.  Still get the same panic.
 
  Can you send the entire dmesg?  Is it too big to email?
 
 That was the entire dmesg output.
 
 Thanks,
 Pete

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired