Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr

2016-11-09 Thread Cao jin

Thanks Corrina for your info.

I tested my patch, it works for me on kernel 4.9-rc4.
"surprise removal" maybe another issue to solve. This one is enough to 
solve my issue and other one's, could it be accept first?


Cao jin

On 11/09/2016 03:33 AM, Alexander Duyck wrote:

On Tue, Nov 8, 2016 at 10:37 AM, Corinna Vinschen  wrote:

On Nov  8 09:16, Hisashi T Fujinaka wrote:

On Tue, 8 Nov 2016, Corinna Vinschen wrote:

On Nov  8 15:06, Cao jin wrote:

When running as guest, under certain condition, it will oops as following.
writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
is NULL. While other register access won't oops kernel because they use
wr32/rd32 which have a defense against NULL pointer.
[...]


Incidentally we're just looking for a solution to that problem too.
Do three patches to fix the same problem at rougly the same time already
qualify as freak accident?

FTR, I attached my current patch, which I was planning to submit after
some external testing.

However, all three patches have one thing in common:  They workaround
a somewhat dubious resetting of the hardware address to NULL in case
reading from a register failed.

That makes me wonder if setting the hardware address to NULL in
rd32/igb_rd32 is really such a good idea.  It's performed in a function
which return value is *never* tested for validity in the calling
functions and leads to subsequent crashes since no tests for hw_addr ==
NULL are performed.

Maybe commit 22a8b2915 should be reconsidered?  Isn't there some more
graceful way to handle the "surprise removal"?


Answering this from my home account because, well, work is Outlook.

"Reconsidering" would be great. In fact, revert if if you'd like. I'm
uncertain that the surprise removal code actually works the way I
thought previously and I think I took a lot of it out of my local code.

Unfortuantely I don't have any equipment that I can use to reproduce
surprise removal any longer so that means I wouldn't be able to test
anything. I have to defer to you or Cao Jin.


I'm not too keen to rip out a PCIe NIC under power from my locale
desktop machine, but I think an actual surprise removal is not the
problem.

As described in my git log entry, the error condition in igb_rd32 can be
triggered during a suspend.  The HW has been put into a sleep state but
some register read requests are apparently not guarded against that
situation.  Reading a register in this state returns -1, thus a suspend
is erroneously triggering the "surprise removal" sequence.


The question I would have is what is reading the device when it is in
this state.  The watchdog and any other functions that would read the
device should be disabled.

One possibility could be a race between a call to igb_close and the
igb_suspend function.  We have seen some of those pop up recently on
ixgbe and it looks like igb has the same bug.  We should probably be
using the rtnl_lock to guarantee that netif_device_detach and the call
to __igb_close are completed before igb_close could possibly be called
by the network stack.


Here's a raw idea:

- Note that device is suspended in e1000_hw struct.  Don't trigger
   error sequence in igb_rd32 if so (...and return a 0 value???)


The thing is that a suspended device should not be accessed at all.
If we are accessing it while it is suspended then that is a bug.  If
you could throw a WARN_ON call in igb_rd32 to capture where this is
being triggered that might be useful.


- Otherwise assume it's actually a surprise removal.  In theory that
   should somehow trigger a device removal sequence, kind of like
   calling igb_remove, no?


Well a read of the MMIO region while suspended is more of a surprise
read since there shouldn't be anything going on.  We need to isolate
where that read is coming from and fix it.

Thanks.

- Alex


.








[PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr

2016-11-07 Thread Cao jin
When running as guest, under certain condition, it will oops as following.
writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
is NULL. While other register access won't oops kernel because they use
wr32/rd32 which have a defense against NULL pointer.

[  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
error received: id=0101
[  141.225523] igb :01:00.1: PCIe Bus Error:
severity=Uncorrected (Fatal), type=Unaccessible,
id=0101(Unregistered Agent ID)
[  141.299442] igb :01:00.1: broadcast error_detected message
[  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
detached
[  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
detached
[  143.465904] pcieport :00:1c.0: Root Port link has been reset
[  143.465994] igb :01:00.1: broadcast slot_reset message
[  143.466039] igb :01:00.0: enabling device ( -> 0002)
[  144.389078] igb :01:00.1: enabling device ( -> 0002)
[  145.312078] igb :01:00.1: broadcast resume message
[  145.322211] BUG: unable to handle kernel paging request at
3818
[  145.361275] IP: []
igb_configure_tx_ring+0x14d/0x280 [igb]
[  145.400048] PGD 0
[  145.438007] Oops: 0002 [#1] SMP

A similiar issue & solution could be found at:
http://patchwork.ozlabs.org/patch/689592/

Signed-off-by: Cao jin 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index edc9a6a..3f240ac 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3390,7 +3390,7 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
 tdba & 0xULL);
wr32(E1000_TDBAH(reg_idx), tdba >> 32);
 
-   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
+   ring->tail = adapter->io_addr + E1000_TDT(reg_idx);
wr32(E1000_TDH(reg_idx), 0);
writel(0, ring->tail);
 
@@ -3729,7 +3729,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 ring->count * sizeof(union e1000_adv_rx_desc));
 
/* initialize head and tail */
-   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
+   ring->tail = adapter->io_addr + E1000_RDT(reg_idx);
wr32(E1000_RDH(reg_idx), 0);
writel(0, ring->tail);
 
-- 
2.1.0





Re: [Intel-wired-lan] [PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Cao jin



On 11/08/2016 12:12 PM, Alexander Duyck wrote:



On Monday, November 7, 2016, Cao jin mailto:caoj.f...@cn.fujitsu.com>> wrote:






We removed head because it isn't really accessed very often, it is only
really used for when the ring is configured.  Tail is accessed every
time we add a descriptor to a ring.  The pointer chasing from ring to
netdev to adapter to hw is expensive.  That is one of the rasons why
we cache the pointer to the tail register.


I see. I can submit the patch as you suggested.



Signed-off-by: Cao jin 
---
   drivers/net/ethernet/intel/igb/igb.h  |  1 -
   drivers/net/ethernet/intel/igb/igb_main.c | 16
+---




hw->hw_addr could be alterred to NULL(in igb_rd32), this is why
writel oops the kernel, you give a fine solution.

But from the oops message, we can find, register reading returns all
F's, I also have a question want to consult: when igb device is
reset, would reading register(no matter config space or non-PCIe
configuration registers) during reset returns all F's? (I guess this
is the core of my issue)


An all F's value means the read failed.  The device is likely off of the
bus and the hw_addr may not have been repopulated after the reset.

You might want to check the mailing list as I thought someone had
submitted a patch recently for one of the drivers to repopulate hw_addr
after a reset.



I guess you are saying this one:
  http://patchwork.ozlabs.org/patch/689592/

Seems they have a similar issue with me.


--
Yours Sincerely,

Cao jin




Re: [Intel-wired-lan] [PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Cao jin



On 11/08/2016 02:49 AM, Alexander Duyck wrote:

On Mon, Nov 7, 2016 at 4:44 AM, Cao jin  wrote:

Under certain condition, I find guest will oops on writel() in
igb_configure_tx_ring(), because hw->hw_address is NULL. While other
register access won't oops kernel because they use wr32/rd32 which have
a defense against NULL pointer. The oops message are as following:

 [  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
 error received: id=0101
 [  141.225523] igb :01:00.1: PCIe Bus Error: severity=Uncorrected
 (Fatal), type=Unaccessible, id=0101(Unregistered Agent ID)
 [  141.299442] igb :01:00.1: broadcast error_detected message
 [  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
 detached
 [  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
 detached
 [  143.465904] pcieport :00:1c.0: Root Port link has been reset
 [  143.465994] igb :01:00.1: broadcast slot_reset message
 [  143.466039] igb :01:00.0: enabling device ( -> 0002)
 [  144.389078] igb :01:00.1: enabling device ( -> 0002)
 [  145.312078] igb :01:00.1: broadcast resume message
 [  145.322211] BUG: unable to handle kernel paging request at
 3818
 [  145.361275] IP: [] igb_configure_tx_ring+0x14d/0x280 
[igb]
 [  145.438007] Oops: 0002 [#1] SMP

On the other hand, commit 238ac817 does some optimization which
dropped the field "head". So I think it is time to drop "tail" as well.


There is a bug here, but removing tail isn't the fix.



Yes, totally agree with you. The oops issue just drive me find that 
"tail" may should be dropped as "head", at least won't oops kernel when 
this driver's bug come out.


Couldn't we remove "tail"? Removed "head" while left "tail" untouched 
seems weird, and all the other register's access go via rd32/wr32, 
except "tail", it also seems weird, isn't it?



Signed-off-by: Cao jin 
---
  drivers/net/ethernet/intel/igb/igb.h  |  1 -
  drivers/net/ethernet/intel/igb/igb_main.c | 16 +---
  2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index d11093d..0df06bc 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -247,7 +247,6 @@ struct igb_ring {
 };
 void *desc; /* descriptor ring memory */
 unsigned long flags;/* ring specific flags */
-   void __iomem *tail; /* pointer to ring tail register */
 dma_addr_t dma; /* phys address of the ring */
 unsigned int  size; /* length of desc. ring in bytes */

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index edc9a6a..e177d0e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3390,9 +3390,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
  tdba & 0xULL);
 wr32(E1000_TDBAH(reg_idx), tdba >> 32);

-   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);


This line is where the bug is.  This should be adapter->io_addr, not
hw->hw_addr.


hw->hw_addr could be alterred to NULL(in igb_rd32), this is why writel 
oops the kernel, you give a fine solution.


But from the oops message, we can find, register reading returns all 
F's, I also have a question want to consult: when igb device is reset, 
would reading register(no matter config space or non-PCIe configuration 
registers) during reset returns all F's? (I guess this is the core of my 
issue)





 wr32(E1000_TDH(reg_idx), 0);
-   writel(0, ring->tail);
+wr32(E1000_TDT(reg_idx), 0);

 txdctl |= IGB_TX_PTHRESH;
 txdctl |= IGB_TX_HTHRESH << 8;
@@ -3729,9 +3728,8 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
  ring->count * sizeof(union e1000_adv_rx_desc));

 /* initialize head and tail */
-   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);


Same thing here.  It looks like the wrong values where used.


 wr32(E1000_RDH(reg_idx), 0);
-   writel(0, ring->tail);
+   wr32(E1000_RDT(reg_idx), 0);

 /* set descriptor configuration */


Would you prefer to submit the patch for this or should I?  Basically
all you need to do is change the two lines where ring->tail is
populated so that you use adapter->io_addr instead of hw->hw_addr.

Thanks.

- Alex


.



--
Yours Sincerely,

Cao jin




[PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Cao jin
Under certain condition, I find guest will oops on writel() in
igb_configure_tx_ring(), because hw->hw_address is NULL. While other
register access won't oops kernel because they use wr32/rd32 which have
a defense against NULL pointer. The oops message are as following:

[  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
error received: id=0101
[  141.225523] igb :01:00.1: PCIe Bus Error: severity=Uncorrected
(Fatal), type=Unaccessible, id=0101(Unregistered Agent ID)
[  141.299442] igb :01:00.1: broadcast error_detected message
[  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
detached
[  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
detached
[  143.465904] pcieport :00:1c.0: Root Port link has been reset
[  143.465994] igb :01:00.1: broadcast slot_reset message
[  143.466039] igb :01:00.0: enabling device ( -> 0002)
[  144.389078] igb :01:00.1: enabling device ( -> 0002)
[  145.312078] igb :01:00.1: broadcast resume message
[  145.322211] BUG: unable to handle kernel paging request at
3818
[  145.361275] IP: [] igb_configure_tx_ring+0x14d/0x280 
[igb]
[  145.438007] Oops: 0002 [#1] SMP

On the other hand, commit 238ac817 does some optimization which
dropped the field "head". So I think it is time to drop "tail" as well.

Signed-off-by: Cao jin 
---
 drivers/net/ethernet/intel/igb/igb.h  |  1 -
 drivers/net/ethernet/intel/igb/igb_main.c | 16 +---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index d11093d..0df06bc 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -247,7 +247,6 @@ struct igb_ring {
};
void *desc; /* descriptor ring memory */
unsigned long flags;/* ring specific flags */
-   void __iomem *tail; /* pointer to ring tail register */
dma_addr_t dma; /* phys address of the ring */
unsigned int  size; /* length of desc. ring in bytes */
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index edc9a6a..e177d0e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3390,9 +3390,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
 tdba & 0xULL);
wr32(E1000_TDBAH(reg_idx), tdba >> 32);
 
-   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
wr32(E1000_TDH(reg_idx), 0);
-   writel(0, ring->tail);
+wr32(E1000_TDT(reg_idx), 0);
 
txdctl |= IGB_TX_PTHRESH;
txdctl |= IGB_TX_HTHRESH << 8;
@@ -3729,9 +3728,8 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 ring->count * sizeof(union e1000_adv_rx_desc));
 
/* initialize head and tail */
-   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
wr32(E1000_RDH(reg_idx), 0);
-   writel(0, ring->tail);
+   wr32(E1000_RDT(reg_idx), 0);
 
/* set descriptor configuration */
srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
@@ -5130,6 +5128,8 @@ static void igb_tx_map(struct igb_ring *tx_ring,
u32 tx_flags = first->tx_flags;
u32 cmd_type = igb_tx_cmd_type(skb, tx_flags);
u16 i = tx_ring->next_to_use;
+struct igb_adapter *adapter = netdev_priv(tx_ring->netdev);
+struct e1000_hw *hw = &adapter->hw;
 
tx_desc = IGB_TX_DESC(tx_ring, i);
 
@@ -5223,7 +5223,7 @@ static void igb_tx_map(struct igb_ring *tx_ring,
igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+wr32(E1000_TDT(tx_ring->reg_idx), i);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
@@ -6756,7 +6756,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector 
*q_vector, int napi_budget)
"  desc.status  <%x>\n",
tx_ring->queue_index,
rd32(E1000_TDH(tx_ring->reg_idx)),
-   readl(tx_ring->tail),
+   rd32(E1000_TDT(tx_ring->reg_idx)),
tx_ring->next_to_use,
tx_ring->next_to_clean,
tx_buffer->time_stamp,
@@ -7265,6 +7265,8 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 
cleaned_count)
union e1000_adv_rx_desc *rx_desc;
struct ig

[PATCH] igb/e1000: correct register comments

2016-11-02 Thread Cao jin
Signed-off-by: Cao jin 
---
 drivers/net/ethernet/intel/igb/e1000_regs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h 
b/drivers/net/ethernet/intel/igb/e1000_regs.h
index d84afdd..58adbf2 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -320,7 +320,7 @@
 #define E1000_VT_CTL   0x0581C  /* VMDq Control - RW */
 #define E1000_WUC  0x05800  /* Wakeup Control - RW */
 #define E1000_WUFC 0x05808  /* Wakeup Filter Control - RW */
-#define E1000_WUS  0x05810  /* Wakeup Status - RO */
+#define E1000_WUS  0x05810  /* Wakeup Status - R/W1C */
 #define E1000_MANC 0x05820  /* Management Control - RW */
 #define E1000_IPAV 0x05838  /* IP Address Valid - RW */
 #define E1000_WUPL 0x05900  /* Wakeup Packet Length - RW */
-- 
2.1.0