Re: Problem with PHY state machine when using interrupts

2017-07-25 Thread Florian Fainelli
On July 25, 2017 4:41:32 AM PDT, Mason  wrote:
>On 25/07/2017 12:51, Mason wrote:
>
>> Moving the call to phy_stop() down after all the MAC tear down
>> avoids the hang.
>> 
>> As far as I understand, when we are shutting everything down,
>> we don't need the phy_state_machine to run asynchronously.
>> We can run it synchronously one last time after the delayed
>> stuff has been disabled.
>
>Below is my current WIP diff. (It conflates the two issues
>I've been discussing. Splitting the diff required.)
>
>Tested in interrupt mode:
>
># ip link set eth0 up
>[   11.107547] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change UP -> AN
>[   14.530329] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full -
>flow control rx/tx
>[   14.538136] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change AN -> RUNNING
># ip link set eth0 down
>[   23.801018] nb8800 26000.ethernet eth0: Link is Down
># ip link set eth0 up
>[   28.740870] Atheros 8035 ethernet UP26000.nb8800-mii:04: PHY state
>change UP -> AN
>[   31.431528] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full -
>flow control rx/tx
>[   31.439350] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change AN -> RUNNING
>
>Works as expected.

It does indeed, although this probably also contains your change that only logs 
the PHY state machine transitions, which makes me wonder why the UP -> HALTED 
state is not logged?

>
>Tested in polling mode:
>
># ip link set eth0 up
>[   23.001199] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change UP -> AN
>[   24.024315] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change AN -> NOLINK
>[   27.064355] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full -
>flow control rx/tx
>[   27.072156] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change NOLINK -> RUNNING
># ip link set eth0 down
>[   42.134617] nb8800 26000.ethernet eth0: Link is Down
># ip link set eth0 up
>[   48.381185] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change UP -> AN
>[   49.410976] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change AN -> NOLINK
>[   51.437686] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full -
>flow control rx/tx
>[   51.445486] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state
>change NOLINK -> RUNNING
>
>Works as expected.
>
>Also tested on my old board, no regression seen.
>
>Can you confirm that the changes to drivers/net/phy/phy.c
>look reasonable?

The flush is correct, but I am not sure about the explicit state change, in two 
ways:

- in polling mode, we should already be reaching that state with the flush call 
AFAICT
- in interrupt driven mode (phy_interrupt_is_valid or PHY_IGNORE_INTERRUPT) we 
do indeed need to make sure the HALTED state is reached

Not sure why I did not see the interrupt imbalance problem with 
phy_mac_interrupt...

I am out today but will follow up tomorrow. thanks!

-- 
Florian


Re: Problem with PHY state machine when using interrupts

2017-07-25 Thread Mason
On 25/07/2017 12:51, Mason wrote:

> Moving the call to phy_stop() down after all the MAC tear down
> avoids the hang.
> 
> As far as I understand, when we are shutting everything down,
> we don't need the phy_state_machine to run asynchronously.
> We can run it synchronously one last time after the delayed
> stuff has been disabled.

Below is my current WIP diff. (It conflates the two issues
I've been discussing. Splitting the diff required.)

Tested in interrupt mode:

# ip link set eth0 up
[   11.107547] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change UP 
-> AN
[   14.530329] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   14.538136] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change AN 
-> RUNNING
# ip link set eth0 down
[   23.801018] nb8800 26000.ethernet eth0: Link is Down
# ip link set eth0 up
[   28.740870] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change UP 
-> AN
[   31.431528] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   31.439350] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change AN 
-> RUNNING

Works as expected.

Tested in polling mode:

# ip link set eth0 up
[   23.001199] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change UP 
-> AN
[   24.024315] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change AN 
-> NOLINK
[   27.064355] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   27.072156] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change 
NOLINK -> RUNNING
# ip link set eth0 down
[   42.134617] nb8800 26000.ethernet eth0: Link is Down
# ip link set eth0 up
[   48.381185] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change UP 
-> AN
[   49.410976] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change AN 
-> NOLINK
[   51.437686] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   51.445486] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change 
NOLINK -> RUNNING

Works as expected.

Also tested on my old board, no regression seen.

Can you confirm that the changes to drivers/net/phy/phy.c
look reasonable?

Regards.


diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index e94159507847..b5eba7958871 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -41,6 +41,7 @@
 
 static void nb8800_tx_done(struct net_device *dev);
 static int nb8800_dma_stop(struct net_device *dev);
+static int nb8800_init(struct net_device *dev);
 
 static inline u8 nb8800_readb(struct nb8800_priv *priv, int reg)
 {
@@ -957,6 +958,9 @@ static int nb8800_open(struct net_device *dev)
struct phy_device *phydev;
int err;
 
+   nb8800_init(dev);
+   priv->speed = 0;
+
/* clear any pending interrupts */
nb8800_writel(priv, NB8800_RXC_SR, 0xf);
nb8800_writel(priv, NB8800_TXC_SR, 0xf);
@@ -1004,8 +1008,6 @@ static int nb8800_stop(struct net_device *dev)
struct nb8800_priv *priv = netdev_priv(dev);
struct phy_device *phydev = dev->phydev;
 
-   phy_stop(phydev);
-
netif_stop_queue(dev);
napi_disable(>napi);
 
@@ -1013,6 +1015,7 @@ static int nb8800_stop(struct net_device *dev)
nb8800_mac_rx(dev, false);
nb8800_mac_tx(dev, false);
 
+   phy_stop(phydev);
phy_disconnect(phydev);
 
free_irq(dev->irq, dev);
@@ -1350,6 +1353,21 @@ static int nb8800_tango4_init(struct net_device *dev)
 };
 MODULE_DEVICE_TABLE(of, nb8800_dt_ids);
 
+static int nb8800_init(struct net_device *dev)
+{
+#ifndef RESET_IN_PROBE
+   struct nb8800_priv *priv = netdev_priv(dev);
+   const struct nb8800_ops *ops = priv->ops;
+
+   ops->reset(dev);
+   nb8800_hw_init(dev);
+   ops->init(dev);
+   nb8800_update_mac_addr(dev);
+#endif
+
+   return 0;
+}
+
 static int nb8800_probe(struct platform_device *pdev)
 {
const struct of_device_id *match;
@@ -1389,6 +1407,7 @@ static int nb8800_probe(struct platform_device *pdev)
 
priv = netdev_priv(dev);
priv->base = base;
+   priv->ops = ops;
 
priv->phy_mode = of_get_phy_mode(pdev->dev.of_node);
if (priv->phy_mode < 0)
@@ -1407,11 +1426,13 @@ static int nb8800_probe(struct platform_device *pdev)
 
spin_lock_init(>tx_lock);
 
+#ifdef RESET_IN_PROBE
if (ops && ops->reset) {
ret = ops->reset(dev);
if (ret)
goto err_disable_clk;
}
+#endif
 
bus = devm_mdiobus_alloc(>dev);
if (!bus) {
@@ -1454,6 +1475,7 @@ static int nb8800_probe(struct platform_device *pdev)
 
priv->mii_bus = bus;
 
+#ifdef RESET_IN_PROBE
ret = nb8800_hw_init(dev);
if (ret)
goto err_deregister_fixed_link;
@@ -1463,6 +1485,7 @@ static int nb8800_probe(struct platform_device *pdev)
if (ret)
goto 

Re: Problem with PHY state machine when using interrupts

2017-07-25 Thread Mason
On 25/07/2017 00:39, Florian Fainelli wrote:

> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index d0626bf5c540..652e24b53f3f 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -968,6 +968,8 @@ void phy_stop(struct phy_device *phydev)
>  * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
>  * will not reenable interrupts.
>  */
> +   if (phy_interrupt_is_valid(phydev))
> +   phy_change(phydev);
>  }
>  EXPORT_SYMBOL(phy_stop);
> 

When setting link down:
- kernel logs Unbalanced enable for IRQ 30 (PHY interrupt)
- serial console hangs

# ip link set eth0 up
[  125.800057] ENTER nb8800_tangox_reset
[  125.803790] ++ETH++ gw8  reg=f0026424 val=00
[  125.817446] ++ETH++ gw8  reg=f0026424 val=01
[  125.821768] ++ETH++ gw16 reg=f0026420 val=0050
[  125.826241] ENTER nb8800_hw_init
[  125.829507] ++ETH++ gw8  reg=f0026000 val=1c
[  125.833808] ++ETH++ gw8  reg=f0026001 val=05
[  125.838122] ++ETH++ gw8  reg=f0026004 val=22
[  125.842421] ++ETH++ gw8  reg=f0026008 val=04
[  125.846717] ++ETH++ gw8  reg=f0026014 val=0c
[  125.851015] ++ETH++ gw8  reg=f0026051 val=00
[  125.855310] ++ETH++ gw8  reg=f0026052 val=ff
[  125.859607] ++ETH++ gw8  reg=f0026054 val=40
[  125.863903] ++ETH++ gw8  reg=f0026060 val=00
[  125.868226] ++ETH++ gw8  reg=f0026061 val=c3
[  125.872534] ENTER nb8800_mc_init
[  125.875800] ++ETH++ gw8  reg=f002602e val=00
[  125.880096] ENTER nb8800_tangox_init
[  125.883697] ++ETH++ gw8  reg=f0026400 val=01
[  125.887993] ENTER nb8800_tango4_init
[  125.891592] ENTER nb8800_update_mac_addr
[  125.895538] ++ETH++ gw8  reg=f002606a val=00
[  125.899835] ++ETH++ gw8  reg=f002606b val=16
[  125.904132] ++ETH++ gw8  reg=f002606c val=e8
[  125.908429] ++ETH++ gw8  reg=f002606d val=5e
[  125.912725] ++ETH++ gw8  reg=f002606e val=65
[  125.917022] ++ETH++ gw8  reg=f002606f val=bc
[  125.921319] ++ETH++ gw8  reg=f002603c val=00
[  125.925618] ++ETH++ gw8  reg=f002603d val=16
[  125.929912] ++ETH++ gw8  reg=f002603e val=e8
[  125.934210] ++ETH++ gw8  reg=f002603f val=5e
[  125.938505] ++ETH++ gw8  reg=f0026040 val=65
[  125.942802] ++ETH++ gw8  reg=f0026041 val=bc
[  125.947095] ENTER nb8800_open
[  125.950082] ENTER nb8800_dma_init
[  125.954025] ++ETH++ gw8  reg=f0026004 val=23
[  125.958331] ++ETH++ gw8  reg=f0026000 val=1d
[  126.027848] ENTER nb8800_set_rx_mode
[  126.031451] ENTER nb8800_mc_init
[  126.034702] ++ETH++ gw8  reg=f002602e val=00
[  126.039334] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change UP 
-> AN
[  126.046838] ENTER nb8800_set_rx_mode
[  126.050438] ENTER nb8800_mc_init
[  126.053688] ++ETH++ gw8  reg=f002602e val=00
[  126.057983] ++ETH++ gw8  reg=f0026028 val=01
[  126.062279] ++ETH++ gw8  reg=f0026029 val=00
[  126.066573] ++ETH++ gw8  reg=f002602a val=5e
[  126.070868] ++ETH++ gw8  reg=f002602b val=00
[  126.075162] ++ETH++ gw8  reg=f002602c val=00
[  126.079457] ++ETH++ gw8  reg=f002602d val=01
[  126.083751] ENTER nb8800_mc_init
[  126.086997] ++ETH++ gw8  reg=f002602e val=ff
[  129.426741] ENTER nb8800_link_reconfigure
[  129.430800] PRIV link=0 speed=0 duplex=0
[  129.434753] PHYDEV link=1 speed=1000 duplex=1
[  129.439141] ENTER nb8800_mac_config
[  129.442662] ++ETH++ gw8  reg=f0026050 val=01
[  129.446962] ++ETH++ gw8  reg=f002601c val=ff
[  129.451262] ++ETH++ gw8  reg=f0026044 val=81
[  129.455563] ENTER nb8800_pause_config
[  129.459252] ++ETH++ gw8  reg=f0026004 val=2b
[  129.463558] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[  129.471355] Atheros 8035 ethernet 26000.nb8800-mii:04: PHY state change AN 
-> RUNNING


# ip link set eth0 down
[  170.933454] ENTER nb8800_set_rx_mode
[  170.937067] ENTER nb8800_mc_init
[  170.940318] ++ETH++ gw8  reg=f002602e val=00
[  170.944613] ++ETH++ gw8  reg=f0026028 val=01
[  170.948907] ++ETH++ gw8  reg=f0026029 val=00
[  170.953199] ++ETH++ gw8  reg=f002602a val=5e
[  170.957494] ++ETH++ gw8  reg=f002602b val=00
[  170.961786] ++ETH++ gw8  reg=f002602c val=00
[  170.966079] ++ETH++ gw8  reg=f002602d val=01
[  170.970371] ENTER nb8800_mc_init
[  170.973616] ++ETH++ gw8  reg=f002602e val=ff
[  170.977962] ENTER nb8800_stop
[  170.981210] [ cut here ]
[  170.985860] WARNING: CPU: 0 PID: 964 at kernel/irq/manage.c:498 
__enable_irq+0x44/0x68
[  170.993815] Unbalanced enable for IRQ 30
[  170.997751] Modules linked in:
[  171.000821] CPU: 0 PID: 964 Comm: ip Not tainted 4.13.0-rc1 #154
[  171.006854] Hardware name: Sigma Tango DT
[  171.010896] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[  171.018686] [] (show_stack) from [] 
(dump_stack+0x78/0x8c)
[  171.025950] [] (dump_stack) from [] (__warn+0xe8/0x100)
[  171.032948] [] (__warn) from [] 
(warn_slowpath_fmt+0x38/0x48)
[  171.040471] [] (warn_slowpath_fmt) from [] 
(__enable_irq+0x44/0x68)
[  171.048519] [] (__enable_irq) from [] 
(enable_irq+0x34/0x6c)
[  171.055956] [] (enable_irq) from [] 
(phy_change+0x98/0x13c)  

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 03:59 PM, Florian Fainelli wrote:
> On 07/24/2017 03:53 PM, Mason wrote:
>> On 25/07/2017 00:36, Florian Fainelli wrote:
>>> On 07/24/2017 02:20 PM, Mason wrote:
 On 24/07/2017 21:53, Florian Fainelli wrote:

> Well now that I see the possible interrupts generated, I indeed don't
> see how you can get a link down notification unless you somehow force
> the link down yourself, which would certainly happen in phy_suspend()
> when we set BMCR.pwrdwn, but that may be too late.
>
> You should still expect the adjust_link() function to be called though
> with PHY_HALTED being set and that takes care of doing phydev->link = 0
> and netif_carrier_off(). If that still does not work, then see whether
> removing the call to phy_stop() does help (it really should).

 The only functions setting phydev->state to PHY_HALTED
 are phy_error() and phy_stop() AFAICT.

 I am aware that when phy_state_machine() handles the
 PHY_HALTED state, it will set phydev->link = 0;
 and call netif_carrier_off() -- because that's where
 I copied that code from.

 My issue is that phy_state_machine() does not run when
 I run 'ip set link dev eth0 down' from the command line.
>>>
>>> Yes, that much is clear, which is why I suggested earlier you try the
>>> patch at the end now.
>>>

 If I'm reading the code right, phy_disconnect() actually
 stops the state machine.

 In interrupt mode, phy_state_machine() doesn't run
 because no interrupt is generated.

 In polling mode, phy_state_machine() doesn't run
 because phy_disconnect() stops the state machine.

 Introducing a sleep before phy_disconnect() gives
 the state machine a chance to run in polling mode,
 but it doesn't feel right, and doesn't fix the
 other mode, which I'm using.
>>>
>>> There are several problems it seems:
>>>
>>> - the PHY state machine cannot move solely based on the PHY generating
>>> interrupts during phy_stop() if none of the changing conditions for the
>>> HW have changed, come to think about it, I doubt any PHY would be
>>> capable of doing something like that
>>>
>>> - there is an expectation from your driver to have adjust_link() run
>>> sometime during ndo_stop() to do something, but why?
>>>
>>> What is special about nb8800 that interrupts should be generated during
>>> ndo_stop(), and why do you think this is going to solve your problem?
>>>

 Looking at bcm_enet_stop() it calls phy_stop() and
 phy_disconnect() just like the nb8800 driver...

 I'm stumped.
>>>
>>> My suggestion of not using phy_stop() was not a good one, but
>>> functionally there is little difference in doing phy_stop() +
>>> phy_disconnect() or just phy_disconnect() alone. What matters is that we
>>> restart the PHY properly when phy_connect() or phy_start() is called.
>>>
>>> What I understand now from your "bug report" is that you want
>>> adjust_link to run with phydev->link = 0 to do something during
>>> ndo_close() but you have not explained why, but I suspect such that upon
>>> ndo_open() things work again.
>>>
>>> What I suspect your bug is, is that the really was no change in link
>>> status, no interrupt was generated because there should not be one, yet
>>> some of what nb8800_stop() does is not properly balanced by
>>> nb8800_open(). How about the following patch:
>>>
>>> diff --git a/drivers/net/ethernet/aurora/nb8800.c
>>> b/drivers/net/ethernet/aurora/nb8800.c
>>> index 041cfb7952f8..b07dea3ab019 100644
>>> --- a/drivers/net/ethernet/aurora/nb8800.c
>>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>>> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
>>> nb8800_mac_rx(dev, true);
>>> nb8800_mac_tx(dev, true);
>>>
>>> +   priv->speed = -1;
>>> +   priv->link = -1;
>>> +   priv->duplex = -1;
>>> +
>>> phydev = of_phy_connect(dev, priv->phy_node,
>>> nb8800_link_reconfigure, 0,
>>> priv->phy_mode);
>>>
>>
>> I will test your two patches ASAP.
>>
>> For the record, I have two (maybe related) issues:
>>
>> 1) After a sequence of
>> ip set link dev eth0 up
>> ip set link dev eth0 down
>> ip set link dev eth0 up
>> RX engine is hosed, thus network connectivity is borked.
>> The work-around is resetting the MAC in ndo_open
>> => mac_init in my proposed patch.
>> This is by far the biggest issue.
>> Also, resetting the MAC in ndo_open will make it easy
>> to implement suspend/resume, as I can just ndo_stop
>> in suspend, and ndo_open in resume.
> 
> OK.
> 
>>
>> 2) The system does not print "Link down" when I run
>> 'ip set link dev eth0 down'
>> This is a symptom of nb8800_link_reconfigure()
>> not being called at all (or being called
>> with phydev->link == priv->link when I tried
>> skipping phy_stop)
> 
> Correct, and there is actually a timing hazard that I could also
> reproduce here in that 

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 03:53 PM, Mason wrote:
> On 25/07/2017 00:36, Florian Fainelli wrote:
>> On 07/24/2017 02:20 PM, Mason wrote:
>>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>>
 Well now that I see the possible interrupts generated, I indeed don't
 see how you can get a link down notification unless you somehow force
 the link down yourself, which would certainly happen in phy_suspend()
 when we set BMCR.pwrdwn, but that may be too late.

 You should still expect the adjust_link() function to be called though
 with PHY_HALTED being set and that takes care of doing phydev->link = 0
 and netif_carrier_off(). If that still does not work, then see whether
 removing the call to phy_stop() does help (it really should).
>>>
>>> The only functions setting phydev->state to PHY_HALTED
>>> are phy_error() and phy_stop() AFAICT.
>>>
>>> I am aware that when phy_state_machine() handles the
>>> PHY_HALTED state, it will set phydev->link = 0;
>>> and call netif_carrier_off() -- because that's where
>>> I copied that code from.
>>>
>>> My issue is that phy_state_machine() does not run when
>>> I run 'ip set link dev eth0 down' from the command line.
>>
>> Yes, that much is clear, which is why I suggested earlier you try the
>> patch at the end now.
>>
>>>
>>> If I'm reading the code right, phy_disconnect() actually
>>> stops the state machine.
>>>
>>> In interrupt mode, phy_state_machine() doesn't run
>>> because no interrupt is generated.
>>>
>>> In polling mode, phy_state_machine() doesn't run
>>> because phy_disconnect() stops the state machine.
>>>
>>> Introducing a sleep before phy_disconnect() gives
>>> the state machine a chance to run in polling mode,
>>> but it doesn't feel right, and doesn't fix the
>>> other mode, which I'm using.
>>
>> There are several problems it seems:
>>
>> - the PHY state machine cannot move solely based on the PHY generating
>> interrupts during phy_stop() if none of the changing conditions for the
>> HW have changed, come to think about it, I doubt any PHY would be
>> capable of doing something like that
>>
>> - there is an expectation from your driver to have adjust_link() run
>> sometime during ndo_stop() to do something, but why?
>>
>> What is special about nb8800 that interrupts should be generated during
>> ndo_stop(), and why do you think this is going to solve your problem?
>>
>>>
>>> Looking at bcm_enet_stop() it calls phy_stop() and
>>> phy_disconnect() just like the nb8800 driver...
>>>
>>> I'm stumped.
>>
>> My suggestion of not using phy_stop() was not a good one, but
>> functionally there is little difference in doing phy_stop() +
>> phy_disconnect() or just phy_disconnect() alone. What matters is that we
>> restart the PHY properly when phy_connect() or phy_start() is called.
>>
>> What I understand now from your "bug report" is that you want
>> adjust_link to run with phydev->link = 0 to do something during
>> ndo_close() but you have not explained why, but I suspect such that upon
>> ndo_open() things work again.
>>
>> What I suspect your bug is, is that the really was no change in link
>> status, no interrupt was generated because there should not be one, yet
>> some of what nb8800_stop() does is not properly balanced by
>> nb8800_open(). How about the following patch:
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index 041cfb7952f8..b07dea3ab019 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
>> nb8800_mac_rx(dev, true);
>> nb8800_mac_tx(dev, true);
>>
>> +   priv->speed = -1;
>> +   priv->link = -1;
>> +   priv->duplex = -1;
>> +
>> phydev = of_phy_connect(dev, priv->phy_node,
>> nb8800_link_reconfigure, 0,
>> priv->phy_mode);
>>
> 
> I will test your two patches ASAP.
> 
> For the record, I have two (maybe related) issues:
> 
> 1) After a sequence of
> ip set link dev eth0 up
> ip set link dev eth0 down
> ip set link dev eth0 up
> RX engine is hosed, thus network connectivity is borked.
> The work-around is resetting the MAC in ndo_open
> => mac_init in my proposed patch.
> This is by far the biggest issue.
> Also, resetting the MAC in ndo_open will make it easy
> to implement suspend/resume, as I can just ndo_stop
> in suspend, and ndo_open in resume.

OK.

> 
> 2) The system does not print "Link down" when I run
> 'ip set link dev eth0 down'
> This is a symptom of nb8800_link_reconfigure()
> not being called at all (or being called
> with phydev->link == priv->link when I tried
> skipping phy_stop)

Correct, and there is actually a timing hazard that I could also
reproduce here in that phy_stop() + phy_disconnect(), will try to cook a
patch fixing that as well, since that seems highly undesirable.

> 
> Again, thanks for your patches and suggestions,
> 

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason
On 25/07/2017 00:36, Florian Fainelli wrote:
> On 07/24/2017 02:20 PM, Mason wrote:
>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>
>>> Well now that I see the possible interrupts generated, I indeed don't
>>> see how you can get a link down notification unless you somehow force
>>> the link down yourself, which would certainly happen in phy_suspend()
>>> when we set BMCR.pwrdwn, but that may be too late.
>>>
>>> You should still expect the adjust_link() function to be called though
>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>>> and netif_carrier_off(). If that still does not work, then see whether
>>> removing the call to phy_stop() does help (it really should).
>>
>> The only functions setting phydev->state to PHY_HALTED
>> are phy_error() and phy_stop() AFAICT.
>>
>> I am aware that when phy_state_machine() handles the
>> PHY_HALTED state, it will set phydev->link = 0;
>> and call netif_carrier_off() -- because that's where
>> I copied that code from.
>>
>> My issue is that phy_state_machine() does not run when
>> I run 'ip set link dev eth0 down' from the command line.
> 
> Yes, that much is clear, which is why I suggested earlier you try the
> patch at the end now.
> 
>>
>> If I'm reading the code right, phy_disconnect() actually
>> stops the state machine.
>>
>> In interrupt mode, phy_state_machine() doesn't run
>> because no interrupt is generated.
>>
>> In polling mode, phy_state_machine() doesn't run
>> because phy_disconnect() stops the state machine.
>>
>> Introducing a sleep before phy_disconnect() gives
>> the state machine a chance to run in polling mode,
>> but it doesn't feel right, and doesn't fix the
>> other mode, which I'm using.
> 
> There are several problems it seems:
> 
> - the PHY state machine cannot move solely based on the PHY generating
> interrupts during phy_stop() if none of the changing conditions for the
> HW have changed, come to think about it, I doubt any PHY would be
> capable of doing something like that
> 
> - there is an expectation from your driver to have adjust_link() run
> sometime during ndo_stop() to do something, but why?
> 
> What is special about nb8800 that interrupts should be generated during
> ndo_stop(), and why do you think this is going to solve your problem?
> 
>>
>> Looking at bcm_enet_stop() it calls phy_stop() and
>> phy_disconnect() just like the nb8800 driver...
>>
>> I'm stumped.
> 
> My suggestion of not using phy_stop() was not a good one, but
> functionally there is little difference in doing phy_stop() +
> phy_disconnect() or just phy_disconnect() alone. What matters is that we
> restart the PHY properly when phy_connect() or phy_start() is called.
> 
> What I understand now from your "bug report" is that you want
> adjust_link to run with phydev->link = 0 to do something during
> ndo_close() but you have not explained why, but I suspect such that upon
> ndo_open() things work again.
> 
> What I suspect your bug is, is that the really was no change in link
> status, no interrupt was generated because there should not be one, yet
> some of what nb8800_stop() does is not properly balanced by
> nb8800_open(). How about the following patch:
> 
> diff --git a/drivers/net/ethernet/aurora/nb8800.c
> b/drivers/net/ethernet/aurora/nb8800.c
> index 041cfb7952f8..b07dea3ab019 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
> nb8800_mac_rx(dev, true);
> nb8800_mac_tx(dev, true);
> 
> +   priv->speed = -1;
> +   priv->link = -1;
> +   priv->duplex = -1;
> +
> phydev = of_phy_connect(dev, priv->phy_node,
> nb8800_link_reconfigure, 0,
> priv->phy_mode);
> 

I will test your two patches ASAP.

For the record, I have two (maybe related) issues:

1) After a sequence of
ip set link dev eth0 up
ip set link dev eth0 down
ip set link dev eth0 up
RX engine is hosed, thus network connectivity is borked.
The work-around is resetting the MAC in ndo_open
=> mac_init in my proposed patch.
This is by far the biggest issue.
Also, resetting the MAC in ndo_open will make it easy
to implement suspend/resume, as I can just ndo_stop
in suspend, and ndo_open in resume.

2) The system does not print "Link down" when I run
'ip set link dev eth0 down'
This is a symptom of nb8800_link_reconfigure()
not being called at all (or being called
with phydev->link == priv->link when I tried
skipping phy_stop)

Again, thanks for your patches and suggestions,
which I will test in the morning.

I will also try to understand why I didn't have
these issues on the other board...

Regards.


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 03:36 PM, Florian Fainelli wrote:
> On 07/24/2017 02:20 PM, Mason wrote:
>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>
>>> Well now that I see the possible interrupts generated, I indeed don't
>>> see how you can get a link down notification unless you somehow force
>>> the link down yourself, which would certainly happen in phy_suspend()
>>> when we set BMCR.pwrdwn, but that may be too late.
>>>
>>> You should still expect the adjust_link() function to be called though
>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>>> and netif_carrier_off(). If that still does not work, then see whether
>>> removing the call to phy_stop() does help (it really should).
>>
>> The only functions setting phydev->state to PHY_HALTED
>> are phy_error() and phy_stop() AFAICT.
>>
>> I am aware that when phy_state_machine() handles the
>> PHY_HALTED state, it will set phydev->link = 0;
>> and call netif_carrier_off() -- because that's where
>> I copied that code from.
>>
>> My issue is that phy_state_machine() does not run when
>> I run 'ip set link dev eth0 down' from the command line.
> 
> Yes, that much is clear, which is why I suggested earlier you try the
> patch at the end now.

This sentence was referring to this patch that I changed mid way through
this email:

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..652e24b53f3f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -968,6 +968,8 @@ void phy_stop(struct phy_device *phydev)
 * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
 * will not reenable interrupts.
 */
+   if (phy_interrupt_is_valid(phydev))
+   phy_change(phydev);
 }
 EXPORT_SYMBOL(phy_stop);
-- 
Florian


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 02:20 PM, Mason wrote:
> On 24/07/2017 21:53, Florian Fainelli wrote:
> 
>> Well now that I see the possible interrupts generated, I indeed don't
>> see how you can get a link down notification unless you somehow force
>> the link down yourself, which would certainly happen in phy_suspend()
>> when we set BMCR.pwrdwn, but that may be too late.
>>
>> You should still expect the adjust_link() function to be called though
>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>> and netif_carrier_off(). If that still does not work, then see whether
>> removing the call to phy_stop() does help (it really should).
> 
> The only functions setting phydev->state to PHY_HALTED
> are phy_error() and phy_stop() AFAICT.
> 
> I am aware that when phy_state_machine() handles the
> PHY_HALTED state, it will set phydev->link = 0;
> and call netif_carrier_off() -- because that's where
> I copied that code from.
> 
> My issue is that phy_state_machine() does not run when
> I run 'ip set link dev eth0 down' from the command line.

Yes, that much is clear, which is why I suggested earlier you try the
patch at the end now.

> 
> If I'm reading the code right, phy_disconnect() actually
> stops the state machine.
> 
> In interrupt mode, phy_state_machine() doesn't run
> because no interrupt is generated.
> 
> In polling mode, phy_state_machine() doesn't run
> because phy_disconnect() stops the state machine.
> 
> Introducing a sleep before phy_disconnect() gives
> the state machine a chance to run in polling mode,
> but it doesn't feel right, and doesn't fix the
> other mode, which I'm using.

There are several problems it seems:

- the PHY state machine cannot move solely based on the PHY generating
interrupts during phy_stop() if none of the changing conditions for the
HW have changed, come to think about it, I doubt any PHY would be
capable of doing something like that

- there is an expectation from your driver to have adjust_link() run
sometime during ndo_stop() to do something, but why?

What is special about nb8800 that interrupts should be generated during
ndo_stop(), and why do you think this is going to solve your problem?

> 
> Looking at bcm_enet_stop() it calls phy_stop() and
> phy_disconnect() just like the nb8800 driver...
> 
> I'm stumped.

My suggestion of not using phy_stop() was not a good one, but
functionally there is little difference in doing phy_stop() +
phy_disconnect() or just phy_disconnect() alone. What matters is that we
restart the PHY properly when phy_connect() or phy_start() is called.

What I understand now from your "bug report" is that you want
adjust_link to run with phydev->link = 0 to do something during
ndo_close() but you have not explained why, but I suspect such that upon
ndo_open() things work again.

What I suspect your bug is, is that the really was no change in link
status, no interrupt was generated because there should not be one, yet
some of what nb8800_stop() does is not properly balanced by
nb8800_open(). How about the following patch:

diff --git a/drivers/net/ethernet/aurora/nb8800.c
b/drivers/net/ethernet/aurora/nb8800.c
index 041cfb7952f8..b07dea3ab019 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
nb8800_mac_rx(dev, true);
nb8800_mac_tx(dev, true);

+   priv->speed = -1;
+   priv->link = -1;
+   priv->duplex = -1;
+
phydev = of_phy_connect(dev, priv->phy_node,
nb8800_link_reconfigure, 0,
priv->phy_mode);
-- 
Florian


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason
On 24/07/2017 21:53, Florian Fainelli wrote:

> Well now that I see the possible interrupts generated, I indeed don't
> see how you can get a link down notification unless you somehow force
> the link down yourself, which would certainly happen in phy_suspend()
> when we set BMCR.pwrdwn, but that may be too late.
> 
> You should still expect the adjust_link() function to be called though
> with PHY_HALTED being set and that takes care of doing phydev->link = 0
> and netif_carrier_off(). If that still does not work, then see whether
> removing the call to phy_stop() does help (it really should).

The only functions setting phydev->state to PHY_HALTED
are phy_error() and phy_stop() AFAICT.

I am aware that when phy_state_machine() handles the
PHY_HALTED state, it will set phydev->link = 0;
and call netif_carrier_off() -- because that's where
I copied that code from.

My issue is that phy_state_machine() does not run when
I run 'ip set link dev eth0 down' from the command line.

If I'm reading the code right, phy_disconnect() actually
stops the state machine.

In interrupt mode, phy_state_machine() doesn't run
because no interrupt is generated.

In polling mode, phy_state_machine() doesn't run
because phy_disconnect() stops the state machine.

Introducing a sleep before phy_disconnect() gives
the state machine a chance to run in polling mode,
but it doesn't feel right, and doesn't fix the
other mode, which I'm using.

Looking at bcm_enet_stop() it calls phy_stop() and
phy_disconnect() just like the nb8800 driver...

I'm stumped.

Regards.


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 12:32 PM, Florian Fainelli wrote:
> On 07/24/2017 12:13 PM, Mason wrote:
>> On 24/07/2017 18:49, Florian Fainelli wrote:
>>
>>> On 07/24/2017 08:01 AM, Mason wrote:
>>>
> When I set the link down via 'ip link set eth0 down'
> (as opposed to pulling the Ethernet cable) things don't happen as 
> expected:
>
> The driver's adjust_link() callback is never called, and doesn't
> get a chance make some required changes. And when I set the link
> up again, there is no network connectivity.
>
> I get this problem only if I enable interrupts on my PHY.
> If I use polling, things work as expected.
>
>
> When I set the link down, devinet_ioctl() eventually calls
> ndo_set_rx_mode() and ndo_stop()
>
> In ndo_stop() the driver calls
> phy_stop(phydev);
> which disables interrupts and sets the state to HALTED.
>
> In phy_state_machine()
> the PHY_HALTED case does call the adjust_link() callback:
>
>   if (phydev->link) {
>   phydev->link = 0;
>   netif_carrier_off(phydev->attached_dev);
>   phy_adjust_link(phydev);
>   do_suspend = true;
>   }
>
> But it's not called when I use interrupts...
>
> Perhaps because there are no interrupts generated?
> Or even if there were, they have been turned off by phy_stop?
>
> Basically, it seems like when I use interrupts,
> the phy_state_machine() is not called on link down,
> which breaks the MAC driver's expectations.
>
> Am I barking up the wrong tree?

 FWIW, the patch below solves my issue.
 Basically, we reset the MAC in open(), instead of probe().

 I also had to solve the issue of adjust_link() not being
 called by calling it explicitly in stop() instead of
 relying on phy_stop() to do it indirectly.
>>>
>>> Which is of course absolutely not how it is intended to be used.
>>> phy_stop() does the following:
>>>
>>> - if the PHY was already HALTED do nothing and exit
>>> - if it was not and an interrupt is valid for this PHY:  disable and
>>> clear these interrupts
>>> - set state to PHY_HALTED
>>>
>>> somehow an interrupt should be generated from doing this such that
>>> phy_change(), invoked from phy_interrupt() should have a chance to run
>>> and make the PHY state machine transition properly to PHY_HALTED.
>>
>> I'm totally confused. Are you saying that phy_stop itself
>> should trigger an interrupt, or that the process of setting
>> the link down should generate an interrupt *before* we reach
>> phy_stop?
> 
> My reading of the code, and because I don't actually have a system where
> PHY interrupts proper are used (only polling or PHY_IGNORE INTERRUPT) is
> that, yes, somehow calling phy_stop() should result in a PHY interrupt
> to be generated making the state machine move to PHY_HALTED.
> 
>>
>> I'm also perplex over this synchronous IRQ business.
>> Should I be looking for a way to trigger an IRQ in
>> software in the Atheros PHY?
> 
> No, first understand the problem and what is going on before trying to
> workaround things in the PHY driver, there were questions for you as to
> what state the PHY state machine is left in we need to see that to
> understand how to possibly fix what you are seeing.
> 
>>
>> Before I forget: there is also an issue when using the PHY
>> in polling mode. The ndo_stop callback runs through phy_stop
>> and phy_disconnect too fast for the adjust_link() callback
>> to be called. My patch fixed that too, by calling
>> nb8800_link_reconfigure() explicitly.
> 
> Most, if not all drivers should have this:
> 
> ndo_open() calls phy_connect() or phy_attach() + phy_start() because
> that allows you to properly manage the PHY's power state and the state
> machine, the reciprocal is to have ndo_stop() call phy_disconnect() (and
> just that) which properly waits for the PHY state machine to be fully
> stopped.
> 
> phy_stop() returns immediately but the PHY state machine only gets
> stopped asynchronously at a later time, either with an interrupt or with
> an explicit work queue scheduling. If you call phy_disconnect() right
> after, this cancels the work queue and it may not have run the
> adjust_link callback yet.
> 
>>
>>
>>> So from there can you check a few things:
>>>
>>> - is such an interrupt actually generated?
>>> - if you turn on dynamic debug prints for drivers/net/phy/phy.c where do
>>> we leave the PHY state machine and what state is it in when you call
>>> ifconfig up again?
>>
>> The only interrupts I've ever seen the PHY generate are
>> on plugging/unplugging the Ethernet cable.
>>
>> Looking at the driver and datasheet...
>> http://elixir.free-electrons.com/linux/v4.13-rc2/source/drivers/net/phy/at803x.c#L312
>>  value |= AT803X_INTR_ENABLE_AUTONEG_ERR;
>>  value |= AT803X_INTR_ENABLE_SPEED_CHANGED;
>>  value |= 

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 12:13 PM, Mason wrote:
> On 24/07/2017 18:49, Florian Fainelli wrote:
> 
>> On 07/24/2017 08:01 AM, Mason wrote:
>>
 When I set the link down via 'ip link set eth0 down'
 (as opposed to pulling the Ethernet cable) things don't happen as expected:

 The driver's adjust_link() callback is never called, and doesn't
 get a chance make some required changes. And when I set the link
 up again, there is no network connectivity.

 I get this problem only if I enable interrupts on my PHY.
 If I use polling, things work as expected.


 When I set the link down, devinet_ioctl() eventually calls
 ndo_set_rx_mode() and ndo_stop()

 In ndo_stop() the driver calls
 phy_stop(phydev);
 which disables interrupts and sets the state to HALTED.

 In phy_state_machine()
 the PHY_HALTED case does call the adjust_link() callback:

if (phydev->link) {
phydev->link = 0;
netif_carrier_off(phydev->attached_dev);
phy_adjust_link(phydev);
do_suspend = true;
}

 But it's not called when I use interrupts...

 Perhaps because there are no interrupts generated?
 Or even if there were, they have been turned off by phy_stop?

 Basically, it seems like when I use interrupts,
 the phy_state_machine() is not called on link down,
 which breaks the MAC driver's expectations.

 Am I barking up the wrong tree?
>>>
>>> FWIW, the patch below solves my issue.
>>> Basically, we reset the MAC in open(), instead of probe().
>>>
>>> I also had to solve the issue of adjust_link() not being
>>> called by calling it explicitly in stop() instead of
>>> relying on phy_stop() to do it indirectly.
>>
>> Which is of course absolutely not how it is intended to be used.
>> phy_stop() does the following:
>>
>> - if the PHY was already HALTED do nothing and exit
>> - if it was not and an interrupt is valid for this PHY:  disable and
>> clear these interrupts
>> - set state to PHY_HALTED
>>
>> somehow an interrupt should be generated from doing this such that
>> phy_change(), invoked from phy_interrupt() should have a chance to run
>> and make the PHY state machine transition properly to PHY_HALTED.
> 
> I'm totally confused. Are you saying that phy_stop itself
> should trigger an interrupt, or that the process of setting
> the link down should generate an interrupt *before* we reach
> phy_stop?

My reading of the code, and because I don't actually have a system where
PHY interrupts proper are used (only polling or PHY_IGNORE INTERRUPT) is
that, yes, somehow calling phy_stop() should result in a PHY interrupt
to be generated making the state machine move to PHY_HALTED.

> 
> I'm also perplex over this synchronous IRQ business.
> Should I be looking for a way to trigger an IRQ in
> software in the Atheros PHY?

No, first understand the problem and what is going on before trying to
workaround things in the PHY driver, there were questions for you as to
what state the PHY state machine is left in we need to see that to
understand how to possibly fix what you are seeing.

> 
> Before I forget: there is also an issue when using the PHY
> in polling mode. The ndo_stop callback runs through phy_stop
> and phy_disconnect too fast for the adjust_link() callback
> to be called. My patch fixed that too, by calling
> nb8800_link_reconfigure() explicitly.

Most, if not all drivers should have this:

ndo_open() calls phy_connect() or phy_attach() + phy_start() because
that allows you to properly manage the PHY's power state and the state
machine, the reciprocal is to have ndo_stop() call phy_disconnect() (and
just that) which properly waits for the PHY state machine to be fully
stopped.

phy_stop() returns immediately but the PHY state machine only gets
stopped asynchronously at a later time, either with an interrupt or with
an explicit work queue scheduling. If you call phy_disconnect() right
after, this cancels the work queue and it may not have run the
adjust_link callback yet.

> 
> 
>> So from there can you check a few things:
>>
>> - is such an interrupt actually generated?
>> - if you turn on dynamic debug prints for drivers/net/phy/phy.c where do
>> we leave the PHY state machine and what state is it in when you call
>> ifconfig up again?
> 
> The only interrupts I've ever seen the PHY generate are
> on plugging/unplugging the Ethernet cable.
> 
> Looking at the driver and datasheet...
> http://elixir.free-electrons.com/linux/v4.13-rc2/source/drivers/net/phy/at803x.c#L312
>   value |= AT803X_INTR_ENABLE_AUTONEG_ERR;
>   value |= AT803X_INTR_ENABLE_SPEED_CHANGED;
>   value |= AT803X_INTR_ENABLE_DUPLEX_CHANGED;
>   value |= AT803X_INTR_ENABLE_LINK_FAIL;
>   value |= AT803X_INTR_ENABLE_LINK_SUCCESS;
> 
> And the interrupts reasons supported by the PHY 

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason
On 24/07/2017 18:49, Florian Fainelli wrote:

> On 07/24/2017 08:01 AM, Mason wrote:
>
>>> When I set the link down via 'ip link set eth0 down'
>>> (as opposed to pulling the Ethernet cable) things don't happen as expected:
>>>
>>> The driver's adjust_link() callback is never called, and doesn't
>>> get a chance make some required changes. And when I set the link
>>> up again, there is no network connectivity.
>>>
>>> I get this problem only if I enable interrupts on my PHY.
>>> If I use polling, things work as expected.
>>>
>>>
>>> When I set the link down, devinet_ioctl() eventually calls
>>> ndo_set_rx_mode() and ndo_stop()
>>>
>>> In ndo_stop() the driver calls
>>> phy_stop(phydev);
>>> which disables interrupts and sets the state to HALTED.
>>>
>>> In phy_state_machine()
>>> the PHY_HALTED case does call the adjust_link() callback:
>>>
>>> if (phydev->link) {
>>> phydev->link = 0;
>>> netif_carrier_off(phydev->attached_dev);
>>> phy_adjust_link(phydev);
>>> do_suspend = true;
>>> }
>>>
>>> But it's not called when I use interrupts...
>>>
>>> Perhaps because there are no interrupts generated?
>>> Or even if there were, they have been turned off by phy_stop?
>>>
>>> Basically, it seems like when I use interrupts,
>>> the phy_state_machine() is not called on link down,
>>> which breaks the MAC driver's expectations.
>>>
>>> Am I barking up the wrong tree?
>>
>> FWIW, the patch below solves my issue.
>> Basically, we reset the MAC in open(), instead of probe().
>>
>> I also had to solve the issue of adjust_link() not being
>> called by calling it explicitly in stop() instead of
>> relying on phy_stop() to do it indirectly.
> 
> Which is of course absolutely not how it is intended to be used.
> phy_stop() does the following:
> 
> - if the PHY was already HALTED do nothing and exit
> - if it was not and an interrupt is valid for this PHY:  disable and
> clear these interrupts
> - set state to PHY_HALTED
> 
> somehow an interrupt should be generated from doing this such that
> phy_change(), invoked from phy_interrupt() should have a chance to run
> and make the PHY state machine transition properly to PHY_HALTED.

I'm totally confused. Are you saying that phy_stop itself
should trigger an interrupt, or that the process of setting
the link down should generate an interrupt *before* we reach
phy_stop?

I'm also perplex over this synchronous IRQ business.
Should I be looking for a way to trigger an IRQ in
software in the Atheros PHY?

Before I forget: there is also an issue when using the PHY
in polling mode. The ndo_stop callback runs through phy_stop
and phy_disconnect too fast for the adjust_link() callback
to be called. My patch fixed that too, by calling
nb8800_link_reconfigure() explicitly.


> So from there can you check a few things:
> 
> - is such an interrupt actually generated?
> - if you turn on dynamic debug prints for drivers/net/phy/phy.c where do
> we leave the PHY state machine and what state is it in when you call
> ifconfig up again?

The only interrupts I've ever seen the PHY generate are
on plugging/unplugging the Ethernet cable.

Looking at the driver and datasheet...
http://elixir.free-electrons.com/linux/v4.13-rc2/source/drivers/net/phy/at803x.c#L312
value |= AT803X_INTR_ENABLE_AUTONEG_ERR;
value |= AT803X_INTR_ENABLE_SPEED_CHANGED;
value |= AT803X_INTR_ENABLE_DUPLEX_CHANGED;
value |= AT803X_INTR_ENABLE_LINK_FAIL;
value |= AT803X_INTR_ENABLE_LINK_SUCCESS;

And the interrupts reasons supported by the PHY are:
#define AT803X_INTR_ENABLE_AUTONEG_ERR  BIT(15)
#define AT803X_INTR_ENABLE_SPEED_CHANGEDBIT(14)
#define AT803X_INTR_ENABLE_DUPLEX_CHANGED   BIT(13)
#define AT803X_INTR_ENABLE_PAGE_RECEIVEDBIT(12)
#define AT803X_INTR_ENABLE_LINK_FAILBIT(11)
#define AT803X_INTR_ENABLE_LINK_SUCCESS BIT(10)
#define AT803X_INTR_ENABLE_WIRESPEED_DOWNGRADE  BIT(5)
#define AT803X_INTR_ENABLE_POLARITY_CHANGED BIT(1)
#define AT803X_INTR_ENABLE_WOL  BIT(0)

These all seem to be external reasons (from the peer).

I did enable debug logs in drivers/net/phy/phy.c
to trace the state machine, and it is not called
at all on set link down, so it remains in state
RUNNING (both in polling and interrupt modes).

IIRC, this used to work on the 2-core board, but breaks
on the 4-core board. Could this be some kind of race?

Regards.


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli
On 07/24/2017 08:01 AM, Mason wrote:
> On 24/07/2017 13:07, Mason wrote:
> 
>> When I set the link down via 'ip link set eth0 down'
>> (as opposed to pulling the Ethernet cable) things don't happen as expected:
>>
>> The driver's adjust_link() callback is never called, and doesn't
>> get a chance make some required changes. And when I set the link
>> up again, there is no network connectivity.
>>
>> I get this problem only if I enable interrupts on my PHY.
>> If I use polling, things work as expected.
>>
>>
>> When I set the link down, devinet_ioctl() eventually calls
>> ndo_set_rx_mode() and ndo_stop()
>>
>> In ndo_stop() the driver calls
>> phy_stop(phydev);
>> which disables interrupts and sets the state to HALTED.
>>
>> In phy_state_machine()
>> the PHY_HALTED case does call the adjust_link() callback:
>>
>>  if (phydev->link) {
>>  phydev->link = 0;
>>  netif_carrier_off(phydev->attached_dev);
>>  phy_adjust_link(phydev);
>>  do_suspend = true;
>>  }
>>
>> But it's not called when I use interrupts...
>>
>> Perhaps because there are no interrupts generated?
>> Or even if there were, they have been turned off by phy_stop?
>>
>> Basically, it seems like when I use interrupts,
>> the phy_state_machine() is not called on link down,
>> which breaks the MAC driver's expectations.
>>
>> Am I barking up the wrong tree?
> 
> FWIW, the patch below solves my issue.
> Basically, we reset the MAC in open(), instead of probe().
> 
> I also had to solve the issue of adjust_link() not being
> called by calling it explicitly in stop() instead of
> relying on phy_stop() to do it indirectly.

Which is of course absolutely not how it is intended to be used.
phy_stop() does the following:

- if the PHY was already HALTED do nothing and exit
- if it was not and an interrupt is valid for this PHY:  disable and
clear these interrupts
- set state to PHY_HALTED

somehow an interrupt should be generated from doing this such that
phy_change(), invoked from phy_interrupt() should have a chance to run
and make the PHY state machine transition properly to PHY_HALTED.

So from there can you check a few things:

- is such an interrupt actually generated?
- if you turn on dynamic debug prints for drivers/net/phy/phy.c where do
we leave the PHY state machine and what state is it in when you call
ifconfig up again?

> 
> With this code, I think it is easy to handle suspend/resume:
> on suspend, I will stop() and on resume, I will start(),
> and everything should work as expected.
> 
> I'd like to hear comments on the patch, so I can turn it
> into a formal submission.



-- 
Florian


Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason
On 24/07/2017 13:07, Mason wrote:

> When I set the link down via 'ip link set eth0 down'
> (as opposed to pulling the Ethernet cable) things don't happen as expected:
> 
> The driver's adjust_link() callback is never called, and doesn't
> get a chance make some required changes. And when I set the link
> up again, there is no network connectivity.
> 
> I get this problem only if I enable interrupts on my PHY.
> If I use polling, things work as expected.
> 
> 
> When I set the link down, devinet_ioctl() eventually calls
> ndo_set_rx_mode() and ndo_stop()
> 
> In ndo_stop() the driver calls
> phy_stop(phydev);
> which disables interrupts and sets the state to HALTED.
> 
> In phy_state_machine()
> the PHY_HALTED case does call the adjust_link() callback:
> 
>   if (phydev->link) {
>   phydev->link = 0;
>   netif_carrier_off(phydev->attached_dev);
>   phy_adjust_link(phydev);
>   do_suspend = true;
>   }
> 
> But it's not called when I use interrupts...
> 
> Perhaps because there are no interrupts generated?
> Or even if there were, they have been turned off by phy_stop?
> 
> Basically, it seems like when I use interrupts,
> the phy_state_machine() is not called on link down,
> which breaks the MAC driver's expectations.
> 
> Am I barking up the wrong tree?

FWIW, the patch below solves my issue.
Basically, we reset the MAC in open(), instead of probe().

I also had to solve the issue of adjust_link() not being
called by calling it explicitly in stop() instead of
relying on phy_stop() to do it indirectly.

With this code, I think it is easy to handle suspend/resume:
on suspend, I will stop() and on resume, I will start(),
and everything should work as expected.

I'd like to hear comments on the patch, so I can turn it
into a formal submission.

Regards.



For the record, here is the debug output printed:

# ip addr add 172.27.64.45/18 brd 172.27.127.255 dev eth0
# ip link set eth0 up
[   10.460952] ENTER nb8800_tangox_reset
[   10.464680] ++ETH++ gw8  reg=f0026424 val=00
[   10.478521] ++ETH++ gw8  reg=f0026424 val=01
[   10.482837] ++ETH++ gw16 reg=f0026420 val=0050
[   10.487325] ENTER nb8800_hw_init
[   10.490571] ++ETH++ gw8  reg=f0026000 val=1c
[   10.494878] ++ETH++ gw8  reg=f0026001 val=05
[   10.499176] ++ETH++ gw8  reg=f0026004 val=22
[   10.503481] ++ETH++ gw8  reg=f0026008 val=04
[   10.50] ++ETH++ gw8  reg=f0026014 val=0c
[   10.512082] ++ETH++ gw8  reg=f0026051 val=00
[   10.516377] ++ETH++ gw8  reg=f0026052 val=ff
[   10.520672] ++ETH++ gw8  reg=f0026054 val=40
[   10.524967] ++ETH++ gw8  reg=f0026060 val=00
[   10.529261] ++ETH++ gw8  reg=f0026061 val=c3
[   10.533555] ENTER nb8800_mc_init
[   10.536801] ++ETH++ gw8  reg=f002602e val=00
[   10.541094] ENTER nb8800_tangox_init
[   10.544690] ++ETH++ gw8  reg=f0026400 val=01
[   10.548985] ENTER nb8800_tango4_init
[   10.552580] ENTER nb8800_update_mac_addr
[   10.556523] ++ETH++ gw8  reg=f002606a val=00
[   10.560818] ++ETH++ gw8  reg=f002606b val=16
[   10.565112] ++ETH++ gw8  reg=f002606c val=e8
[   10.569407] ++ETH++ gw8  reg=f002606d val=5e
[   10.573700] ++ETH++ gw8  reg=f002606e val=65
[   10.577994] ++ETH++ gw8  reg=f002606f val=bc
[   10.582288] ++ETH++ gw8  reg=f002603c val=00
[   10.586582] ++ETH++ gw8  reg=f002603d val=16
[   10.590876] ++ETH++ gw8  reg=f002603e val=e8
[   10.595171] ++ETH++ gw8  reg=f002603f val=5e
[   10.599465] ++ETH++ gw8  reg=f0026040 val=65
[   10.603759] ++ETH++ gw8  reg=f0026041 val=bc
[   10.608051] ENTER nb8800_open
[   10.611034] ENTER nb8800_dma_init
[   10.614951] ++ETH++ gw8  reg=f0026004 val=23
[   10.619255] ++ETH++ gw8  reg=f0026000 val=1d
[   10.688912] ENTER nb8800_set_rx_mode
[   10.692515] ENTER nb8800_mc_init
[   10.695762] ++ETH++ gw8  reg=f002602e val=00
[   10.700384] PHY state change UP -> AN
[   10.704118] ENTER nb8800_set_rx_mode
[   10.707717] ENTER nb8800_mc_init
[   10.710963] ++ETH++ gw8  reg=f002602e val=00
[   10.715257] ++ETH++ gw8  reg=f0026028 val=01
[   10.719550] ++ETH++ gw8  reg=f0026029 val=00
[   10.723843] ++ETH++ gw8  reg=f002602a val=5e
[   10.728135] ++ETH++ gw8  reg=f002602b val=00
[   10.732428] ++ETH++ gw8  reg=f002602c val=00
[   10.736721] ++ETH++ gw8  reg=f002602d val=01
[   10.741013] ENTER nb8800_mc_init
[   10.744258] ++ETH++ gw8  reg=f002602e val=ff

[   14.141948] ENTER nb8800_link_reconfigure
[   14.145988] PRIV link=0 speed=0 duplex=0
[   14.150121] PHYDEV link=1 speed=1000 duplex=1
[   14.154589] ENTER nb8800_mac_config
[   14.158164] ++ETH++ gw8  reg=f0026050 val=01
[   14.162527] ++ETH++ gw8  reg=f002601c val=ff
[   14.166882] ++ETH++ gw8  reg=f0026044 val=81
[   14.171233] ENTER nb8800_pause_config
[   14.174981] ++ETH++ gw8  reg=f0026004 val=2b
[   14.179342] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   14.187193] PHY state change AN -> RUNNING

# ip link set eth0 down
[   21.577737] ENTER nb8800_set_rx_mode
[   

Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason
Hello,

When I set the link down via 'ip link set eth0 down'
(as opposed to pulling the Ethernet cable) things don't happen as expected:

The driver's adjust_link() callback is never called, and doesn't
get a chance make some required changes. And when I set the link
up again, there is no network connectivity.

I get this problem only if I enable interrupts on my PHY.
If I use polling, things work as expected.


When I set the link down, devinet_ioctl() eventually calls
ndo_set_rx_mode() and ndo_stop()

In ndo_stop() the driver calls
phy_stop(phydev);
which disables interrupts and sets the state to HALTED.

In phy_state_machine()
the PHY_HALTED case does call the adjust_link() callback:

if (phydev->link) {
phydev->link = 0;
netif_carrier_off(phydev->attached_dev);
phy_adjust_link(phydev);
do_suspend = true;
}

But it's not called when I use interrupts...

Perhaps because there are no interrupts generated?
Or even if there were, they have been turned off by phy_stop?

Basically, it seems like when I use interrupts,
the phy_state_machine() is not called on link down,
which breaks the MAC driver's expectations.

Am I barking up the wrong tree?

Regards.