Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 13:23, Maciej S. Szmigiero wrote:
> On 11.10.2018 10:24, Chris Clayton wrote:
>> On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
>>> On 11.10.2018 00:49, Chris Clayton wrote:
> Now, knowing the "right" value you can experiment with what 
> rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).
>

 This might be more interesting. Through a combination of viewing the 
 output from pr_notice() and the output from
 "ethtool -d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

 As I did with 4.18.10 early on in the process, I removed the call to 
 rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
 installed and rebooted. Now I see the following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002400e

>>>
>>> Now we can finally see some difference...
>>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
>>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
>>> is kind of expected - one can see that the working configuration
>>> post-resume has bit 14 (or 0x4000) set, too.
>>>
>>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
>>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
>>>
>>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
>>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
>>> change:
>>> --- r8169.c
>>> +++ r8169.c
>>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>>> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>>> case RTL_GIGA_MAC_VER_34:
>>> case RTL_GIGA_MAC_VER_35:
>>> +   case RTL_GIGA_MAC_VER_38:
>>> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
>>> RX_DMA_BURST);
>>> break;
>>> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>>>
>>> This will add RX_MULTI_EN also for your chip model (you need to add back
>>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>>>
>>
>> That's done the trick. With the above change applied, my network runs 
>> running fine after a suspend/resume cycle and the
>> ping times are back in the 14-15ms range.
> 
> Nice!
> 
> I will submit a patch, it would be great if you could test it and then
> add a "Tested-by:" tag.
>  

Will do, Maciej.

Thanks for solving this.
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 13:23, Maciej S. Szmigiero wrote:
> On 11.10.2018 10:24, Chris Clayton wrote:
>> On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
>>> On 11.10.2018 00:49, Chris Clayton wrote:
> Now, knowing the "right" value you can experiment with what 
> rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).
>

 This might be more interesting. Through a combination of viewing the 
 output from pr_notice() and the output from
 "ethtool -d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

 As I did with 4.18.10 early on in the process, I removed the call to 
 rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
 installed and rebooted. Now I see the following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002400e

>>>
>>> Now we can finally see some difference...
>>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
>>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
>>> is kind of expected - one can see that the working configuration
>>> post-resume has bit 14 (or 0x4000) set, too.
>>>
>>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
>>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
>>>
>>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
>>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
>>> change:
>>> --- r8169.c
>>> +++ r8169.c
>>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>>> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>>> case RTL_GIGA_MAC_VER_34:
>>> case RTL_GIGA_MAC_VER_35:
>>> +   case RTL_GIGA_MAC_VER_38:
>>> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
>>> RX_DMA_BURST);
>>> break;
>>> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>>>
>>> This will add RX_MULTI_EN also for your chip model (you need to add back
>>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>>>
>>
>> That's done the trick. With the above change applied, my network runs 
>> running fine after a suspend/resume cycle and the
>> ping times are back in the 14-15ms range.
> 
> Nice!
> 
> I will submit a patch, it would be great if you could test it and then
> add a "Tested-by:" tag.
>  

Will do, Maciej.

Thanks for solving this.
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Maciej S. Szmigiero
On 11.10.2018 10:24, Chris Clayton wrote:
> On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
>> On 11.10.2018 00:49, Chris Clayton wrote:
 Now, knowing the "right" value you can experiment with what 
 rtl_init_rxcfg()
 writes (under the "default:" label for your NIC model).

>>>
>>> This might be more interesting. Through a combination of viewing the output 
>>> from pr_notice() and the output from
>>> "ethtool -d", I can see RxConfig with the following values
>>>
>>> During boot:0x00028700
>>> Before suspend: 0x0002870e
>>> During resume:  0x00024000
>>> Post resume:0x0002870e
>>>
>>> As I did with 4.18.10 early on in the process, I removed the call to 
>>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>>> installed and rebooted. Now I see the following values:
>>>
>>> During boot:0x00028700
>>> Before suspend: 0x0002870e
>>> During resume:  0x00024000
>>> Post resume:0x0002400e
>>>
>>
>> Now we can finally see some difference...
>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
>> is kind of expected - one can see that the working configuration
>> post-resume has bit 14 (or 0x4000) set, too.
>>
>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
>>
>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
>> change:
>> --- r8169.c
>> +++ r8169.c
>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>>  case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>>  case RTL_GIGA_MAC_VER_34:
>>  case RTL_GIGA_MAC_VER_35:
>> +case RTL_GIGA_MAC_VER_38:
>>  RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
>> RX_DMA_BURST);
>>  break;
>>  case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>>
>> This will add RX_MULTI_EN also for your chip model (you need to add back
>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>>
> 
> That's done the trick. With the above change applied, my network runs running 
> fine after a suspend/resume cycle and the
> ping times are back in the 14-15ms range.

Nice!

I will submit a patch, it would be great if you could test it and then
add a "Tested-by:" tag.
 
> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Maciej S. Szmigiero
On 11.10.2018 10:24, Chris Clayton wrote:
> On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
>> On 11.10.2018 00:49, Chris Clayton wrote:
 Now, knowing the "right" value you can experiment with what 
 rtl_init_rxcfg()
 writes (under the "default:" label for your NIC model).

>>>
>>> This might be more interesting. Through a combination of viewing the output 
>>> from pr_notice() and the output from
>>> "ethtool -d", I can see RxConfig with the following values
>>>
>>> During boot:0x00028700
>>> Before suspend: 0x0002870e
>>> During resume:  0x00024000
>>> Post resume:0x0002870e
>>>
>>> As I did with 4.18.10 early on in the process, I removed the call to 
>>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>>> installed and rebooted. Now I see the following values:
>>>
>>> During boot:0x00028700
>>> Before suspend: 0x0002870e
>>> During resume:  0x00024000
>>> Post resume:0x0002400e
>>>
>>
>> Now we can finally see some difference...
>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
>> is kind of expected - one can see that the working configuration
>> post-resume has bit 14 (or 0x4000) set, too.
>>
>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
>>
>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
>> change:
>> --- r8169.c
>> +++ r8169.c
>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>>  case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>>  case RTL_GIGA_MAC_VER_34:
>>  case RTL_GIGA_MAC_VER_35:
>> +case RTL_GIGA_MAC_VER_38:
>>  RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
>> RX_DMA_BURST);
>>  break;
>>  case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>>
>> This will add RX_MULTI_EN also for your chip model (you need to add back
>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>>
> 
> That's done the trick. With the above change applied, my network runs running 
> fine after a suspend/resume cycle and the
> ping times are back in the 14-15ms range.

Nice!

I will submit a patch, it would be great if you could test it and then
add a "Tested-by:" tag.
 
> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
> On 11.10.2018 00:49, Chris Clayton wrote:
>>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>>> writes (under the "default:" label for your NIC model).
>>>
>>
>> This might be more interesting. Through a combination of viewing the output 
>> from pr_notice() and the output from
>> "ethtool -d", I can see RxConfig with the following values
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002870e
>>
>> As I did with 4.18.10 early on in the process, I removed the call to 
>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>> installed and rebooted. Now I see the following values:
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002400e
>>
> 
> Now we can finally see some difference...
> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
> is kind of expected - one can see that the working configuration
> post-resume has bit 14 (or 0x4000) set, too.
> 
> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
> 
> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
> change:
> --- r8169.c
> +++ r8169.c
> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>   case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>   case RTL_GIGA_MAC_VER_34:
>   case RTL_GIGA_MAC_VER_35:
> + case RTL_GIGA_MAC_VER_38:
>   RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
> RX_DMA_BURST);
>   break;
>   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
> 
> This will add RX_MULTI_EN also for your chip model (you need to add back
> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>

That's done the trick. With the above change applied, my network runs running 
fine after a suspend/resume cycle and the
ping times are back in the 14-15ms range.

Chris

> If this does not help then I would try another values in the above write:
> 1) RTL_W32(tp, RxConfig, 0x00024000);
> 2) RTL_W32(tp, RxConfig, 0x4000);
> 3) RTL_W32(tp, RxConfig, RX_DMA_BURST);
> 4) RTL_W32(tp, RxConfig, RX128_INT_EN);
> 
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
> On 11.10.2018 00:49, Chris Clayton wrote:
>>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>>> writes (under the "default:" label for your NIC model).
>>>
>>
>> This might be more interesting. Through a combination of viewing the output 
>> from pr_notice() and the output from
>> "ethtool -d", I can see RxConfig with the following values
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002870e
>>
>> As I did with 4.18.10 early on in the process, I removed the call to 
>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>> installed and rebooted. Now I see the following values:
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002400e
>>
> 
> Now we can finally see some difference...
> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
> is kind of expected - one can see that the working configuration
> post-resume has bit 14 (or 0x4000) set, too.
> 
> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
> 
> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
> change:
> --- r8169.c
> +++ r8169.c
> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>   case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>   case RTL_GIGA_MAC_VER_34:
>   case RTL_GIGA_MAC_VER_35:
> + case RTL_GIGA_MAC_VER_38:
>   RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
> RX_DMA_BURST);
>   break;
>   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
> 
> This will add RX_MULTI_EN also for your chip model (you need to add back
> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>

That's done the trick. With the above change applied, my network runs running 
fine after a suspend/resume cycle and the
ping times are back in the 14-15ms range.

Chris

> If this does not help then I would try another values in the above write:
> 1) RTL_W32(tp, RxConfig, 0x00024000);
> 2) RTL_W32(tp, RxConfig, 0x4000);
> 3) RTL_W32(tp, RxConfig, RX_DMA_BURST);
> 4) RTL_W32(tp, RxConfig, RX128_INT_EN);
> 
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Maciej S. Szmigiero
On 11.10.2018 00:49, Chris Clayton wrote:
>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>> writes (under the "default:" label for your NIC model).
>>
> 
> This might be more interesting. Through a combination of viewing the output 
> from pr_notice() and the output from
> "ethtool -d", I can see RxConfig with the following values
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
> As I did with 4.18.10 early on in the process, I removed the call to 
> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
> installed and rebooted. Now I see the following values:
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002400e
> 

Now we can finally see some difference...
Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
(bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
is kind of expected - one can see that the working configuration
post-resume has bit 14 (or 0x4000) set, too.

This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.

RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
family as your RTL_GIGA_MAC_VER_38, so can you please try the following
change:
--- r8169.c
+++ r8169.c
@@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
+   case RTL_GIGA_MAC_VER_38:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
RX_DMA_BURST);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:

This will add RX_MULTI_EN also for your chip model (you need to add back
the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).

If this does not help then I would try another values in the above write:
1) RTL_W32(tp, RxConfig, 0x00024000);
2) RTL_W32(tp, RxConfig, 0x4000);
3) RTL_W32(tp, RxConfig, RX_DMA_BURST);
4) RTL_W32(tp, RxConfig, RX128_INT_EN);

> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Maciej S. Szmigiero
On 11.10.2018 00:49, Chris Clayton wrote:
>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>> writes (under the "default:" label for your NIC model).
>>
> 
> This might be more interesting. Through a combination of viewing the output 
> from pr_notice() and the output from
> "ethtool -d", I can see RxConfig with the following values
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
> As I did with 4.18.10 early on in the process, I removed the call to 
> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
> installed and rebooted. Now I see the following values:
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002400e
> 

Now we can finally see some difference...
Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
(bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
is kind of expected - one can see that the working configuration
post-resume has bit 14 (or 0x4000) set, too.

This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.

RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
family as your RTL_GIGA_MAC_VER_38, so can you please try the following
change:
--- r8169.c
+++ r8169.c
@@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
+   case RTL_GIGA_MAC_VER_38:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
RX_DMA_BURST);
break;
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:

This will add RX_MULTI_EN also for your chip model (you need to add back
the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).

If this does not help then I would try another values in the above write:
1) RTL_W32(tp, RxConfig, 0x00024000);
2) RTL_W32(tp, RxConfig, 0x4000);
3) RTL_W32(tp, RxConfig, RX_DMA_BURST);
4) RTL_W32(tp, RxConfig, RX128_INT_EN);

> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).
> 

This might be more interesting. Through a combination of viewing the output 
from pr_notice() and the output from
"ethtool -d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

As I did with 4.18.10 early on in the process, I removed the call to 
rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
installed and rebooted. Now I see the following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002400e

As with 4.18.10, networking now appears to be stable after the resume. Starting 
a browser results in my homepage being
displayed and I've spent a few minutes surfing with no interruptions. 
Similarly, ping runs without stopping. I simply
don't know enough to know what might now be enabled or disabled by this change 
in value, but hopefully it will provide a
clue to someone as to what is going on.

Chris

> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).
> 

This might be more interesting. Through a combination of viewing the output 
from pr_notice() and the output from
"ethtool -d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

As I did with 4.18.10 early on in the process, I removed the call to 
rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
installed and rebooted. Now I see the following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002400e

As with 4.18.10, networking now appears to be stable after the resume. Starting 
a browser results in my homepage being
displayed and I've spent a few minutes surfing with no interruptions. 
Similarly, ping runs without stopping. I simply
don't know enough to know what might now be enabled or disabled by this change 
in value, but hopefully it will provide a
clue to someone as to what is going on.

Chris

> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Too late at night to be doing this stuff. Clicked send instead of saving a 
draft. Sorry, please ignore.

On 10/10/2018 23:30, Chris Clayton wrote:
> OK, right kernel/module used this time. Please see findings below.
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
 Thanks to Maciej and Heiner for their replies.

 On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>> but tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a 
>> resume from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>> so quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
>
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
>
> The same goes for the PCI configuration (lspci -d :8168 -vv).
>
 Maciej suggested comparing the output from lspci -vv for the ethernet 
 device. They are identical.

 Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
 pre and post suspend. Again, they are identical.
 Heiner specifically suggested looking at the RxConfig. The value of that 
 is 0x0002870e both pre and post suspend.

>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> 
> This change made no difference. Networking still dies if I open a browser or 
> leave ping running long enough.
> 
>> 2) Check the original value of RxConfig (after a resume) before
>> rtl_init_rxcfg() overwrites it (compile tested only):
>> --- r8169.c.ori
>> +++ r8169.c
>> @@ -5155,6 +5155,9 @@
>>  /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>>  RTL_R8(tp, IntrMask);
>>  RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
>> +
>> +pr_notice("RxConfig before init was %.8x\n",
>> +(unsigned int)RTL_R32(tp, RxConfig));
>>  rtl_init_rxcfg(tp);
>>  rtl_set_tx_config_registers(tp);
>>  
>>
>> This should be the value that you got when you removed the call to
>> rtl_init_rxcfg() for testing.
>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>> writes (under the "default:" label for your NIC model).
> 
> This might be more interesting. Through combination of viewing the output 
> from pr_notice() and the output from "ethtool
> -d", I can see RxConfig with the following values
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
> I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
> installed and rebooted. Now I see the
> following values:
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
>>
>> Hope this helps,
>> Maciej
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Too late at night to be doing this stuff. Clicked send instead of saving a 
draft. Sorry, please ignore.

On 10/10/2018 23:30, Chris Clayton wrote:
> OK, right kernel/module used this time. Please see findings below.
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
 Thanks to Maciej and Heiner for their replies.

 On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>> but tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a 
>> resume from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>> so quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
>
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
>
> The same goes for the PCI configuration (lspci -d :8168 -vv).
>
 Maciej suggested comparing the output from lspci -vv for the ethernet 
 device. They are identical.

 Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
 pre and post suspend. Again, they are identical.
 Heiner specifically suggested looking at the RxConfig. The value of that 
 is 0x0002870e both pre and post suspend.

>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> 
> This change made no difference. Networking still dies if I open a browser or 
> leave ping running long enough.
> 
>> 2) Check the original value of RxConfig (after a resume) before
>> rtl_init_rxcfg() overwrites it (compile tested only):
>> --- r8169.c.ori
>> +++ r8169.c
>> @@ -5155,6 +5155,9 @@
>>  /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>>  RTL_R8(tp, IntrMask);
>>  RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
>> +
>> +pr_notice("RxConfig before init was %.8x\n",
>> +(unsigned int)RTL_R32(tp, RxConfig));
>>  rtl_init_rxcfg(tp);
>>  rtl_set_tx_config_registers(tp);
>>  
>>
>> This should be the value that you got when you removed the call to
>> rtl_init_rxcfg() for testing.
>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>> writes (under the "default:" label for your NIC model).
> 
> This might be more interesting. Through combination of viewing the output 
> from pr_notice() and the output from "ethtool
> -d", I can see RxConfig with the following values
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
> I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
> installed and rebooted. Now I see the
> following values:
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
>>
>> Hope this helps,
>> Maciej
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).

This might be more interesting. Through combination of viewing the output from 
pr_notice() and the output from "ethtool
-d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
installed and rebooted. Now I see the
following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

> 
> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).

This might be more interesting. Through combination of viewing the output from 
pr_notice() and the output from "ethtool
-d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
installed and rebooted. Now I see the
following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

> 
> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Sorry, I forgot that editing r8169.c and rebuilding would result in rc7+, so I 
tested the wrong kernel/module to get the
results I provided below. That, however, may make the results more interesting 
because they happened with a virgin rc7
kernel/module.

I'll test your proposals properly later.

Chris

On 10/10/2018 09:09, Chris Clayton wrote:
> 
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
 Thanks to Maciej and Heiner for their replies.

 On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>> but tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a 
>> resume from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>> so quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
>
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
>
> The same goes for the PCI configuration (lspci -d :8168 -vv).
>
 Maciej suggested comparing the output from lspci -vv for the ethernet 
 device. They are identical.

 Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
 pre and post suspend. Again, they are identical.
 Heiner specifically suggested looking at the RxConfig. The value of that 
 is 0x0002870e both pre and post suspend.

>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> After testing your first proposal, which made no  difference, I founf the 
> following in dmesg in the output from dmesg:
> 
> [  761.999468] [ cut here ]
> [  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
> dev_watchdog+0x1e9/0x1f0
> [  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
> iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
> nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
> videobuf2_memops snd_hda_codec_via
> videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common 
> usbhid realtek coretemp snd_hda_intel hwmon
> snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
> unloaded: btintel]
> [  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
> [  761.999504] Hardware name: Notebook W65_67SZ   
>  /W65_67SZ
>, BIOS 1.03.05 02/26/2014
> [  761.999508] Workqueue: events rtl_task [r8169]
> [  761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0
> [  761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b 
> c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1
> 81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 
> c7 07 00 00 00 00
> [  761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282
> [  761.999514] RAX:  RBX:  RCX: 
> 0006
> [  761.999516] RDX: 0007 RSI: 0096 RDI: 
> 88040f8153d0
> [  761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 
> 034e
> [  761.999517] R10: 0007 R11:  R12: 
> 88040ca9a39c
> [  761.999518] R13: 88040ca9a000 R14: 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Sorry, I forgot that editing r8169.c and rebuilding would result in rc7+, so I 
tested the wrong kernel/module to get the
results I provided below. That, however, may make the results more interesting 
because they happened with a virgin rc7
kernel/module.

I'll test your proposals properly later.

Chris

On 10/10/2018 09:09, Chris Clayton wrote:
> 
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
 Thanks to Maciej and Heiner for their replies.

 On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>> but tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a 
>> resume from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>> so quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
>
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
>
> The same goes for the PCI configuration (lspci -d :8168 -vv).
>
 Maciej suggested comparing the output from lspci -vv for the ethernet 
 device. They are identical.

 Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
 pre and post suspend. Again, they are identical.
 Heiner specifically suggested looking at the RxConfig. The value of that 
 is 0x0002870e both pre and post suspend.

>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> After testing your first proposal, which made no  difference, I founf the 
> following in dmesg in the output from dmesg:
> 
> [  761.999468] [ cut here ]
> [  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
> dev_watchdog+0x1e9/0x1f0
> [  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
> iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
> nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
> videobuf2_memops snd_hda_codec_via
> videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common 
> usbhid realtek coretemp snd_hda_intel hwmon
> snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
> unloaded: btintel]
> [  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
> [  761.999504] Hardware name: Notebook W65_67SZ   
>  /W65_67SZ
>, BIOS 1.03.05 02/26/2014
> [  761.999508] Workqueue: events rtl_task [r8169]
> [  761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0
> [  761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b 
> c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1
> 81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 
> c7 07 00 00 00 00
> [  761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282
> [  761.999514] RAX:  RBX:  RCX: 
> 0006
> [  761.999516] RDX: 0007 RSI: 0096 RDI: 
> 88040f8153d0
> [  761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 
> 034e
> [  761.999517] R10: 0007 R11:  R12: 
> 88040ca9a39c
> [  761.999518] R13: 88040ca9a000 R14: 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton



On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 
After testing your first proposal, which made no  difference, I founf the 
following in dmesg in the output from dmesg:

[  761.999468] [ cut here ]
[  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
dev_watchdog+0x1e9/0x1f0
[  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
videobuf2_memops snd_hda_codec_via
videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid 
realtek coretemp snd_hda_intel hwmon
snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
unloaded: btintel]
[  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
[  761.999504] Hardware name: Notebook W65_67SZ 
   /W65_67SZ
   , BIOS 1.03.05 02/26/2014
[  761.999508] Workqueue: events rtl_task [r8169]
[  761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0
[  761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b 
c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1
81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 
07 00 00 00 00
[  761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282
[  761.999514] RAX:  RBX:  RCX: 0006
[  761.999516] RDX: 0007 RSI: 0096 RDI: 88040f8153d0
[  761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 034e
[  761.999517] R10: 0007 R11:  R12: 88040ca9a39c
[  761.999518] R13: 88040ca9a000 R14: 0001 R15: 8803ea17cc80
[  761.999520] FS:  () GS:88040f80() 
knlGS:
[  761.999521] CS:  0010 DS:  ES:  CR0: 80050033
[  761.999522] CR2: 7f67280206b8 CR3: 0200a002 CR4: 001606f0
[  761.999523] Call Trace:
[  761.999525]  
[  761.999527]  ? qdisc_reset+0xe0/0xe0
[  761.999529]  ? qdisc_reset+0xe0/0xe0
[  761.999532]  call_timer_fn+0x11/0x70
[  761.999534]  expire_timers+0x8e/0xa0
[  761.999535]  

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton



On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
 On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
>
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a 
> resume from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

 You can try comparing chip registers (ethtool -d eth0) in the working
 state (before a suspend) and in the broken state (after a resume).
 Maybe there will be some obvious in the difference.

 The same goes for the PCI configuration (lspci -d :8168 -vv).

>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 
After testing your first proposal, which made no  difference, I founf the 
following in dmesg in the output from dmesg:

[  761.999468] [ cut here ]
[  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
dev_watchdog+0x1e9/0x1f0
[  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
videobuf2_memops snd_hda_codec_via
videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid 
realtek coretemp snd_hda_intel hwmon
snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
unloaded: btintel]
[  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
[  761.999504] Hardware name: Notebook W65_67SZ 
   /W65_67SZ
   , BIOS 1.03.05 02/26/2014
[  761.999508] Workqueue: events rtl_task [r8169]
[  761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0
[  761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b 
c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1
81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 
07 00 00 00 00
[  761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282
[  761.999514] RAX:  RBX:  RCX: 0006
[  761.999516] RDX: 0007 RSI: 0096 RDI: 88040f8153d0
[  761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 034e
[  761.999517] R10: 0007 R11:  R12: 88040ca9a39c
[  761.999518] R13: 88040ca9a000 R14: 0001 R15: 8803ea17cc80
[  761.999520] FS:  () GS:88040f80() 
knlGS:
[  761.999521] CS:  0010 DS:  ES:  CR0: 80050033
[  761.999522] CR2: 7f67280206b8 CR3: 0200a002 CR4: 001606f0
[  761.999523] Call Trace:
[  761.999525]  
[  761.999527]  ? qdisc_reset+0xe0/0xe0
[  761.999529]  ? qdisc_reset+0xe0/0xe0
[  761.999532]  call_timer_fn+0x11/0x70
[  761.999534]  expire_timers+0x8e/0xa0
[  761.999535]  

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Maciej S. Szmigiero
On 09.10.2018 22:36, Heiner Kallweit wrote:
> On 09.10.2018 16:40, Chris Clayton wrote:
>> Thanks to Maciej and Heiner for their replies.
>>
>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>> On 07.10.2018 21:36, Chris Clayton wrote:
 Hi again,

 I didn't think there was anything in 4.19-rc7 to fix this regression, but 
 tried it anyway. I can confirm that the
 regression is still present and my network still fails when, after a 
 resume from suspend (to ram or disk), I open my
 browser or my mail client. In both those cases the failure is almost 
 immediate - e.g. my home page doesn't get displayed
 in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
 quickly but the reported time increases from
 14-15ms to more than 1000ms.
>>>
>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>> state (before a suspend) and in the broken state (after a resume).
>>> Maybe there will be some obvious in the difference.
>>>
>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>
>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>> device. They are identical.
>>
>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>> and post suspend. Again, they are identical.
>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>> 0x0002870e both pre and post suspend.
>>
> Hmm, this is very weird, especially taking into account that in your original
> report you state that removing the call to rtl_init_rxcfg() from 
> rtl_hw_start()
> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
> register values seem to be the same before and after resume. So how can the
> chip behave differently?
> So far my best guess is that some chip quirk causes it to accept writes to
> register RxConfig, but to misinterpret or ignore the written value.
> So far your report is the only one (affecting RTL8411), but we don't know
> whether other chip versions are affected too.

Also, it is interesting that even if one removes a call to
rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
written to moments later by rtl_set_rx_mode().

The only chip accesses in the meantime seems to be a write to TxConfig by
rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
to MAR0 earlier in rtl_set_rx_mode().

My proposals are:
1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
in rtl_hw_start().
Maybe the chip does not like sometimes that RxConfig is written before
TxConfig.

2) Check the original value of RxConfig (after a resume) before
rtl_init_rxcfg() overwrites it (compile tested only):
--- r8169.c.ori
+++ r8169.c
@@ -5155,6 +5155,9 @@
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
+
+   pr_notice("RxConfig before init was %.8x\n",
+   (unsigned int)RTL_R32(tp, RxConfig));
rtl_init_rxcfg(tp);
rtl_set_tx_config_registers(tp);
 

This should be the value that you got when you removed the call to
rtl_init_rxcfg() for testing.
Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
writes (under the "default:" label for your NIC model).

Hope this helps,
Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Maciej S. Szmigiero
On 09.10.2018 22:36, Heiner Kallweit wrote:
> On 09.10.2018 16:40, Chris Clayton wrote:
>> Thanks to Maciej and Heiner for their replies.
>>
>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>> On 07.10.2018 21:36, Chris Clayton wrote:
 Hi again,

 I didn't think there was anything in 4.19-rc7 to fix this regression, but 
 tried it anyway. I can confirm that the
 regression is still present and my network still fails when, after a 
 resume from suspend (to ram or disk), I open my
 browser or my mail client. In both those cases the failure is almost 
 immediate - e.g. my home page doesn't get displayed
 in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
 quickly but the reported time increases from
 14-15ms to more than 1000ms.
>>>
>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>> state (before a suspend) and in the broken state (after a resume).
>>> Maybe there will be some obvious in the difference.
>>>
>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>
>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>> device. They are identical.
>>
>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>> and post suspend. Again, they are identical.
>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>> 0x0002870e both pre and post suspend.
>>
> Hmm, this is very weird, especially taking into account that in your original
> report you state that removing the call to rtl_init_rxcfg() from 
> rtl_hw_start()
> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
> register values seem to be the same before and after resume. So how can the
> chip behave differently?
> So far my best guess is that some chip quirk causes it to accept writes to
> register RxConfig, but to misinterpret or ignore the written value.
> So far your report is the only one (affecting RTL8411), but we don't know
> whether other chip versions are affected too.

Also, it is interesting that even if one removes a call to
rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
written to moments later by rtl_set_rx_mode().

The only chip accesses in the meantime seems to be a write to TxConfig by
rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
to MAR0 earlier in rtl_set_rx_mode().

My proposals are:
1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
in rtl_hw_start().
Maybe the chip does not like sometimes that RxConfig is written before
TxConfig.

2) Check the original value of RxConfig (after a resume) before
rtl_init_rxcfg() overwrites it (compile tested only):
--- r8169.c.ori
+++ r8169.c
@@ -5155,6 +5155,9 @@
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
+
+   pr_notice("RxConfig before init was %.8x\n",
+   (unsigned int)RTL_R32(tp, RxConfig));
rtl_init_rxcfg(tp);
rtl_set_tx_config_registers(tp);
 

This should be the value that you got when you removed the call to
rtl_init_rxcfg() for testing.
Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
writes (under the "default:" label for your NIC model).

Hope this helps,
Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton



On 09/10/2018 22:39, Heiner Kallweit wrote:
> On 09.10.2018 16:40, Chris Clayton wrote:
>> Thanks to Maciej and Heiner for their replies.
>>
>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>> On 07.10.2018 21:36, Chris Clayton wrote:
 Hi again,

 I didn't think there was anything in 4.19-rc7 to fix this regression, but 
 tried it anyway. I can confirm that the
 regression is still present and my network still fails when, after a 
 resume from suspend (to ram or disk), I open my
 browser or my mail client. In both those cases the failure is almost 
 immediate - e.g. my home page doesn't get displayed
 in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
 quickly but the reported time increases from
 14-15ms to more than 1000ms.
>>>
>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>> state (before a suspend) and in the broken state (after a resume).
>>> Maybe there will be some obvious in the difference.
>>>
>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>
>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>> device. They are identical.
>>
>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>> and post suspend. Again, they are identical.
>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>> 0x0002870e both pre and post suspend.
>>
>> I've attached files I redirected the outputs to.
>>
>> Please don't hesitate to ask for any other information needed to solve this 
>> problem. In the meantime, I've now got
>> scripts that stop the network during suspend and restart it during resume. 
>> (Those scripts were removed whilst I gathered
>> the diagnostics shown in the attachments.)
>>
> I'd like to check whether it may be a timing issue. The following 
> experimental patch
> adds a PCI commit after writing register ChipCmd. Could you please check 
> whether
> it changes anything?
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 7d3f671e1..f3c359492 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct  rtl8169_private *tp)
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> + RTL_R8(tp, ChipCmd);
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 

Sorry, this patch doesn't make any difference - my network still fails. After a 
suspend/resume my browsers (chromium
and firefox) both fail to open my home page (https://www.google.co.uk). The 
ping time for one of my ISP's name servers
increases from 14-15ms to more than 1000ms, although it after a few pings it 
does reduce. As the screen grab below
shows, the network does eventually fail

$ ping NS1
PING ns1 (90.207.238.97): 56 data bytes
64 bytes from 90.207.238.97: icmp_seq=0 ttl=251 time=1017.289 ms
64 bytes from 90.207.238.97: icmp_seq=1 ttl=251 time=1018.051 ms
64 bytes from 90.207.238.97: icmp_seq=2 ttl=251 time=1015.271 ms
64 bytes from 90.207.238.97: icmp_seq=3 ttl=251 time=1015.495 ms
64 bytes from 90.207.238.97: icmp_seq=6 ttl=251 time=1015.646 ms
64 bytes from 90.207.238.97: icmp_seq=7 ttl=251 time=1022.609 ms
64 bytes from 90.207.238.97: icmp_seq=8 ttl=251 time=1015.612 ms
64 bytes from 90.207.238.97: icmp_seq=10 ttl=251 time=1015.551 ms
64 bytes from 90.207.238.97: icmp_seq=12 ttl=251 time=1015.446 ms
64 bytes from 90.207.238.97: icmp_seq=13 ttl=251 time=1015.657 ms
64 bytes from 90.207.238.97: icmp_seq=14 ttl=251 time=1015.614 ms
64 bytes from 90.207.238.97: icmp_seq=15 ttl=251 time=1015.651 ms
64 bytes from 90.207.238.97: icmp_seq=17 ttl=251 time=1015.459 ms
64 bytes from 90.207.238.97: icmp_seq=18 ttl=251 time=1015.443 ms
64 bytes from 90.207.238.97: icmp_seq=19 ttl=251 time=1015.936 ms
64 bytes from 90.207.238.97: icmp_seq=20 ttl=251 time=1015.681 ms
64 bytes from 90.207.238.97: icmp_seq=22 ttl=251 time=1015.410 ms
64 bytes from 90.207.238.97: icmp_seq=23 ttl=251 time=1015.487 ms
64 bytes from 90.207.238.97: icmp_seq=24 ttl=251 time=1016.169 ms
64 bytes from 90.207.238.97: icmp_seq=25 ttl=251 time=1015.659 ms
64 bytes from 90.207.238.97: icmp_seq=26 ttl=251 time=14.606 ms
64 bytes from 90.207.238.97: icmp_seq=30 ttl=251 time=32.765 ms
64 bytes from 90.207.238.97: icmp_seq=31 ttl=251 time=115.052 ms
64 bytes from 90.207.238.97: icmp_seq=33 ttl=251 time=757.115 ms
64 bytes from 90.207.238.97: icmp_seq=34 ttl=251 time=176.696 ms
64 bytes from 90.207.238.97: icmp_seq=35 ttl=251 time=1017.462 ms
64 bytes from 90.207.238.97: icmp_seq=36 ttl=251 time=16.394 ms
64 bytes from 90.207.238.97: icmp_seq=37 ttl=251 time=20.402 ms
64 bytes from 90.207.238.97: icmp_seq=38 ttl=251 time=37.795 ms
64 bytes from 90.207.238.97: icmp_seq=39 ttl=251 time=141.997 ms
92 bytes from laptop.local.lan 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton



On 09/10/2018 22:39, Heiner Kallweit wrote:
> On 09.10.2018 16:40, Chris Clayton wrote:
>> Thanks to Maciej and Heiner for their replies.
>>
>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>> On 07.10.2018 21:36, Chris Clayton wrote:
 Hi again,

 I didn't think there was anything in 4.19-rc7 to fix this regression, but 
 tried it anyway. I can confirm that the
 regression is still present and my network still fails when, after a 
 resume from suspend (to ram or disk), I open my
 browser or my mail client. In both those cases the failure is almost 
 immediate - e.g. my home page doesn't get displayed
 in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
 quickly but the reported time increases from
 14-15ms to more than 1000ms.
>>>
>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>> state (before a suspend) and in the broken state (after a resume).
>>> Maybe there will be some obvious in the difference.
>>>
>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>
>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>> device. They are identical.
>>
>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>> and post suspend. Again, they are identical.
>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>> 0x0002870e both pre and post suspend.
>>
>> I've attached files I redirected the outputs to.
>>
>> Please don't hesitate to ask for any other information needed to solve this 
>> problem. In the meantime, I've now got
>> scripts that stop the network during suspend and restart it during resume. 
>> (Those scripts were removed whilst I gathered
>> the diagnostics shown in the attachments.)
>>
> I'd like to check whether it may be a timing issue. The following 
> experimental patch
> adds a PCI commit after writing register ChipCmd. Could you please check 
> whether
> it changes anything?
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 7d3f671e1..f3c359492 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct  rtl8169_private *tp)
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> + RTL_R8(tp, ChipCmd);
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 

Sorry, this patch doesn't make any difference - my network still fails. After a 
suspend/resume my browsers (chromium
and firefox) both fail to open my home page (https://www.google.co.uk). The 
ping time for one of my ISP's name servers
increases from 14-15ms to more than 1000ms, although it after a few pings it 
does reduce. As the screen grab below
shows, the network does eventually fail

$ ping NS1
PING ns1 (90.207.238.97): 56 data bytes
64 bytes from 90.207.238.97: icmp_seq=0 ttl=251 time=1017.289 ms
64 bytes from 90.207.238.97: icmp_seq=1 ttl=251 time=1018.051 ms
64 bytes from 90.207.238.97: icmp_seq=2 ttl=251 time=1015.271 ms
64 bytes from 90.207.238.97: icmp_seq=3 ttl=251 time=1015.495 ms
64 bytes from 90.207.238.97: icmp_seq=6 ttl=251 time=1015.646 ms
64 bytes from 90.207.238.97: icmp_seq=7 ttl=251 time=1022.609 ms
64 bytes from 90.207.238.97: icmp_seq=8 ttl=251 time=1015.612 ms
64 bytes from 90.207.238.97: icmp_seq=10 ttl=251 time=1015.551 ms
64 bytes from 90.207.238.97: icmp_seq=12 ttl=251 time=1015.446 ms
64 bytes from 90.207.238.97: icmp_seq=13 ttl=251 time=1015.657 ms
64 bytes from 90.207.238.97: icmp_seq=14 ttl=251 time=1015.614 ms
64 bytes from 90.207.238.97: icmp_seq=15 ttl=251 time=1015.651 ms
64 bytes from 90.207.238.97: icmp_seq=17 ttl=251 time=1015.459 ms
64 bytes from 90.207.238.97: icmp_seq=18 ttl=251 time=1015.443 ms
64 bytes from 90.207.238.97: icmp_seq=19 ttl=251 time=1015.936 ms
64 bytes from 90.207.238.97: icmp_seq=20 ttl=251 time=1015.681 ms
64 bytes from 90.207.238.97: icmp_seq=22 ttl=251 time=1015.410 ms
64 bytes from 90.207.238.97: icmp_seq=23 ttl=251 time=1015.487 ms
64 bytes from 90.207.238.97: icmp_seq=24 ttl=251 time=1016.169 ms
64 bytes from 90.207.238.97: icmp_seq=25 ttl=251 time=1015.659 ms
64 bytes from 90.207.238.97: icmp_seq=26 ttl=251 time=14.606 ms
64 bytes from 90.207.238.97: icmp_seq=30 ttl=251 time=32.765 ms
64 bytes from 90.207.238.97: icmp_seq=31 ttl=251 time=115.052 ms
64 bytes from 90.207.238.97: icmp_seq=33 ttl=251 time=757.115 ms
64 bytes from 90.207.238.97: icmp_seq=34 ttl=251 time=176.696 ms
64 bytes from 90.207.238.97: icmp_seq=35 ttl=251 time=1017.462 ms
64 bytes from 90.207.238.97: icmp_seq=36 ttl=251 time=16.394 ms
64 bytes from 90.207.238.97: icmp_seq=37 ttl=251 time=20.402 ms
64 bytes from 90.207.238.97: icmp_seq=38 ttl=251 time=37.795 ms
64 bytes from 90.207.238.97: icmp_seq=39 ttl=251 time=141.997 ms
92 bytes from laptop.local.lan 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Heiner Kallweit
On 09.10.2018 16:40, Chris Clayton wrote:
> Thanks to Maciej and Heiner for their replies.
> 
> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>> On 07.10.2018 21:36, Chris Clayton wrote:
>>> Hi again,
>>>
>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>> tried it anyway. I can confirm that the
>>> regression is still present and my network still fails when, after a resume 
>>> from suspend (to ram or disk), I open my
>>> browser or my mail client. In both those cases the failure is almost 
>>> immediate - e.g. my home page doesn't get displayed
>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>> quickly but the reported time increases from
>>> 14-15ms to more than 1000ms.
>>
>> You can try comparing chip registers (ethtool -d eth0) in the working
>> state (before a suspend) and in the broken state (after a resume).
>> Maybe there will be some obvious in the difference.
>>
>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>
> Maciej suggested comparing the output from lspci -vv for the ethernet device. 
> They are identical.
> 
> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
> and post suspend. Again, they are identical.
> Heiner specifically suggested looking at the RxConfig. The value of that is 
> 0x0002870e both pre and post suspend.
> 
> I've attached files I redirected the outputs to.
> 
> Please don't hesitate to ask for any other information needed to solve this 
> problem. In the meantime, I've now got
> scripts that stop the network during suspend and restart it during resume. 
> (Those scripts were removed whilst I gathered
> the diagnostics shown in the attachments.)
> 
I'd like to check whether it may be a timing issue. The following experimental 
patch
adds a PCI commit after writing register ChipCmd. Could you please check whether
it changes anything?

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 7d3f671e1..f3c359492 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct  rtl8169_private *tp)
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
+   RTL_R8(tp, ChipCmd);
rtl_init_rxcfg(tp);
rtl_set_tx_config_registers(tp);
 
-- 
2.19.1



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Heiner Kallweit
On 09.10.2018 16:40, Chris Clayton wrote:
> Thanks to Maciej and Heiner for their replies.
> 
> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>> On 07.10.2018 21:36, Chris Clayton wrote:
>>> Hi again,
>>>
>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>> tried it anyway. I can confirm that the
>>> regression is still present and my network still fails when, after a resume 
>>> from suspend (to ram or disk), I open my
>>> browser or my mail client. In both those cases the failure is almost 
>>> immediate - e.g. my home page doesn't get displayed
>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>> quickly but the reported time increases from
>>> 14-15ms to more than 1000ms.
>>
>> You can try comparing chip registers (ethtool -d eth0) in the working
>> state (before a suspend) and in the broken state (after a resume).
>> Maybe there will be some obvious in the difference.
>>
>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>
> Maciej suggested comparing the output from lspci -vv for the ethernet device. 
> They are identical.
> 
> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
> and post suspend. Again, they are identical.
> Heiner specifically suggested looking at the RxConfig. The value of that is 
> 0x0002870e both pre and post suspend.
> 
> I've attached files I redirected the outputs to.
> 
> Please don't hesitate to ask for any other information needed to solve this 
> problem. In the meantime, I've now got
> scripts that stop the network during suspend and restart it during resume. 
> (Those scripts were removed whilst I gathered
> the diagnostics shown in the attachments.)
> 
I'd like to check whether it may be a timing issue. The following experimental 
patch
adds a PCI commit after writing register ChipCmd. Could you please check whether
it changes anything?

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 7d3f671e1..f3c359492 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct  rtl8169_private *tp)
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
+   RTL_R8(tp, ChipCmd);
rtl_init_rxcfg(tp);
rtl_set_tx_config_registers(tp);
 
-- 
2.19.1



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Heiner Kallweit
On 09.10.2018 16:40, Chris Clayton wrote:
> Thanks to Maciej and Heiner for their replies.
> 
> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>> On 07.10.2018 21:36, Chris Clayton wrote:
>>> Hi again,
>>>
>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>> tried it anyway. I can confirm that the
>>> regression is still present and my network still fails when, after a resume 
>>> from suspend (to ram or disk), I open my
>>> browser or my mail client. In both those cases the failure is almost 
>>> immediate - e.g. my home page doesn't get displayed
>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>> quickly but the reported time increases from
>>> 14-15ms to more than 1000ms.
>>
>> You can try comparing chip registers (ethtool -d eth0) in the working
>> state (before a suspend) and in the broken state (after a resume).
>> Maybe there will be some obvious in the difference.
>>
>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>
> Maciej suggested comparing the output from lspci -vv for the ethernet device. 
> They are identical.
> 
> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
> and post suspend. Again, they are identical.
> Heiner specifically suggested looking at the RxConfig. The value of that is 
> 0x0002870e both pre and post suspend.
> 
Hmm, this is very weird, especially taking into account that in your original
report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start()
fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
register values seem to be the same before and after resume. So how can the
chip behave differently?
So far my best guess is that some chip quirk causes it to accept writes to
register RxConfig, but to misinterpret or ignore the written value.
So far your report is the only one (affecting RTL8411), but we don't know
whether other chip versions are affected too.
One option could be to call rtl_init_rxcfg() for chip versions <= 06 only
because for them we know that they need this call.


> I've attached files I redirected the outputs to.
> 
> Please don't hesitate to ask for any other information needed to solve this 
> problem. In the meantime, I've now got
> scripts that stop the network during suspend and restart it during resume. 
> (Those scripts were removed whilst I gathered
> the diagnostics shown in the attachments.)
> 
> Chris
> 
>>> Chris
>>
>> Maciej
>>



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Heiner Kallweit
On 09.10.2018 16:40, Chris Clayton wrote:
> Thanks to Maciej and Heiner for their replies.
> 
> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>> On 07.10.2018 21:36, Chris Clayton wrote:
>>> Hi again,
>>>
>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>> tried it anyway. I can confirm that the
>>> regression is still present and my network still fails when, after a resume 
>>> from suspend (to ram or disk), I open my
>>> browser or my mail client. In both those cases the failure is almost 
>>> immediate - e.g. my home page doesn't get displayed
>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>> quickly but the reported time increases from
>>> 14-15ms to more than 1000ms.
>>
>> You can try comparing chip registers (ethtool -d eth0) in the working
>> state (before a suspend) and in the broken state (after a resume).
>> Maybe there will be some obvious in the difference.
>>
>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>
> Maciej suggested comparing the output from lspci -vv for the ethernet device. 
> They are identical.
> 
> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
> and post suspend. Again, they are identical.
> Heiner specifically suggested looking at the RxConfig. The value of that is 
> 0x0002870e both pre and post suspend.
> 
Hmm, this is very weird, especially taking into account that in your original
report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start()
fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
register values seem to be the same before and after resume. So how can the
chip behave differently?
So far my best guess is that some chip quirk causes it to accept writes to
register RxConfig, but to misinterpret or ignore the written value.
So far your report is the only one (affecting RTL8411), but we don't know
whether other chip versions are affected too.
One option could be to call rtl_init_rxcfg() for chip versions <= 06 only
because for them we know that they need this call.


> I've attached files I redirected the outputs to.
> 
> Please don't hesitate to ask for any other information needed to solve this 
> problem. In the meantime, I've now got
> scripts that stop the network during suspend and restart it during resume. 
> (Those scripts were removed whilst I gathered
> the diagnostics shown in the attachments.)
> 
> Chris
> 
>>> Chris
>>
>> Maciej
>>



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton
Thanks to Maciej and Heiner for their replies.

On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>> tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a resume 
>> from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>> quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
> 
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
> 
> The same goes for the PCI configuration (lspci -d :8168 -vv).
> 
Maciej suggested comparing the output from lspci -vv for the ethernet device. 
They are identical.

Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and 
post suspend. Again, they are identical.
Heiner specifically suggested looking at the RxConfig. The value of that is 
0x0002870e both pre and post suspend.

I've attached files I redirected the outputs to.

Please don't hesitate to ask for any other information needed to solve this 
problem. In the meantime, I've now got
scripts that stop the network during suspend and restart it during resume. 
(Those scripts were removed whilst I gathered
the diagnostics shown in the attachments.)

Chris

>> Chris
> 
> Maciej
> 
ethtool -d eth0
===
RealTek RTL8411 registers:

0x00: MAC Address  80:fa:5b:08:d0:3d
0x08: Multicast Address Filter 0x 0x0080
0x10: Dump Tally Counter Command   0x0c2ec000 0x0004
0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x0004
0x28: Tx High Priority Ring Addr   0x 0x
0x30: Flash memory read/write 0x
0x34: Early Rx Byte Count  0
0x36: Early Rx Status   0x00
0x37: Command   0x0c
  Rx on, Tx on
0x3C: Interrupt Mask  0x803f
  SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 
0x3E: Interrupt Status0x
  
0x40: Tx Configuration0x4b800f80
0x44: Rx Configuration0x0002870e
0x48: Timer count 0x
0x4C: Missed packet counter 0x00
0x50: EEPROM Command0x10
0x51: Config 0  0x00
0x52: Config 1  0xcf
0x53: Config 2  0x3c
0x54: Config 3  0x60
0x55: Config 4  0x10
0x56: Config 5  0x02
0x58: Timer interrupt 0x
0x5C: Multiple Interrupt Select   0x
0x60: PHY access  0x80040de1
0x64: TBI control and status  0x2701
0x68: TBI Autonegotiation advertisement (ANAR)0xf70c
0x6A: TBI Link partner ability (LPAR) 0x0002
0x6C: PHY status0xeb
0x84: PM wakeup frame 00x 0x
0x8C: PM wakeup frame 10x 0x
0x94: PM wakeup frame 2 (low)  0x 0x
0x9C: PM wakeup frame 2 (high) 0x 0x
0xA4: PM wakeup frame 3 (low)  0x 0x
0xAC: PM wakeup frame 3 (high) 0x 0x
0xB4: PM wakeup frame 4 (low)  0x 0x
0xBC: PM wakeup frame 4 (high) 0x 0x
0xC4: Wakeup frame 0 CRC  0x
0xC6: Wakeup frame 1 CRC  0x
0xC8: Wakeup frame 2 CRC  0x
0xCA: Wakeup frame 3 CRC  0x
0xCC: Wakeup frame 4 CRC  0x
0xDA: RX packet maximum size  0x4000
0xE0: C+ Command  0x20e1
  VLAN de-tagging
  RX checksumming
0xE2: Interrupt Mitigation0x5151
  TxTimer:   5
  TxPackets: 1
  RxTimer:   5
  RxPackets: 1
0xE4: Rx Ring Addr 0x07935000 0x0004
0xEC: Early Tx threshold0x27
0xF0: Func Event  0x0040003f
0xF4: Func Event Mask 0x
0xF8: Func Preset State   0x00031eff
0xFC: Func Force Event0x

lspci -d :8168 -vv
==
pcilib: sysfs_read_vpd: read failed: Input/output error
05:00.2 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton
Thanks to Maciej and Heiner for their replies.

On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>> tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a resume 
>> from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>> quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
> 
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
> 
> The same goes for the PCI configuration (lspci -d :8168 -vv).
> 
Maciej suggested comparing the output from lspci -vv for the ethernet device. 
They are identical.

Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and 
post suspend. Again, they are identical.
Heiner specifically suggested looking at the RxConfig. The value of that is 
0x0002870e both pre and post suspend.

I've attached files I redirected the outputs to.

Please don't hesitate to ask for any other information needed to solve this 
problem. In the meantime, I've now got
scripts that stop the network during suspend and restart it during resume. 
(Those scripts were removed whilst I gathered
the diagnostics shown in the attachments.)

Chris

>> Chris
> 
> Maciej
> 
ethtool -d eth0
===
RealTek RTL8411 registers:

0x00: MAC Address  80:fa:5b:08:d0:3d
0x08: Multicast Address Filter 0x 0x0080
0x10: Dump Tally Counter Command   0x0c2ec000 0x0004
0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x0004
0x28: Tx High Priority Ring Addr   0x 0x
0x30: Flash memory read/write 0x
0x34: Early Rx Byte Count  0
0x36: Early Rx Status   0x00
0x37: Command   0x0c
  Rx on, Tx on
0x3C: Interrupt Mask  0x803f
  SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 
0x3E: Interrupt Status0x
  
0x40: Tx Configuration0x4b800f80
0x44: Rx Configuration0x0002870e
0x48: Timer count 0x
0x4C: Missed packet counter 0x00
0x50: EEPROM Command0x10
0x51: Config 0  0x00
0x52: Config 1  0xcf
0x53: Config 2  0x3c
0x54: Config 3  0x60
0x55: Config 4  0x10
0x56: Config 5  0x02
0x58: Timer interrupt 0x
0x5C: Multiple Interrupt Select   0x
0x60: PHY access  0x80040de1
0x64: TBI control and status  0x2701
0x68: TBI Autonegotiation advertisement (ANAR)0xf70c
0x6A: TBI Link partner ability (LPAR) 0x0002
0x6C: PHY status0xeb
0x84: PM wakeup frame 00x 0x
0x8C: PM wakeup frame 10x 0x
0x94: PM wakeup frame 2 (low)  0x 0x
0x9C: PM wakeup frame 2 (high) 0x 0x
0xA4: PM wakeup frame 3 (low)  0x 0x
0xAC: PM wakeup frame 3 (high) 0x 0x
0xB4: PM wakeup frame 4 (low)  0x 0x
0xBC: PM wakeup frame 4 (high) 0x 0x
0xC4: Wakeup frame 0 CRC  0x
0xC6: Wakeup frame 1 CRC  0x
0xC8: Wakeup frame 2 CRC  0x
0xCA: Wakeup frame 3 CRC  0x
0xCC: Wakeup frame 4 CRC  0x
0xDA: RX packet maximum size  0x4000
0xE0: C+ Command  0x20e1
  VLAN de-tagging
  RX checksumming
0xE2: Interrupt Mitigation0x5151
  TxTimer:   5
  TxPackets: 1
  RxTimer:   5
  RxPackets: 1
0xE4: Rx Ring Addr 0x07935000 0x0004
0xEC: Early Tx threshold0x27
0xF0: Func Event  0x0040003f
0xF4: Func Event Mask 0x
0xF8: Func Preset State   0x00031eff
0xFC: Func Force Event0x

lspci -d :8168 -vv
==
pcilib: sysfs_read_vpd: read failed: Input/output error
05:00.2 

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Maciej S. Szmigiero
On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
> 
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a resume 
> from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

You can try comparing chip registers (ethtool -d eth0) in the working
state (before a suspend) and in the broken state (after a resume).
Maybe there will be some obvious in the difference.

The same goes for the PCI configuration (lspci -d :8168 -vv).

> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Maciej S. Szmigiero
On 07.10.2018 21:36, Chris Clayton wrote:
> Hi again,
> 
> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
> tried it anyway. I can confirm that the
> regression is still present and my network still fails when, after a resume 
> from suspend (to ram or disk), I open my
> browser or my mail client. In both those cases the failure is almost 
> immediate - e.g. my home page doesn't get displayed
> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
> quickly but the reported time increases from
> 14-15ms to more than 1000ms.

You can try comparing chip registers (ethtool -d eth0) in the working
state (before a suspend) and in the broken state (after a resume).
Maybe there will be some obvious in the difference.

The same goes for the PCI configuration (lspci -d :8168 -vv).

> Chris

Maciej


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-07 Thread Chris Clayton
Hi again,

I didn't think there was anything in 4.19-rc7 to fix this regression, but tried 
it anyway. I can confirm that the
regression is still present and my network still fails when, after a resume 
from suspend (to ram or disk), I open my
browser or my mail client. In both those cases the failure is almost immediate 
- e.g. my home page doesn't get displayed
in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
quickly but the reported time increases from
14-15ms to more than 1000ms.

Chris

On 04/10/2018 09:41, Chris Clayton wrote:
> Hi Heiner,
> 
> Here's the reply to your questions. Sorry for the delay.
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
 Hi,

> Hi,
>
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>
> The pattern of the problem is that when I first boot, the network is 
> fine. But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I 
> can revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

 Please have a look at the following thread:
 https://lkml.org/lkml/2018/9/25/1118

>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?
> 
> $ dmesg | grep r8169
> [5.274938] libphy: r8169: probed
> [5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
> [RTL8211E Gigabit Ethernet]
> (mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
> [9.460876] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
> control rx/tx
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
>>
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y". I've re-enabled it now.
> 
> Chris
> 
>> Heiner
>>
 Maciej

>>> Chris
>>>
>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-07 Thread Chris Clayton
Hi again,

I didn't think there was anything in 4.19-rc7 to fix this regression, but tried 
it anyway. I can confirm that the
regression is still present and my network still fails when, after a resume 
from suspend (to ram or disk), I open my
browser or my mail client. In both those cases the failure is almost immediate 
- e.g. my home page doesn't get displayed
in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
quickly but the reported time increases from
14-15ms to more than 1000ms.

Chris

On 04/10/2018 09:41, Chris Clayton wrote:
> Hi Heiner,
> 
> Here's the reply to your questions. Sorry for the delay.
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
 Hi,

> Hi,
>
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>
> The pattern of the problem is that when I first boot, the network is 
> fine. But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I 
> can revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

 Please have a look at the following thread:
 https://lkml.org/lkml/2018/9/25/1118

>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?
> 
> $ dmesg | grep r8169
> [5.274938] libphy: r8169: probed
> [5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
> [RTL8211E Gigabit Ethernet]
> (mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
> [9.460876] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
> control rx/tx
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
>>
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y". I've re-enabled it now.
> 
> Chris
> 
>> Heiner
>>
 Maciej

>>> Chris
>>>
>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-04 Thread Chris Clayton
Hi Heiner,

Here's the reply to your questions. Sorry for the delay.

On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
 Hi,

 I upgraded my kernel to 4.18.10 recently and have since been experiencing 
 network problems after resuming from a
 suspend to RAM or disk. I previously had 4.18.6 and that was OK.

 The pattern of the problem is that when I first boot, the network is fine. 
 But, after resume from suspend I find that
 the time taken for a ping of one of my ISP's nameservers increases from 
 14-15ms to more than 1000ms. Moreover, when I
 open a browser (chromium or firefox), it fails to retrieve my home page 
 (https://www.google.co.uk) and pings of the
 nameserver fail with the message "Destination Host Unreachable". Often, I 
 can revive the network by stopping it with
 /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
 module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep r8169
[5.274938] libphy: r8169: probed
[5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
[RTL8211E Gigabit Ethernet]
(mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
[9.460876] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
control rx/tx

> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?
> 

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y". I've re-enabled it now.

Chris

> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-04 Thread Chris Clayton
Hi Heiner,

Here's the reply to your questions. Sorry for the delay.

On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
 Hi,

 I upgraded my kernel to 4.18.10 recently and have since been experiencing 
 network problems after resuming from a
 suspend to RAM or disk. I previously had 4.18.6 and that was OK.

 The pattern of the problem is that when I first boot, the network is fine. 
 But, after resume from suspend I find that
 the time taken for a ping of one of my ISP's nameservers increases from 
 14-15ms to more than 1000ms. Moreover, when I
 open a browser (chromium or firefox), it fails to retrieve my home page 
 (https://www.google.co.uk) and pings of the
 nameserver fail with the message "Destination Host Unreachable". Often, I 
 can revive the network by stopping it with
 /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
 module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep r8169
[5.274938] libphy: r8169: probed
[5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
[RTL8211E Gigabit Ethernet]
(mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
[9.460876] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
control rx/tx

> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?
> 

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y". I've re-enabled it now.

Chris

> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton
Sorry, sent by accident. Note to self - don't attempt email until after second 
cup of coffee.

On 29/09/2018 08:25, Chris Clayton wrote:
> 
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
 Hi,

> Hi,
>
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>
> The pattern of the problem is that when I first boot, the network is 
> fine. But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I 
> can revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

 Please have a look at the following thread:
 https://lkml.org/lkml/2018/9/25/1118

>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?

I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable 
series kernel in which eth0 continues to
function reliably after a suspend/resume cycle. The second dmesg output below 
is taken from that kernel. The first one
was from an up-to-date 4.19 kernel
> 
> $ dmesg | grep -i r8169
> [5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 19
> [5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.235218] r8169 :05:00.2 eth0: link down
> [   11.717460] r8169 :05:00.2 eth0: link up
> 
> $ dmesg | grep -i r8169
> [5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.459217] r8169 :05:00.2 eth0: link down
> [   10.459880] r8169 :05:00.2 eth0: link down
> [   12.015158] r8169 :05:00.2 eth0: link up
> 
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y".

As I said above I have re-enabled MSI.
> 
>>
>> Heiner
>>
 Maciej

>>> Chris
>>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton
Sorry, sent by accident. Note to self - don't attempt email until after second 
cup of coffee.

On 29/09/2018 08:25, Chris Clayton wrote:
> 
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
 Hi,

> Hi,
>
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>
> The pattern of the problem is that when I first boot, the network is 
> fine. But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I 
> can revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

 Please have a look at the following thread:
 https://lkml.org/lkml/2018/9/25/1118

>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?

I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable 
series kernel in which eth0 continues to
function reliably after a suspend/resume cycle. The second dmesg output below 
is taken from that kernel. The first one
was from an up-to-date 4.19 kernel
> 
> $ dmesg | grep -i r8169
> [5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 19
> [5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.235218] r8169 :05:00.2 eth0: link down
> [   11.717460] r8169 :05:00.2 eth0: link up
> 
> $ dmesg | grep -i r8169
> [5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.459217] r8169 :05:00.2 eth0: link down
> [   10.459880] r8169 :05:00.2 eth0: link down
> [   12.015158] r8169 :05:00.2 eth0: link up
> 
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y".

As I said above I have re-enabled MSI.
> 
>>
>> Heiner
>>
 Maciej

>>> Chris
>>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton



On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
 Hi,

 I upgraded my kernel to 4.18.10 recently and have since been experiencing 
 network problems after resuming from a
 suspend to RAM or disk. I previously had 4.18.6 and that was OK.

 The pattern of the problem is that when I first boot, the network is fine. 
 But, after resume from suspend I find that
 the time taken for a ping of one of my ISP's nameservers increases from 
 14-15ms to more than 1000ms. Moreover, when I
 open a browser (chromium or firefox), it fails to retrieve my home page 
 (https://www.google.co.uk) and pings of the
 nameserver fail with the message "Destination Host Unreachable". Often, I 
 can revive the network by stopping it with
 /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
 module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep -i r8169
[5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 19
[5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.235218] r8169 :05:00.2 eth0: link down
[   11.717460] r8169 :05:00.2 eth0: link up

$ dmesg | grep -i r8169
[5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.459217] r8169 :05:00.2 eth0: link down
[   10.459880] r8169 :05:00.2 eth0: link down
[   12.015158] r8169 :05:00.2 eth0: link up


> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y".

> 
> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton



On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
 Hi,

 I upgraded my kernel to 4.18.10 recently and have since been experiencing 
 network problems after resuming from a
 suspend to RAM or disk. I previously had 4.18.6 and that was OK.

 The pattern of the problem is that when I first boot, the network is fine. 
 But, after resume from suspend I find that
 the time taken for a ping of one of my ISP's nameservers increases from 
 14-15ms to more than 1000ms. Moreover, when I
 open a browser (chromium or firefox), it fails to retrieve my home page 
 (https://www.google.co.uk) and pings of the
 nameserver fail with the message "Destination Host Unreachable". Often, I 
 can revive the network by stopping it with
 /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
 module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep -i r8169
[5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 19
[5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.235218] r8169 :05:00.2 eth0: link down
[   11.717460] r8169 :05:00.2 eth0: link up

$ dmesg | grep -i r8169
[5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.459217] r8169 :05:00.2 eth0: link down
[   10.459880] r8169 :05:00.2 eth0: link down
[   12.015158] r8169 :05:00.2 eth0: link up


> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y".

> 
> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Heiner Kallweit
On 29.09.2018 00:00, Chris Clayton wrote:
> Thanks Maciej.
> 
> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>> Hi,
>>
>>> Hi,
>>>
>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>> network problems after resuming from a
>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>
>>> The pattern of the problem is that when I first boot, the network is fine. 
>>> But, after resume from suspend I find that
>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>> 14-15ms to more than 1000ms. Moreover, when I
>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>> (https://www.google.co.uk) and pings of the
>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>> can revive the network by stopping it with
>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>> module and load it again.
>>
>> Please have a look at the following thread:
>> https://lkml.org/lkml/2018/9/25/1118
>>
> 
> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
> is not solved by it. Similarly, I applied
> Heiner's patch to the 4.19, but again the problem is not solved.
> 
I think we talk about two different issues here. The one the fix is for has no 
link to suspend/resume.

Chris, the lspci output doesn't provide enough detail to determine the exact 
chip version.
Can you provide the dmesg part with the XID?
According to your lspci output neither MSI nor MSI-X is active.
Do you have to use nomsi for whatever reason?

Heiner

>> Maciej
>>
> Chris
> 



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Heiner Kallweit
On 29.09.2018 00:00, Chris Clayton wrote:
> Thanks Maciej.
> 
> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>> Hi,
>>
>>> Hi,
>>>
>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>> network problems after resuming from a
>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>
>>> The pattern of the problem is that when I first boot, the network is fine. 
>>> But, after resume from suspend I find that
>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>> 14-15ms to more than 1000ms. Moreover, when I
>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>> (https://www.google.co.uk) and pings of the
>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>> can revive the network by stopping it with
>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>> module and load it again.
>>
>> Please have a look at the following thread:
>> https://lkml.org/lkml/2018/9/25/1118
>>
> 
> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
> is not solved by it. Similarly, I applied
> Heiner's patch to the 4.19, but again the problem is not solved.
> 
I think we talk about two different issues here. The one the fix is for has no 
link to suspend/resume.

Chris, the lspci output doesn't provide enough detail to determine the exact 
chip version.
Can you provide the dmesg part with the XID?
According to your lspci output neither MSI nor MSI-X is active.
Do you have to use nomsi for whatever reason?

Heiner

>> Maciej
>>
> Chris
> 



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Chris Clayton
Thanks Maciej.

On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
> Hi,
> 
>> Hi,
>>
>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>> network problems after resuming from a
>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>
>> The pattern of the problem is that when I first boot, the network is fine. 
>> But, after resume from suspend I find that
>> the time taken for a ping of one of my ISP's nameservers increases from 
>> 14-15ms to more than 1000ms. Moreover, when I
>> open a browser (chromium or firefox), it fails to retrieve my home page 
>> (https://www.google.co.uk) and pings of the
>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>> can revive the network by stopping it with
>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>> module and load it again.
> 
> Please have a look at the following thread:
> https://lkml.org/lkml/2018/9/25/1118
> 

I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is 
not solved by it. Similarly, I applied
Heiner's patch to the 4.19, but again the problem is not solved.

> Maciej
> 
Chris


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Chris Clayton
Thanks Maciej.

On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
> Hi,
> 
>> Hi,
>>
>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>> network problems after resuming from a
>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>
>> The pattern of the problem is that when I first boot, the network is fine. 
>> But, after resume from suspend I find that
>> the time taken for a ping of one of my ISP's nameservers increases from 
>> 14-15ms to more than 1000ms. Moreover, when I
>> open a browser (chromium or firefox), it fails to retrieve my home page 
>> (https://www.google.co.uk) and pings of the
>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>> can revive the network by stopping it with
>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>> module and load it again.
> 
> Please have a look at the following thread:
> https://lkml.org/lkml/2018/9/25/1118
> 

I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is 
not solved by it. Similarly, I applied
Heiner's patch to the 4.19, but again the problem is not solved.

> Maciej
> 
Chris


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Maciej S. Szmigiero
Hi,

> Hi,
> 
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
> 
> The pattern of the problem is that when I first boot, the network is fine. 
> But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I can 
> revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

Please have a look at the following thread:
https://lkml.org/lkml/2018/9/25/1118

Maciej



Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Maciej S. Szmigiero
Hi,

> Hi,
> 
> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
> network problems after resuming from a
> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
> 
> The pattern of the problem is that when I first boot, the network is fine. 
> But, after resume from suspend I find that
> the time taken for a ping of one of my ISP's nameservers increases from 
> 14-15ms to more than 1000ms. Moreover, when I
> open a browser (chromium or firefox), it fails to retrieve my home page 
> (https://www.google.co.uk) and pings of the
> nameserver fail with the message "Destination Host Unreachable". Often, I can 
> revive the network by stopping it with
> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
> module and load it again.

Please have a look at the following thread:
https://lkml.org/lkml/2018/9/25/1118

Maciej