Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-23 Thread Ben Hutchings
On Mon, 2023-04-24 at 07:00 +1000, Rod Webster wrote:

[...]
> I have not kept my .config files as PC's have been reformatted so many
> times. However, my kernels and the steps used to build them are available
> in my google drive. They will show what we have changed. Here is the link
> to the 6.1.0-rt5 kernel
> https://drive.google.com/drive/folders/1jGc6AUYKMPvsSOdWRdvhWeDX1P96tsFQ?usp=sharing
> 
> I
> will redo this for the final 6.1 kernel and share the config to get in step
> with Debian Bookworm's current state.
[...]
> 

I'm not spotting any particular interesting differences in the config
there, unfortunately.  And from your earlier messages, it sounded like
you didn't have so much of a problem with the Debian packages for Linux
6.1 anyway.

If you still have custom packages of Linux 5.10 where the r8169
driver's network latency is OK, I would like to see those.

Ben.

-- 
Ben Hutchings
It's easier to fight for one's principles than to live up to them.


signature.asc
Description: This is a digitally signed message part


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-23 Thread Rod Webster
Thanks. I think I have tried most (if not all) released kernels (from 5.10
to present day 6.1) I adopted Bullseye a few weeks before it became the
stable branch. Bookworm supports the Realtek R8125 NIC which I noted this
week is also using the R8169 driver with the latest Bullseye kernel. I
believe it is also affected by this issue but have not benchmarked it.

Linuxcnc was accepted into SID back in January 2022 and I started using the
non-free sid versions from that time. I then started using Bookworm/Testing
once linuxcnc was accepted into it.

I have personally tested on 4-5 USFF PC's ranging from intel J1900, J4115
and i3 CPU's. All used Realtek network hardware and  all were affected. All
were initially using the R8169 driver. Many Other Linuxcnc users have
reported the issue. All of these had hardware covered by Realteks official
R8168 driver. All of these benefited from installing the Debian R8168-dkms
driver.

Compiling the RT kernel is not new to linuxcnc users as it was required up
until Debian first released linux-image-rt packages. All we have ever
needed to do was to patch the code and make a single change in
menuconfig/xconfig to select the fully preemptible kernel and compile. I
learnt how to build kernel debs when Bookworm was on the 6.0 kernel and
built a 6.1-rt5 version which I shared publicly with other users via Google
Drive. This resolved issues for a lot of users. Another user recently
reported substantial improvement in latency with the 6.3 kernel so two of
us built and tested it with outstanding and near identical results for both
overall latency and network latency.

I have not kept my .config files as PC's have been reformatted so many
times. However, my kernels and the steps used to build them are available
in my google drive. They will show what we have changed. Here is the link
to the 6.1.0-rt5 kernel
https://drive.google.com/drive/folders/1jGc6AUYKMPvsSOdWRdvhWeDX1P96tsFQ?usp=sharing

I
will redo this for the final 6.1 kernel and share the config to get in step
with Debian Bookworm's current state.

Note we use Linuxcnc's latency testing tools to measure latency but
cyclictest produces similar observable results.
Unfortunately, we don't have any portable method to test network latency.
Our hardware reports the maximum time to read and write to it in CPU timer
ticks.
This command may give some insight but the other device would need to be on
a dedicated point to point ethernet connection (no hub or router)
sudo chrt 99 ping -i .001 -q 10.10.10.10

I hope this covers your questions.

Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611




On Sun, 23 Apr 2023 at 23:43, Ben Hutchings  wrote:

> Control: retitle -1 linux-image-rt-amd64: High network latency with r8169
> driver
> Control: tag -1 moreinfo
>
> On Sun, 2023-04-23 at 09:14 +1000, Rod Webster wrote:
> > Thanks.
> > That is really a disappointing response because:
> > 1. Hardware selected based on  Debian  4.x kernels in Buster that
> operated
> > safely was broken by the 5.10 and above kernels in Bullseye and Bookworm
> > 2. You ask us to report a bug if the R8168-dkms package has to be used so
> > we did, now no interest is shown in actioning the report
> > 3. It does not address the excessive latency in the Debian RT kernel that
> > is not present in the upstream version at kernel.org
> > 4. It has taken a lot of work from a lot of Linuxcnc users to identify
> the
> > issues before this report could be made.
> >
> > The official ISO release of Linuxcnc is still based on Buster so not many
> > users ventured into the later kernels hence the delay in reporting.
> > Linuxcnc is packaged in Bookworm so the issue will be more prevalent
> moving
> > forward.
> >
> > I was told by a Debian developer involved in linuxcnc that if there were
> > issues affecting us, they would be fixed. I hope something comes of this.
> [...]
>
> I'm not dismissing this bug report, but I wanted to first make it clear
> that we cannot take any responsibility for safety-critical
> applications.
>
> As to the general issue of network latency:
> - What was the latest Debian packaged kernel version you used?
> - You've said that installing r8168-dkms resolves the issue. Am I
> correct in assuming that when you ran the Debian packaged kernel, the
> r8169 driver was used?
> - Have you tested on any other machines with different network
> hardware?
> - We don't make a lot of changes to the kernel source, but our build
> configuration will be different. Can you confirm exactly which upstream
> release you've tested, and provide the configuration (.config) file you
> used?
>
> Ben.
>
> --
> Ben Hutchings
> It's easier to fight for one's principles than to live up to them.
>


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-23 Thread Diederik de Haas
On Sunday, 23 April 2023 15:43:01 CEST Ben Hutchings wrote:
> Can you confirm exactly which upstream release you've tested

The initial bug report (which didn't end up on debian-kernel ML) had:

On Tue, 18 Apr 2023 12:12:58 +1000 Rod Webster  wrote:
> We note that RT latency/jitter has significantly improved in the 6.x kernels
> and is better again with the 6.3 kernel compiled from kernel.org sources
> where latency/jitter is on a par with the 4.x kernels found in Buster.

So I'm guess upstream master (so 6.3-rc7 f.e.).

me@pc:~/dev/kernel.org/linux$ git log --oneline v6.1..HEAD -- 
drivers/net/ethernet/realtek/r8169*
33189f0a94b9 r8169: fix RTL8168H and RTL8107E rx crc error
ce870af39558 r8169: reset bus if NIC isn't accessible after tx timeout
a99da46ac01a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
80c0576ef179 r8169: disable ASPM in case of tx timeout
2ea26b4de6f4 Revert "r8169: disable detection of chip version 36"
bb41c13c05c2 r8169: fix dmar pte write access is not set error
ad425666a1f0 r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
42f66a44d837 r8169: enable GRO software interrupt coalescing per default
4b6c6065fca1 r8169: use tp_to_dev instead of open code
eca485d22165 drivers: net: convert to boolean for the mac_managed_pm flag

It looks like some are now part of 6.1.25 too, but not all.

It also looks like realtek is now actually contributing to the upstream kernel
instead of periodically dumping their own code on the internet :-)

signature.asc
Description: This is a digitally signed message part.


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-23 Thread Ben Hutchings
Control: retitle -1 linux-image-rt-amd64: High network latency with r8169 driver
Control: tag -1 moreinfo

On Sun, 2023-04-23 at 09:14 +1000, Rod Webster wrote:
> Thanks.
> That is really a disappointing response because:
> 1. Hardware selected based on  Debian  4.x kernels in Buster that operated
> safely was broken by the 5.10 and above kernels in Bullseye and Bookworm
> 2. You ask us to report a bug if the R8168-dkms package has to be used so
> we did, now no interest is shown in actioning the report
> 3. It does not address the excessive latency in the Debian RT kernel that
> is not present in the upstream version at kernel.org
> 4. It has taken a lot of work from a lot of Linuxcnc users to identify the
> issues before this report could be made.
> 
> The official ISO release of Linuxcnc is still based on Buster so not many
> users ventured into the later kernels hence the delay in reporting.
> Linuxcnc is packaged in Bookworm so the issue will be more prevalent moving
> forward.
> 
> I was told by a Debian developer involved in linuxcnc that if there were
> issues affecting us, they would be fixed. I hope something comes of this.
[...]

I'm not dismissing this bug report, but I wanted to first make it clear
that we cannot take any responsibility for safety-critical
applications.

As to the general issue of network latency:
- What was the latest Debian packaged kernel version you used?
- You've said that installing r8168-dkms resolves the issue. Am I
correct in assuming that when you ran the Debian packaged kernel, the
r8169 driver was used?
- Have you tested on any other machines with different network
hardware?
- We don't make a lot of changes to the kernel source, but our build
configuration will be different. Can you confirm exactly which upstream
release you've tested, and provide the configuration (.config) file you
used?

Ben.

-- 
Ben Hutchings
It's easier to fight for one's principles than to live up to them.


signature.asc
Description: This is a digitally signed message part


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-22 Thread Rod Webster
Thanks.
That is really a disappointing response because:
1. Hardware selected based on  Debian  4.x kernels in Buster that operated
safely was broken by the 5.10 and above kernels in Bullseye and Bookworm
2. You ask us to report a bug if the R8168-dkms package has to be used so
we did, now no interest is shown in actioning the report
3. It does not address the excessive latency in the Debian RT kernel that
is not present in the upstream version at kernel.org
4. It has taken a lot of work from a lot of Linuxcnc users to identify the
issues before this report could be made.

The official ISO release of Linuxcnc is still based on Buster so not many
users ventured into the later kernels hence the delay in reporting.
Linuxcnc is packaged in Bookworm so the issue will be more prevalent moving
forward.

I was told by a Debian developer involved in linuxcnc that if there were
issues affecting us, they would be fixed. I hope something comes of this.


Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611




On Sun, 23 Apr 2023 at 01:09, Ben Hutchings  wrote:

> On Tue, 18 Apr 2023 12:12:58 +1000 Rod Webster  wrote:
> [...]
> > Linuxcnc uses a 1 ms realtime thread and we regularly see "Error
> Finishing
> > Read" reported.  This error disables the connection becasue our 1 ms
> thread has
> > been overrun. This issue mainly affects Realtek NIC hardware and s of
> real
> > concern where the motion hardware could be commanding components weiging
> > several thousand pounds.
> [...]
>
> The real-time kernel packages are provided as a convenience for users
> that have non-safety-critical real-time requirements, such as audio
> production.
>
> For safety-critical applications, you must take responsibility (or find
> a supplier who can) for selecting and validating software that meets
> the real-time and other reliability requirements.
>
> As a reminder, "Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to
> the extent permitted by applicable law."
>
> Ben.
>
> --
> Ben Hutchings
> Theory and practice are closer in theory than in practice - John Levine
>


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-22 Thread Ben Hutchings
On Tue, 18 Apr 2023 12:12:58 +1000 Rod Webster  wrote:
[...]
> Linuxcnc uses a 1 ms realtime thread and we regularly see "Error Finishing
> Read" reported.  This error disables the connection becasue our 1 ms thread 
> has
> been overrun. This issue mainly affects Realtek NIC hardware and s of real
> concern where the motion hardware could be commanding components weiging
> several thousand pounds.
[...]

The real-time kernel packages are provided as a convenience for users
that have non-safety-critical real-time requirements, such as audio
production.

For safety-critical applications, you must take responsibility (or find
a supplier who can) for selecting and validating software that meets
the real-time and other reliability requirements.

As a reminder, "Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to
the extent permitted by applicable law."

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice - John Levine


signature.asc
Description: This is a digitally signed message part


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-18 Thread Rod Webster
Thanks for a prompt response. We used the alpha 2 ISO release. I had
checked sources.list. The two of us working on this were unaware that there
was a difference between non-free-firmware and non-free. non-free-firmware
is not mentioned as a component on the debian wiki
https://wiki.debian.org/SourcesList   But
the examples for bookworm show the use of non-free-firmware so it's no
wonder we were confused. Could the documentation team amend the wiki to
clarify the changes and the differences between non-free and
non-free-firmware components? I'm sure this will be a source of confusion
moving forward.

In any case, this is a side issue and we await a response from the kernel
team in relation to the latency issue with the R8169 module under
PREEMPT_RT.





On Tue, 18 Apr 2023 at 18:13, Andreas Beckmann  wrote:

> Control: tag -1 - newcomer
> Control: retitle -1 excessive network latency with PREEMPT_RT kernel
> without the r8168-dkms driver
> Control: reassign -1 src:linux
>
> On 18/04/2023 04.12, Rod Webster wrote:
> > Package: r8168-dkms
>
> > We are linuxcnc users which is packaged in Bookworm. Linuxcnc requires
> the
> > PREEMPT_RT real time kernel as a prerequisite. We have found excessive
> latency
> > in the real time environment since Debian moved to the 5.x kernels.  We
> note
> ...
>
> I'm reassigning this bug to the linux kernel package, as this is not an
> issue in the r8168 driver.
>
> > We are no longer able to locate the R8168-dkms driver in the
> repositories,
> > despite it being listed as available in package search. We have
> downloaded a
> > .deb file from the Sid packages to install the correct driver.
>
> >   3. The R8168-dkms driver to continue to be made available in the
> Bookworm
> > repositories.
>
> The r8168-dkms package is in non-free - do you have that enabled?
>
>
> Andreas
>


Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-18 Thread Andreas Beckmann

Control: tag -1 - newcomer
Control: retitle -1 excessive network latency with PREEMPT_RT kernel without 
the r8168-dkms driver
Control: reassign -1 src:linux

On 18/04/2023 04.12, Rod Webster wrote:

Package: r8168-dkms



We are linuxcnc users which is packaged in Bookworm. Linuxcnc requires the
PREEMPT_RT real time kernel as a prerequisite. We have found excessive latency
in the real time environment since Debian moved to the 5.x kernels.  We note

...

I'm reassigning this bug to the linux kernel package, as this is not an
issue in the r8168 driver.


We are no longer able to locate the R8168-dkms driver in the repositories,
despite it being listed as available in package search. We have downloaded a
.deb file from the Sid packages to install the correct driver.



  3. The R8168-dkms driver to continue to be made available in the Bookworm
repositories.


The r8168-dkms package is in non-free - do you have that enabled?


Andreas



Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

2023-04-17 Thread Rod Webster
Package: r8168-dkms
Version: r8168-dkms
Severity: important
Tags: upstream newcomer
X-Debbugs-Cc: r...@vmn.com.au

Dear Maintainer,


   * What led up to the situation?
We are linuxcnc users which is packaged in Bookworm. Linuxcnc requires the
PREEMPT_RT real time kernel as a prerequisite. We have found excessive latency
in the real time environment since Debian moved to the 5.x kernels.  We note
that RT latency/jitter has significantly improved in the 6.x kernels and is
better again with the 6.3 kernel compiled from kernel.org sources where
latency/jitter is on a par with the 4.x kernels found in Buster.

We also note that the latency/jitter is significantly improved where a kernel
based on pristine source from kernel.org is used. We are disappointed that
Debian's implementation of the PREEMPT_RT kernel results in significantly less
performance than the upstream sources. In our recent testing, we found the 6.3
kernel gave a 265% improvement in latency/jitter over the default Debian
Bookworm 6.1.x Real time kernel. Similar improvement has also been noted
between Debian's 6.1 kernel and one we compiled from upstream source.

Network latency/jitter when we communicate point to point from a Debian PC to
an ethernet connected motion card is also another significant issue for us that
was not present in the 4.x kernels. This has not been resolved in the 6.1 to
6.3 kernels.

Linuxcnc uses a 1 ms realtime thread and we regularly see "Error Finishing
Read" reported.  This error disables the connection becasue our 1 ms thread has
been overrun. This issue mainly affects Realtek NIC hardware and s of real
concern where the motion hardware could be commanding components weiging
several thousand pounds.

   * What exactly did you do (or not do) that was effective (or
 ineffective)?

We have found installing the R8168-dkms driver with the 6.3 kernel we compiled
has resulted a 400% improvement in network latency (from approx 800 usec)to
about 200 usec) when compared with the default R8169 kernel module driver.

We are no longer able to locate the R8168-dkms driver in the repositories,
despite it being listed as available in package search. We have downloaded a
.deb file from the Sid packages to install the correct driver.

The R8168-dkms description says to report use of the driver so the R8169 kernel
module can be updated.

   * What was the outcome of this action?

A combination of the 6.3 kernel and installing the R8168-dkms driver has
resolved our issues. However, this is not something a normal user would expect
or have the skills to do.

   * What outcome did you expect instead?

We expect that Debian Bookworm:
 1. Has acceptible jitter/latency in a PREEMPT_RT real time environment
 2. Correctly supports Realtek NIC devices covered by the R8168-dkms diver
using the default R8169 kernel module as discussed on the Realtek web site.
 3. The R8168-dkms driver to continue to be made available in the Bookworm
repositories.
 4. Does not negatively affect real time performance when benchamarked against
the kernel.org sources.
 5. This issue may require escalation upstream.


-- System Information:
Debian Release: bookworm/sid
  APT prefers testing-security
  APT policy: (500, 'testing-security'), (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-5-rt-amd64 (SMP w/4 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled