subject:"\[Bug 264257\] \[tcp\] Panic\: Fatal trap 12\: page fault while in kernel mode \(if_io_tqg_4\) \- m_copydata ... at \/usr\/src\/sys\/kern\/uipc

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2023-03-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #115 from Richard Scheffenegger  ---
The stacktrace here looks entirely different, with the local system's tcp stack
not being involved (as this is a router, forwarded traffic doesn't normally get
processed by the tcp stack - only by the ip layer as is visible in the stack
trace).

I suggest to open a separate bug for this, and also provide the core dumps if
possible (note that the core dumps may contain sensible information in the mbuf
chains).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2023-03-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Timothy Pearson  changed:

   What|Removed |Added

 CC||tpearson@raptorengineering.
   ||com

--- Comment #114 from Timothy Pearson  ---
We're still seeing a very similar panic on a FreeBSD 13 router system that
includes this patch in the kernel:

Tracing command kernel pid 0 tid 100012 td 0xfe001ed50720 (CPU 1)
kdb_enter() at kdb_enter+0x37/frame 0xfe001bd3f5b0
vpanic() at vpanic+0x1b0/frame 0xfe001bd3f600
panic() at panic+0x43/frame 0xfe001bd3f660
trap_fatal() at trap_fatal+0x385/frame 0xfe001bd3f6c0
trap_pfault() at trap_pfault+0x4f/frame 0xfe001bd3f720
calltrap() at calltrap+0x8/frame 0xfe001bd3f720
--- trap 0xc, rip = 0x80d3d250, rsp = 0xfe001bd3f7f0, rbp =
0xfe001bd3f850 ---
m_copym() at m_copym+0x30/frame 0xfe001bd3f850
ip_fragment() at ip_fragment+0x24f/frame 0xfe001bd3f8f0
ip_output() at ip_output+0x13d5/frame 0xfe001bd3fa30
ip_forward() at ip_forward+0x3cf/frame 0xfe001bd3fae0
ip_input() at ip_input+0x79e/frame 0xfe001bd3fb70
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfe001bd3fbc0
ether_demux() at ether_demux+0x138/frame 0xfe001bd3fbf0
ether_nh_input() at ether_nh_input+0x355/frame 0xfe001bd3fc50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfe001bd3fca0
ether_input() at ether_input+0x69/frame 0xfe001bd3fd00
iflib_rxeof() at iflib_rxeof+0xc27/frame 0xfe001bd3fe00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfe001bd3fe40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfe001bd3fec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame
0xfe001bd3fef0
fork_exit() at fork_exit+0x7e/frame 0xfe001bd3ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe001bd3ff30
--- trap 0, rip = 0x80c313af, rsp = 0, rbp = 0x3 ---
mi_startup() at mi_startup+0xdf/frame 0x3

Will try the workaround to see if the issue still appears.  Note it can be
several weeks between panics, so we won't know if the workaround is still a
solution until the end of April.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-10-12 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Michael Tuexen  changed:

   What|Removed |Added

 Status|In Progress |Closed
 Resolution|--- |FIXED

--- Comment #113 from Michael Tuexen  ---
Closing this as it seems to be fixed. If the problem still exists, please
reopen.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-10-12 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Richard Scheffenegger  changed:

   What|Removed |Added

 Status|Open|In Progress

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-30 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Michael Tuexen  changed:

   What|Removed |Added

  Flags|mfc-stable13?   |mfc-stable13+,
   ||mfc-stable12+

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-30 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #112 from Richard Scheffenegger  ---
(In reply to Christos Chatzaras from comment #111)
Yes, this is very likely - the "fix offset" patch addresses the problematic
sequence of events, leading to this off-by-one calculation. 

MAIN: https://reviews.freebsd.org/rG6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
STABLE-13:
https://reviews.freebsd.org/rG0612d3000b974f31de15c90c77bf43f121fc8656
STABLE-12:
https://reviews.freebsd.org/rG26370413d43bfd65500270ff331ae6bdf0f54133

The change is effectively identical, in all variants.

(Without the other two fixes, the exact timing of packets being sent may not
fully conform to TCP specs, but this doesn't materially impact stability or
consistency at all).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-30 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #111 from Christos Chatzaras  ---
Hello,

Is it possible this patch fixes the issue in comment #36 ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-30 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #110 from Dmitriy  ---
(In reply to Michael Tuexen from comment #87)
Hello,
after applying of the patch named "Fix offset computation" system is currently 
12 days uptime with the same workload that before patch causing the crash after
5-8 days uptime. So, seems that was a root cause and it was fixed!
Thank you!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #109 from commit-h...@freebsd.org ---
A commit in branch stable/13 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=0612d3000b974f31de15c90c77bf43f121fc8656

commit 0612d3000b974f31de15c90c77bf43f121fc8656
Author: Michael Tuexen 
AuthorDate: 2022-09-19 10:42:43 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:54:18 +

tcp: fix computation of offset

Only update the offset if actually retransmitting from the
scoreboard. If not done correctly, this may result in
trying to (re)-transmit data not being being in the socket
buffe and therefore resulting in a panic.

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36626

(cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #108 from commit-h...@freebsd.org ---
A commit in branch stable/13 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=f9edad0054652e020b8214f61c0e454fd48101a6

commit f9edad0054652e020b8214f61c0e454fd48101a6
Author: Michael Tuexen 
AuthorDate: 2022-09-22 10:12:11 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:55:41 +

tcp: send ACKs when requested

When doing Limited Transmit send an ACK when needed by the protocol
processing (like sending ACKs with a DSACK block).

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36631

(cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #107 from commit-h...@freebsd.org ---
A commit in branch stable/13 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=c1f9a81e7bfe354dfa4f191d5180426f76bc514b

commit c1f9a81e7bfe354dfa4f191d5180426f76bc514b
Author: Richard Scheffenegger 
AuthorDate: 2022-09-22 10:55:25 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:56:28 +

tcp: fix cwnd restricted SACK retransmission loop

While doing the initial SACK retransmission segment while heavily cwnd
constrained, tcp_ouput can erroneously send out the entire sendbuffer
again. This may happen after an retransmission timeout, which resets
snd_nxt to snd_una while the SACK scoreboard is still populated.

Reviewed By:tuexen, #transport
PR: 264257
PR: 263445
PR: 260393
MFC after:  3 days
Sponsored by:   NetApp, Inc.
Differential Revision:  https://reviews.freebsd.org/D36637

(cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #106 from commit-h...@freebsd.org ---
A commit in branch stable/12 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=26370413d43bfd65500270ff331ae6bdf0f54133

commit 26370413d43bfd65500270ff331ae6bdf0f54133
Author: Michael Tuexen 
AuthorDate: 2022-09-19 10:42:43 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:41:54 +

tcp: fix computation of offset

Only update the offset if actually retransmitting from the
scoreboard. If not done correctly, this may result in
trying to (re)-transmit data not being being in the socket
buffe and therefore resulting in a panic.

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36626

(cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #105 from commit-h...@freebsd.org ---
A commit in branch stable/12 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=9e69e009c86f259653610f3c337253b79381c7a7

commit 9e69e009c86f259653610f3c337253b79381c7a7
Author: Michael Tuexen 
AuthorDate: 2022-09-22 10:12:11 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:46:54 +

tcp: send ACKs when requested

When doing Limited Transmit send an ACK when needed by the protocol
processing (like sending ACKs with a DSACK block).

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36631

(cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-25 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #104 from commit-h...@freebsd.org ---
A commit in branch stable/12 references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=3651c4f42285644938e2f5bc924ab8c7ed857f83

commit 3651c4f42285644938e2f5bc924ab8c7ed857f83
Author: Richard Scheffenegger 
AuthorDate: 2022-09-22 10:55:25 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-25 08:52:56 +

tcp: fix cwnd restricted SACK retransmission loop

While doing the initial SACK retransmission segment while heavily cwnd
constrained, tcp_ouput can erroneously send out the entire sendbuffer
again. This may happen after an retransmission timeout, which resets
snd_nxt to snd_una while the SACK scoreboard is still populated.

Reviewed By:tuexen, #transport
PR: 264257
PR: 263445
PR: 260393
MFC after:  3 days
Sponsored by:   NetApp, Inc.
Differential Revision:  https://reviews.freebsd.org/D36637

(cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #103 from Richard Scheffenegger  ---
(In reply to Marek Zarychta from comment #102)

This setting got enabled by default in D28702
(https://reviews.freebsd.org/rGd1de2b05a001d3d80f633f576f4909c2686dda3d).

Prior to this, RFC6675 (SACK loss recovery) was implemented more fully with
D18985
(https://reviews.freebsd.org/rG3c40e1d52cd86168779cf99dbabe58df465d7e3f).

This combination - enhanced features, combined with a very long lingering issue
in handling rare corner cases of SACK loss recovery & retransmission timeout
(RTO) interaction lead to the much higher incidence rate, and actual panics
reported in these three associated bugs.

As a part of the sequence of events which lead to this issue includes
non-responsive clients or network paths with extreme packet loss rates, this
effect is unlikely to be observed in stable, low loss, reactive client
environments.

Nevertheless, there is still a discussion point, if there should be a (binary)
patch for this - or if providing a patch, and including this in STABLE-13 /
STABLE-12 is sufficient. Most affected deployments could include the patches,
so far.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #102 from Marek Zarychta  ---
(In reply to Richard Scheffenegger from comment #99)
I asked because have never experienced such a panic on the systems I am
maintaining, and since net.inet.tcp.rfc6675_pipe = 1 is set almost everywhere,
so I was guessing that maybe it was a remedy.

Thank you for an exhaustive explanation of these intricacies.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #101 from commit-h...@freebsd.org ---
A commit in branch main references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=a743fc8826fa348b09d219632594c537f8e5690e

commit a743fc8826fa348b09d219632594c537f8e5690e
Author: Richard Scheffenegger 
AuthorDate: 2022-09-22 10:55:25 +
Commit: Richard Scheffenegger 
CommitDate: 2022-09-22 11:28:43 +

tcp: fix cwnd restricted SACK retransmission loop

While doing the initial SACK retransmission segment while heavily cwnd
constrained, tcp_ouput can erroneously send out the entire sendbuffer
again. This may happen after an retransmission timeout, which resets
snd_nxt to snd_una while the SACK scoreboard is still populated.

Reviewed By:tuexen, #transport
PR: 264257
PR: 263445
PR: 260393
MFC after:  3 days
Sponsored by:   NetApp, Inc.
Differential Revision:  https://reviews.freebsd.org/D36637

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #100 from commit-h...@freebsd.org ---
A commit in branch main references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe

commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe
Author: Michael Tuexen 
AuthorDate: 2022-09-22 10:12:11 +
Commit: Michael Tuexen 
CommitDate: 2022-09-22 10:12:11 +

tcp: send ACKs when requested

When doing Limited Transmit send an ACK when needed by the protocol
processing (like sending ACKs with a DSACK block).

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36631

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #99 from Richard Scheffenegger  ---
(In reply to Marek Zarychta from comment #98)
> It looks like setting "net.inet.tcp.rfc6675_pipe=1" was a workaround, could 
> you confirm, please?

The workaround was "net.inet.tcp.rfc6675_pipe=0" for bug 254244, to disable
"SACK rescue retransmissions".

> This issue looks similar to the already fixed bug 254244.

Similar yes, but not identical.

In Bug 254244, it was an oversight on my part, to deal with the final  bit
in the sequence number space, if a rescue retransmission is to be done after
the stack had already sent this .

The implicit assumption is, that at the tail end of a TCP session, only a
single , at one specific sequence number exists - as it should be.

However, the complex of bug 260393, bug 263445 and this bug 264257 comes from
an issue which existed "forever", but never, or extremely rarely manifested
itself as a server-side panic. It does have some potential to lead to erraneous
bytes at the tail end of a tcp session, which, depending on the application
using the data, may lead to client side data inconsistencies.

That issue is, that under specific circumstances, a TCP session will sent
multiple  at advancing sequence numbers, and "overwriting" the previously
sent  with a (not fully) arbitrary data byte.

In once instance, logging showed up to 6 consecutive s, each at a new
position - but even just two such  at consecutive positions are
problematic. 

This day-zero issue (most likely affecting / having affected all BSD derived
TCP stacks with active SACK) is fixed in FreeBSD with 
https://reviews.freebsd.org/D36626

While investigating further, two more bugs showed up:
https://reviews.freebsd.org/D36631
https://reviews.freebsd.org/D36637

but these do not corrupt the tcpcb state, or lead to erraneous transmission of
different information at the same sequence number - only affect when and which
segments are sent. 

(A regression in MAIN was also found during looking closely at the test cases -
one of which would not fully validate the expected state, but that is not
related directly to this problem really).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Marek Zarychta  changed:

   What|Removed |Added

 CC||zarych...@plan-b.pwste.edu.
   ||pl

--- Comment #98 from Marek Zarychta  ---
It looks like setting "net.inet.tcp.rfc6675_pipe=1" was a workaround, could you
confirm, please?
This issue looks similar to the already fixed bug 254244.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-22 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #97 from Dobri Dobrev  ---
(In reply to Michael Tuexen from comment #96)

Last time it took 10-15 minutes, however, I've had cases where it took 24+
hours.
Atm still running w/o a panic.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-21 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #96 from Michael Tuexen  ---
(In reply to Dobri Dobrev from comment #95)
How long did it take normally until a panic happened?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-20 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #95 from Dobri Dobrev  ---
24 hours - no panic yet.
Looks promising.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-19 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #94 from commit-h...@freebsd.org ---
A commit in branch main references this bug:

URL:
https://cgit.FreeBSD.org/src/commit/?id=6d9e911fbadf3b409802a211c1dae9b47cb5a2b8

commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
Author: Michael Tuexen 
AuthorDate: 2022-09-19 10:42:43 +
Commit: Michael Tuexen 
CommitDate: 2022-09-19 10:49:31 +

tcp: fix computation of offset

Only update the offset if actually retransmitting from the
scoreboard. If not done correctly, this may result in
trying to (re)-transmit data not being being in the socket
buffe and therefore resulting in a panic.

PR: 264257
PR: 263445
PR: 260393
Reviewed by:rscheff@
MFC after:  3 days
Sponsored by:   Netflix, Inc.
Differential Revision:  https://reviews.freebsd.org/D36626

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #93 from Michael Tuexen  ---
I missed to move also an KASSERT. But it shouldn't be critical if you don't
move it...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Michael Tuexen  changed:

   What|Removed |Added

 Attachment #236664|0   |1
is obsolete||

--- Comment #92 from Michael Tuexen  ---
Created attachment 236667
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=236667=edit
Updated: Fix offset computation (releng/13.1)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Michael Tuexen  changed:

   What|Removed |Added

 Attachment #236665|0   |1
is obsolete||

--- Comment #91 from Michael Tuexen  ---
Created attachment 23
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=23=edit
Updated: Fix offset computation (stable/13)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #90 from Michael Tuexen  ---
(In reply to Dobri Dobrev from comment #88)
Please use patch for stable/13

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #89 from Michael Tuexen  ---
Created attachment 236665
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=236665=edit
Fix offset computation (stable/13)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #88 from Dobri Dobrev  ---
Comment on attachment 236664
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=236664
Fix offset computation

The patch doesn't match with tcp_output.c in 13-stable.

} else {
len = ((int32_t)ulmin(cwin,
SEQ_SUB(p->end, p->rxmit)));
}
off = SEQ_SUB(p->rxmit, tp->snd_una);
KASSERT(off >= 0,("%s: sack block to the left of una : %d",
__func__, off));
if (len > 0) {
sack_rxmit = 1;
sendalot = 1;
TCPSTAT_INC(tcps_sack_rexmits);
TCPSTAT_ADD(tcps_sack_rexmit_bytes,
min(len, tcp_maxseg(tp)));
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #87 from Michael Tuexen  ---
Hi Dmitriy,

there are several things which going unexpected, but one that I think is
definitely wrong is the computation of an offset. Can you try the patch "Fix
offset computation" and report if the kernel still crashes?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-09-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #86 from Michael Tuexen  ---
Created attachment 236664
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=236664=edit
Fix offset computation

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-20 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #85 from Richard Scheffenegger  ---
Hi Chad,

Thank you very much for your effort. Are you sure about this panic - in the
backtrace, it looks like an issue happend in 

#6 0x80d55ec7 at pfil_run_hooks+0x97
#7 0x8239af37 at bridge_pfil+0x497

(bridging code, not tcp).

In the few instances we have seen in more detail, it looked like TLS handshake
was being performed when - for one reason or another - there was no response
from the client for an extended period of time (probably 1-2 min), before
seeing ACKs (and SACKs) again.

The IP stats in your case seem pretty clean - we suspected that a missing
adjustment in a local error handling path may have something to do with this,
but after doing some error injection prior and after addressing this, the
behavior seems sufficiently different that I'm not convinced that is the
culprit either.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-20 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #84 from Chad Smith  ---
(In reply to Richard Scheffenegger from comment #80)

Grabbed the netstat -s output from a recent crash file where the system had
been up for an hour before it crashed again, also posting some additional info. 

We have a test bench set up and we are trying to reproduce this behavior on
LAN. It seems in perfect network conditions a massively parallel iperf run
across a bridged interface for 24 hours does not trigger this. Looking for
ideas on how to simulate internet-like network conditions. Short of hooking the
other end of our test bench up to an internet connection in another city I am
out of ideas. Open to suggestions.

# uname -a
FreeBSD 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212
GENERIC amd64

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8234a783
stack pointer   = 0x0:0xfe00c45a5a50
frame pointer   = 0x0:0xfe00c45a5a60
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_4)
trap number = 12
panic: page fault
cpuid = 4
time = 1657212869
KDB: stack backtrace:
#0 0x80c69465 at kdb_backtrace+0x65
#1 0x80c1bb1f at vpanic+0x17f
#2 0x80c1b993 at panic+0x43
#3 0x810afdf5 at trap_fatal+0x385
#4 0x810afe4f at trap_pfault+0x4f
#5 0x81087528 at calltrap+0x8
#6 0x80d55ec7 at pfil_run_hooks+0x97
#7 0x8239af37 at bridge_pfil+0x497
#8 0x8239d5a3 at bridge_forward+0x323
#9 0x8239cef1 at bridge_input+0x4c1
#10 0x80d380fd at ether_nh_input+0x21d
#11 0x80d53089 at netisr_dispatch_src+0xb9
#12 0x80d372d9 at ether_input+0x69
#13 0x80d4f4d7 at iflib_rxeof+0xc27
#14 0x80d49b22 at _task_fn_rx+0x72
#15 0x80c67e9d at gtaskqueue_run_locked+0x15d
#16 0x80c67b12 at gtaskqueue_thread_loop+0xc2
#17 0x80bd8a5e at fork_exit+0x7e


netstat -s

tcp:
144916 packets sent
10016 data packets (765551 bytes)
0 data packets (0 bytes) retransmitted
0 data packets unnecessarily retransmitted
0 resends initiated by MTU discovery
133061 ack-only packets (306 delayed)
0 URG only packets
0 window probe packets
7 window update packets
1832 control packets
275476 packets received
13046 acks (for 767206 bytes)
17 duplicate acks
0 UDP tunneled pkts
0 UDP tunneled pkt cnt with errors
0 acks for unsent data
269667 packets (337208234 bytes) received in-sequence
17 completely duplicate packets (1304 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
13 out-of-order packets (16952 bytes)
0 packets (0 bytes) of data after window
0 window probes
0 window update packets
0 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
0 discarded due to full reassembly queue
177 connection requests
1561 connection accepts
0 bad connection attempts
0 listen queue overflows
0 ignored RSTs in the windows
1661 connections established (including accepts)
6 times used RTT from hostcache
6 times used RTT variance from hostcache
0 times used slow-start threshold from hostcache
1680 connections closed (including 0 drops)
6 connections updated cached RTT on close
6 connections updated cached RTT variance on close
0 connections updated cached ssthresh on close
3 embryonic connections dropped
13046 segments updated rtt (of 9912 attempts)
74 retransmit timeouts
0 connections dropped by rexmit timeout
0 persist timeouts
0 connections dropped by persist timeout
0 Connections (fin_wait_2) dropped because of timeout
0 keepalive timeouts
0 keepalive probes sent
0 connections dropped by keepalive
254 correct ACK header predictions
260118 correct data packet header predictions
1561 syncache entries added
0 retransmitted
0 dupsyn

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #83 from Michael Tuexen  ---
(In reply to Dmitriy from comment #82)
Thank you very much for the core. Let me have a look, I'll report my findings.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-18 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #82 from Dmitriy  ---
(In reply to Michael Tuexen from comment #78)
Just sent the link with new core file in e-mail. Kernel was built with "options
TCP_BLACKBOX", without any debug\diagnostic patches.
Thank you!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #81 from Michael Tuexen  ---
(In reply to Richard Scheffenegger from comment #80)
Yes, errors from ip_output() are logged:
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_output.c?h=stable/13#n1670
This is not the same way as in RACK or BBR, but we can clean that up later.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #80 from Richard Scheffenegger  ---
(In reply to Chad Smith from comment #79)

Do the affected servers report IP-layer output errors? Or low mbuf situations?

(netstat -s after a bit of uptime may show increasing IP output error
counters).

Background: As the full packet trace (without timing) by itself is not
sufficient to reproduce the issue, one possibility is that very infrequent
error handling is involved in the introduction of the problematic state - and
this may only happen on quite busy machines where mbuf allocation, or packet
output issues are showing up.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Chad Smith  changed:

   What|Removed |Added

 CC||clearscr...@gmail.com

--- Comment #79 from Chad Smith  ---
Hello all,

On the few servers we're running 13.1 we see these trap 12 errors too. The most
interesting bit I can contribute is we only see it on servers utilizing
if_bridge. 

My guess is the servers using if_bridge are seeing a lot more traffic across
the wire and are therefore more likely to experience the packet sequence
required to cause the trap 12 errors. When we rank servers by trap 12 error
occurrence we see a correlation with bandwidth utilization (more bandwidth
utilization, greater trap 12 error frequency).

Happy to help in any way I can.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-07-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #78 from Michael Tuexen  ---
>From the pcap information we were able to reconstruct the packet sequence, but
not the exact timing. Unfortunately, we were not able to reproduce the issue.
However, it seems hard, to extend the pcap logging.

So I suggest to use the BlackBox Logging instead.

Could you:

* Add
  options TCP_BLACKBOX
  to you kernel configuration file and recompile the kernel.
  You can remove the TCPPCAP option, if you want.

* Add
  net.inet.tcp.bb.log_auto_all=1
  net.inet.tcp.bb.log_auto_ratio=1
  net.inet.tcp.bb.log_auto_mode=1
  net.inet.tcp.bb.log_session_limit=200

This should enable BlackBox Logging on all TCP connections and it will the last
200
events per TCP connection stored in the end point. Once the system has crashed,
please
provide a core. This time we should not only get the packet being
sent/received, but
also internal TCP state variables and information about the retransmission
timer.
If this information is not enough, we can adhoc extend the TCP Black Box
Logging.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-29 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #77 from Christos Chatzaras  ---
(In reply to Richard Scheffenegger from comment #70)

I found this old review which I believe is not committed: 

https://reviews.freebsd.org/D2970

Does this try to fix my panic?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-24 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #76 from Michael Tuexen  ---
(In reply to iron.udjin from comment #75)
Thanks for the data pointer. I guess, whatever modifications a tunnel or link
layer does unintentionally would look like packet corruption to TCP and should
be handled fine. I also guess such issue would have come up on a much broader
scale.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-24 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #75 from iron.ud...@gmail.com ---
(In reply to Christos Chatzaras from comment #73)

I don't think that it's somehow related to wireguard because I had similar
panics on the server which is doesn't have wireguard installed/used.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-24 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #74 from Richard Scheffenegger  ---
The offending TCP sessions we've seen so far all seem to be regular https
sessions. For (yet unknown) reasons, rarely the FIN bit seems to get accounted
more than once - up to 6 times, from one of the logging patched kernels.

With SACK rescue retransmissions, that can lead to having the sequence number
of one or more FIN bits included in the block which is to be retransmitted -
and the panic happens, when the data byte at the sequence number of the FIN bit
is tried to be retranmitted...

In your case, even though you didn't have SACK rescue retransmissions enabled,
the client of the offending session appears to have SACK'ed the 2nd
retransmission of the (and empty) FIN packet with a "too high" sequence number,
in effect resulting in the same issue. This should be happening much less
frequently than with the SACK rescue retransmissions.

At the same time, it appears, that double-counting of FIN bits happens quite
frequently, but is not easy to reproduce. Thus we are currently working on a
patch which exposes this (in INVARIANTS kernels with panics, in production
kernels by logging that unexpected state, and clearing it "on the fly").

See D35446

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-24 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #73 from Christos Chatzaras  ---
I see that iron.udjin loader.conf has wireguard enabled.

I use wireguard too.

Is this possible related to wireguard?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-21 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #72 from Christos Chatzaras  ---
(In reply to Richard Scheffenegger from comment #69)

1) net.inet.tcp.rfc6675_pipe is disabled.

2) I sent the vmcore + debug symbols to your e-mail.

3) I temporary disable sack.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-21 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #71 from Richard Scheffenegger  ---
Created attachment 234833
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234833=edit
symptomatic logging patch

This patch symptomatically addresses when snd_max grows beyond the FIN bit,
without panicing. Also, a few tidbits of data get logged to dmesg.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-21 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #70 from Richard Scheffenegger  ---
(In reply to Christos Chatzaras from comment #68)

>From the provided core:

snd_una: 2568161219
snd_nxt: 2568161624
snd_max: 2568161624
snd_fack: 2568161624 (!!)

so_snd: 403

snd_una + so_snd: 2568161622

 *tp->sackhint->nexthole
{start = 2568161219, end = 2568161623, rxmit = 2568161622, ...

This indicates, that the remote client SACKed the 2nd FIN bit (which apparently
was sent at sequence number 2568161624, not 2568161623 as expected).

The SACK for this 2nd FIN created a SACK scoreboard hole, which included the
sequence number of the 1st FIN bit - which does not exist in the socket
sendbuffer. Thus when trying to retransmit this non-existing byte, the panic
happens.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-21 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #69 from Richard Scheffenegger  ---
Are you using net.inet.tcp.rfc6675_pipe=1? While we are investigating the root
cause for snd_max to grow beyond what it is supposed to ever grow to, in the
meantime disabling sack rescue retransmissions should prevent these panics.

(Note: even though we have one core with the (effectively) full packet trace,
the misbehavior could not yet be reproduced properly.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-20 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #68 from Christos Chatzaras  ---
Today I had another crash in a different server.

Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80cae31d
stack pointer   = 0x28:0xfe01141445c0
frame pointer   = 0x28:0xfe0114144630
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_7)
trap number = 12
panic: page fault
cpuid = 7
time = 1655767279
KDB: stack backtrace:
#0 0x80c69465 at kdb_backtrace+0x65
#1 0x80c1bb1f at vpanic+0x17f
#2 0x80c1b993 at panic+0x43
#3 0x810afdf5 at trap_fatal+0x385
#4 0x810afe4f at trap_pfault+0x4f
#5 0x81087528 at calltrap+0x8
#6 0x80de07c9 at tcp_output+0x1339
#7 0x80dd7eed at tcp_do_segment+0x2cfd
#8 0x80dd44b1 at tcp_input_with_port+0xb61
#9 0x80dd515b at tcp_input+0xb
#10 0x80dc691f at ip_input+0x11f
#11 0x80d53089 at netisr_dispatch_src+0xb9
#12 0x80d36ea8 at ether_demux+0x138
#13 0x80d38235 at ether_nh_input+0x355
#14 0x80d53089 at netisr_dispatch_src+0xb9
#15 0x80d372d9 at ether_input+0x69
#16 0x80ddeaa5 at tcp_push_and_replace+0x25
#17 0x80ddd74c at tcp_lro_flush+0x4c
Uptime: 29d3h36m11s
Dumping 4275 out of 65278 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
at /usr/src/sys/kern/kern_shutdown.c:399
#2  0x80c1b71c in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:487
#3  0x80c1bb8e in vpanic (fmt=0x811b4fb9 "%s",
ap=) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0x80c1b993 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:844
#5  0x810afdf5 in trap_fatal (frame=0xfe0114144500, eva=24)
at /usr/src/sys/amd64/amd64/trap.c:944
#6  0x810afe4f in trap_pfault (frame=0xfe0114144500,
usermode=false, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:763
#7  
#8  m_copydata (m=0x0, m@entry=0xf80c219ce500, off=0, len=1,
cp=) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0x80de07c9 in tcp_output (tp=)
at /usr/src/sys/netinet/tcp_output.c:1081
#10 0x80dd7eed in tcp_do_segment (m=,
th=, so=, tp=0xfe01990a1000,
drop_hdrlen=64, tlen=, iptos=0 '\000')
at /usr/src/sys/netinet/tcp_input.c:2637
#11 0x80dd44b1 in tcp_input_with_port (mp=,
offp=, proto=, port=port@entry=0)
at /usr/src/sys/netinet/tcp_input.c:1400
#12 0x80dd515b in tcp_input (mp=0xf80c219ce500, offp=0x0, proto=1)
at /usr/src/sys/netinet/tcp_input.c:1496
#13 0x80dc691f in ip_input (m=0x0)
at /usr/src/sys/netinet/ip_input.c:839
#14 0x80d53089 in netisr_dispatch_src (proto=1,
source=source@entry=0, m=0xf80e00395400)
at /usr/src/sys/net/netisr.c:1143
#15 0x80d5345f in netisr_dispatch (proto=563930368, m=0x1)
at /usr/src/sys/net/netisr.c:1234
#16 0x80d36ea8 in ether_demux (ifp=ifp@entry=0xf80004659000,
m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0x80d38235 in ether_input_internal (ifp=0xf80004659000, m=0x0)
at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:737
#19 0x80d53089 in netisr_dispatch_src (proto=proto@entry=5,
source=source@entry=0, m=m@entry=0xf80e00395400)
at /usr/src/sys/net/netisr.c:1143
#20 0x80d5345f in netisr_dispatch (proto=563930368, proto@entry=5,
m=0x1, m@entry=0xf80e00395400) at /usr/src/sys/net/netisr.c:1234
#21 0x80d372d9 in ether_input (ifp=,
m=0xf80e00395400) at /usr/src/sys/net/if_ethersubr.c:828
#22 0x80ddeaa5 in tcp_push_and_replace (lc=0xf80c219ce500,
lc@entry=0xf80003ef2830, le=le@entry=0xfe0158387690,
m=m@entry=0xf80f2b178300) at /usr/src/sys/netinet/tcp_lro.c:923
#23 0x80ddd74c in tcp_lro_condense (lc=0xf80003ef2830,
le=0xfe0158387690) at /usr/src/sys/netinet/tcp_lro.c:1011
#24 tcp_lro_flush (lc=lc@entry=0xf80003ef2830, le=0xfe0158387690)
at /usr/src/sys/netinet/tcp_lro.c:1374
#25 0x803b in tcp_lro_rx_done (lc=0xf80003ef2830)
at /usr/src/sys/netinet/tcp_lro.c:566
#26 tcp_lro_flush_all (lc=lc@entry=0xf80003ef2830)
at /usr/src/sys/netinet/tcp_lro.c:1532
#27 0x80d4f503 in iflib_rxeof (rxq=,

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-16 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #67 from Michael Tuexen  ---
(In reply to Dmitriy from comment #66)
Thank you very much. The new core file contains the packets handled by the end
point.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-16 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #66 from Dmitriy  ---
(In reply to Michael Tuexen from comment #65)
Done. Please find the link with new corefile in e-mail.
Thank you!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #65 from Michael Tuexen  ---
(In reply to Dmitriy from comment #63)
OK, I tested it locally. To get TCPPCAP working you need
to apply the patch I just added with the name "Fix TCPPCAP"

Can you apply that patch, build and install an updated kernel and provide a
kernel dump after that patch has been applied?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #64 from Michael Tuexen  ---
Created attachment 234709
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234709=edit
Fix TCPPCAP

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #63 from Dmitriy  ---
(In reply to Michael Tuexen from comment #62)

Option was added to sysctl.conf, system rebooted, after reboot it definetly was
checked:
sysctl net.inet.tcp.tcp_pcap_packets
net.inet.tcp.tcp_pcap_packets: 10

Going to set net.inet.tcp.tcp_pcap_packets=30 for sure. Waiting for new core.

Offtop: meantime kernel was rebilt without INVARIANTS to continue running
without panicing, but logging. Hope results would be helpful too:

Jun 15 13:55:02 hostname kernel: tcp_output#1587: snd_max 142291 > so_snd+1
142290 adjusting.
Jun 15 13:55:47 hostname kernel: tcp_output#1587: snd_max 142292 > so_snd+1
142290 adjusting.
Jun 15 14:13:29 hostname kernel: tcp_output#1587: snd_max 75162767 > so_snd+1
75162766 adjusting.
Jun 15 14:14:15 hostname kernel: tcp_output#1587: snd_max 75162768 > so_snd+1
75162766 adjusting.
Jun 15 15:06:32 hostname kernel: tcp_output#1587: snd_max 2137549205 > so_snd+1
2137549204 adjusting.
Jun 15 15:15:53 hostname kernel: tcp_output#1587: snd_max 2960832867 > so_snd+1
2960832866 adjusting.
Jun 15 15:16:32 hostname kernel: tcp_output#1587: snd_max 1274542388 > so_snd+1
1274542387 adjusting.
Jun 15 15:16:32 hostname kernel: tcp_output#1587: snd_max 1274542389 > so_snd+1
1274542387 adjusting.
Jun 15 15:30:31 hostname kernel: tcp_output#1587: snd_max 2283463051 > so_snd+1
2283463050 adjusting.
Jun 15 15:35:43 hostname kernel: tcp_output#1587: snd_max 336646110 > so_snd+1
336646109 adjusting.
Jun 15 15:38:35 hostname kernel: tcp_output#1587: snd_max 1227541195 > so_snd+1
1227541194 adjusting.
Jun 15 15:38:35 hostname kernel: tcp_output#1587: snd_max 1227541196 > so_snd+1
1227541194 adjusting.
Jun 15 16:56:46 hostname kernel: tcp_output#1587: snd_max 382752252 > so_snd+1
382752251 adjusting.
Jun 15 16:56:46 hostname kernel: tcp_output#1587: snd_max 382752253 > so_snd+1
382752251 adjusting.
Jun 15 17:08:42 hostname kernel: tcp_output#1587: snd_max 359933951 > so_snd+1
359933950 adjusting.
Jun 15 17:08:42 hostname kernel: tcp_output#1587: snd_max 359933952 > so_snd+1
359933950 adjusting.
Jun 15 17:09:31 hostname kernel: tcp_output#1587: snd_max 205782007 > so_snd+1
205782006 adjusting.
Jun 15 17:09:41 hostname kernel: tcp_output#1587: snd_max 205782008 > so_snd+1
205782006 adjusting.
Jun 15 17:09:53 hostname kernel: tcp_output#1587: snd_max 205782009 > so_snd+1
205782006 adjusting.
Jun 15 17:10:03 hostname kernel: tcp_output#1587: snd_max 205782010 > so_snd+1
205782006 adjusting.
Jun 15 17:22:07 hostname kernel: tcp_output#1587: snd_max 4147118877 > so_snd+1
4147118876 adjusting.
Jun 15 17:36:46 hostname kernel: tcp_output#1587: snd_max 494998134 > so_snd+1
494998133 adjusting.
Jun 15 18:27:14 hostname kernel: tcp_output#1587: snd_max 2387061338 > so_snd+1
2387061337 adjusting.
Jun 15 19:22:28 hostname kernel: tcp_output#1587: snd_max 725541353 > so_snd+1
725541352 adjusting.
Jun 15 19:23:53 hostname kernel: tcp_output#1587: snd_max 607018670 > so_snd+1
607018669 adjusting.
Jun 15 19:24:03 hostname kernel: tcp_output#1587: snd_max 607018671 > so_snd+1
607018669 adjusting.
Jun 15 19:24:14 hostname kernel: tcp_output#1587: snd_max 607018672 > so_snd+1
607018669 adjusting.
Jun 15 19:24:24 hostname kernel: tcp_output#1587: snd_max 607018673 > so_snd+1
607018669 adjusting.
Jun 15 19:24:34 hostname kernel: tcp_output#1587: snd_max 607018674 > so_snd+1
607018669 adjusting.
Jun 15 19:24:44 hostname kernel: tcp_output#1587: snd_max 607018675 > so_snd+1
607018669 adjusting.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #62 from Michael Tuexen  ---
(In reply to Dmitriy from comment #56)
I can confirm that
options TCPPCAP
was enabled in the kernel. However, it seems you did not execute
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
at least that would explain why no packets were captured.
Can you double check? More than 10 would be useful (like 20 or 30), but I don't
know if that would work for your workload.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #61 from Michael Tuexen  ---
(In reply to Dmitriy from comment #56)
Thank you! I'm trying to recreate the problem locally using packetdrill.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #60 from Richard Scheffenegger  ---
(In reply to Christos Chatzaras from comment #57)

Sorry for being unclear:

A plain (unpatched) 13.1-RELEASE kernel with net.inet.tcp.rfc6675_pipe=0 would
not exhibit a panic.

The patches added here really are to gather more information - and deliberately
panic (when built with INVARIANTS) or retroactively adjust the incorrect TCP
state.

Stable operation in a production environment would only need the standard
kernel, and it's safe to use SACK as long as rfc6675_pipe is not enabled.

(The root cause - apparent double-accounting for FIN bits, and possibly even
sending FIN bits at two different, final sequence numbers - would still be
present, but not cause a crash).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #59 from Christos Chatzaras  ---
(In reply to Dmitriy from comment #58)

Yes "0" is the default. So the only workaround is to disable SACK?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #58 from Dmitriy  ---
(In reply to Christos Chatzaras from comment #57)
But "net.inet.tcp.rfc6675_pipe: 0"
it was already "0" and never was changed. It remains "0" value when crash
occurs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #57 from Christos Chatzaras  ---
(In reply to Richard Scheffenegger from comment #55)

You mean that he has to use "net.inet.tcp.rfc6675_pipe=0" for stable
operartion, right?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-15 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #56 from Dmitriy  ---
(In reply to Michael Tuexen from comment #54)

You are right, the option was not enabled.
Added these settings and sent new core files in a e-mail to:
rsch...@freebsd.org tue...@freebsd.org
Thank you!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #55 from Richard Scheffenegger  ---
Thanks for the core. For stable operation, please use an unpatched kernel, and
without net.inet.tcp.rfc6675_pipe=0

The patched cores confirm that during the very final phases, the stack
increments one of the variables multiple times for the FIN bit, when it should
only have incremented it once. Together with SACK rescue retransmissions
(enabled by above sysctl) this leads to an attempt to send out non-existing
data.

the TCPPCAP and sudo syssctl net.inet.tcp.tcp_pcap_packets=10 would retain the
last 10 packets of every session in-memory, in order to create a reproduction
script, as the timing (relativ to retransmission timeouts, persist and
keepalive timers) of the clients packets apparently has something to do with
this problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #54 from Michael Tuexen  ---
(In reply to Dmitriy from comment #53)
Thanks. Did you enable
options TCPPCAP
in the kernel config? It looks like that it is not enabled...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #53 from Dmitriy  ---
(In reply to Richard Scheffenegger from comment #51)

Sent the link with in e-mail to: rsch...@freebsd.org tue...@freebsd.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #52 from Richard Scheffenegger  ---
Created attachment 234683
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234683=edit
more logging

extended logging (w/ panic/KASSERT)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #51 from Richard Scheffenegger  ---
Thanks a lot!

Can you provide that core + kernel.debug files?

I've extended the logging in a revised patch, if this may be more easy.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-14 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #50 from Dmitriy  ---
After applying the patch comment#34
and with
options INVARIANTS
options INVARIANT_SUPPORT
in kernel, system goin to panic in 5-40 minutes (tried 3 times, all the same
place), with following trace:

Unread portion of the kernel message buffer:
panic: tcp_output: snd_max beyond so_snd
cpuid = 12
time = 1655213044
KDB: stack backtrace:
#0 0x808d8f01 at kdb_backtrace+0x71
#1 0x8086f797 at vpanic+0x227
#2 0x8086f2be at panic+0x4e
#3 0x80ab3551 at tcp_output+0x32a1
#4 0x80aa2722 at tcp_do_segment+0x2e72
#5 0x80a9ec35 at tcp_input_with_port+0x1be5
#6 0x80a9f777 at tcp_input+0x27
#7 0x80a87061 at ip_input+0xdd1
#8 0x80a4023f at netisr_dispatch_src+0x1df
#9 0x80a407a1 at netisr_dispatch+0x21
#10 0x80a11266 at ether_demux+0x306
#11 0x80a13c10 at ether_input_internal+0x9e0
#12 0x80a13221 at ether_nh_input+0xb1
#13 0x80a4023f at netisr_dispatch_src+0x1df
#14 0x80a407a1 at netisr_dispatch+0x21
#15 0x80a11b09 at ether_input+0x1a9
#16 0x80a3a925 at iflib_rxeof+0x895
#17 0x80a2e4e5 at _task_fn_rx+0xd5
Uptime: 43m43s
Dumping 9369 out of 261999 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:399
399dumptid = curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:399
#1  0x8086efd3 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:487
#2  0x8086f84f in vpanic (fmt=0x80f390c8 "%s: snd_max beyond
so_snd", ap=0xfe027ac92320) at /usr/src/sys/kern/kern_shutdown.c:920
#3  0x8086f2be in panic (fmt=0x80f390c8 "%s: snd_max beyond
so_snd") at /usr/src/sys/kern/kern_shutdown.c:844
#4  0x80ab3551 in tcp_output (tp=0xfe04709abca8) at
/usr/src/sys/netinet/tcp_output.c:1583
#5  0x80aa2722 in tcp_do_segment (m=0xf801ef8be500,
th=0xf801ef8be57a, so=0xf8061cdc8b10, tp=0xfe04709abca8,
drop_hdrlen=41, tlen=0, iptos=0 '\000') at
/usr/src/sys/netinet/tcp_input.c:2713
#6  0x80a9ec35 in tcp_input_with_port (mp=0xfe027ac929c8,
offp=0xfe027ac92968, proto=6, port=0) at
/usr/src/sys/netinet/tcp_input.c:1400
#7  0x80a9f777 in tcp_input (mp=0xfe027ac929c8,
offp=0xfe027ac92968, proto=6) at /usr/src/sys/netinet/tcp_input.c:1496
#8  0x80a87061 in ip_input (m=0x0) at
/usr/src/sys/netinet/ip_input.c:839
#9  0x80a4023f in netisr_dispatch_src (proto=1, source=0,
m=0xf801ef8be500) at /usr/src/sys/net/netisr.c:1143
#10 0x80a407a1 in netisr_dispatch (proto=1, m=0xf801ef8be500) at
/usr/src/sys/net/netisr.c:1234
#11 0x80a11266 in ether_demux (ifp=0xf820816e3800,
m=0xf801ef8be500) at /usr/src/sys/net/if_ethersubr.c:921
#12 0x80a13c10 in ether_input_internal (ifp=0xf820816e3800,
m=0xf801ef8be500) at /usr/src/sys/net/if_ethersubr.c:707
#13 0x80a13221 in ether_nh_input (m=0xf801ef8be500) at
/usr/src/sys/net/if_ethersubr.c:737
#14 0x80a4023f in netisr_dispatch_src (proto=5, source=0,
m=0xf801ef8be500) at /usr/src/sys/net/netisr.c:1143
#15 0x80a407a1 in netisr_dispatch (proto=5, m=0xf801ef8be500) at
/usr/src/sys/net/netisr.c:1234
#16 0x80a11b09 in ether_input (ifp=0xf8010650a000,
m=0xf801ef8be500) at /usr/src/sys/net/if_ethersubr.c:828
#17 0x80a3a925 in iflib_rxeof (rxq=0xfe01b7551080, budget=16) at
/usr/src/sys/net/iflib.c:3047
#18 0x80a2e4e5 in _task_fn_rx (context=0xfe01b7551080) at
/usr/src/sys/net/iflib.c:3990
#19 0x808d7427 in gtaskqueue_run_locked (queue=0xf80104d7e200) at
/usr/src/sys/kern/subr_gtaskqueue.c:371
#20 0x808d6fad in gtaskqueue_thread_loop (arg=0xfe01b71a7128) at
/usr/src/sys/kern/subr_gtaskqueue.c:547
#21 0x808053f2 in fork_exit (callout=0x808d6f00
, arg=0xfe01b71a7128, frame=0xfe027ac92f40) at
/usr/src/sys/kern/kern_fork.c:1093
#22 
#23 0x8129ea18 in periodic_resettodr_sys_init ()
Backtrace stopped: Cannot access memory at address 0x0
(kgdb) fr 4
#4  0x80ab3551 in tcp_output (tp=0xfe04709abca8) at
/usr/src/sys/netinet/tcp_output.c:1583


1583KASSERT(SEQ_LEQ(tp->snd_max, top+1),
(kgdb) p tp->snd_max
$1 = 3141897257
(kgdb) p top
$2 = 3141897255
(kgdb)

No KTLS enabled\used. Adapter is Intel X710 (if_ixl).
If there is anything else we can help with, please just let me know.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-13 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Dmitriy  changed:

   What|Removed |Added

 CC||suppor...@ukr.net

--- Comment #49 from Dmitriy  ---
(In reply to Richard Scheffenegger from comment #37)

Good day to all.
We most likely got into exactly the same problem and backtrace.
Here is some information that we hope will be of some help.
If there is anything else we can help with, please just let me know.

13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n251001-41ce229505a: Sat Jun 4
19:47:50 EEST 2022

kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0
...
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x806e5569
stack pointer   = 0x28:0xfe027ac53690
frame pointer   = 0x28:0xfe027ac53700
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_2)
trap number = 12
panic: page fault
cpuid = 2
time = 1655099122
KDB: stack backtrace:
#0 0x806a0cc5 at kdb_backtrace+0x65
#1 0x80657a0f at vpanic+0x17f
#2 0x80657883 at panic+0x43
#3 0x80a03837 at trap_fatal+0x387
#4 0x80a0388f at trap_pfault+0x4f
#5 0x809dcbe8 at calltrap+0x8
#6 0x807cbee9 at tcp_output+0x1329
#7 0x807c332b at tcp_do_segment+0x29db
#8 0x807bfc21 at tcp_input_with_port+0xb61
#9 0x807c08bb at tcp_input+0xb
#10 0x807b2118 at ip_input+0x118
#11 0x807890b9 at netisr_dispatch_src+0xb9
#12 0x8076d554 at ether_demux+0x144
#13 0x8076e8b6 at ether_nh_input+0x346
#14 0x807890b9 at netisr_dispatch_src+0xb9
#15 0x8076d979 at ether_input+0x69
#16 0x80780bf1 at _task_fn_rx+0xc31
#17 0x8069f6dd at gtaskqueue_run_locked+0x15d
Uptime: 8d12h42m8s
Dumping 15966 out of 262007
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) f 10
#10 0x807c332b in tcp_do_segment (m=0xf8020c425400,
th=0xf8020c42547a, so=0xf8179dcc1760, tp=0xfe09a5da7950,
drop_hdrlen=41,
tlen=, iptos=0 '\000') at
/usr/src/sys/netinet/tcp_input.c:2637
2637(void)
tp->t_fb->tfb_tcp_output(tp);
(kgdb) p *tp
$12 = {t_inpcb = 0xf81612d9f1f0, t_fb = 0x80ebe670
, t_fb_ptr = 0x0, t_maxseg = 1440, t_logstate = 0, t_port = 0,
t_state = 8, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0,
t_log_state_set = 0, 
  bits_spare = 0, t_flags = 537920116, snd_una = 893181596, snd_max =
893186736, snd_nxt = 893186736, snd_up = 893181596, snd_wnd = 66240, snd_cwnd =
7996, t_peakrate_thr = 0, ts_offset = 0, rfbuf_ts = 736745954, rcv_numsacks =
0, 
  t_tsomax = 0, t_tsomaxsegcount = 0, t_tsomaxsegsize = 0, rcv_nxt =
2834403673, rcv_adv = 2834408281, rcv_wnd = 4608, t_flags2 = 1026, t_srtt = 0,
t_rttvar = 4000, ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 9 '\t', 
  snd_limited = 0 '\000', request_r_scale = 9 '\t', last_ack_sent = 2834403673,
t_rcvtime = 2883811337, rcv_up = 2834403673, t_segqlen = 0, t_segqmbuflen = 0,
t_segq = {tqh_first = 0x0, tqh_last = 0xfe09a5da79e0}, t_in_pkt = 0x0, 
  t_tail_pkt = 0x0, t_timers = 0xfe09a5da7bf8, t_vnet = 0xf801016b1380,
snd_ssthresh = 2880, snd_wl1 = 2834403673, snd_wl2 = 893181596, irs =
2834403154, iss = 893181595, t_acktime = 0, t_sndtime = 2883799567,
ts_recent_age = 0, 
  snd_recover = 893186735, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0
'\000', t_rxtcur = 64000, t_rxtshift = 7, t_rtttime = 2883799567, t_rtseq =
893186734, t_starttime = 2883628642, t_fbyte_in = 2883628647, 
  t_fbyte_out = 2883628648, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0,
t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 0, t_softerror = 0, max_sndwnd
= 66240, snd_cwnd_prev = 23040, snd_ssthresh_prev = 1073725440, 
  snd_recover_prev = 893181595, t_sndzerowin = 0, t_rttupdated = 0,
snd_numholes = 3, t_badrxtwin = 2883629648, snd_holes = {tqh_first =
0xf808705bae80, tqh_last = 0xf8037d9e6110}, snd_fack = 893186735,
sackblks = {{
  start = 2834403672, end = 2834403673}, {start = 0, end = 0}, {start = 0,
end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}},
sackhint = {nexthole = 0xf8037d9e6100, sack_bytes_rexmit = 4250, 
last_sack_ack = 0, delivered_data = 0, sacked_bytes = 888, recover_fs =
5138, prr_delivered = 6636, prr_out = 20102}, t_rttlow = 0, rfbuf_cnt = 517,
tod = 0x0, t_sndrexmitpack = 85, t_rcvoopack = 0, t_toe = 0x0,

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #48 from Michael Tuexen  ---
(In reply to iron.udjin from comment #47)
Perfect!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #47 from iron.ud...@gmail.com ---
(In reply to Michael Tuexen from comment #46)

Yes, I did it already. Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #46 from Michael Tuexen  ---
(In reply to iron.udjin from comment #45)
Can you try something like
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
If your system runs out of mbufs, use less than 10...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #45 from iron.ud...@gmail.com ---
(In reply to Richard Scheffenegger from comment #44)

Ok, kernel has been rebuilded. I'll let you know in case of having something
interesting in dmesg.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #44 from Richard Scheffenegger  ---
(In reply to iron.udjin from comment #38)
Can you also add 

options INVARIANT_SUPPORT

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #43 from Christos Chatzaras  ---
(In reply to Michael Tuexen from comment #42)

Links sent. Thank you.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #42 from Michael Tuexen  ---
(In reply to Christos Chatzaras from comment #39)
Can you also send the link to tue...@freebsd.org or, if that is blocked, to
tue...@fh-muenster.de?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #41 from Richard Scheffenegger  ---
(In reply to Christos Chatzaras from comment #40)

Indeed. Unlike the other cores, the session is still in FIN_WAIT_1 state (6),
not LAST_ACK.

Also, there is an entire chain of 1-byte holes in the SACK scoreboard (which
got retransmitted. No sign of a rescue retransmission - snd_fack seems to have
covered snd_max (the FIN bit?) but there is a gap of 1 byte in the sequence
stream:

snd_max is still off from so_snd by 2 instead of the expected 1.

So not quite the same, but maybe the same root cause.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #40 from Christos Chatzaras  ---
(In reply to Richard Scheffenegger from comment #37)

Looks like similar to this which I reported last year:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254725

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #39 from Christos Chatzaras  ---
(In reply to Richard Scheffenegger from comment #37)

Hello Richard,

Tried to e-mail you at rsch...@freebsd.org but Office 365 block it. Can you
e-mail me at ch...@cretaforce.gr to send you the link to download it?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #38 from iron.ud...@gmail.com ---
(In reply to Richard Scheffenegger from comment #35)


I have a problem with kernel rebuild on the linking stage. I just copied
GENERIC to KP and added:

ident   DEBUG
options TCPPCAP
options WITNESS
options INVARIANTS

...and got errors:

linking kernel.full
ld: error: undefined symbol: __mtx_assert
>>> referenced by cam_periph.c:351 (/usr/src/sys/cam/cam_periph.c:351)
>>>   cam_periph.o:(cam_periph_find)
>>> referenced by cam_periph.c:444 (/usr/src/sys/cam/cam_periph.c:444)
>>>   cam_periph.o:(cam_periph_release_locked_buses)
>>> referenced by cam_periph.c:678 (/usr/src/sys/cam/cam_periph.c:678)
>>>   cam_periph.o:(cam_periph_release_locked_buses)
>>> referenced 4252 more times

ld: error: undefined symbol: _sx_assert
>>> referenced by filedesc.h:292 (/usr/src/sys/sys/filedesc.h:292)
>>>   freebsd32_capability.o:(freebsd32_cap_ioctls_get)
>>> referenced by acpi_battery.c:465 
>>> (/usr/src/sys/dev/acpica/acpi_battery.c:465)
>>>   acpi_battery.o:(acpi_battery_register)
>>> referenced by acpi_cmbat.c:342 (/usr/src/sys/dev/acpica/acpi_cmbat.c:342)
>>>   acpi_cmbat.o:(acpi_cmbat_get_bix)
>>> referenced 590 more times

ld: error: undefined symbol: __rw_assert
>>> referenced by agp.c:613 (/usr/src/sys/dev/agp/agp.c:613)
>>>   agp.o:(agp_generic_bind_memory)
>>> referenced by tmpfs_subr.c:169 (/usr/src/sys/fs/tmpfs/tmpfs_subr.c:169)
>>>   tmpfs_subr.o:(tmpfs_pager_update_writecount)
>>> referenced by tmpfs_subr.c:185 (/usr/src/sys/fs/tmpfs/tmpfs_subr.c:185)
>>>   tmpfs_subr.o:(tmpfs_pager_release_writecount)
>>> referenced 567 more times

ld: error: undefined symbol: _lockmgr_assert
>>> referenced by msdosfs_fat.c:404 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:404)
>>>   msdosfs_fat.o:(usemap_free)
>>> referenced by msdosfs_fat.c:762 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:762)
>>>   msdosfs_fat.o:(clusteralloc)
>>> referenced by msdosfs_fat.c:901 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:901)
>>>   msdosfs_fat.o:(fillinusemap)
>>> referenced 14 more times

ld: error: undefined symbol: _rm_assert
>>> referenced by kern_rmlock.c:120 (/usr/src/sys/kern/kern_rmlock.c:120)
>>>   kern_rmlock.o:(assert_rm)
>>> referenced by kern_rmlock.c:151 (/usr/src/sys/kern/kern_rmlock.c:151)
>>>   kern_rmlock.o:(unlock_rm)
>>> referenced by kern_rmlock.c:322 (/usr/src/sys/kern/kern_rmlock.c:322)
>>>   kern_rmlock.o:(rm_destroy)
>>> referenced 38 more times

ld: error: undefined symbol: _rangelock_cookie_assert
>>> referenced by uipc_shm.c:743 (/usr/src/sys/kern/uipc_shm.c:743)
>>>   uipc_shm.o:(shm_dotruncate_cookie)
>>> referenced by uipc_shm.c:641 (/usr/src/sys/kern/uipc_shm.c:641)
>>>   uipc_shm.o:(shm_dotruncate_locked)
*** [kernel.full] Error code 1

/etc/make.conf: CFLAGS= -O2 -pipe -march=native

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #37 from Richard Scheffenegger  ---
Can you provide the vmcore (and kernel.debug) files?

Or check 
f 10
p *tp
p *tp->sackhint.nexthole
p tp->snd_una +  tp->t_inpcb->inp_socket->so_snd.sb_ccc
p/x tp->t_flags
p/x tp->t_flags2
p *tp->t_timers

So far, this seems to be the signature:
t_state == 8 (LAST_ACK)
snd_max == snd_una + so_snd.sb_ccc + 2 (! should be one to account for the FIN)
snd_fack < sackhole.end (rescue retransmission - this is new in fbsd13)
t_rxtshift == 6..12 (many retransmission timeouts, indicating the client
disappeared - temporarily)
t_rtseq == snd_una + so_snd.sb_ccc
snd_recover == snd_max (indicating the double-accounting for FIN happend prior
of entering loss recovery, or "many" packets prior of the actual panic)

t_flags = 0x20100274 (TF_CONGRECOVERY, TF_FASTRECOVERY, TF_SACK_PERMIT,
TF_SENTFIN, TF_REQ_SCALE, TF_RCVD_SCALE, TF_NODELAY, TF_ACKNOW)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #36 from Christos Chatzaras  ---
Today I had a kernel panic in a server. Is it possible the same bug?


Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80cae31d
stack pointer   = 0x28:0xfe00e00445f0
frame pointer   = 0x28:0xfe00e0044660
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_7)
trap number = 12
panic: page fault
cpuid = 7
time = 1654981222
KDB: stack backtrace:
#0 0x80c69465 at kdb_backtrace+0x65
#1 0x80c1bb1f at vpanic+0x17f
#2 0x80c1b993 at panic+0x43
#3 0x810afdf5 at trap_fatal+0x385
#4 0x810afe4f at trap_pfault+0x4f
#5 0x81087528 at calltrap+0x8
#6 0x80de07c9 at tcp_output+0x1339
#7 0x80dd7eed at tcp_do_segment+0x2cfd
#8 0x80dd44b1 at tcp_input_with_port+0xb61
#9 0x80dd515b at tcp_input+0xb
#10 0x80dc691f at ip_input+0x11f
#11 0x80d53089 at netisr_dispatch_src+0xb9
#12 0x80d36ea8 at ether_demux+0x138
#13 0x80d38235 at ether_nh_input+0x355
#14 0x80d53089 at netisr_dispatch_src+0xb9
#15 0x80d372d9 at ether_input+0x69
#16 0x80ddd9f4 at tcp_lro_flush+0x2f4
#17 0x803b at tcp_lro_flush_all+0x1bb
Uptime: 20d1h1m33s
Dumping 2502 out of 32501 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
at /usr/src/sys/kern/kern_shutdown.c:399
#2  0x80c1b71c in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:487
#3  0x80c1bb8e in vpanic (fmt=0x811b4fb9 "%s", 
ap=) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0x80c1b993 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:844
#5  0x810afdf5 in trap_fatal (frame=0xfe00e0044530, eva=24)
at /usr/src/sys/amd64/amd64/trap.c:944
#6  0x810afe4f in trap_pfault (frame=0xfe00e0044530, 
usermode=false, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:763
#7  
#8  m_copydata (m=0x0, m@entry=0xf80405398400, off=0, len=1, 
cp=) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0x80de07c9 in tcp_output (tp=)
at /usr/src/sys/netinet/tcp_output.c:1081
#10 0x80dd7eed in tcp_do_segment (m=, 
th=, so=, tp=0xfe013f603000, 
drop_hdrlen=40, tlen=, iptos=0 '\000')
at /usr/src/sys/netinet/tcp_input.c:2637
#11 0x80dd44b1 in tcp_input_with_port (mp=, 
offp=, proto=, port=port@entry=0)
at /usr/src/sys/netinet/tcp_input.c:1400
#12 0x80dd515b in tcp_input (mp=0xf80405398400, offp=0x0, proto=1)
at /usr/src/sys/netinet/tcp_input.c:1496
#13 0x80dc691f in ip_input (m=0x0)
at /usr/src/sys/netinet/ip_input.c:839
#14 0x80d53089 in netisr_dispatch_src (proto=1, 
source=source@entry=0, m=0xf8002f330900)
at /usr/src/sys/net/netisr.c:1143
#15 0x80d5345f in netisr_dispatch (proto=87655424, m=0x1)
at /usr/src/sys/net/netisr.c:1234
#16 0x80d36ea8 in ether_demux (ifp=ifp@entry=0xf80004454000, 
m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0x80d38235 in ether_input_internal (ifp=0xf80004454000, m=0x0)
at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:737
#19 0x80d53089 in netisr_dispatch_src (proto=proto@entry=5, 
source=source@entry=0, m=m@entry=0xf8002f330900)
at /usr/src/sys/net/netisr.c:1143
#20 0x80d5345f in netisr_dispatch (proto=87655424, proto@entry=5, 
m=0x1, m@entry=0xf8002f330900) at /usr/src/sys/net/netisr.c:1234
#21 0x80d372d9 in ether_input (ifp=, 
m=0xf8002f330900) at /usr/src/sys/net/if_ethersubr.c:828
#22 0x80ddd9f4 in tcp_lro_flush (lc=lc@entry=0xf80003cf5830, 
le=0xfe0103f3f690) at /usr/src/sys/netinet/tcp_lro.c:1375
#23 0x803b in tcp_lro_rx_done (lc=0xf80003cf5830)
at /usr/src/sys/netinet/tcp_lro.c:566
#24 tcp_lro_flush_all (lc=lc@entry=0xf80003cf5830)
at /usr/src/sys/netinet/tcp_lro.c:1532
#25 0x80d4f503 in iflib_rxeof (rxq=, 
rxq@entry=0xf80003cf5800, budget=)
at /usr/src/sys/net/iflib.c:3058
#26 0x80d49b22 in _task_fn_rx (context=0xf80003cf5800)
at /usr/src/sys/net/iflib.c:3990
#27 0x80c67e9d in gtaskqueue_run_locked (
queue=queue@entry=0xf80003ac2000)
at

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #35 from Richard Scheffenegger  ---
(In reply to iron.udjin from comment #31)
Try the patch attached just now, which should apply cleanly.

Note that if the hypothesis about snd_max getting infrequently incremented
twice holds, you may observe more frequent panics in an INVARIANTS kernel -
prior to these added checks, that oddity would not materially affect the tcp
session. Only the new feature of SACK rescue retransmissions - which should
also happen very infrequently at the same time as a FIN bit - exposed this.

In that case, you may want to switch to a kernel without the INVARIANTS (which
would still log those occurances, but not panic), or the default 13.1-RELEASE
kernel with rfc6675_pipe disabled (to disable rescue retransmissions) for
stable operation.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #34 from Richard Scheffenegger  ---
Created attachment 234627
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234627=edit
clean patch against 13.1-RELEASE

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Richard Scheffenegger  changed:

   What|Removed |Added

 Attachment #234626|0   |1
is obsolete||

--- Comment #33 from Richard Scheffenegger  ---
Comment on attachment 234626
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234626
D35446 patch to apply clean against 13.1-RELEASE

missed removing the goto label to make the compiler happy.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #32 from Richard Scheffenegger  ---
Created attachment 234626
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234626=edit
D35446 patch to apply clean against 13.1-RELEASE

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #31 from iron.ud...@gmail.com ---
(In reply to Richard Scheffenegger from comment #27)

Failed to apply patch for 13.1-RELEASE source code:

|Index: sys/netinet/tcp_sack.c
|===
|--- sys/netinet/tcp_sack.c
|+++ sys/netinet/tcp_sack.c
--
Patching file sys/netinet/tcp_sack.c using Plan A...
Hunk #1 succeeded at 879 with fuzz 1 (offset -19 lines).
Hunk #2 failed at 962.
1 out of 2 hunks failed--saving rejects to sys/netinet/tcp_sack.c.rej
done

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #30 from Michael Tuexen  ---
(In reply to iron.udjin from comment #29)
In addition, I would also suggest
options TCPPCAP
That would allow us to see the last n packets in/out per endpoint in the kernel
dump. Just compiling it in does not enable the logging. If we actually want to
do the logging, we need to do something like
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
to store the last 10 packets in each direction. We can see if your system can
handle this, since it use mbufs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #29 from iron.ud...@gmail.com ---
(In reply to Richard Scheffenegger from comment #27)

Tell me please which options should be enabled when I'll rebuild kernel.
Anything else except INVARIANTS?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #28 from iron.ud...@gmail.com ---
(In reply to Michael Tuexen from comment #26)

When my website has started experience problem with overload because of
thousands distributed requests like "GET / HTTP/2.0", I switched website to
cloudflare and configured filters which was allow the website stay alive. So,
there wasn't any direct traffic from clients to the server. Regarding traffic
pattern: all traffic was TCP/HTTPS/443 port from cloudflare. Currently there no
DDoS attack and no crashes. I guess to make .pcap file now is useless.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #27 from Richard Scheffenegger  ---
(In reply to iron.udjin from comment #25)

I've prepared a patch against main (may need some manual tweaking to apply to
13.1-RELEASE as of now) 
wget https://reviews.freebsd.org/D35446?id=106838=true

If the kernel is built with INVARIANTS, it should panic early on, once an
inconsistency between the socket sendbuffer and tcp state variables is detected
- instead of panicing a few packets later, when that inconsistency results in a
invalid pointer access...

If the kernel is built without INVARIANTS, the kernel log buffer (dmesg) should
provide some hints as to when/where the inconsistency first occured, which may
gve more indirect clues. But it would address the inconsistency right away, and
continue operation.

If the panic was observed during a DDOS, this strengthens the clue that there
exists a race condition (double-accounting for the FIN bit). However, prior to
the introduction of SACK rescue retransmissions, this never materially affected
TCP operations, as the socket buffer data would be used directly to see what
sequence range to send, rather than the SACK scoreboard data.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #26 from Michael Tuexen  ---
(In reply to iron.udjin from comment #25)
Interesting information. Do you know what the traffic pattern was when being
under attack. Any chance you have a .pcap file and are will to share
(privately)?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-11 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #25 from iron.ud...@gmail.com ---
(In reply to Richard Scheffenegger from comment #24)

I can build and run custom kernel. But the problem that I cannot reproduce
crash.
The panic has been occur when my server was under DDoS attack. After that it
was working almost a month without any problem with net.inet.tcp.rfc6675_pipe=1
and KTLS enabled.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-10 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #24 from Richard Scheffenegger  ---
The current thinking is, that SACK rescue retransmissions (in FBSD13 this is
gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which
apparently is beyond the valid data range. 

While under most common circumstances, a final FIN bit in the sequence space is
taken care of, it seems that there may be some double-counting for the FIN bit.

In most of the inspected cores, we found:

TCP state: LAST_ACK (FIN received and also FIN sent)
SACK loss recovery triggered
A cumulative ACK before all outstanding data was received
The remote cliet "disappears" for a significant amount of time (7 to 12
retransmission timeouts), but may re-appear again just prior.
snd_max consistently 2 counts above the last data, instead of the expected 1
(for the FIN bit).

However, it is still unclear under what circumstances this double-counting
happens, possibly when the persist timer triggers, and a few other conditions
are also fulfilled - maybe a race condition between normal packet processing
and a timer firing.

In short: disabling rfc6675 enhanced SACK features (more correct pipeline
accounting, rescue retransmissions) should address the cause of the panic,
while not addressing the root cause of when/why there is the double-accounting
of the FIN bit...

Would you be willing to run an intrumented kernel, which either panics (full
core dump), or spews out various state, when inconsistencies are detected in
this space - while ignoring/addressing them "on the fly" without panicing?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-10 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #23 from Michael Tuexen  ---
(In reply to iron.udjin from comment #22)
Thanks for providing the information. We are trying to figure out what might be
going on...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-10 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #22 from iron.ud...@gmail.com ---
(In reply to Dobri Dobrev from comment #21)

Yes, it's most likely not related to KTLS because I expirianced such crashes on
13.0-STABLE with no KTLS enabled (see #258183)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-10 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Dobri Dobrev  changed:

   What|Removed |Added

 CC||ddobre...@gmail.com

--- Comment #21 from Dobri Dobrev  ---
(In reply to Michael Tuexen from comment #12)
My workload is also web-related - nginx, grafana, influxdb
No KTLS, so I doubt it's related, and I have the same panic.
Last occurrences: on server #1 - 54 days, on server #2 - 154 days.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-10 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

Michael Tuexen  changed:

   What|Removed |Added

   See Also||https://bugs.freebsd.org/bu
   ||gzilla/show_bug.cgi?id=2603
   ||93

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-06 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #20 from Michael Tuexen  ---
(In reply to iron.udjin from comment #19)
OK, thanks for the information.

My current thinking is that the problem might be related to using KTLS in a
situation, where the TCP sends a FIN, but still has outstanding data. KTLS is
tested a lot with the RACK stack, which does NOT send a FIN if there is
outstanding data. This could be the problem with len = 1, since the FIN takes
conceptually one byte in the sequence number space, but there is no
corresponding byte in the data stream...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.

[Bug 264257] [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

2022-06-06 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264257

--- Comment #19 from iron.ud...@gmail.com ---
(In reply to Michael Tuexen from comment #18)

ixl0@pci0:33:0:0:   class=0x02 rev=0x02 hdr=0x00 vendor=0x8086
device=0x15ff subvendor=0x1849 subdevice=0x
vendor = 'Intel Corporation'
device = 'Ethernet Controller X710 for 10GBASE-T'
class  = network
subclass   = ethernet

Network card isn't doing HW TLS. I use ktls_ocf module.

Currently I don't use HTCP but I was using it before.

# sysctl net.inet.tcp.cc.algorithm
net.inet.tcp.cc.algorithm: newreno

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

1 2 >

1 - 100 of 107 matches

Mail list logo