Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sunday, December 25, 2011 11:01:33 am Коньков Евгений wrote: Здравствуйте, John. Вы писали 20 декабря 2011 г., 16:52:44: JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg JB Ok, the one use case I was worried about is happening regularly before your JB panic, so that is good. Can you use gdb to figure out which call to JB tcp_output() is actually panic'ing? I wonder if it is this case: JB /* JB * Return any desired output. JB */ JB if (needoutput || (tp-t_flags TF_ACKNOW)) { JB (void) tcp_output(tp); JB /* XXX: Debug */ JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), JB (tcp_input: negative window after ACK)); JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides JB to not do anything. I've updated tcp_negwin_test.patch to not panic if that call JB to tcp_output() doesn't actually send a packet. Please re-test. # uname -a FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10 i386 rebooting once per day. Now I compile kernel with debug options. Can you advice me which and where I find debug info when it will reboting next time? so I can help to debug problem Are you using the patch at the URL above (tcp_negwin_test.patch)? If not, can you try applying that patch and seeing if you still get any panics? -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Thu, Dec 29, 2011 at 11:12:59AM -0500, John Baldwin wrote: On Sunday, December 25, 2011 11:01:33 am Коньков Евгений wrote: Здравствуйте, John. Вы писали 20 декабря 2011 г., 16:52:44: JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg JB Ok, the one use case I was worried about is happening regularly before your JB panic, so that is good. Can you use gdb to figure out which call to JB tcp_output() is actually panic'ing? I wonder if it is this case: JB /* JB * Return any desired output. JB */ JB if (needoutput || (tp-t_flags TF_ACKNOW)) { JB (void) tcp_output(tp); JB /* XXX: Debug */ JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), JB (tcp_input: negative window after ACK)); JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides JB to not do anything. I've updated tcp_negwin_test.patch to not panic if that call JB to tcp_output() doesn't actually send a packet. Please re-test. # uname -a FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10 i386 rebooting once per day. Now I compile kernel with debug options. Can you advice me which and where I find debug info when it will reboting next time? so I can help to debug problem Are you using the patch at the URL above (tcp_negwin_test.patch)? If not, can you try applying that patch and seeing if you still get any panics? I applied 1.5 days ago, so far now panics and no other messages. I modified the patch a bit to not panic, but print a message when panic was suppose to happen. This box is too valuable for me to panic it too often. Because there were no debug messages I understand that the scenerio didn't happen yet and not that the problem is fixed, right? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp9eZdmwuxI5.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Thursday, December 29, 2011 3:25:02 pm Pawel Jakub Dawidek wrote: On Thu, Dec 29, 2011 at 11:12:59AM -0500, John Baldwin wrote: On Sunday, December 25, 2011 11:01:33 am Коньков Евгений wrote: Здравствуйте, John. Вы писали 20 декабря 2011 г., 16:52:44: JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg JB Ok, the one use case I was worried about is happening regularly before your JB panic, so that is good. Can you use gdb to figure out which call to JB tcp_output() is actually panic'ing? I wonder if it is this case: JB /* JB * Return any desired output. JB */ JB if (needoutput || (tp-t_flags TF_ACKNOW)) { JB (void) tcp_output(tp); JB /* XXX: Debug */ JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), JB (tcp_input: negative window after ACK)); JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides JB to not do anything. I've updated tcp_negwin_test.patch to not panic if that call JB to tcp_output() doesn't actually send a packet. Please re-test. # uname -a FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10 i386 rebooting once per day. Now I compile kernel with debug options. Can you advice me which and where I find debug info when it will reboting next time? so I can help to debug problem Are you using the patch at the URL above (tcp_negwin_test.patch)? If not, can you try applying that patch and seeing if you still get any panics? I applied 1.5 days ago, so far now panics and no other messages. I modified the patch a bit to not panic, but print a message when panic was suppose to happen. This box is too valuable for me to panic it too often. Because there were no debug messages I understand that the scenerio didn't happen yet and not that the problem is fixed, right? Yes. It would be best to see the messages logged to be safe. Thanks. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re[2]: 9.0-RC1 panic in tcp_input: negative winow.
Здравствуйте, John. Вы писали 20 декабря 2011 г., 16:52:44: JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg JB Ok, the one use case I was worried about is happening regularly before your JB panic, so that is good. Can you use gdb to figure out which call to JB tcp_output() is actually panic'ing? I wonder if it is this case: JB /* JB * Return any desired output. JB */ JB if (needoutput || (tp-t_flags TF_ACKNOW)) { JB (void) tcp_output(tp); JB /* XXX: Debug */ JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), JB (tcp_input: negative window after ACK)); JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides JB to not do anything. I've updated tcp_negwin_test.patch to not panic if that call JB to tcp_output() doesn't actually send a packet. Please re-test. # uname -a FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10 i386 rebooting once per day. Now I compile kernel with debug options. Can you advice me which and where I find debug info when it will reboting next time? so I can help to debug problem -- С уважением, Коньков mailto:kes-...@yandex.ru ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote: On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg Ok, the one use case I was worried about is happening regularly before your panic, so that is good. Can you use gdb to figure out which call to tcp_output() is actually panic'ing? I wonder if it is this case: /* * Return any desired output. */ if (needoutput || (tp-t_flags TF_ACKNOW)) { (void) tcp_output(tp); /* XXX: Debug */ KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), (tcp_input: negative window after ACK)); And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() decides to not do anything. I've updated tcp_negwin_test.patch to not panic if that call to tcp_output() doesn't actually send a packet. Please re-test. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Unfortunately it paniced today. Take a look at: http://people.freebsd.org/~pjd/misc/tcp_panic.jpg -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpdhigg3fyKF.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote: An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. Sorry for the delay, but I'm just rebooting the box that triggered the panic with John's patch. I should know more in a day or two. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgp4LFFZCMTh1.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Monday, October 24, 2011 8:14:22 am John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. Also, I'm not sure if this case can leak into the timewait code? If so, we will need to fix this case: /* * Recover last window size sent. */ KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), (tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u, tp, tp-rcv_nxt, tp-rcv_adv)); tw-last_win = (tp-rcv_adv - tp-rcv_nxt) tp-rcv_scale; So that it sets last_win to 0 instead of some really huge value. An update. I've sent Pawel a testing patch to see if my hypothesis is correct (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch). If it is then I intend to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Friday, October 28, 2011 1:46:07 am Pawel Jakub Dawidek wrote: On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote: On 10/26/11 22:53, John Baldwin wrote: The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. Pawel, have you been able to confirm John's hypothesis? [...] Yeah, sorry. I moved the debug to the points where we drop the t_inpcb lock and I still see rcv_nxt being greater than rcv_adv: tcp_do_segment:2970 negative window: tp 0xfe00685ee3d0 rcv_nxt 1312878324 rcv_adv 1312878187 Yes, I still expect this. What I want to see is if 'delack' is always true in this case. This is just before the INP_WUNLOCK(tp-t_inpcb) under 'check_delack' label. I see this a lot (it was logged 545 times for 11 different tp pointers during 24h period). tcp_do_segment:3009 negative window: tp 0xfe005cfc6000 rcv_nxt 1442546453 rcv_adv 1442545722 This is just before calling tcp_output(). This one was logged 65 times for 3 different tp pointers. I placed a debug also after tcp_output() call, but it is not logged, so once we return from tcp_output() everything is fine. That is consistent with what I expect then, since in the delack case, tcp_output() isn't called. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On 10/26/11 22:53, John Baldwin wrote: On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote: On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv= rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. Pawel, have you been able to confirm John's hypothesis? What I don't quite get is why we haven't had a lot more reports of this issue... Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote: On 10/26/11 22:53, John Baldwin wrote: The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. Pawel, have you been able to confirm John's hypothesis? [...] Yeah, sorry. I moved the debug to the points where we drop the t_inpcb lock and I still see rcv_nxt being greater than rcv_adv: tcp_do_segment:2970 negative window: tp 0xfe00685ee3d0 rcv_nxt 1312878324 rcv_adv 1312878187 This is just before the INP_WUNLOCK(tp-t_inpcb) under 'check_delack' label. I see this a lot (it was logged 545 times for 11 different tp pointers during 24h period). tcp_do_segment:3009 negative window: tp 0xfe005cfc6000 rcv_nxt 1442546453 rcv_adv 1442545722 This is just before calling tcp_output(). This one was logged 65 times for 3 different tp pointers. I placed a debug also after tcp_output() call, but it is not logged, so once we return from tcp_output() everything is fine. The panic would be triggered 115 times for 5 different tp pointers during that time. I write 'tp pointers' as I'm not 100% sure if the same pointer always represents the same connection or if it is reused. [...] What I don't quite get is why we haven't had a lot more reports of this issue... Maybe because my TCP/IP stack is heavly modified? ...not:) No idea to be honest. Ask Ken to turn on INVARIANTS in 9.0-RC2 and we will see:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpN4xRhoFGYE.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpjTSToeUp9v.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote: On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. The assertion would be triggered when the next packet arrives (as I said above). Try modifying your debugging output to also log if the ACK is delayed. I suspect it is not delayed until the last one. (Pushing out an ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case of an immediate ACK, rcv_nxt rcv_adv is only a transient condition all under a single lock invocation so never visible to other consumers of the protocol control block.) If that is what you see, then that confirms what I guessed above and I will likely just remove the assertion in tcp_input() and patch the timewait code to handle this case. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On 10/26/11 12:54 AM, Pawel Jakub Dawidek wrote: On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote: On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv= rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. I added debug to the places where rcv_adv and rcv_nxt are modified. Here is what happens before the panic occurs: tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098 tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448 tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221 panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221 I can send you the full log if you want, I've plenty of messages where rcv_adv rcv_nxt, not all of them trigger this assertion. Might be a good place to use the new sifter tool. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. No, the kernel is corrupting things in other places when this is true, so if you are running with INVARIANTS, we want to know about it. Specifically, in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on being able to do 'rcv_adv - rcv_nxt'. In this case, it looks like the difference is consistently less than one frame. I suspect the other end of the connection is sending just beyond the end of the advertised window (it probably assumes it is better to send a full frame if it has that much pending data even though part of it is beyond the window edge vs sending a truncated packet that just fills the window) and that that frame is accepted ok in the header prediction case and it's ACK is delayed, but the next packet to arrive then trips over this assumption. Since 'win' is guaranteed to be non-negative and we explicitly cast 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking for: tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt)); I think we already handle this case ok and perhaps the assertion can just be removed? Not sure if others feel that it warrants a comment to note that this is the case being handled. Also, I'm not sure if this case can leak into the timewait code? If so, we will need to fix this case: /* * Recover last window size sent. */ KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt), (tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u, tp, tp-rcv_nxt, tp-rcv_adv)); tw-last_win = (tp-rcv_adv - tp-rcv_nxt) tp-rcv_scale; So that it sets last_win to 0 instead of some really huge value. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, Oct 23, 2011 at 12:35:15PM +1100, Lawrence Stewart wrote: On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? It is my local file server. It is doing NFS and AFP over LAN and also downloads files from the internet. It is triggered after few hours. I changed the KASSERT() into printf() and added printing 'win' variable and this is what got logged during the night: 05:16:24 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107827269 rcv_adv 1107826256 win=242 05:16:29 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107833451 rcv_adv 1107832977 win=880 05:16:41 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107849563 rcv_adv 1107848860 win=639 05:20:02 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108108230 rcv_adv 1108107331 win=567 05:24:30 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108433302 rcv_adv 1108432272 win=974 05:24:46 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108450385 rcv_adv 1108450060 win=751 05:26:44 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108574818 rcv_adv 1108573851 win=71 05:28:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108654103 rcv_adv 1108653166 win=0 05:28:43 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108692396 rcv_adv 1108691451 win=0 05:30:06 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108781258 rcv_adv 1108780372 win=235 05:35:05 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109067578 rcv_adv 1109067335 win=663 05:37:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109180403 rcv_adv 1109179411 win=0 05:41:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109428265 rcv_adv 1109427375 win=170 And the systems seems to be fine. I'm happy to test patches, but one round would take 24h. My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpcUNlslP3cT.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: On Sun, Oct 23, 2011 at 12:35:15PM +1100, Lawrence Stewart wrote: On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? It is my local file server. It is doing NFS and AFP over LAN and also downloads files from the internet. It is triggered after few hours. I changed the KASSERT() into printf() and added printing 'win' variable and this is what got logged during the night: 05:16:24 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107827269 rcv_adv 1107826256 win=242 05:16:29 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107833451 rcv_adv 1107832977 win=880 05:16:41 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107849563 rcv_adv 1107848860 win=639 05:20:02 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108108230 rcv_adv 1108107331 win=567 05:24:30 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108433302 rcv_adv 1108432272 win=974 05:24:46 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108450385 rcv_adv 1108450060 win=751 05:26:44 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108574818 rcv_adv 1108573851 win=71 05:28:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108654103 rcv_adv 1108653166 win=0 05:28:43 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108692396 rcv_adv 1108691451 win=0 05:30:06 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108781258 rcv_adv 1108780372 win=235 05:35:05 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109067578 rcv_adv 1109067335 win=663 05:37:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109180403 rcv_adv 1109179411 win=0 05:41:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109428265 rcv_adv 1109427375 win=170 And the systems seems to be fine. I'm happy to test patches, but one round would take 24h. My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. pgpKtmVl4jzPD.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote: On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote: My suggestion would be that if we won't be able to fix it before 9.0, we should turn this assertion off, as the system seems to be able to recover. Shipped kernels have all assertions turned off. Yes, I'm aware of that, but many people compile their production kernels with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg. corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing it into a printf, so it will be visible. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpe47qzpCChn.pgp Description: PGP signature
9.0-RC1 panic in tcp_input: negative winow.
The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpo3XZOvEVzL.pgp Description: PGP signature
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sat, 22 Oct 2011, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg I've seen that issue once before in the r222XYZ days -- it's a problem that's been around since the TCP / IP code has been refactored in FreeBSD. Thanks, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 9.0-RC1 panic in tcp_input: negative winow.
On Sun, 23 Oct 2011, Lawrence Stewart wrote: On 10/22/11 19:49, Pawel Jakub Dawidek wrote: The panic message says: panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 3718269252 rcv_adv 3718268291 I only have picture of the backtrace: http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg ewww that is not good. Can you give us any more information about the machine and what it's doing? Is it terminating TCP connections from the internet at large or only local LAN (i.e. is there likely to be packet loss happening)? Are you doing TSO or LRO? Do you have any non-default tuning in place? I can't speak for pjd@, but when this issue occured a couple months ago for me, the system I had run into the issue with was doing packet forwarding via ipfw from an internal NAT via ipfw to a corporate network. The system was using bce(4) on the internal and external interface. Thanks, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org