Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Fri, 22 Nov 2024 15:04:04 -0800: > I think I have found the bug. In traverse(), when the buffer is full, > at about popsbr.c line 656, "slen" is the size of "response", so > setting response[slen] to NUL writes one byte beyond the array. Ok, good find. Now I wonder why I haven't seen this in 2 years of running it. Surely I must have had some line somewhere that caused the buffer to overflow---but then again, maybe this just reveals how uncommon it is for email systems to violate the line length restriction in emails. But I'm also baffled by why running your test-pop against my patch doesn't trigger it here on OpenBSD. Amazingly good catch. And now that I look at my code from what seems an age ago, I'm not sure I like it... too messy. Thanks, Andy
Re: patch for POP client bug #66407, inc long lines (0/2)
>Ok, good find. Now I wonder why I haven't seen this in 2 years of >running it. Surely I must have had some line somewhere that caused the >buffer to overflow---but then again, maybe this just reveals how >uncommon it is for email systems to violate the line length restriction >in emails. It wouldn't surprise me that a SINGLE byte buffer overflow wouldn't necessarily trigger a crash on all systems. It sounds like what happened here was the malloc pool got corrupted and future allocations stepped over a previously-allocated buffer. If the system malloc() implementation ended up doing something like rounding up some allocations to some alignment-based size, you might not see it. --Ken
Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Fri, 22 Nov 2024 14:18:30 -0800: > Meanwhile, has anyone tried my patch? Does it work for anyone other > than me, its author? As I mentioned in a previous email, I have applied your patch, but only for running through your test-pop. I haven't looked it over to see the difference in your approach, nor have I put it into actual use---haven't had the time. I'll see if I can take a look at it this weekend. As for getting it "working", I may have to backport it to nmh-1.8 just so I can integrate it more easily with what I'm currently running for an actual test. Unless there's a way to have it inc into somewhere other than my current setup. Maybe that's a better way to test it---I'll have to see how feasible that is. Thanks, Andy
Re: patch for POP client bug #66407, inc long lines (0/2)
stephen wrote: > I think I have found the bug. In traverse(), when the buffer > is full, at about popsbr.c line 656, "slen" is the size of > "response", so setting response[slen] to NUL writes one byte > beyond the array. Shocking!! How can these things happen?!?:-) Do you have access to valgrind? It would (should) have found this pretty quickly. And, it might find something else, if that's not the only issue. paul =-- paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 43.6 degrees)
Re: patch for POP client bug #66407, inc long lines (0/2)
I think I have found the bug. In traverse(), when the buffer is full, at about popsbr.c line 656, "slen" is the size of "response", so setting response[slen] to NUL writes one byte beyond the array. < Stephen
Re: patch for POP client bug #66407, inc long lines (0/2)
To answer your last question ... I have been unfortunately too busy to do any testing myself. But I will note this: >#1 0x5556b2dc in memcpy (__len=8192, >__src=, __dest=0x7fff9da0) at >/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29 It is unfortunate that src is optimized out, but that would be coming nsc->ns_inptr so that does track in nsc getting clobbered. >I have worked on debugging this. At some point, the entire nsc >structure becomes garbage. I have not yet pinpointed where memory gets >corrupted. Is it possible to set a watchpoint on some part of the nsc structure? If you ARE overrunning malloc()d data, maybe use a debug malloc library? MacOS X comes with gmalloc which puts a "guard" area at the beginning and end of malloc()d memory which catches when you get overflows. I am not sure what kinds of things are available across Linux distributions. --Ken
Re: patch for POP client bug #66407, inc long lines (0/2)
Here's another backtrace: Incorporating new mail into inbox... Program received signal SIGSEGV, Segmentation fault. __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461 warning: 461../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory (gdb) bt #0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461 #1 0x5556b2dc in memcpy (__len=8192, __src=, __dest=0x7fff9da0) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29 #2 netsec_read (nsc=0x555e2d00, buffer=buffer@entry=0x7fff9da0 "uvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopq"..., size=size@entry=8192, errstr=errstr@entry=0x7fff9d80) at sbr/netsec.c:406 #3 0x5556060e in traverse (action=action@entry=0xcd71 , closure=closure@entry=0x7fffbf40, fmt=fmt@entry=0x5557b4c1 "RETR %d") at uip/popsbr.c:592 #4 0x55560e2d in pop_retr (msgno=msgno@entry=1, action=action@entry=0xcd71 , closure=closure@entry=0x7fffbf40) at uip/popsbr.c:560 #5 0xdb7d in main (argc=, argv=) at uip/inc.c:576 Here, in netsec_read, (int)nsc->ns_inbuflen is -6553. I have worked on debugging this. At some point, the entire nsc structure becomes garbage. I have not yet pinpointed where memory gets corrupted. On the way to total understanding, I found one problem unrelated to this memory problem. In traverse(), "len" needs to be signed, or the "if (len < 0)" test cannot detect an error return from netsec_read. Meanwhile, has anyone tried my patch? Does it work for anyone other than me, its author? < Stephen
Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Wed, 20 Nov 2024 20:03:29 -0800: > I am building and testing with Ubuntu 24.04 on x86_64 hardware. I'm building and testing on OpenBSD 7.5 amd64. Very strange that you get different results. What's even more strange is what the backtrace reveals---it would be interesting to inspect the value of nsc->ns_readfd. Are all the failures the same "bit out of range 0 - FD_SETSIZE", or are there others? What configure options do you have enabled? Perhaps inc on your system is going down a different code path than mine? nmh configuration - nmh version: 1.8+dev host os: x86_64-unknown-openbsd7.5 compiler : cc compiler flags : -g -O2 -Wall -Wextra linker flags : -Qunused-arguments preprocessor flags : source code location : . binary install path: /usr/local/bin libexec install path : /usr/local/libexec/nmh config files install path : /usr/local/etc/nmh man page install path : /usr/local/share/man docs install path : /usr/local/share/doc/nmh RPM build root : ./RPM backup prefix : , transport system : smtp spool default locking type : fcntl default smtp server: localhost SASL support : no TLS support: yes OAuth support : no It seems that if I want to reproduce this I'll have to wrangle up a Linux VM somewhere. At any rate, I'll take a look at your new patch when I get some free time. If it's a better approach it would make sense to take that instead of mine, even if I have been using mine successfully for a while now. I did apply your patch to the same commit from master and ran it through the same iterations of test-pop and there were no issues, but then, I have no issues with my patch either. :-) Andy
Re: patch for POP client bug #66407, inc long lines (0/2)
>*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated Um, yikes. That's ... odd. That looks like it comes from: FD_SET(nsc->ns_readfd, &rfds); in sbr/netsec.c, in netsec_fillread(). Unless there is a file descriptor leak somewhere, I don't understand how nsc->ns_readfd got too big for FD_SETSIZE, unless maybe a buffer overflow overwrite nsc->ns_readfd. That's malloc'd memory, so possible a buffer was overwritten and it clobbered that. If you could figure out what the value of ns_readfd was that would be interesting. --Ken
Re: patch for POP client bug #66407, inc long lines (0/2)
I tried your script, running 100 times with jot. I get passes for small line lengths, and failures above 2*BUFSIZ. I am building and testing with Ubuntu 24.04 on x86_64 hardware. Here's a backtrace of one failure: Incorporating new mail into inbox... *** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=) at ./nptl/pthread_kill.c:44 warning: 44 ./nptl/pthread_kill.c: No such file or directory (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x7744526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x774288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x774297b6 in __libc_message_impl (fmt=fmt@entry=0x775ce765 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132 #6 0x77536c19 in __GI___fortify_fail (msg=msg@entry=0x775d3988 "bit out of range 0 - FD_SETSIZE on fd_set") at ./debug/fortify_fail.c:24 #7 0x77536715 in __GI___fdelt_chk (d=) at ./debug/fdelt_chk.c:26 #8 0x5556aef6 in netsec_fillread (nsc=nsc@entry=0x555e2e00, errstr=errstr@entry=0x7fff9d00) at sbr/netsec.c:545 #9 0x5556b2bc in netsec_read (nsc=0x555e2e00, buffer=buffer@entry=0x7fff9d20 "uvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopq"..., size=size@entry=8192, errstr=errstr@entry=0x7fff9d00) at sbr/netsec.c:395 #10 0x5556060e in traverse (action=action@entry=0xcd71 , closure=closure@entry=0x7fffbec0, fmt=fmt@entry=0x5557b4c1 "RETR %d") at uip/popsbr.c:592 #11 0x55560e2d in pop_retr (msgno=msgno@entry=1, action=action@entry=0xcd71 , closure=closure@entry=0x7fffbec0) at uip/popsbr.c:560 #12 0xdb7d in main (argc=, argv=) at uip/inc.c:576
Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Wed, 20 Nov 2024 09:23:11 -0800: > I cloned git at HEAD, applied your patch, applied my patch > 1 of 2 from today's email (so, the new test, not my fix), and > "autoreconf -i && ./configure && make check". I cloned master [ffe12c9a] and applied your test-pop patch and also my nmh-master-take-two.patch patch. I then ran 100 iterations with increasing line lengths: $ time for j in $(jot 100 1000 - 1000); do MH_TEST_LONG_LINE=$j make check-TESTS TESTS=test/inc/test-pop; done | grep -c PASS 100 6m56.07s real 0m49.40s user 4m36.99s system How about 10 tests at larger increment (I had to reduce to 10 because printf is slow at generating so much data): $ time for j in $(jot 10 10 - 10); do MH_TEST_LONG_LINE=$j make check-TESTS TESTS=test/inc/test-pop; done | grep -c PASS 10 7m33.01s real 0m49.42s user 4m56.34s system What about 100 iterations when the buffer is growing and "shifty": $ time for j in $(jot 100 1000 - 999); do MH_TEST_LONG_LINE=$j make check-TESTS TESTS=test/inc/test-pop; done | grep -c PASS 100 6m49.37s real 0m49.18s user 4m32.45s system So I'm not seeing any failures with my patch using your updated test-pop. I'm now curious why you're getting a segfault and I am not. Any chance you can get a backtrace like Ken suggested? Andy
Re: patch for POP client bug #66407, inc long lines (0/2)
Andy Bradford wrote: > Not sure what happened there. Did fakepop crash or something? How > confident are we that fakepop is doing the right thing? fakepop in 1.8 will crash. It is fixed at head. < Stephen
Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Wed, 20 Nov 2024 09:23:11 -0800: > That might be the difference. My patch WILL NOT WORK with nmh 1.8; it > must be applied to git head. Ok, I guess I'll have to try master. What I get when I run it with nmh-1.8 mostly passes 100%, however, I did see one strange falure: inc: Received EOF on network read' first named test failure: netrc's permissions test FAIL: test/inc/test-pop === 1 of 1 test failed Please report to nmh-workers@nongnu.org === Not sure what happened there. Did fakepop crash or something? How confident are we that fakepop is doing the right thing? That's from netsec.c: rc = read(nsc->ns_readfd, readbuf, readbufsize); if (rc == 0) { netsec_err(errstr, "Received EOF on network read"); return NOTOK; } So I'm not sure what's happening there yet. I then tried increasing the line length and eventually at 100,000 it failed like this: --- 0 ./test/inc/test-pop: test failed, outputs are in .../nmh-1.8/test/testdir/testmessage and .../nmh-1.8/test/testdir/Mail/inbox/11. first named test failure: credentials: file FAIL: test/inc/test-pop === 1 of 1 test failed Please report to nmh-workers@nongnu.org === And while I can find testmessage, there is no evidence of Mail/inbox/11 so I wasn't able to verify what's going on. Also, it seems to report differently each time I run it. Sometimes it says the above, sometimes: first named test failure: netrc's permissions test In all cases, however, it's failing on EOF. I don't see any crash or corruption. I'll try again with the latest later. Andy
Re: patch for POP client bug #66407, inc long lines (0/2)
>Segmentation fault (core dumped) ./test/inc/test-pop: "inc Can we get a backtrace here? I knew we had to go over that buffer handling in the original patch with a fine-toothed comb. --Ken
Re: patch for POP client bug #66407, inc long lines (0/2)
Andy Bradford wrote: > ... I've been running this very patch for > over 2 years now without any corruption or "aborts" on a "live" email > system. I wonder what your test-pop test does differently than real > email servers. That's far more operational experience than I've yet had with my patch. > Specifically, I've been running nmh-1.8 with my patch. That might be the difference. My patch WILL NOT WORK with nmh 1.8; it must be applied to git head. > I would like to learn from what I may have missed that's causing your > failures. Would you mind sharing some steps that I run against my own > system that has my patch and your "test-pop" test? I cloned git at HEAD, applied your patch, applied my patch 1 of 2 from today's email (so, the new test, not my fix), and "autoreconf -i && ./configure && make check". The "make check" failed with, in part, this output: Segmentation fault (core dumped) ./test/inc/test-pop: "inc -user testuser -host 127.0.0.1 -port 64884 -width 80" expected: 'Incorporating new mail into inbox... 11+ 12/17 No Such User Hello<
Re: patch for POP client bug #66407, inc long lines (0/2)
Thus said Stephen Gildea on Tue, 19 Nov 2024 10:03:28 -0800: > I tried out nmh-master-take-two-amb.patch of 22 Nov 2022. Sometimes it > works, and sometimes it seems to read outside the buffer, either > aborting or giving me garbage. It succeeded on a bad message from UPS, > but it failed with my enhanced test-pop test. That's interesting to me since I've been running this very patch for over 2 years now without any corruption or "aborts" on a "live" email system. I wonder what your test-pop test does differently than real email servers. Specifically, I've been running nmh-1.8 with my patch. I would like to learn from what I may have missed that's causing your failures. Would you mind sharing some steps that I run against my own system that has my patch and your "test-pop" test? > There was one aspect of this patch that I didn't like: that popsbr.c's > expanded traverse() duplicates logic from multiline, netsec_readline > and netsec_fillread. Yes, I was not happy about duplicating some of the logic but didn't have the time to rewrite more of it. > Con: Because this change is so deep, I had to re-do the interfaces > several layers up. pop_readline(), multiline(), and callers of > multiline are all affected. This is the other reason why I opted to change it the way I did. The more I got into it, the deeper it got and it started getting quite complicated so I took what seemed to be the simpler approach. Thanks, Andy