Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Andy Bradford
Thus said Stephen Gildea on Fri, 22 Nov 2024 15:04:04 -0800:

> I think I have found the bug.  In traverse(), when the buffer is full,
> at  about popsbr.c  line 656,  "slen" is  the size  of "response",  so
> setting response[slen] to NUL writes one byte beyond the array.

Ok,  good find.  Now I  wonder why  I haven't  seen this  in 2  years of
running it. Surely  I must have had some line  somewhere that caused the
buffer  to  overflow---but  then  again, maybe  this  just  reveals  how
uncommon it is for email systems  to violate the line length restriction
in emails.

But  I'm also  baffled by  why running  your test-pop  against my  patch
doesn't trigger it here on OpenBSD.

Amazingly good catch. And now that I  look at my code from what seems an
age ago, I'm not sure I like it... too messy.

Thanks,

Andy




Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Ken Hornstein
>Ok,  good find.  Now I  wonder why  I haven't  seen this  in 2  years of
>running it. Surely  I must have had some line  somewhere that caused the
>buffer  to  overflow---but  then  again, maybe  this  just  reveals  how
>uncommon it is for email systems  to violate the line length restriction
>in emails.

It wouldn't surprise me that a SINGLE byte buffer overflow wouldn't
necessarily trigger a crash on all systems.  It sounds like what
happened here was the malloc pool got corrupted and future allocations
stepped over a previously-allocated buffer.  If the system malloc()
implementation ended up doing something like rounding up some
allocations to some alignment-based size, you might not see it.

--Ken



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Andy Bradford
Thus said Stephen Gildea on Fri, 22 Nov 2024 14:18:30 -0800:

> Meanwhile, has  anyone tried my patch?  Does it work for  anyone other
> than me, its author?

As I mentioned in a previous email,  I have applied your patch, but only
for running through  your test-pop. I haven't looked it  over to see the
difference in your approach, nor have I put it into actual use---haven't
had the time. I'll  see if I can take a look at  it this weekend. As for
getting it "working", I may have to backport it to nmh-1.8 just so I can
integrate it more  easily with what I'm currently running  for an actual
test.

Unless there's a way to have it inc into somewhere other than my current
setup.  Maybe that's  a better  way to  test it---I'll  have to  see how
feasible that is.

Thanks,

Andy




Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Paul Fox
stephen wrote:
 > I think I have found the bug.  In traverse(), when the buffer
 > is full, at about popsbr.c line 656, "slen" is the size of
 > "response", so setting response[slen] to NUL writes one byte
 > beyond the array.

Shocking!!   How can these things happen?!?:-)

Do you have access to valgrind?  It would (should) have found this
pretty quickly.  And, it might find something else, if that's not
the only issue.

paul
=--
paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 43.6 degrees)




Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Stephen Gildea
I think I have found the bug.  In traverse(), when the buffer
is full, at about popsbr.c line 656, "slen" is the size of
"response", so setting response[slen] to NUL writes one byte
beyond the array.

 < Stephen



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Ken Hornstein
To answer your last question ... I have been unfortunately too busy to do
any testing myself.  But I will note this:

>#1 0x5556b2dc in memcpy (__len=8192,
>__src=, __dest=0x7fff9da0) at
>/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29

It is unfortunate that src is optimized out, but that would be coming
nsc->ns_inptr so that does track in nsc getting clobbered.

>I have worked on debugging this.  At some point, the entire nsc
>structure becomes garbage.  I have not yet pinpointed where memory gets
>corrupted.

Is it possible to set a watchpoint on some part of the nsc structure?

If you ARE overrunning malloc()d data, maybe use a debug malloc library?
MacOS X comes with gmalloc which puts a "guard" area at the beginning and
end of malloc()d memory which catches when you get overflows.  I am not
sure what kinds of things are available across Linux distributions.

--Ken



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-22 Thread Stephen Gildea
Here's another backtrace:

Incorporating new mail into inbox...


Program received signal SIGSEGV, Segmentation fault.
__memcpy_avx_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
warning: 461../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No 
such file or directory
(gdb) bt
#0  __memcpy_avx_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
#1  0x5556b2dc in memcpy (__len=8192, __src=, 
__dest=0x7fff9da0) at 
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
#2  netsec_read (nsc=0x555e2d00, 
buffer=buffer@entry=0x7fff9da0 "uvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopq"..., size=size@entry=8192, 
errstr=errstr@entry=0x7fff9d80) at sbr/netsec.c:406
#3  0x5556060e in traverse (action=action@entry=0xcd71 
, closure=closure@entry=0x7fffbf40, 
fmt=fmt@entry=0x5557b4c1 "RETR %d")
at uip/popsbr.c:592
#4  0x55560e2d in pop_retr (msgno=msgno@entry=1, 
action=action@entry=0xcd71 , 
closure=closure@entry=0x7fffbf40) at uip/popsbr.c:560
#5  0xdb7d in main (argc=, argv=) at 
uip/inc.c:576


Here, in netsec_read, (int)nsc->ns_inbuflen is -6553.

I have worked on debugging this.  At some point, the entire nsc
structure becomes garbage.  I have not yet pinpointed where memory
gets corrupted.

On the way to total understanding, I found one problem unrelated to
this memory problem.  In traverse(), "len" needs to be signed, or the
"if (len < 0)" test cannot detect an error return from netsec_read.

Meanwhile, has anyone tried my patch?  Does it work for anyone other
than me, its author?

 < Stephen



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-21 Thread Andy Bradford
Thus said Stephen Gildea on Wed, 20 Nov 2024 20:03:29 -0800:

> I am building and testing with Ubuntu 24.04 on x86_64 hardware.

I'm  building  and testing  on  OpenBSD  7.5  amd64. Very  strange  that
you  get  different  results.  What's  even more  strange  is  what  the
backtrace  reveals---it would  be interesting  to inspect  the value  of
nsc->ns_readfd. Are  all the  failures the  same "bit out  of range  0 -
FD_SETSIZE", or are there others?

What configure options  do you have enabled? Perhaps inc  on your system
is going down a different code path than mine?

nmh configuration
-
nmh version: 1.8+dev
host os: x86_64-unknown-openbsd7.5
compiler   : cc
compiler flags :  -g -O2 -Wall -Wextra
linker flags   : -Qunused-arguments 
preprocessor flags :  
source code location   : .
binary install path: /usr/local/bin
libexec install path   : /usr/local/libexec/nmh
config files install path  : /usr/local/etc/nmh
man page install path  : /usr/local/share/man
docs install path  : /usr/local/share/doc/nmh
RPM build root : ./RPM
backup prefix  : ,
transport system   : smtp
spool default locking type : fcntl
default smtp server: localhost
SASL support   : no
TLS support: yes
OAuth support  : no

It seems  that if I  want to  reproduce this I'll  have to wrangle  up a
Linux VM somewhere.

At any  rate, I'll take a  look at your new  patch when I get  some free
time. If it's a better approach it would make sense to take that instead
of mine, even if I have been using mine successfully for a while now.

I did apply your patch to the same commit from master and ran it through
the same  iterations of test-pop and  there were no issues,  but then, I
have no issues with my patch either. :-)

Andy





Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Ken Hornstein
>*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated

Um, yikes.  That's ... odd.  That looks like it comes from:

FD_SET(nsc->ns_readfd, &rfds);

in sbr/netsec.c, in netsec_fillread().  Unless there is a file descriptor
leak somewhere, I don't understand how nsc->ns_readfd got too big for
FD_SETSIZE, unless maybe a buffer overflow overwrite nsc->ns_readfd.
That's malloc'd memory, so possible a buffer was overwritten and it
clobbered that.  If you could figure out what the value of ns_readfd was
that would be interesting.

--Ken



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Stephen Gildea
I tried your script, running 100 times with jot.  I get passes
for small line lengths, and failures above 2*BUFSIZ.

I am building and testing with Ubuntu 24.04 on x86_64 hardware.

Here's a backtrace of one failure:

Incorporating new mail into inbox...

*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=) at 
./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x7744526e in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x774288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x774297b6 in __libc_message_impl (fmt=fmt@entry=0x775ce765 
"*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132
#6  0x77536c19 in __GI___fortify_fail (msg=msg@entry=0x775d3988 
"bit out of range 0 - FD_SETSIZE on fd_set") at ./debug/fortify_fail.c:24
#7  0x77536715 in __GI___fdelt_chk (d=) at 
./debug/fdelt_chk.c:26
#8  0x5556aef6 in netsec_fillread (nsc=nsc@entry=0x555e2e00, 
errstr=errstr@entry=0x7fff9d00) at sbr/netsec.c:545
#9  0x5556b2bc in netsec_read (nsc=0x555e2e00, 
buffer=buffer@entry=0x7fff9d20 "uvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopqrstuvwxyz.. 
Abcdefghijklmnopqrstuvwxyz.. Abcdefghijklmnopq"..., size=size@entry=8192, 
errstr=errstr@entry=0x7fff9d00) at sbr/netsec.c:395
#10 0x5556060e in traverse (action=action@entry=0xcd71 
, closure=closure@entry=0x7fffbec0, 
fmt=fmt@entry=0x5557b4c1 "RETR %d")
at uip/popsbr.c:592
#11 0x55560e2d in pop_retr (msgno=msgno@entry=1, 
action=action@entry=0xcd71 , 
closure=closure@entry=0x7fffbec0) at uip/popsbr.c:560
#12 0xdb7d in main (argc=, argv=) at 
uip/inc.c:576



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Andy Bradford
Thus said Stephen Gildea on Wed, 20 Nov 2024 09:23:11 -0800:

> I cloned git at HEAD, applied your patch, applied my patch
> 1 of 2 from today's email (so, the new test, not my fix), and
> "autoreconf -i && ./configure && make check".

I  cloned master  [ffe12c9a] and  applied your  test-pop patch  and also
my  nmh-master-take-two.patch  patch. I  then  ran  100 iterations  with
increasing line lengths:

$ time for j in $(jot 100 1000 - 1000); do MH_TEST_LONG_LINE=$j make 
check-TESTS TESTS=test/inc/test-pop; done | grep -c PASS 
100
6m56.07s real 0m49.40s user 4m36.99s system


How about 10  tests at larger increment  (I had to reduce  to 10 because
printf is slow at generating so much data):

$ time for j in $(jot 10 10 - 10); do MH_TEST_LONG_LINE=$j make 
check-TESTS TESTS=test/inc/test-pop; done | grep -c PASS 
10
7m33.01s real 0m49.42s user 4m56.34s system


What about 100 iterations when the buffer is growing and "shifty":

$ time for j in $(jot 100 1000 - 999); do MH_TEST_LONG_LINE=$j make check-TESTS 
TESTS=test/inc/test-pop; done | grep -c PASS 
100
6m49.37s real 0m49.18s user 4m32.45s system


So  I'm  not seeing  any  failures  with  my  patch using  your  updated
test-pop. I'm  now curious why you're  getting a segfault and  I am not.
Any chance you can get a backtrace like Ken suggested?

Andy



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Stephen Gildea
Andy Bradford  wrote:

>   Not  sure what  happened  there.  Did fakepop  crash  or something?  How
>   confident are we that fakepop is doing the right thing?

fakepop in 1.8 will crash.  It is fixed at head.

 < Stephen



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Andy Bradford
Thus said Stephen Gildea on Wed, 20 Nov 2024 09:23:11 -0800:

> That might be the difference. My patch  WILL NOT WORK with nmh 1.8; it
> must be applied to git head.

Ok, I  guess I'll  have to try  master. What  I get when  I run  it with
nmh-1.8 mostly passes 100%, however, I did see one strange falure:

inc: Received EOF on network read'
first named test failure: netrc's permissions test
FAIL: test/inc/test-pop
===
1 of 1 test failed
Please report to nmh-workers@nongnu.org
===

Not  sure what  happened  there.  Did fakepop  crash  or something?  How
confident are we that fakepop is doing the right thing?

That's from netsec.c:

rc = read(nsc->ns_readfd, readbuf, readbufsize);

if (rc == 0) {
netsec_err(errstr, "Received EOF on network read");
return NOTOK;
}

So I'm not sure what's happening there yet.

I then  tried increasing the  line length  and eventually at  100,000 it
failed like this:

--- 0 

./test/inc/test-pop: test failed, outputs are in 
.../nmh-1.8/test/testdir/testmessage and .../nmh-1.8/test/testdir/Mail/inbox/11.
first named test failure: credentials: file
FAIL: test/inc/test-pop
===
1 of 1 test failed
Please report to nmh-workers@nongnu.org
===

And while I can find testmessage,  there is no evidence of Mail/inbox/11
so I wasn't able to verify what's going on.

Also, it  seems to report differently  each time I run  it. Sometimes it
says the above, sometimes:

first named test failure: netrc's permissions test

In all  cases, however, it's  failing on EOF. I  don't see any  crash or
corruption.

I'll try again with the latest later.

Andy




Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Ken Hornstein
>Segmentation fault (core dumped) ./test/inc/test-pop: "inc

Can we get a backtrace here?

I knew we had to go over that buffer handling in the original patch with
a fine-toothed comb.

--Ken



Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Stephen Gildea
Andy Bradford wrote:

>   ...  I've been  running this very  patch for
>   over 2  years now without any  corruption or "aborts" on  a "live" email
>   system.  I wonder  what your  test-pop test  does differently  than real
>   email servers.

That's far more operational experience than I've yet had with my patch.

>   Specifically, I've been running nmh-1.8 with my patch.

That might be the difference.  My patch WILL NOT WORK with
nmh 1.8; it must be applied to git head.

>   I would like  to learn from what  I may have missed  that's causing your
>   failures. Would  you mind sharing some  steps that I run  against my own
>   system that has my patch and your "test-pop" test?

I cloned git at HEAD, applied your patch, applied my patch
1 of 2 from today's email (so, the new test, not my fix), and
"autoreconf -i && ./configure && make check".

The "make check" failed with, in part, this output:


Segmentation fault (core dumped)
./test/inc/test-pop: "inc -user testuser -host 127.0.0.1 -port 64884 -width 80" 
expected:
'Incorporating new mail into inbox...

  11+ 12/17 No Such User   Hello<

Re: patch for POP client bug #66407, inc long lines (0/2)

2024-11-20 Thread Andy Bradford
Thus said Stephen Gildea on Tue, 19 Nov 2024 10:03:28 -0800:

> I tried out nmh-master-take-two-amb.patch of 22 Nov 2022. Sometimes it
> works,  and sometimes  it seems  to  read outside  the buffer,  either
> aborting or giving me garbage. It succeeded on a bad message from UPS,
> but it failed with my enhanced test-pop test.

That's interesting  to me since  I've been  running this very  patch for
over 2  years now without any  corruption or "aborts" on  a "live" email
system.  I wonder  what your  test-pop test  does differently  than real
email servers.

Specifically, I've been running nmh-1.8 with my patch.

I would like  to learn from what  I may have missed  that's causing your
failures. Would  you mind sharing some  steps that I run  against my own
system that has my patch and your "test-pop" test?


> There was one aspect of this patch that I didn't like: that popsbr.c's
> expanded traverse()  duplicates logic from  multiline, netsec_readline
> and netsec_fillread.

Yes, I was not happy about duplicating some of the logic but didn't have
the time to rewrite more of it.


> Con: Because  this change is  so deep, I  had to re-do  the interfaces
> several  layers  up.  pop_readline(),   multiline(),  and  callers  of
> multiline are all affected.

This is  the other reason why  I opted to change  it the way I  did. The
more I  got into  it, the  deeper it  got and  it started  getting quite
complicated so I took what seemed to be the simpler approach.

Thanks,

Andy