Re: uniq(1): support arbitrarily long lines

2021-11-01 Thread Scott Cheloha
On Mon, Nov 01, 2021 at 10:27:40AM -0600, Todd C. Miller wrote: > On Mon, 01 Nov 2021 10:36:08 -0500, Scott Cheloha wrote: > > > My own testing here with pathological inputs didn't show that large of > > a performance difference between fgets(3) and getline(3). There was > > a difference but it w

Re: uniq(1): support arbitrarily long lines

2021-11-01 Thread Stuart Henderson
On 2021/11/01 10:36, Scott Cheloha wrote: > How did you generate this input? Is it just ten million lines with a > single 'z' character? `jot -bz 1000`? That one was lots of copies of ports/infrastructure/bsd.port.mk catted together. > Updated patch. > > I screwed up. We don't need to fre

Re: uniq(1): support arbitrarily long lines

2021-11-01 Thread Todd C . Miller
On Mon, 01 Nov 2021 10:36:08 -0500, Scott Cheloha wrote: > My own testing here with pathological inputs didn't show that large of > a performance difference between fgets(3) and getline(3). There was > a difference but it was closer to like 5-10%. With your updated patch I see: % wc -l /tmp/z

Re: uniq(1): support arbitrarily long lines

2021-11-01 Thread Scott Cheloha
On Mon, Nov 01, 2021 at 09:04:03AM +, Stuart Henderson wrote: > On 2021/10/31 20:48, Scott Cheloha wrote: > > In uniq(1), if we use getline(3) instead of fgets(3) we can support > > arbitrarily long lines. > > It works for me, and getting rid of the length restriction is nice. > > I don't kno

Re: uniq(1): support arbitrarily long lines

2021-11-01 Thread Stuart Henderson
On 2021/10/31 20:48, Scott Cheloha wrote: > In uniq(1), if we use getline(3) instead of fgets(3) we can support > arbitrarily long lines. It works for me, and getting rid of the length restriction is nice. I don't know how much of a concern it is, but it's about twice as slow: $ wc -l /tmp/z 10

uniq(1): support arbitrarily long lines

2021-10-31 Thread Scott Cheloha
In uniq(1), if we use getline(3) instead of fgets(3) we can support arbitrarily long lines. The only potentially confusing thing here is the pointer exchange within the loop. The current code uses fixed buffers so we just do a pointer swap. Easy. The new code uses dynamic buffers so we need to