Hi,

This change causes uniq(1) to compare equal lines twice when run without
`-i', once in strcmp(3) and once again in strcasecmp(3).  In the worst
case, which is also one of the most common, the main loop spends about
half of its time copying buffers and looking for newlines in fgets(3),
and the other half actually comparing those buffers; hence, in practice,
because of this commit, it has now become 25% slower than it was before.

        $ jot -b a -s a 4080 >tmp
        $ cat $(jot -b tmp 4096) >tmp2
        $ cat $(jot -b tmp2 16) >lines

        $ time ./uniq <lines >/dev/null
            0m01.60s real     0m00.80s user     0m00.75s system
        $ time uniq <lines >/dev/null
            0m01.23s real     0m00.47s user     0m00.73s system

Obviously, the relevant condition should have been

        if ((iflag ? strcasecmp : strcmp)(t1, t2) != 0)

instead of awkwardly messing with logical AND and OR.

That said, it seems that this whole program was not really written with
performance in mind in the first place, as in less than an hour I was
able to write a new uniq(1) which, apart from handling arbitrarily long
lines and NUL bytes correctly as well as consistently parsing its
arguments (contrary to OpenBSD's version), has proved an order of
magnitude faster than the latter in such worst cases, so I guess a 25%
slowdown may not appear that important after all.  But, still, I think
the code makes little sense as it currently stands, if only from a
logical point of view and regardless of those performance
considerations.

Regards,

kshe

Reply via email to