performance of shell read loops

David Holland Sun, 06 Mar 2016 21:22:27 -0800

Today it came up (in the context of something Christos did) that shell
read loops are horribly, horribly slow. I'd always assumed that this
was because shell read loops tend to fork at least once for every
iteration (so O(n) times) and a more pipeline-oriented approach tends
to fork O(1) times. While this is probably true it doesn't appear to
be the whole story.


In the following example the shell read loop doesn't fork, but it's
still monumentally slow compared to either sed or awk. (awk is
probably a fairer comparison as it executes the exact same logic; sed
is specialized for this sort of thing...)

   valkyrie% time ./script-sed.sh 
   8.907u 0.086s 0:09.16 98.0%     0+0k 0+1io 0pf+0w
   valkyrie% time ./script-awk.sh
   15.968u 0.101s 0:16.07 99.9%    0+0k 0+2io 0pf+0w
   valkyrie% time ./script-sh.sh
   91.311u 96.339s 3:07.93 99.8%   0+0k 0+2io 0pf+0w

pretty bad!

Some of this is doubtless because sh is required for stupid standards
reasons to reread the input file on every iteration. I wonder if
there's anything else going on... unfortunately I don't have any more
time to look into it this weekend, so I figured I'd post what I've
got.

   --- script-sed.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    sed < $FILE 's/#.*//;/^$/d'
done > /dev/null
   --- script-awk.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    awk < $FILE '{
        sub("#.*", "", $0);
        if (NF == 0) {
            next;
        }
        print;
    }'
done > /dev/null
   --- script-sh.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    while read v; do
        v=${v%%#*}
        if [ -z "$v" ]; then
            continue
        fi
        echo $v
    done < $FILE
done > /dev/null
   ------

-- 
David A. Holland
dholl...@netbsd.org

performance of shell read loops

Reply via email to