Today it came up (in the context of something Christos did) that shell read loops are horribly, horribly slow. I'd always assumed that this was because shell read loops tend to fork at least once for every iteration (so O(n) times) and a more pipeline-oriented approach tends to fork O(1) times. While this is probably true it doesn't appear to be the whole story.
In the following example the shell read loop doesn't fork, but it's still monumentally slow compared to either sed or awk. (awk is probably a fairer comparison as it executes the exact same logic; sed is specialized for this sort of thing...) valkyrie% time ./script-sed.sh 8.907u 0.086s 0:09.16 98.0% 0+0k 0+1io 0pf+0w valkyrie% time ./script-awk.sh 15.968u 0.101s 0:16.07 99.9% 0+0k 0+2io 0pf+0w valkyrie% time ./script-sh.sh 91.311u 96.339s 3:07.93 99.8% 0+0k 0+2io 0pf+0w pretty bad! Some of this is doubtless because sh is required for stupid standards reasons to reread the input file on every iteration. I wonder if there's anything else going on... unfortunately I don't have any more time to look into it this weekend, so I figured I'd post what I've got. --- script-sed.sh --- #!/bin/sh FILE=/usr/share/dict/words for i in $(jot 100 1); do sed < $FILE 's/#.*//;/^$/d' done > /dev/null --- script-awk.sh --- #!/bin/sh FILE=/usr/share/dict/words for i in $(jot 100 1); do awk < $FILE '{ sub("#.*", "", $0); if (NF == 0) { next; } print; }' done > /dev/null --- script-sh.sh --- #!/bin/sh FILE=/usr/share/dict/words for i in $(jot 100 1); do while read v; do v=${v%%#*} if [ -z "$v" ]; then continue fi echo $v done < $FILE done > /dev/null ------ -- David A. Holland dholl...@netbsd.org