bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
tag 37093 notabug close 37093 stop Hello, On 2019-08-19 10:44 p.m., Edward Huff wrote: In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. [...] baseline results: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 $ First, Try to avoid UTF8 locales (i.e., force a C/POSIX locale with LC_ALL=C) which makes 'wc' much faster. On my computer: With UTF8 locale: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) | time --portability wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 46.5928 s, 23.0 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 real 46.59 user 46.37 sys 0.19 With C locale: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) | LC_ALL=C time --portability wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.60285 s, 125 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 real 8.60 user 5.22 sys 0.26 Second, The "word counting" feature in 'wc' is the main cpu-hog. If you avoid that (i.e. counting only lines, or only characters), 'wc' is even faster (and it automatically ignores UTF8 issues): $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \ | tee >(sha256sum>&2) \ | \time --portability wc -c 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.59429 s, 141 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 1073741824 real 7.59 user 0.10 sys 0.71 Notice that the "real time" wasn't changed much (from 8.6s to 7.59s), but the actual work performed by 'wc' (measured in "user time") is down drastically. Third, If you are comfortable with compiling Coreutils from source, you can build it using optimized hashing function from OpenSSL, like so: ./configure --with-openssl make Then, "sha256sum" will be faster (about 2x fast on my computer). If you don't want to re-compile it, consider using "openssl" directly to calculate the checksum, like so: dd if=/dev/zero count=1K bs=1M | tee >(openssl sha256>&2) | wc -c Fourth, To save few more microseconds, consider using dd with larger block size (bs=) and fewer blocks (count=), e.g.: $ time dd if=/dev/zero of=/dev/null count=1M bs=1K 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.865853 s, 1.2 GB/s real 0m0.868s user 0m0.288s sys 0m0.579s $ time dd if=/dev/zero of=/dev/null count=1K bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.0998688 s, 10.8 GB/s real 0m0.102s user 0m0.000s sys 0m0.102s This won't reduce the total time by much, but will result in fewer sys-calls, and less CPU kernel time (at least by a tiny bit). The effect is more noticeable when reading or writing to a physical disk. Lastly, If you use GNU time instead of the shell's built-in 'time' function, you can specify custom output format, and easily show the timing of each program in the pipeline. Example: $ FMT="\n=== CMD: %C ===\nreal %e\tuser %U\tsys %S\n" $ \time -f "$FMT" dd if=/dev/zero count=1M bs=1K \ | \time -f "$FMT" tee >(\time -f "$FMT" sha256sum>&2) \ | \time -f "$FMT" wc -c 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.77339 s, 138 MB/s === CMD: dd if=/dev/zero count=1048576 bs=1024 === real 7.77 user 0.36 sys 1.65 === CMD: tee /dev/fd/63 === real 7.77 user 0.10 sys 1.30 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - === CMD: sha256sum === real 7.77 user 7.47 sys 0.27 1073741824 === CMD: wc -c === real 7.77 user 0.05 sys 0.76 As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf
bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
On 8/20/19 6:44 AM, Edward Huff wrote: > In the demo below, dd uses 0.665s to write 1GiB of zeros. > sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. > wc uses 32.160s to count 1GiB of zeros. > > Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UTC 2019 > x86_64 x86_64 x86_64 GNU/Linux > coreutils-8.31-2.fc30.x86_64 > dd (coreutils) 8.31 > wc (GNU coreutils) 8.31 > sha256sum (GNU coreutils) 8.31 > > baseline results: > $ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc > 1048576+0 records in > 1048576+0 records out > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s > 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - > 0 0 1073741824 > $ > > Demo script: > $ cat wc-at-100pct > #!/bin/bash > set -xv > rm pipe* tee* > { >{ > time dd if=/dev/zero count=$((1024*1024)) bs=1024 >} 2>>pipedd > } | { >tee >( > { > time sha256sum > } >teesha256 2>&1 >) > } | { >{ > time wc >} > pipewc 2>&1 > } > > { >{ > time dd if=/dev/zero count=$((1024*1024)) bs=1024 >} 2>>pipedd > } | { >tee >( > { > time wc > } >teewc 2>&1 >) > } | { >{ > time sha256sum >} > pipesha256sum 2>&1 > } > > head pipe* tee* > $ > > Results: > ./wc-at-100pct > rm pipe* tee* > + rm pipedd pipesha256sum pipewc teesha256 teewc > { >{ > time dd if=/dev/zero count=$((1024*1024)) bs=1024 >} 2>>pipedd > } | { >tee >( > { > time sha256sum > } >teesha256 2>&1 >) > } | { >{ > time wc >} > pipewc 2>&1 > } > + tee /dev/fd/63 > > { >{ > time dd if=/dev/zero count=$((1024*1024)) bs=1024 >} 2>>pipedd > } | { >tee >( > { > time wc > } >teewc 2>&1 >) > } | { >{ > time sha256sum >} > pipesha256sum 2>&1 > } > + tee /dev/fd/63 > > head pipe* tee* > + head pipedd pipesha256sum pipewc teesha256 teewc > ==> pipedd <== > + dd if=/dev/zero count=1048576 bs=1024 > 1048576+0 records in > 1048576+0 records out > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5495 s, 33.0 MB/s > > real 0m32.550s > user 0m0.665s > sys 0m1.503s > + dd if=/dev/zero count=1048576 bs=1024 > 1048576+0 records in > > ==> pipesha256sum <== > + sha256sum > 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - > > real 0m31.457s > user 0m4.285s > sys 0m0.562s > > ==> pipewc <== > + wc > 0 0 1073741824 > > real 0m32.555s > user 0m32.160s > sys 0m0.247s > > ==> teesha256 <== > ++ sha256sum > 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - > > real 0m32.553s > user 0m4.333s > sys 0m0.704s > > ==> teewc <== > ++ wc > 0 0 1073741824 > > real 0m31.456s > user 0m31.121s > sys 0m0.221s I'm not sure what your report is trying to demonstrate exactly. Here, your script just takes ~5-6 seconds for each pass on a ~5 year-old "Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz". Furthermore, replacing 'time' by 'env time -v' shows that wc(1) is taking less CPU than sha256sum - exactly as one would expect: $ grep CPU * | column -t pipedd: Percent of CPU this job got: 27% pipedd: Percent of CPU this job got: 29% pipesha256sum: Percent of CPU this job got: 99% pipewc: Percent of CPU this job got: 65% teesha256: Percent of CPU this job got: 99% teewc: Percent of CPU this job got: 66% > coreutils-8.31-2.fc30.x86_64 It looks like you are using the Fedora package of coreutils ... which may have some additional patches compared to upstream GNU coreutils. Did you try the upstream version as well? Have a nice day, Berny
bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux coreutils-8.31-2.fc30.x86_64 dd (coreutils) 8.31 wc (GNU coreutils) 8.31 sha256sum (GNU coreutils) 8.31 baseline results: $ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - 0 0 1073741824 $ Demo script: $ cat wc-at-100pct #!/bin/bash set -xv rm pipe* tee* { { time dd if=/dev/zero count=$((1024*1024)) bs=1024 } 2>>pipedd } | { tee >( { time sha256sum } >teesha256 2>&1 ) } | { { time wc } > pipewc 2>&1 } { { time dd if=/dev/zero count=$((1024*1024)) bs=1024 } 2>>pipedd } | { tee >( { time wc } >teewc 2>&1 ) } | { { time sha256sum } > pipesha256sum 2>&1 } head pipe* tee* $ Results: ./wc-at-100pct rm pipe* tee* + rm pipedd pipesha256sum pipewc teesha256 teewc { { time dd if=/dev/zero count=$((1024*1024)) bs=1024 } 2>>pipedd } | { tee >( { time sha256sum } >teesha256 2>&1 ) } | { { time wc } > pipewc 2>&1 } + tee /dev/fd/63 { { time dd if=/dev/zero count=$((1024*1024)) bs=1024 } 2>>pipedd } | { tee >( { time wc } >teewc 2>&1 ) } | { { time sha256sum } > pipesha256sum 2>&1 } + tee /dev/fd/63 head pipe* tee* + head pipedd pipesha256sum pipewc teesha256 teewc ==> pipedd <== + dd if=/dev/zero count=1048576 bs=1024 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5495 s, 33.0 MB/s real 0m32.550s user 0m0.665s sys 0m1.503s + dd if=/dev/zero count=1048576 bs=1024 1048576+0 records in ==> pipesha256sum <== + sha256sum 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - real 0m31.457s user 0m4.285s sys 0m0.562s ==> pipewc <== + wc 0 0 1073741824 real 0m32.555s user 0m32.160s sys 0m0.247s ==> teesha256 <== ++ sha256sum 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 - real 0m32.553s user 0m4.333s sys 0m0.704s ==> teewc <== ++ wc 0 0 1073741824 real 0m31.456s user 0m31.121s sys 0m0.221s