Re: Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2

2017-05-30 Thread Ximin Luo
Ximin Luo:
> Ximin Luo:
>> Package: diffoscope
>> Version: 78
>> Severity: normal
>>
>> Dear Maintainer,
>>
>> diff(1) first reads the contents of one file then the next one:
>>
>> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552
>>
>> This means that if the "files" are actually FIFOs connected to the output of 
>> a
>> process, as they are in many cases in diffoscope, the second process has to 
>> wait
>> for diff(1) to fully read the output of the first process, before it itself 
>> can
>> run. This prevents both processes from running in parallel.
>>
>> An appropriate fix would be to store the output of at least one of the 
>> commands
>> into a temporary file, and have diff(1) read from this instead. This has to 
>> be
>> done carefully however, to make sure that diff(1) doesn't accidentally read 
>> it
>> before the process is finished.
>>
>> [..]
> 
> It seems readelf specifically has weird performance behaviours when running 
> in parallel.
> 
> [..]

I couldn't reproduce the above results on Holger's profitbricks machine, and 
bunk@ couldn't reproduce it either. That is, running the commands in parallel 
*did* produce roughly a 2x speed up.

Also on my local machine I got:

$ ls -laSr /usr/bin/{hot,hokey,darcs}
-rwxr-xr-x 1 root root 20555008 Oct 28  2016 /usr/bin/hot*
-rwxr-xr-x 1 root root 29637664 Oct 28  2016 /usr/bin/hokey*
-rwxr-xr-x 1 root root 37144392 Oct 28  2016 /usr/bin/darcs*

$ f() { taskset --cpu-list $1 objdump -S /usr/bin/hot >/dev/null; }; time ( 
f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m12.445s
user0m12.408s
sys 0m0.024s

real0m7.653s
user0m15.224s
sys 0m0.040s

$ f() { taskset --cpu-list $1 objdump -S /usr/bin/hokey >/dev/null; }; time 
( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m24.998s
user0m24.896s
sys 0m0.064s

real0m21.197s
user0m42.224s
sys 0m0.076s

$ f() { taskset --cpu-list $1 objdump -S /usr/bin/darcs >/dev/null; }; time 
( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m38.652s
user0m38.532s
sys 0m0.064s

real0m34.323s
user1m8.168s
sys 0m0.104s

i.e. the speed-improvement-due-to-parallelism decreases as the size of the 
input increases - but I couldn't reproduce this the profitbricks machine either.

Due to the lack of debugging symbols for binutils (#863728) it's hard for me to 
investigate this further, so I'll pause this for now.

It's probably worth un-reverting e28b540b0b289ce9fda70095160382799d7602a6 
perhaps guarded by a CLI flag; though diffoscope's heavy use of Python-based 
filtering of external commands' output makes this less significant (without 
also trying to optimise how this filtering is done). In the meantime I'm also 
using "--exclude-command '^readelf.*\s--debug-dump=info'" to avoid the longest 
part of ELF processing.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2

2017-05-30 Thread Ximin Luo
Ximin Luo:
> Package: diffoscope
> Version: 78
> Severity: normal
> 
> Dear Maintainer,
> 
> diff(1) first reads the contents of one file then the next one:
> 
> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552
> 
> This means that if the "files" are actually FIFOs connected to the output of a
> process, as they are in many cases in diffoscope, the second process has to 
> wait
> for diff(1) to fully read the output of the first process, before it itself 
> can
> run. This prevents both processes from running in parallel.
> 
> An appropriate fix would be to store the output of at least one of the 
> commands
> into a temporary file, and have diff(1) read from this instead. This has to be
> done carefully however, to make sure that diff(1) doesn't accidentally read it
> before the process is finished.
> 
> [..]

It seems readelf specifically has weird performance behaviours when running in 
parallel. My initial attempt at fixing this works in the general case, but not 
for specific cases that are important to diffoscope and this scenario.

https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?h=experimental=e28b540b0b289ce9fda70095160382799d7602a6

Taking diffoscope completely out of the equation, we see the following 
behaviours:

$ f() { sleep 3; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m6.010s
user0m0.000s
sys 0m0.000s

real0m3.004s
user0m0.000s
sys 0m0.000s

$ f() { sha256sum /run/shm/elf.$1 >/dev/null; }; time ( f 1; f 2; ); time ( 
f 1 & x=$!; f 2; wait $x; )

real0m0.451s
user0m0.432s
sys 0m0.012s

real0m0.221s
user0m0.428s
sys 0m0.004s

(The above two cases confirm that the shell snippets are correct. 
/run/shm/elf.{1,2} is about 50MB.)

$ f() { taskset $1 readelf --wide --debug-dump=rawline /run/shm/elf.$1 
>/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m17.128s
user0m17.096s
sys 0m0.008s

real0m19.126s
user0m37.256s
sys 0m0.016s

$ f() { taskset $1 readelf --wide --debug-dump=rawline 
/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug 
>/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m3.330s
user0m3.324s
sys 0m0.000s

real0m1.674s
user0m3.324s
sys 0m0.000s

$ f() { taskset $1 readelf --wide --debug-dump=info 
/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug 
>/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

real0m18.437s
user0m18.384s
sys 0m0.012s

real0m29.039s
user0m56.196s
sys 0m0.044s

/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug is 
about 4MB and from libc6-dbg=2.24-10.

So it looks like readelf has issues when running for a long time in parallel. 
(--debug-dump=rawline with a 4MB file sped up when running in parallel.)

However other tools don't seem to have this problem:

$ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' readelf 
--wide --debug-dump=info 
/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug 
>/dev/null )

real0m18.730s
user0m18.644s
sys 0m0.032s

$ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' readelf 
--wide --debug-dump=info 
/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug 
>/dev/null )

real0m33.076s
user1m0.844s
sys 0m0.460s

$ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' sha256sum 
/run/shm/test.iso >/dev/null )

real0m33.302s
user0m32.332s
sys 0m0.932s

$ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' sha256sum 
/run/shm/test.iso >/dev/null )

real0m16.843s
user0m32.224s
sys 0m1.272s

Here, /run/shm/test.iso is about 4GB in size, and it does speed up when being 
processed by sha256sum in parallel.

Running strace on the "readelf" calls shows that most syscalls are done at the 
start, then for most of the rest of the execution it's just "write()" syscalls 
and indeed the "sys" times above are negligible. So it seems unlikely that the 
processes are blocking on anything.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds