Attendees: Sakib, Alex, Richard, Saul, Randy

1. Sakib is working on using procpath (1) to generate full-system process summary that is specific to bitbake builds and perhaps even
YP AB.

The basic idea is to take a single run of top and convert it to something easier to quicly understand (see (4) below).


We are interested in procpath since screen scraping the top output
is likely error prone. Richard is concerned that
using python and this module will be quite inefficent compared to what
top does. That's a valid concern but we believe and will prove that
the module and top are both just accessing the same data in /proc.
Stay tuned.


2. Tony and Randy have a work-around of using taskset for the valgrind
races. We will have a patch to use taskset of the list of 'racey'
valgrind tests. We think it has gotten worse when we added 'smp'
to qemu. There are similar VG bug reports from 5
years ago so we may have to carry the work-around for some time.


3. Add logging to runqemu when qemu fails.

a. Run ps (or top) to see what's happening on the host.
b. collect qemu logs
   In this example:

https://autobuilder.yoctoproject.org/typhoon/#/builders/101/builds/2386/steps/12/logs/stdio
  there should be a log file unde:

/home/pokybuild/yocto-worker/qemux86-alt/build/build/tmp/work/qemux86-poky-linux/core-image-sato-sdk/1.0-r0/temp

but  s/temp/testimage/ and logs should be under there.

Note that the logs may not always be there so check and
make that clear with a warning when one is missing.
This now is Sakib's top priority for the week.


4. Regarding a LTP test that fails because mkdir failed...
This is described in an email (2) from Richard with the subject:
   Re: [swat] qemux86-64 ltp kernel panic

Alex is going to use gcore (3) to get a core file for the qemu process
and try to figure out what is crazy in the qemu guest kernel.

Did that work Alex?
Where are the core file and related symbols?

5. As a general remark, we are overloading the systems but
we don't want to scale back too much.

In the cases wehre qemu failed to copy in 120 seconds,
what has generally be going on is lots of builds involving:
   llvm, kea, webkit or two, musl world build.
The load average goes over 300 but as you'd expect most jobs
are sleeping.
We think the biggest step forward on this is the job server that
Trevor is going to start working on that.

There are a few knobs that we can adjust to reduce the maximum load:
- Number of build in parallel - currently 3
- parallel make - currently 16, could drop to 12 but
  dropping below say 8 would reduce the test system throughput too much.
- bb number of threads - number of bitbake tasks.

Many times this results in a RCU stall and things go off the rails.


If anyone else is interested in helping, please let me know.
More next week.


--
# Randy MacLeod
# Wind River Linux


1) https://heptapod.host/saajns/procpath
2) https://lists.yoctoproject.org/g/swat/message/156
3) https://www.linux.org/docs/man1/gcore.html

4)

If I hack up a sed pipeline and run it on the entire latency-full.log, I get:

$ cat /tmp/latency-full.log | grep -v "^NOTE:" | \
  cut -c 69- | cut -d" " -f 1 | grep -v "\[" | \
  sed -e 's|/home/pokybuild/yocto-worker/|/~/|' | \
sed -e 's|/build/build/tmp/work/core2-64-poky-linux/| .../tmp/pklin/... |g' | \ sed -e 's|/recipe-sysroot-native/usr/bin/x86_64-poky-linux/../../libexec/x86_64-poky-linux/gcc/x86_64-poky-linux/| .../gcc-foo/... |' | \ sed -e 's|/build/build/tmp/work/core2-32-poky-linux/| .../pklin32/... |' | \
  grep -v "^$" | \
  sort | uniq -c | sort -n | tail -30

42 /~/meta-oe .../tmp/pklin/... vulkan-cts/1.2.6.0-r0 .../gcc-foo/... 11.1.0/as 42 /~/meta-oe .../tmp/pklin/... vulkan-cts/1.2.6.0-r0/recipe-sysroot-native/usr/bin/x86_64-poky-linux/x86_64-poky-linux-g++
     43 as
     43 top
     44 tar
     45 cmake
     45 ninja
50 /~/reproducible-ubuntu/build/build-st-38234/reproducibleB/tmp/work/qemux86_64-poky-linux/linux-yocto/5.4.119+gitAUTOINC+9e2546ab8d_8997f66300-r0 .../gcc-foo/... 9.3.0/cc1
     58 bitbake-server
     58 /lib/systemd/systemd
     58 (sd-pam)
65 /~/pkgman-non-rpm/build/build/tmp/sysroots-components/x86_64/pseudo-native/usr/bin/pseudo 90 /~/pkgman-non-rpm/build/build/tmp/work/qemux86-poky-linux/linux-yocto/5.4.119+gitAUTOINC+9e2546ab8d_8997f66300-r0/recipe-sysroot-native/usr/bin/i686-poky-linux/../../libexec/i686-poky-linux/gcc/i686-poky-linux/9.3.0/cc1 97 /~/meta-oe .../tmp/pklin/... opengl-es-cts/3.2.7.0-r0 .../gcc-foo/... 11.1.0/cc1plus 98 /~/meta-oe .../tmp/pklin/... opengl-es-cts/3.2.7.0-r0 .../gcc-foo/... 11.1.0/as 102 /~/meta-oe .../tmp/pklin/... opengl-es-cts/3.2.7.0-r0/recipe-sysroot-native/usr/bin/x86_64-poky-linux/x86_64-poky-linux-g++ 112 /~/meta-oe .../tmp/pklin/... mongodb/4.4.5+4.4.6-rc0-r0 .../gcc-foo/... 11.1.0/cc1plus 113 /~/meta-oe .../tmp/pklin/... mongodb/4.4.5+4.4.6-rc0-r0 .../gcc-foo/... 11.1.0/as
    116 /usr/bin/python3
    141 i686-poky-linux-gcc
151 /~/meta-oe .../tmp/pklin/... nodejs/14.16.1-r0 .../gcc-foo/... 11.1.0/cc1plus 153 /~/meta-oe .../tmp/pklin/... nodejs/14.16.1-r0 .../gcc-foo/... 11.1.0/as
    154 sh
    209 x86_64-poky-linux-gcc
    313 x86_64-poky-linux-g++
345 /~/meta-oe/build/build/tmp/sysroots-components/x86_64/pseudo-native/usr/bin/pseudo
    476 /bin/bash
    629 make
   1130 python3
   1331 /bin/sh

the 'one cmd' lines such as, 'make, python3, /bin/sh, tar lines clearly need more context.

There's more work to do cleaning up some 'noise' but does
anyone want anything else in the first version?

Trevor pointed me at:
   https://heptapod.host/saajns/procpath
that we might useful but I haven't played with it yet.

A quick simple filter is likely a better starting point.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#53740): https://lists.yoctoproject.org/g/yocto/message/53740
Mute This Topic: https://lists.yoctoproject.org/mt/83284473/21656
Group Owner: [email protected]
Unsubscribe: https://lists.yoctoproject.org/g/yocto/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to