Yocto Technical Team Minutes, Engineering Sync, for December 14, 2021
archive: 
https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit

== disclaimer ==
Best efforts are made to ensure the below is accurate and valid. However,
errors sometimes happen. If any errors or omissions are found, please feel
free to reply to this email with any corrections.

== attendees ==
Trevor Woerner, Stephen Jolley, Bruce Ashfield, Daiane, Jan-Simon Möller,
Jon Mason, Joshua Watt, Saul Wold, Steve Sakoman, Randy MacLeod, Richard
Purdie, Scott Murray, Rephael C, Peter Kjellerstedt, Ross Burton, Michael
Opdenacker, Armin Kuster, Nathan Glimsdale, Ryan Eatmon

== project status ==
- 3.5 M1 (kirkstone) in QA
- 3.1.13 (dunfell) to be built this week
- maintenance for AB, updating SSDs and updating distros, next week (Dec 20-24)
- significant improvements to patch count, some changes might affect other 
layers
- CVE metrics improved for dunfell and master
- rising AB-int issues (new high!)

== discussion ==
RP: looked at more patches last week. removed some patches related to a MIPS
    platform (support for which was also removed from the latest kernel, see
    https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.16-Drops-MIPS-
    Netlogic). these patches were never added to upstream binutils. an SH4 gdb
    patch was also removed, not sure if it even works anymore. if users need
    these things, they can be re-introduced in separate layers if required,
    but not appropriate for oe-core going forward. Ross wins the award for
    most invasive change, and changes most likely to require changes in
    other layers, but these are good changes. seeing good trends: 50 patches
    removed, and 50 patched moved out of pending state. i’m still working
    with upstream gcc to get some patches merged and am still hoping to get
    some libtool patches upstream too.

RP: re: AB-int issues: there is a recurring bitbake selftest issue with the
    runqueue tests that does have a fix ready. also lttng: there appear to
    be a number of tools issues going on but all logged as one bug. upstream
    has fixed some of the original bugs we reported, but there are some other
    things. it all appears to have started around june

RP: AB downtime next week. there’s never a good time to do it. reimaging of 
most of the cluster. Michael has permission to replace all the OSes on all the 
workers (bring in new ones, remove old ones). it’s a good time to bring in new 
distros and get rid of older versions. for all we know the AB may never work 
again! (lol). if you have anything that needs to be preserved make sure to let 
us know.

RP: it also means that if we’re going to have a 3.1.13 release, it has to be
    this week.
SS: i’m ready, there is a small set of patches
RP: so the plan is to get those in, then do the build?
SS: yes. i don’t think there’s anything controversial there at all
RP: there’s a chance the parts might not arrive in time, so the update to
    the AB might get delayed

Randy: if we upgrade all the AB to SSDs then we won’t have a control to see
    how things go, other than historical data? does everyone only use SSD? are
    magnetic disks still important?
SS: i’ve been SSD-only for a couple years now
RP: conversely i only use spinning rust
JPEW: i would expect that the intersection of people doing things as extreme
    as the AB and still using spinning disks is confined to just the AB
Randy: you would be wrong (lol). at least half of WR is still using magnetic
    disks. however we do plan to upgrade.
RP: i understand the desire to have a control, but that would add to the
    maintenance burden. we’ll have to see how it goes. we have 2 performance
    testing workers as well, one is running CentOS 7 and the other is running
    Ubuntu 16.04; those will also need upgrading as well (we’ve been putting
    it off for too long now). so we might end up with 2 more performance
    workers (that will run in parallel with the existing 2) or the existing
    ones might just get replaced. it’s up to Michael
Randy: what about the ARM worker, any sign of that machine arriving?
RP: there’s talk of it, but getting stuff into the US is not easy
Saul: is anyone talking to Ampere?
RP: the people involved are the ARM people, so they know what they’re doing
Randy: will the ARM worker get an SSD as well?
RP: it think it already has one. if it doesn’t then it will
Ross: the ARM worker is pretty old hardware, unfortunately
RP: we have 2, one is older but bulletproof. the other one is faster but has a
    tendency to report CPU temps that are high

JPEW: i sent an RFC to switch the bitbake-worker to asyncio
RP: i had a look. i hadn’t thought of using asyncio in bitbake-worker
    because generally it is one of the more self-contained bits of bitbake
    that generally actually works and i had wanted to leave it alone. the
    patch adds more lines than it removes. is it an improvement?
JPEW: given what it’s doing, i don’t think it’s going to be more
    efficient. most of the time it sits waiting for things. the big advantage
    would be the maintainability. asycnio is easier to read than the polling
    loop it was doing. the adding of lines might just be my way of writing
    code.
RP: i don’t object to it as such. if *i* had done the conversion then
    i could read that code more easily, however, since i didn’t do the
    conversion, it makes maintainability harder for me. that’s not a
    criticism of the work itself. the diff is too big, maybe easier to just
    look at the updated code
JPEW: yes the diff is worthless. also, we could simplify it even more if we
    slightly changed the protocol between bitbake-worker and bitbake-server.
    would fit better with how asyncio works and what’s already included
    in asyncio (i.e. asyncio already knows how to hande reading text
    line-by-line, but we do a tagged XML thing, which i had to write
    explicitly). if we change it to be more like the hashserv protocol
    (newline-delimited JSON) then that would fit very well with asyncio. that
    would reduce the size
RP: i think the data (that goes over the bitbake-worker to bitbake-server
    link) can have newlines in it, so we’d need an escape mechanism
JPEW: yes, it’s pickled data. it wouldn’t have to be newlines, you could
    split on any character
RP: also, there are some lines removing some multiprocessing locking, is that
    still safe for workers that call into multiprocessing?
JPEW: the lock was never used in bitbake-worker itself, just the child
    processes. so i moved it to the child processes. the child processes have
    a pipe to bitbake-worker and i left the lock in the child process. so if
    they’re multithreaded (or whatever) they still have a lock when they
    write into the parent process. but each of them has a dedicated pipe into
    bitbake-worker parent process, so that doesn’t need locking
RP: yes, fair enough. i need to look at the final code and think about it some
    more

RP: i’m worried about the bitbake server process (i.e. not the worker but
    the cooker). i have a pile of bugs but the general theme is: someone
    presses Ctrl-C and bitbake is off doing something else and doesn’t
    respond. in general (by design) we tend to defer things off (tasks are run
    by bitbake-worker and not bitbake itself) the trouble is once the parsing
    occurs in sub-processes it can can starve the connection handling. i’m
    worried about the threading model (or lack thereof) in our design. there
    are 2 types of commands that can be run against the server: synchronous
    and asynchronous. but if something goes wrong in some of those synchronous
    commands then you can’t even send a stop event to the server. asyncio
    doesn’t necessarily help us with any of this stuff
JPEW: in order for asycio to help, everything has to be done asynchronously.
    e.g. long-running tasks have to punt it to a thread (if it’s not I/O
    bound)
RP: asyncio probably isn’t going to be the answer here, we might have to
    push some of this out to a separate cooker thread with the server running
    in its own thread and handling the actual UI and commands (etc) separately
JPEW: we’ll probably need a hybrid approach: asyncio for the main loop, and
    long-running stuff in a thread
RP: it’s one of the bigger problems we have with bitbake right now. if
    anyone has any ideas…

RP: re: meetings over the holidays. i’m guessing we’ll cancel meetings on
    the 28th, and most will be back by the 4th of January? will enough people
    be around for a meeting on the 21st?
<several>: i’ll be around
RP: okay, we’ll cancel the 28th and keep the others

Randy: i heard that someone got the terminal working in phosh? has anyone else
    played with phosh and got it working?
JPEW: yes, mostly working. you can download the daily build
Randy: i’ll give it a try shortly. is it something we’ll keep until after
    3.5? or are we going to rip out sato and replace it with phosh for this
    release?
JPEW: oh no, not this release
RP: that’s a bad idea. we’ll run with sato for the LTS

RP: any other patches in oe-core that we should be doing things with? we
    have some good success cases (e.g. the puzzles app in sato, binutils,
    gdb). tcp-wrappers is appearing on my radar; upstream is dead and we’re
    carrying about 15 patches. also the musl systemd patches need attention
ScottM: the two people who would care are not on this call
Ross: i think there’s been some improvement to systemd accepting musl
    patches
ScottM: maybe alpine would drive this issue, but maybe not
RP: there are 2 sets of issues with systemd and musl: 1) headers issue (which
    i think is relatively work-around-able) and i think systemd is willing to
    negotiate on some of those patches 2) pieces of c library are missing and
    by patching them out causes security holes, therefore we probably won’t
    see systemd accepting those. systemd has made it quite clear that they
    want to rely on those libc features and they’re simply not there in musl
    (as is my understanding)
ScottM: they’re quite vocal about being fine with being very linux-centric
RP: i want to get this done early in the cycle, rather than waiting for the
    week before feature-freeze
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#55599): https://lists.yoctoproject.org/g/yocto/message/55599
Mute This Topic: https://lists.yoctoproject.org/mt/87755993/21656
Group Owner: [email protected]
Unsubscribe: https://lists.yoctoproject.org/g/yocto/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to