Yocto Technical Team Minutes, Engineering Sync, for June 29 2021 archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit
== disclaimer == Best efforts are made to ensure the below is accurate and valid. However, errors sometimes happen. If any errors or omissions are found, please feel free to reply to this email with any corrections. == attendees == Trevor Woerner, Stephen Jolley, Steve Sakoman, Joshua Watt, Jon Mason, Tony Tascioglu, Richard Purdie, Scott Murray, Armin Kuster, Paul Barker, Tim Orling, Alejandro H, Bruce Ashfield, Randy MacLeod, Denys Dmytriyenko == notes == - 3.1.9 (dunfell) through QA awaiting release approval, no blockers - 3.4 m1 (honister) released - identified an RCU stall hang that’s been causing some of our AB-INT issues. closed a couple of AB-INT bugs as a result, but found some more - prserv rewrite using asyncio is stuck on AB hangs when tested on larger scale - ARM-specific ltp hang issue (bug 14460, https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460) - multiconfig needs simpler test cases - about 50% of AB issues are ptest-related == general == RP: the RCU fix is awesome, significantly fixed AB-INT issues PaulB: (summary) prserv code updated to asycio, it works for me on my home machine, but then we see failures when that code is run on the AB. bitbake hangs, probably in the shutdown path. i have been able to reproduce it at home. works with python3.6 but seems to fail with python3.8 (from buildtools). we’re using asyncio and multiprocessing in various modules and it’s unclear how well they play together. there might be issues wrt to which system gets initiated first and when forking occurs JPEW: buildtools tarball only? PaulB: i need to check the matrix of what’s running native vs what’s from the buildtools JPEW: you’re mixing asyncio and multiprocessing? PaulB: yes JPEW: doesn’t that do a fork/exec itself? just to be clear: you’re init’ing asyncio first, then forking? PaulB: yes, that’s what other parts of the code are doing too. maybe we should fork first, then call start_tcp_server(), then start the asyncio? JPEW: server or client? PaulB: server, but that’s in bitbake. there’s no clear docs on how these things would work together JPEW: the AB uses hash equiv server but it’s running on its own, so i’m not sure what code paths are used PaulB: yes, RP and i discussed that and we know that code path is not being used JPEW: so what happens? what are you seeing? PaulB: it gets so far through the test suite then the bitbake server stops. we see the keepalive messages but no other output. RP got stuff installed and did some dumps. it looks to be prserver-export functionality related. bitbake is finished and is run successfully, but stalled in tring to shutdown bitbake RP: for example in one case we saw 57 zombie threads, the 58th is stuck in a client side asyncio call to prserv. we tried killing the prserv, so we’re not sure if it’s hung. then we found it waiting on the socket (which was already closed) PaulB: and if we sent sigint to the process, it’s not waiting RP: we had backtrace issues which we’ve fixed. but there’s this hang. some tests failed early with python3.8, but then an oeselftest failed. one of the parsers was stuck in this prserv call PaulB: we should take a look through prserv.bbclass to see what’s also done. we could look at the args used and check for parse completed events RP: hashserv vs prserv: hashserv is called in its own context but prserv is called from within the parser threads PaulB: yes, back to the issue of the init-vs-fork timing JPEW: i would expect that’s an issue. have them init in each thread. are they threads or processes? RP: can’t remember. i think processes, but not sure JPEW: that seems something to try. i would guess setting up asyncio then forking wouldn’t work over that boundary. so have each parser thread setup their asyncio separately PaulB: we can run builds quite happily. the build works, but then when i try the prserv-export that’s where it falls over RP: in the parse thread PaulB: it’s also queried in do_package, and that seems fine JPEW: i think asyncio in python is an abstraction around some OS primitive, but it’s configurable so it’s possible the one being put in the buildtools tarballs (from wherever it’s being built) isn’t properly setup for the actual machine on which it’s run. if we could dump the config then we’lll probably see that it’s not using the proper backend PaulB: i think there’s just a linux one and a windows one JPEW: okay, maybe it’s something else RP: i think the async init is key PaulB: i think asyncio has a good reputation of working. on stackoverflow there are other instances/questions of people mixing asyncio with threads and none of them have definitive answers. so there must be caveats that the docs don’t address. most users of asyncio are basing their entire software on it, whereas we’re just using it in one piece and mixing it with everything else. we have some good leads here, i’ll do a writeup and send my latest patches (there’s a new read-only patch) RP: JPEW if you could look at the patches, specifically the shutdown paths that would be great. has anyone else expressed interest? ScottM: I’ll be taking a look, as part of AGL. it’s on my short list PaulB: there’s a patch series that Khem has forwarded, python linter fixes, i think we need more discussion on it. ideally we should be testing this with every commit, otherwise we end up with these massive linter patches that mess up the repo history JPEW: i’m a big fan of automatic linters/formatters, but it has to be automated TimO: me too. not sure how it’ll work for a large group like this PaulB: having these flag days is really bad for breaking “git blame” etc RP: bad implications for LTS. some changes i like, some i’m less keen on PaulB: if there’s some agreement, then we could add a linter config file to the project so we’re all using the same thing RP: we are running the pylint stuff on the AB, i’m blanking on where the config file is Bruce: i usually do that RP: we do some of this stuff in oe-core (pylint script) but was only configured to show errors, but nobody is even looking at those now Bruce: i looked at the github link, this is a “throw oever the wall” patch. there isn’t going to be any updates (“i only do github pull requests, could you please forward this to the list”) TrevorW: do we have a checkpatch.sh script? RP: we had something, but nobody is considering/using it TrevorW: if the patch doesn’t pass, it doesn’t get applied. so it should be up to the submitter to fix PaulB: the check tools we have aren’t easy to run locally RP: it should be. if we had something would someone maintain it? TrevorW: i tried a long time ago to create such a tool but there are significant differences between (for example) the formatting between YP and OE so how can a tool be created? RP: yes there are some conflicts, but there is also a lot of agreement, so let’s focus on the agreed-upon things first. i think the only issue is tabs vs spaces TrevorW: i think there was also an ordering issue RP: yes, but ordering is not irrelevant. changing the order can change the behaviour, so we can’t enforce ordering Armin: there is an ordering styleguide TimO: some linters are too aggressive RP: JPEW: how did the SBOM plugfest go? JPEW: it went well. gave us an idea of how compatible we are. i don’t think we’re too far off. i think there’s another one coming up. i think they’re going to be a plugfest every 3 months or so until momentum goes down. i’ll go to the next one. i have some patches, we are compatible but there are some things we can change. it was interesting to see the issues of the community at large. but we’re lucky because we have all the data (whereas other projects don’t, necessarily) i believe one of our outstanding issues is that license strings need a sync, but that’s for another time. i think our mappings might be bad. RP: i’d be interested in a list of the ones that aren’t valid JPEW: i can track that down RP: i liked the compression patch series, it failed in testing but i think a small tweak will fix it TrevorW: tomorrow is the OEHH https://www.openembedded.org/wiki/Happy_Hours Bruce: we’re starting to shape up for -m2. 5.13 kernels added. 5.4 dropped from master but will send rev updates for dunfell for 5.4 RP: so just as we got 5.10 working, we’ll drop it Bruce: we’ve been testing with -dev a lot (ARM64 needs awk). i think we’ll add 5.13, then let all 3 sit there for a while, then drop 5.10. 5.13 has been tested a lot more than most TrevorW: has there been a resolution wrt to the new operator discussion? RP: no. i think there are more invasive things that need to be done with some existing operators. TrevorW: so the new operator is a go? it’s going in? RP: not sure yet, more experiments needed ScottM: at the end of the day we’re talking about 1 person’s issue with 1 BSP. is this a generic issue to warrant such a move? RP: i think many people have hit it, but worked around it. so i think there is an architectural problem that needs a wider discussion ScottM: i think there’s more value in the changes to += and _append than adding a new operator RP: i think we need both. that’s why i’ve deferred. i need to do more experiments ScottM: could we do a flag day, or a carry-over for say 1 year. do we have a process for that RP: we don’t have a process, the TSC would have to come up with a plan for it. it would be specific for this case
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#54009): https://lists.yoctoproject.org/g/yocto/message/54009 Mute This Topic: https://lists.yoctoproject.org/mt/83875165/21656 Group Owner: yocto+ow...@lists.yoctoproject.org Unsubscribe: https://lists.yoctoproject.org/g/yocto/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-