On Sun, Feb 9, 2020 at 7:27 AM Alex Kiernan <[email protected]> wrote: > > On Sun, Feb 9, 2020 at 12:23 AM [email protected] > <[email protected]> wrote: > > > > Hi Richard, > > > > > > > Anecdotally, we are running Zeus for nightly builds with three > > > > > multiconfigs. I cherry-picked your "bitbake: fix2" and "bitbake: > > > > > fixup" patches and haven't seen any of the BB_UNIHASH errors since. > > > > > Granted it's only been a week. But before that, hash equiv + > > > > > multiconfig was unusable due to the BB_UNIHASH errors. > > > > > > > > That is a really helpful data point, thanks. I should probably clean up > > > > those bitbake patches and get them merged then, I couldn't decide if > > > > they were right or not... > > > > > > > > > > I just picked all your pending changes out of master-next into our > > > local patch queue - will let you know how it looks when it's finished > > > cooking! > > > > There are two small issues I have observed. > > > > One is occasionally I get a lot of undeterministic metadata errors when > > BB_CACHE_POLICY = "cache", multiconfig, and hash equiv are enabled. The > > errors are all on recipes for which SRCREV = "${AUTOREV}". It doesn't > > always happen. But it did just now when I rebased our "zeus-modified" > > branch onto the upstream "zeus" branch, to get the changes starting with > > 7dc72fde6edeb5d6ac6b3832530998afeea67cbc. > > > > Two is that, sometimes "Initializing tasks" stage appears stuck at 44% for > > a couple minutes. I traced it down to this code in runqueue.py (line 1168 > > on zeus): > > > > # Iterate over the task list and call into the siggen code > > dealtwith = set() > > todeal = set(self.runtaskentries) > > while len(todeal) > 0: > > for tid in todeal.copy(): > > if len(self.runtaskentries[tid].depends - dealtwith) == 0: > > dealtwith.add(tid) > > todeal.remove(tid) > > self.prepare_task_hash(tid) > > > > When I instrument the loop to print out the size of "todeal", I see it > > decrease very slowly, sometimes only a couple at a time. I'm guessing this > > is because prepare_task_hash is contacting the hash equiv server, in a > > serial manner here. I'm over my work VPN which makes things extra slow. Is > > there an opportunity for batching here? > > > > I've a new failure: > > 00:20:59.829 Traceback (most recent call last): > 00:20:59.829 File > "/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/server/process.py", > line 278, in ProcessServer.idle_commands(delay=0.1, > fds=[<socket.socket fd=6, family=AddressFamily.AF_UNIX, > type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>, > <socket.socket fd=18, family=AddressFamily.AF_UNIX, > type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>, > <bb.server.process.ConnectionReader object at 0x7f831b7adb70>]): > 00:20:59.829 try: > 00:20:59.829 > retval = function(self, data, False) > 00:20:59.829 if retval is False: > 00:20:59.829 File > "/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/cooker.py", > line 1434, in buildTargetsIdle(server=<ProcessServer(ProcessServer-1, > started)>, rq=<bb.runqueue.RunQueue object at 0x7f82f5112f98>, > abort=False): > 00:20:59.829 try: > 00:20:59.829 > retval = rq.execute_runqueue() > 00:20:59.829 except runqueue.TaskFailure as exc: > 00:20:59.829 File > "/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py", > line 1522, in RunQueue.execute_runqueue(): > 00:20:59.829 try: > 00:20:59.829 > return self._execute_runqueue() > 00:20:59.829 except bb.runqueue.TaskFailure: > 00:20:59.829 File > "/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py", > line 1488, in RunQueue._execute_runqueue(): > 00:20:59.829 if self.state is runQueueRunning: > 00:20:59.829 > retval = self.rqexe.execute() > 00:20:59.829 > 00:20:59.829 File > "/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py", > line 1997, in RunQueueExecute.execute(): > 00:20:59.829 else: > 00:20:59.829 > > self.sqdata.outrightfail.remove(nexttask) > 00:20:59.829 if nexttask in > self.sqdata.outrightfail: > > Just testing locally with: > > diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py > index 71108eeed752..a94a9bb27ae2 100644 > --- a/bitbake/lib/bb/runqueue.py > +++ b/bitbake/lib/bb/runqueue.py > @@ -1994,7 +1994,7 @@ class RunQueueExecute: > self.sq_task_failoutright(nexttask) > return True > else: > - self.sqdata.outrightfail.remove(nexttask) > + self.sqdata.outrightfail.discard(nexttask) > if nexttask in self.sqdata.outrightfail: > logger.debug(2, 'No package found, so > skipping setscene task %s', nexttask) > self.sq_task_failoutright(nexttask) >
That change has got me a clean build to complete end to end, which a rebuild is then successfully using the sstate-cache. But something is upsetting sstate I'm serving back from the jenkins box to a local build, as I'm getting different hashes for the same sstate: akiernan@akiernan-virtual-machine:~/nanohub/build$ find sstate-cache -name '*quilt-native*populate_sysroot*' -ls 2240468 40 -rw-rw-r-- 1 akiernan akiernan 39406 Feb 9 13:53 sstate-cache/universal/ff/29/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:ff29b95eb35bba9a4c2e0857372991e6f08c0e9fcb72f76bc2dfbad5d12cade1_populate_sysroot.tgz.siginfo 2241106 56 -rw-rw-r-- 1 akiernan akiernan 53302 Feb 9 13:53 sstate-cache/universal/ff/29/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:ff29b95eb35bba9a4c2e0857372991e6f08c0e9fcb72f76bc2dfbad5d12cade1_populate_sysroot.tgz 2634859 40 -rw-rw-r-- 1 akiernan akiernan 39387 Feb 9 16:16 sstate-cache/universal/83/30/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:83309dcd3c0c7e2ab03ed24b2a5b8d6bf9e35e7b4c8c27373fd68513c8c2b29e_populate_sysroot.tgz.siginfo 2634858 52 -rw-rw-r-- 1 akiernan akiernan 52543 Feb 9 16:16 sstate-cache/universal/83/30/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:83309dcd3c0c7e2ab03ed24b2a5b8d6bf9e35e7b4c8c27373fd68513c8c2b29e_populate_sysroot.tgz akiernan@akiernan-virtual-machine:~/nanohub/build$ bitbake-diffsigs sstate-cache/universal/ff/29/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:ff29b95eb35bba9a4c2e0857372991e6f08c0e9fcb72f76bc2dfbad5d12cade1_populate_sysroot.tgz.siginfo sstate-cache/universal/83/30/sstate:quilt-native:x86_64-linux:0.66:r0:x86_64:3:83309dcd3c0c7e2ab03ed24b2a5b8d6bf9e35e7b4c8c27373fd68513c8c2b29e_populate_sysroot.tgz.siginfo NOTE: Starting bitbake server... akiernan@akiernan-virtual-machine:~/nanohub/build$ Running dumpsig and diffing them manually I'm none the wiser - other than variables being a in a different order in the two sstate files, they're identical. -- Alex Kiernan -- _______________________________________________ Openembedded-core mailing list [email protected] http://lists.openembedded.org/mailman/listinfo/openembedded-core
