> -----Original Message----- > From: [email protected] > <[email protected]> On Behalf Of Richard Purdie > Sent: den 15 september 2023 10:41 > To: [email protected]; openembedded-architecture > <[email protected]> > Cc: Philip Balister <[email protected]> > Subject: Re: [bitbake-devel] [PATCH] server/process: Disable the flush() call > in server logging > > On Thu, 2023-09-14 at 17:28 +0100, Richard Purdie via lists.openembedded.org > wrote: > > We've been chasing bitbake timeouts for a while and it was unclear where > > things > > were blocking on IO. It appears the flush() call in server logging can cause > > pauses up to minutes long on systems with slow (spinning) disks that are > > heavily > > loaded with IO. > > > > Since the flush() was added to aid debugging of other timing issues, we > > shouldn't > > need it now and it can be disabled. Leave a comment as a reminder of the > > pain this > > can cause. > > > > Signed-off-by: Richard Purdie <[email protected]> > > --- > > lib/bb/server/process.py | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py > > index 40cb99bc97..94e93bc3a6 100644 > > --- a/lib/bb/server/process.py > > +++ b/lib/bb/server/process.py > > @@ -43,7 +43,8 @@ def currenttime(): > > > > def serverlog(msg): > > print(str(os.getpid()) + " " + currenttime() + " " + msg) > > - sys.stdout.flush() > > + #Seems a flush here triggers filesytem sync like behaviour and long > > hangs in the server > > + #sys.stdout.flush() > > > > # > > # When we have lockfile issues, try and find infomation about which > > process is > > Whilst this patch looks simple, it has all the signs of turning into > weeks of pain after looking at the results from testing. It looks like > we have a race in the system which this flush is masking. > > https://autobuilder.yoctoproject.org/typhoon/#/builders/127/builds/2082/steps/14/logs/stdio > > Looking at just one of the failures there (of the 6), we see this test > pass: > > 1: 20/57 355/537 (0.12s) (0 failed) > (recipetool.RecipetoolTests.test_recipetool_split_pkg_licenses) > > then the next test to run on that build directory is: > > 1: 21/57 359/537 (137.64s) (0 failed) > (signing.Signing.test_signing_packages) > > with an error which boils down to: > > FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/recipetoolqaizvjzf8h' > > from > > run(cmd='git rev-parse --abbrev-ref HEAD', input=None, log=None, > extrafiles=[], **options={'cwd': '/tmp/recipetoolqaizvjzf8h', 'shell': > True}) > > i.e. it is trying to run a command in a directory which doesn't exist. > > Without looking, I know "recipetoolqa" tmpfiles will come from > RecipetoolTests so the previous build directory is being removed and > yet still is cwd for the server. > > I've not looked much into any of the other failures yet. > > Looking at the 5 different selftest failures: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/127/builds/2082/steps/14/logs/stdio > > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > archiver.Archiver.test_archiver_srpm_mode: FAILED (237.00s) > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > overlayfs.OverlayFSEtcRunTimeTests.test_all_required_variables_set: FAILED > (13.25s) > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > prservice.BitbakePrTests.test_pr_service_deb_arch_dep: FAILED (126.30s) > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > signing.Signing.test_signing_packages: FAILED (137.64s) > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > sstatetests.SStateHashSameSigs.test_sstate_32_64_same_hash: FAILED > (541.51s) > 2023-09-15 01:56:12,685 - oe-selftest - INFO - RESULTS - > sstatetests.SStateHashSameSigs3.test_sstate_multilib_or_not_native_samesig > s: FAILED (224.05s) > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5710/steps/14/logs/stdio > > 2023-09-14 16:00:20,642 - oe-selftest - INFO - RESULTS - > archiver.Archiver.test_archiver_srpm_mode: FAILED (59.78s) > 2023-09-14 16:00:20,642 - oe-selftest - INFO - RESULTS - > bblogging.BitBakeLogging.test_shell_loggingA: FAILED (15.71s) > 2023-09-14 16:00:20,642 - oe-selftest - INFO - RESULTS - > multiconfig.MultiConfig.test_multiconfig_reparse: FAILED (6.39s) > 2023-09-14 16:00:20,642 - oe-selftest - INFO - RESULTS - > prservice.BitbakePrTests.test_pr_service_rpm_arch_indep: FAILED (0.84s) > > https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/5746/steps/14/logs/stdio > > 2023-09-14 23:33:02,720 - oe-selftest - INFO - RESULTS - > package.VersionOrdering.test_dpkg: ERROR > 2023-09-14 23:33:02,720 - oe-selftest - INFO - RESULTS - > package.VersionOrdering.test_opkg: ERROR > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > package.VersionOrdering.test_rpm: ERROR > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > archiver.Archiver.test_archiver_srpm_mode: FAILED (129.04s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > bbtests.BitbakeTests.test_postfile: FAILED (2.06s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > bbtests.BitbakeTests.test_prefile: FAILED (1.73s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > devtool.DevtoolModifyTests.test_devtool_buildclean: FAILED (265.26s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > overlayfs.OverlayFSTests.test_mount_unit_not_set: FAILED (7.83s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > prservice.BitbakePrTests.test_import_export_override_db: FAILED (289.85s) > 2023-09-14 23:33:02,721 - oe-selftest - INFO - RESULTS - > sstatetests.SStateHashSameSigs3.test_sstate_multilib_or_not_native_samesig > s: FAILED (111.34s) > > https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/5757/steps/15/logs/stdio > > 2023-09-14 22:55:12,332 - oe-selftest - INFO - RESULTS - > bblogging.BitBakeLogging.test_python_exec_func_python_loggingB: FAILED > (100.76s) > > https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/5777/steps/14/logs/stdio > > 2023-09-14 23:00:48,622 - oe-selftest - INFO - RESULTS - > archiver.Archiver.test_archiver_srpm_mode: FAILED (71.69s) > 2023-09-14 23:00:48,622 - oe-selftest - INFO - RESULTS - > prservice.BitbakePrTests.test_pr_service_deb_arch_dep: FAILED (27.14s) > > so the failures are all over the place. > > We also had a wic test failure that when re-ran, didn't reproduce: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/5777/steps/14/logs/stdio > > and some toolchain test failures for qemuppc: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/153/builds/529 > > and qemuarm64: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/151/builds/530 > > The question at this point is what do people want me to do. We clearly > have a really nasty bug in here. The patch is "right" and we do need to > fix this. If I merge it, I suspect I'm going to end up having to chase > this down before we can release and I am going to struggle to find the > time to do it and I suspect my sanity will suffer. This does look to be > a significant issue though. > > Cheers, > > Richard
While the problem the patch tried to fix is real, it seems a lot less problematic than the result of applying the patch. So from a release perspective, I would keep the current code and possibly add it as a known bug to the release notes. Then, after the release is out, start the work of solving the fallout from applying the patch, with the intention to have it solved for the first point release of Nanbield. While not ideal, it sounds a lot less stressful than applying the patch at this stage. //Peter
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1760): https://lists.openembedded.org/g/openembedded-architecture/message/1760 Mute This Topic: https://lists.openembedded.org/mt/101375777/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
