Re: qtcreator compilation failure due to memory/disk corruption
Hello, Michael Kelly, le dim. 04 janv. 2026 20:28:42 +, a ecrit: > stress-ng: fail: [3947] vm: detected 141733920769 bit errors while > stressing memory > stress-ng: fail: [3984] vm: detected 2 bit errors while stressing memory > > That's the good news. The bad news is that I suspect the cause is related to > the handling of the signals which are used to terminate the stress-ng worker > (oomable child). That'd still be useful to fix :) I'm seeing corruption issues with the haskell ghc compiler, see https://buildd.debian.org/status/fetch.php?pkg=ghc&arch=hurd-amd64&ver=9.10.3-1&stamp=1767477002&raw=0 I have to disable preemption to get the x86_64 build going, while the i386 build goes fine. Samuel
Re: qtcreator compilation failure due to memory/disk corruption
Hi All, On 01/01/2026 14:52, Samuel Thibault wrote: Michael Kelly, le jeu. 01 janv. 2026 14:48:19 +, a ecrit: It's hard to say. The 4 buildds keep building packages all day long, and I notice such "stray" errors on one of them like every one day or two. That's possibly as rare as the stress-ng bit errors then given that my machine is almost certainly slower than those supporting the buildds. It may then be simpler to just reproduce it with stress-ng, since then you'd know exactly what it was doing, while package installation etc. is a mess of things that happen :) Samuel I'm making an update on this investigation because like others I'm likely to have less time for looking into this from tomorrow. I have been successful at adjusting the stress-ng parameters to make the likelihood of 'bit error' reports close to 100%. A test like the following on a 4GB hurd-amd64 virtual machine and also on a 4GB real hardware fails for me almost every time: # stress-ng -t 20s --metrics --vm 64 --vm-bytes 1800M --vm-method incdec With errors like: stress-ng: fail: [3947] vm: detected 141733920769 bit errors while stressing memory stress-ng: fail: [3984] vm: detected 2 bit errors while stressing memory That's the good news. The bad news is that I suspect the cause is related to the handling of the signals which are used to terminate the stress-ng worker (oomable child). That first error reported (above) has a value which is nonsense given the size of memory region being worked on. I added some debug to the stress-ng code and there were some extraordinary things going on which made no sense at all with stack variables seemingly changing 'randomly'. It seems suspicious to me that these things only start occurring after the first signal is delivered to the process. This all needs a thorough investigation when time permits. In any case, this same test result does not present when running on hurd-i386. That test completes perfectly over many 10s of iterations. This indicates that the stress-ng bit errors are not related to the buildd issues. I've had no luck recreating that issue but will return to it when time permits. Regards, Mike.
Re: qtcreator compilation failure due to memory/disk corruption
Michael Kelly, le jeu. 01 janv. 2026 14:48:19 +, a ecrit: > It's hard to say. The 4 buildds keep building packages all day long, and > I notice such "stray" errors on one of them like every one day or two. > > That's possibly as rare as the stress-ng bit errors then given that my machine > is almost certainly slower than those supporting the buildds. It may then be simpler to just reproduce it with stress-ng, since then you'd know exactly what it was doing, while package installation etc. is a mess of things that happen :) Samuel
Re: qtcreator compilation failure due to memory/disk corruption
Samuel, On 01/01/2026 14:03, Samuel Thibault wrote: Michael Kelly, le jeu. 01 janv. 2026 13:38:10 +, a ecrit: Are you uninstalling/installing the dependencies each time? Yes, it's a fairly minimal base.tgz. The qtcreator build installs 340 packages. I'm not saving the pbuilder chroot so it installs the same way for each build. I have run with and without mach-defpager but there is evidence to suggest that on my machine swapping is required by the build process with the message 'vm_page warning: unable to recycle any page' being output. That message does not mean that swapping is required, on the contrary it means that paging out didn't manage to free memory. My reasoning assumes that external pages can almost all be paged out as required. The appearance of that message therefore implies to me that external pageout was not sufficient and internal pageout would have been necessary to free memory. I have seen stress_ng tests report bit errors which potentially could be from the same cause. I've only seen those errors on 64 bit Hurd. That would still be worth investigating. It would indeed but unfortunately it's a very rare occurrence too. I've only seen a handful of such reports in months. 4) Are there any packages other than qtcreator that show this issue regularly? Yes, various packages do it. For instance: sysprof: https://buildd.debian.org/status/fetch.php?pkg=sysprof&arch=hurd-i386&ver=49.0-4&stamp=1766538245&raw=0 but I guess it's actually spirv-llvm-translator-20 that got built just before that triggered the issue, without symptoms in its own build: https://buildd.debian.org/status/fetch.php?pkg=spirv-llvm-translator-20&arch=hurd-i386&ver=20.1.9-1&stamp=1766534212&raw=0 survivor: https://buildd.debian.org/status/fetch.php?pkg=survivor&arch=hurd-amd64&ver=1.0.7-6&stamp=1766520613&raw= but again, I guess it's actually rustc that triggered the issue: I think perhaps I'll set up a cycle of continuous build across this set of packages including qtcreator. https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=hurd-amd64&ver=1.91.1%2Bdfsg1-1~exp3&stamp=1766519978&raw=0 5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1 in N builds approximately) ? It's hard to say. The 4 buildds keep building packages all day long, and I notice such "stray" errors on one of them like every one day or two. That's possibly as rare as the stress-ng bit errors then given that my machine is almost certainly slower than those supporting the buildds. 6) Is eatmydata being used for the build? Ah, yes. I've been using that too. On 01/01/2026 14:19, Samuel Thibault wrote: Note: since your latest changes have made active external page page-out way less frequent, it may have made the issue way less frequent. You may have to revert that to trigger the issue more often. I'm using gnumach 2:1.8+git20251228-1 which has some of the recent changes but not the change to the actual eviction policy. I believe this version should operate very similarly to any from the the last 6 months or so. Cheers, Mike.
Re: qtcreator compilation failure due to memory/disk corruption
Note: since your latest changes have made active external page page-out way less frequent, it may have made the issue way less frequent. You may have to revert that to trigger the issue more often. Samuel
Re: qtcreator compilation failure due to memory/disk corruption
Michael Kelly, le jeu. 01 janv. 2026 13:38:10 +, a ecrit: > On 29/12/2025 22:47, Samuel Thibault wrote: > > > 3) How to setup the sbuild environment to compile qtcreator package using > > > that method. > > It's just a chroot, and packages are installed/removed on each package > > build. > > I'm having no success at recreating this corruption. I'm using pbuilder to > build the qtcreator package on a 64 bit Hurd virtual machine with 3GB of > RAM. Are you uninstalling/installing the dependencies each time? > I have run with and without mach-defpager but there is evidence to > suggest that on my machine swapping is required by the build process with > the message 'vm_page warning: unable to recycle any page' being output. That message does not mean that swapping is required, on the contrary it means that paging out didn't manage to free memory. > I have a few questions: > > 1) Does this problem occur on machines other than boralus ? Yes, the various buildds have occasionally the same kind of issue. > 2) Have the errors occurred on 32 bit Hurd as well ? All the same, yes. > I have seen stress_ng tests report bit errors which potentially could > be from the same cause. I've only seen those errors on 64 bit Hurd. That would still be worth investigating. > 3) I cannot remember if it is possible to run 64 bit without rumpdisk No, we don't plan to spend the time to make the in-gnumach drivers 64bit-ready. > 4) Are there any packages other than qtcreator that show this issue > regularly? Yes, various packages do it. For instance: sysprof: https://buildd.debian.org/status/fetch.php?pkg=sysprof&arch=hurd-i386&ver=49.0-4&stamp=1766538245&raw=0 but I guess it's actually spirv-llvm-translator-20 that got built just before that triggered the issue, without symptoms in its own build: https://buildd.debian.org/status/fetch.php?pkg=spirv-llvm-translator-20&arch=hurd-i386&ver=20.1.9-1&stamp=1766534212&raw=0 survivor: https://buildd.debian.org/status/fetch.php?pkg=survivor&arch=hurd-amd64&ver=1.0.7-6&stamp=1766520613&raw= but again, I guess it's actually rustc that triggered the issue: https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=hurd-amd64&ver=1.91.1%2Bdfsg1-1~exp3&stamp=1766519978&raw=0 > 5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1 > in N builds approximately) ? It's hard to say. The 4 buildds keep building packages all day long, and I notice such "stray" errors on one of them like every one day or two. > 6) Is eatmydata being used for the build? Ah, yes. (whenever the machine crashes or reboots unexpectedly, I recreate the chroots entirely) Samuel
Re: qtcreator compilation failure due to memory/disk corruption
On 01/01/2026 13:38, Michael Kelly wrote: 5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1 in N builds approximately) ? Mike. Well, almost finally anyway. 6) Is eatmydata being used for the build? Mike.
Re: qtcreator compilation failure due to memory/disk corruption
On 29/12/2025 22:47, Samuel Thibault wrote: 3) How to setup the sbuild environment to compile qtcreator package using that method. It's just a chroot, and packages are installed/removed on each package build. I'm having no success at recreating this corruption. I'm using pbuilder to build the qtcreator package on a 64 bit Hurd virtual machine with 3GB of RAM. I have run with and without mach-defpager but there is evidence to suggest that on my machine swapping is required by the build process with the message 'vm_page warning: unable to recycle any page' being output. I have a few questions: 1) Does this problem occur on machines other than boralus ? 2) Have the errors occurred on 32 bit Hurd as well ? I have seen stress_ng tests report bit errors which potentially could be from the same cause. I've only seen those errors on 64 bit Hurd. 3) I cannot remember if it is possible to run 64 bit without rumpdisk but presumably that is being used for disk access ? 4) Are there any packages other than qtcreator that show this issue regularly? 5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1 in N builds approximately) ? Mike.
Re: qtcreator compilation failure due to memory/disk corruption
Michael Kelly, le lun. 29 déc. 2025 20:22:13 +, a ecrit: > 1) The amount of memory available on the buildd machines typically (or on > boralus.sceen.net) boralus hs 3G memory, no swap (not even mach-defpager running, so the kernel knows it can't swap). > 2) Whether they are used concurrently for other sbuilds whilst the qtcreator > sbuild is in operation. The VM itself only runs the build. > 3) How to setup the sbuild environment to compile qtcreator package using > that method. It's just a chroot, and packages are installed/removed on each package build. Samuel
