Re: qtcreator compilation failure due to memory/disk corruption

2026-01-04 Thread Samuel Thibault
Hello,

Michael Kelly, le dim. 04 janv. 2026 20:28:42 +, a ecrit:
> stress-ng: fail:  [3947] vm: detected 141733920769 bit errors while
> stressing memory
> stress-ng: fail:  [3984] vm: detected 2 bit errors while stressing memory
> 
> That's the good news. The bad news is that I suspect the cause is related to
> the handling of the signals which are used to terminate the stress-ng worker
> (oomable child).

That'd still be useful to fix :)

I'm seeing corruption issues with the haskell ghc compiler, see

https://buildd.debian.org/status/fetch.php?pkg=ghc&arch=hurd-amd64&ver=9.10.3-1&stamp=1767477002&raw=0

I have to disable preemption to get the x86_64 build going, while the
i386 build goes fine.

Samuel



Re: qtcreator compilation failure due to memory/disk corruption

2026-01-04 Thread Michael Kelly

Hi All,

On 01/01/2026 14:52, Samuel Thibault wrote:

Michael Kelly, le jeu. 01 janv. 2026 14:48:19 +, a ecrit:

 It's hard to say. The 4 buildds keep building packages all day long, and
 I notice such "stray" errors on one of them like every one day or two.

That's possibly as rare as the stress-ng bit errors then given that my machine
is almost certainly slower than those supporting the buildds.

It may then be simpler to just reproduce it with stress-ng, since then
you'd know exactly what it was doing, while package installation etc. is
a mess of things that happen :)

Samuel


I'm making an update on this investigation because like others I'm 
likely to have less time for looking into this from tomorrow.


I have been successful at adjusting the stress-ng parameters to make the 
likelihood of 'bit error' reports close to 100%. A test like the 
following on a 4GB hurd-amd64 virtual machine and also on a 4GB real 
hardware fails for me almost every time:


# stress-ng -t 20s --metrics --vm 64 --vm-bytes 1800M --vm-method incdec

With errors like:

stress-ng: fail:  [3947] vm: detected 141733920769 bit errors while 
stressing memory

stress-ng: fail:  [3984] vm: detected 2 bit errors while stressing memory

That's the good news. The bad news is that I suspect the cause is 
related to the handling of the signals which are used to terminate the 
stress-ng worker (oomable child). That first error reported (above) has 
a value which is nonsense given the size of memory region being worked 
on. I added some debug to the stress-ng code and there were some 
extraordinary things going on which made no sense at all with stack 
variables seemingly changing 'randomly'. It seems suspicious to me that 
these things only start occurring after the first signal is delivered to 
the process. This all needs a thorough investigation when time permits.


In any case, this same test result does not present when running on 
hurd-i386. That test completes perfectly over many 10s of iterations. 
This indicates that the stress-ng bit errors are not related to the 
buildd issues. I've had no luck recreating that issue but will return to 
it when time permits.


Regards,

Mike.




Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Samuel Thibault
Michael Kelly, le jeu. 01 janv. 2026 14:48:19 +, a ecrit:
> It's hard to say. The 4 buildds keep building packages all day long, and
> I notice such "stray" errors on one of them like every one day or two.
> 
> That's possibly as rare as the stress-ng bit errors then given that my machine
> is almost certainly slower than those supporting the buildds. 

It may then be simpler to just reproduce it with stress-ng, since then
you'd know exactly what it was doing, while package installation etc. is
a mess of things that happen :)

Samuel



Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Michael Kelly

Samuel,

On 01/01/2026 14:03, Samuel Thibault wrote:

Michael Kelly, le jeu. 01 janv. 2026 13:38:10 +, a ecrit:
Are you uninstalling/installing the dependencies each time?
Yes, it's a fairly minimal base.tgz. The qtcreator build installs 340 
packages. I'm not saving the pbuilder chroot so it installs the same way 
for each build.

I have run with and without mach-defpager but there is evidence to
suggest that on my machine swapping is required by the build process with
the message 'vm_page warning: unable to recycle any page' being output.

That message does not mean that swapping is required, on the contrary it
means that paging out didn't manage to free memory.
My reasoning assumes that external pages can almost all be paged out as 
required. The appearance of that message therefore implies to me that 
external pageout was not sufficient and internal pageout would have been 
necessary to free memory.

I have seen stress_ng tests report bit errors which potentially could
be from the same cause. I've only seen those errors on 64 bit Hurd.

That would still be worth investigating.
It would indeed but unfortunately it's a very rare occurrence too. I've 
only seen a handful of such reports in months.

4) Are there any packages other than qtcreator that show this issue
regularly?

Yes, various packages do it.

For instance:

sysprof:
https://buildd.debian.org/status/fetch.php?pkg=sysprof&arch=hurd-i386&ver=49.0-4&stamp=1766538245&raw=0

but I guess it's actually spirv-llvm-translator-20 that got built just
before that triggered the issue, without symptoms in its own build:
https://buildd.debian.org/status/fetch.php?pkg=spirv-llvm-translator-20&arch=hurd-i386&ver=20.1.9-1&stamp=1766534212&raw=0

survivor:
https://buildd.debian.org/status/fetch.php?pkg=survivor&arch=hurd-amd64&ver=1.0.7-6&stamp=1766520613&raw=

but again, I guess it's actually rustc that triggered the issue:
I think perhaps I'll set up a cycle of continuous build across this set 
of packages including qtcreator.

https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=hurd-amd64&ver=1.91.1%2Bdfsg1-1~exp3&stamp=1766519978&raw=0


5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1
in N builds approximately) ?

It's hard to say. The 4 buildds keep building packages all day long, and
I notice such "stray" errors on one of them like every one day or two.
That's possibly as rare as the stress-ng bit errors then given that my 
machine is almost certainly slower than those supporting the buildds.

6) Is eatmydata being used for the build?

Ah, yes.


I've been using that too.

On 01/01/2026 14:19, Samuel Thibault wrote:

Note: since your latest changes have made active external page page-out
way less frequent, it may have made the issue way less frequent. You may
have to revert that to trigger the issue more often.


I'm using gnumach 2:1.8+git20251228-1 which has some of the recent 
changes but not the change to the actual eviction policy. I believe this 
version should operate very similarly to any from the the last 6 months 
or so.


Cheers,

Mike.


Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Samuel Thibault
Note: since your latest changes have made active external page page-out
way less frequent, it may have made the issue way less frequent. You may
have to revert that to trigger the issue more often.

Samuel



Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Samuel Thibault
Michael Kelly, le jeu. 01 janv. 2026 13:38:10 +, a ecrit:
> On 29/12/2025 22:47, Samuel Thibault wrote:
> > > 3) How to setup the sbuild environment to compile qtcreator package using
> > > that method.
> > It's just a chroot, and packages are installed/removed on each package
> > build.
> 
> I'm having no success at recreating this corruption. I'm using pbuilder to
> build the qtcreator package on a 64 bit Hurd virtual machine with 3GB of
> RAM.

Are you uninstalling/installing the dependencies each time?

> I have run with and without mach-defpager but there is evidence to
> suggest that on my machine swapping is required by the build process with
> the message 'vm_page warning: unable to recycle any page' being output.

That message does not mean that swapping is required, on the contrary it
means that paging out didn't manage to free memory.

> I have a few questions:
> 
> 1) Does this problem occur on machines other than boralus ?

Yes, the various buildds have occasionally the same kind of issue.

> 2) Have the errors occurred on 32 bit Hurd as well ?

All the same, yes.

> I have seen stress_ng tests report bit errors which potentially could
> be from the same cause. I've only seen those errors on 64 bit Hurd.

That would still be worth investigating.

> 3) I cannot remember if it is possible to run 64 bit without rumpdisk

No, we don't plan to spend the time to make the in-gnumach drivers
64bit-ready.

> 4) Are there any packages other than qtcreator that show this issue
> regularly?

Yes, various packages do it.

For instance:

sysprof:
https://buildd.debian.org/status/fetch.php?pkg=sysprof&arch=hurd-i386&ver=49.0-4&stamp=1766538245&raw=0

but I guess it's actually spirv-llvm-translator-20 that got built just
before that triggered the issue, without symptoms in its own build:
https://buildd.debian.org/status/fetch.php?pkg=spirv-llvm-translator-20&arch=hurd-i386&ver=20.1.9-1&stamp=1766534212&raw=0

survivor:
https://buildd.debian.org/status/fetch.php?pkg=survivor&arch=hurd-amd64&ver=1.0.7-6&stamp=1766520613&raw=

but again, I guess it's actually rustc that triggered the issue:
https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=hurd-amd64&ver=1.91.1%2Bdfsg1-1~exp3&stamp=1766519978&raw=0

> 5) Finally, what is the likelihood of a build failure for qtcreator (ie. 1
> in N builds approximately) ?

It's hard to say. The 4 buildds keep building packages all day long, and
I notice such "stray" errors on one of them like every one day or two.

> 6) Is eatmydata being used for the build?

Ah, yes.

(whenever the machine crashes or reboots unexpectedly, I recreate the
chroots entirely)

Samuel



Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Michael Kelly

On 01/01/2026 13:38, Michael Kelly wrote:
5) Finally, what is the likelihood of a build failure for qtcreator 
(ie. 1 in N builds approximately) ?


Mike.


Well, almost finally anyway.

6) Is eatmydata being used for the build?

Mike.




Re: qtcreator compilation failure due to memory/disk corruption

2026-01-01 Thread Michael Kelly

On 29/12/2025 22:47, Samuel Thibault wrote:

3) How to setup the sbuild environment to compile qtcreator package using
that method.

It's just a chroot, and packages are installed/removed on each package
build.


I'm having no success at recreating this corruption. I'm using pbuilder 
to build the qtcreator package on a 64 bit Hurd virtual machine with 3GB 
of RAM. I have run with and without mach-defpager but there is evidence 
to suggest that on my machine swapping is required by the build process 
with the message 'vm_page warning: unable to recycle any page' being output.


I have a few questions:

1) Does this problem occur on machines other than boralus ?

2) Have the errors occurred on 32 bit Hurd as well ? I have seen 
stress_ng tests report bit errors which potentially could be from the 
same cause. I've only seen those errors on 64 bit Hurd.


3) I cannot remember if it is possible to run 64 bit without rumpdisk 
but presumably that is being used for disk access ?


4) Are there any packages other than qtcreator that show this issue 
regularly?


5) Finally, what is the likelihood of a build failure for qtcreator (ie. 
1 in N builds approximately) ?


Mike.




Re: qtcreator compilation failure due to memory/disk corruption

2025-12-29 Thread Samuel Thibault
Michael Kelly, le lun. 29 déc. 2025 20:22:13 +, a ecrit:
> 1) The amount of memory available on the buildd machines typically (or on
> boralus.sceen.net)

boralus hs 3G memory, no swap (not even mach-defpager running, so the
kernel knows it can't swap).

> 2) Whether they are used concurrently for other sbuilds whilst the qtcreator
> sbuild is in operation.

The VM itself only runs the build.

> 3) How to setup the sbuild environment to compile qtcreator package using
> that method.

It's just a chroot, and packages are installed/removed on each package
build.

Samuel