Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 18-04-2024 10:25 p.m., Paul Gevers wrote: I'll hopefully do the changes tomorrow. (RL work is a bit busy at the moment.) The test ran. Unfortunately zfs-test-suite-1 failed. https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45683824/ 4089s Results Summary 4089s PASS 681 4089s FAIL 2 4089s SKIP 3 Seems like we're nearly there. (I made a tiny mistake in that run, as I had 8GB RAM; I have now lowered it to 4GB which will be the setting until further discussion is warranted). Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 14-04-2024 5:14 a.m., 陈 晟祺 wrote: I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. I forgot I promised changes to the settings. Without those changes, it doesn't end nicely: https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45540021/ I'll hopefully do the changes tomorrow. (RL work is a bit busy at the moment.) Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 14-04-2024 5:14 a.m., 陈 晟祺 wrote: When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1], the tests would run with 2 core + 4GB memory + ~10GB disk space. I also tried 2GB / 3GB, and both will be interrupted by OOM killer. So, let's settle on 2+4 for now. That sounds like a value we could very reasonably support. I'll configure our setup for that. I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. Ack. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Control: tags -1 + pending Hi, > 2024年4月13日 01:29,陈 晟祺 写道: > > I am now trying to run tests on 2 core and 4GB memory (and maybe less later). > If the tester itself does not occupy too much RAM, the real requirement for > resources > is now probably several gigabytes of disk space (currently it’s ~10GB). > > I will give more feedback once new results come out. > When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1], the tests would run with 2 core + 4GB memory + ~10GB disk space. I also tried 2GB / 3GB, and both will be interrupted by OOM killer. I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. [1]: https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/cf8e8afe69a0a8f21768415a08b131f8aa9fdc6a Thanks, -- Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月12日 12:48,Paul Gevers 写道: > > Hi, > > On 12-04-2024 4:42 a.m., 陈 晟祺 wrote: >> - If I limit the test file size to 1G, quite many tests would fail even with >> adequate resources > > Ack. To be fair, I was more thinking to make current test conditional on the > available free disk space. But yeah, that might also lead to issues as the > test might be randomly skipped. > You got the point. I previously thought that testifies are on disk, but actually they are in tmpfs and consuming huge memory. That’s why OOM killer would kick in when writing large files in tests. > Good, so 2GB memory is not enough for zfs-linux (I assume you ran this test > with 2 cores like I did) Yes, I always use 2 cores. > > I agree we shouldn't spend too much time on squeezing it into the *current* > defaults. I'm still somewhat hoping that we could squeeze out a somewhat > smaller memory defaults than 8 GB: does 4 GB work (and if so, how long does > it take)? > I am now trying to run tests on 2 core and 4GB memory (and maybe less later). If the tester itself does not occupy too much RAM, the real requirement for resources is now probably several gigabytes of disk space (currently it’s ~10GB). I will give more feedback once new results come out. Thanks, -- Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 07-04-2024 2:29 p.m., 陈 晟祺 wrote: Could you please provide more detailed information on the test settings on ci.d.o.? E.g., CPU type, #cores, memory size, etc. The host that runs this is an m3-large instance at equinix [1]. We create the qemu image with autopkgtest-build-qemu (default settings as far as I know). From within the testbed: root@host:~# lscpu lscpu Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 1 On-line CPU(s) list: 0 Vendor ID: AuthenticAMD BIOS Vendor ID:QEMU Model name:AMD EPYC 7502P 32-Core Processor BIOS Model name: pc-i440fx-7.2 CPU @ 2.0GHz BIOS CPU family: 1 CPU family: 23 Model: 49 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping:0 BogoMIPS:4990.62 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc a cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx m mxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid ex td_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 s se4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy sv m cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw p erfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_a djust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt cl wb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbn oinvd arat npt lbrv nrip_save tsc_scale vmcb_clean paus efilter pfthreshold v_vmsave_vmload vgif umip rdpid arc h_capabilities Virtualization features: Virtualization:AMD-V Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 64 KiB (1 instance) L1i: 64 KiB (1 instance) L2:512 KiB (1 instance) L3:16 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT disabled Spec rstack overflow: Vulnerable: Safe RET, no microcode Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2:Mitigation; Retpolines, IBPB conditional, STIBP disable d, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected root@host:~# lsmem lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x-0x7fff 2G online yes 0-15 Memory block size: 128M Total online memory: 2G Total offline memory: 0B Paul [1] https://deploy.equinix.com/product/servers/m3-large/ OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月7日 17:23,Paul Gevers 写道: > > Dear maintainer(s), > > Your package has an autopkgtest, great. I recently added support for > isolation-machine tests on ci.debian.net for amd64 and added your package to > the list to use that. However, it fails because the zfs-test-suite test times > out after 2:47h (it seems to hang by the looks of the log). Can you please > investigate the situation and fix it? I copied some of the output at the > bottom of this report. > Thanks for your work! I have long waited for the isolation-machine tag to be available. > The release team has announced [1] that failing autopkgtest on amd64 and > arm64 are considered RC in testing, but because machine-isolation support by > ci.debian.net is new I have not marked this bug as serious (yet). > > Because the test doesn't fail, but tmpfails (might be a bug in autopkgtest), > I've reverted the preferred backend for zfs-linux back to lxc until this bug > is closed. > I am not yet able to reproduce the hang on my local testing environment. Could you please provide more detailed information on the test settings on ci.d.o.? E.g., CPU type, #cores, memory size, etc. Thanks, Shengqi Chen