Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-21 Thread Paul Gevers

Hi,

On 18-04-2024 10:25 p.m., Paul Gevers wrote:
I'll hopefully do the changes tomorrow. (RL work is a bit busy at the 
moment.)


The test ran. Unfortunately zfs-test-suite-1 failed.

https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45683824/

4089s Results Summary
4089s PASS   681
4089s FAIL 2
4089s SKIP 3

Seems like we're nearly there.

(I made a tiny mistake in that run, as I had 8GB RAM; I have now lowered 
it to 4GB which will be the setting until further discussion is warranted).


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-18 Thread Paul Gevers

Hi,

On 14-04-2024 5:14 a.m., 陈 晟祺 wrote:

I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.


I forgot I promised changes to the settings. Without those changes, it 
doesn't end nicely:


https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45540021/

I'll hopefully do the changes tomorrow. (RL work is a bit busy at the 
moment.)


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-14 Thread Paul Gevers

Hi,

On 14-04-2024 5:14 a.m., 陈 晟祺 wrote:

When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1],
the tests would run with 2 core + 4GB memory + ~10GB disk space.
I also tried 2GB / 3GB, and both will be interrupted by OOM killer.


So, let's settle on 2+4 for now. That sounds like a value we could very 
reasonably support. I'll configure our setup for that.



I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.


Ack.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-13 Thread 陈 晟祺
Control: tags -1 + pending

Hi,

> 2024年4月13日 01:29,陈 晟祺  写道:
> 
> I am now trying to run tests on 2 core and 4GB memory (and maybe less later).
> If the tester itself does not occupy too much RAM, the real requirement for 
> resources
> is now probably several gigabytes of disk space (currently it’s ~10GB).
> 
> I will give more feedback once new results come out.
> 

When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1],
the tests would run with 2 core + 4GB memory + ~10GB disk space.
I also tried 2GB / 3GB, and both will be interrupted by OOM killer.

I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.

[1]: 
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/cf8e8afe69a0a8f21768415a08b131f8aa9fdc6a

Thanks,
--
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-12 Thread 陈 晟祺
Hi,

> 2024年4月12日 12:48,Paul Gevers  写道:
> 
> Hi,
> 
> On 12-04-2024 4:42 a.m., 陈 晟祺 wrote:
>> - If I limit the test file size to 1G, quite many tests would fail even with 
>> adequate resources
> 
> Ack. To be fair, I was more thinking to make current test conditional on the 
> available free disk space. But yeah, that might also lead to issues as the 
> test might be randomly skipped.
> 

You got the point. I previously thought that testifies are on disk,
but actually they are in tmpfs and consuming huge memory.
That’s why OOM killer would kick in when writing large files in tests.

> Good, so 2GB memory is not enough for zfs-linux (I assume you ran this test 
> with 2 cores like I did)

Yes, I always use 2 cores. 

> 
> I agree we shouldn't spend too much time on squeezing it into the *current* 
> defaults. I'm still somewhat hoping that we could squeeze out a somewhat 
> smaller memory defaults than 8 GB: does 4 GB work (and if so, how long does 
> it take)?
> 

I am now trying to run tests on 2 core and 4GB memory (and maybe less later).
If the tester itself does not occupy too much RAM, the real requirement for 
resources
is now probably several gigabytes of disk space (currently it’s ~10GB).

I will give more feedback once new results come out.

Thanks,
--
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread Paul Gevers

Hi,

On 07-04-2024 2:29 p.m., 陈 晟祺 wrote:

Could you please provide more detailed information on the test settings on 
ci.d.o.?
E.g., CPU type, #cores, memory size, etc.


The host that runs this is an m3-large instance at equinix [1].

We create the qemu image with autopkgtest-build-qemu (default settings 
as far as I know).


From within the testbed:
root@host:~# lscpu
lscpu
Architecture:x86_64
  CPU op-mode(s):32-bit, 64-bit
  Address sizes: 48 bits physical, 48 bits virtual
  Byte Order:Little Endian
CPU(s):  1
  On-line CPU(s) list:   0
Vendor ID:   AuthenticAMD
  BIOS Vendor ID:QEMU
  Model name:AMD EPYC 7502P 32-Core Processor
BIOS Model name: pc-i440fx-7.2  CPU @ 2.0GHz
BIOS CPU family: 1
CPU family:  23
Model:   49
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):   1
Stepping:0
BogoMIPS:4990.62
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mc
 a cmov pat pse36 clflush mmx fxsr sse sse2 
syscall nx m
 mxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl 
cpuid ex
 td_apicid tsc_known_freq pni pclmulqdq ssse3 
fma cx16 s
 se4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes
  xsave avx f16c rdrand hypervisor lahf_lm 
cmp_legacy sv
 m cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw p
 erfctr_core ssbd ibrs ibpb stibp vmmcall 
fsgsbase tsc_a
 djust bmi1 avx2 smep bmi2 rdseed adx smap 
clflushopt cl
 wb sha_ni xsaveopt xsavec xgetbv1 clzero 
xsaveerptr wbn
 oinvd arat npt lbrv nrip_save tsc_scale 
vmcb_clean paus
 efilter pfthreshold v_vmsave_vmload vgif umip 
rdpid arc

 h_capabilities
Virtualization features:
  Virtualization:AMD-V
  Hypervisor vendor: KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:   64 KiB (1 instance)
  L1i:   64 KiB (1 instance)
  L2:512 KiB (1 instance)
  L3:16 MiB (1 instance)
NUMA:
  NUMA node(s):  1
  NUMA node0 CPU(s): 0
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit: Not affected
  L1tf:  Not affected
  Mds:   Not affected
  Meltdown:  Not affected
  Mmio stale data:   Not affected
  Retbleed:  Mitigation; untrained return thunk; SMT disabled
  Spec rstack overflow:  Vulnerable: Safe RET, no microcode
  Spec store bypass: Mitigation; Speculative Store Bypass disabled 
via prctl
  Spectre v1:Mitigation; usercopy/swapgs barriers and 
__user pointer

  sanitization
  Spectre v2:Mitigation; Retpolines, IBPB conditional, 
STIBP disable

 d, RSB filling, PBRSB-eIBRS Not affected
  Srbds: Not affected
  Tsx async abort:   Not affected
root@host:~# lsmem
lsmem
RANGE SIZE  STATE REMOVABLE BLOCK
0x-0x7fff   2G online   yes  0-15

Memory block size:   128M
Total online memory:   2G
Total offline memory:  0B


Paul

[1] https://deploy.equinix.com/product/servers/m3-large/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread 陈 晟祺
Hi,

> 2024年4月7日 17:23,Paul Gevers  写道:
> 
> Dear maintainer(s),
> 
> Your package has an autopkgtest, great. I recently added support for 
> isolation-machine tests on ci.debian.net for amd64 and added your package to 
> the list to use that. However, it fails because the zfs-test-suite test times 
> out after 2:47h (it seems to hang by the looks of the log). Can you please 
> investigate the situation and fix it? I copied some of the output at the 
> bottom of this report.
> 

Thanks for your work! I have long waited for the isolation-machine tag to be 
available.

> The release team has announced [1] that failing autopkgtest on amd64 and 
> arm64 are considered RC in testing, but because machine-isolation support by 
> ci.debian.net is new I have not marked this bug as serious (yet).
> 
> Because the test doesn't fail, but tmpfails (might be a bug in autopkgtest), 
> I've reverted the preferred backend for zfs-linux back to lxc until this bug 
> is closed.
> 

I am not yet able to reproduce the hang on my local testing environment.
Could you please provide more detailed information on the test settings on 
ci.d.o.?
E.g., CPU type, #cores, memory size, etc.

Thanks,
Shengqi Chen