Bug#1070299: Acknowledgement (gcc-14: Wrong vectorized code generated with -O3, ok without -O.)
This issue turned out to not be an gcc issue, but a badly declared flexible / 'zero-length array' at the end of the structure, which then relied on undefined behaviour. The declared size (here [4]) was then apparently taken into account in the code generation. I do not know of a way to diagnose this kind of use without warning for basically all kinds of arrays at the end of structures. Though it would be nice, as it was not straightforward to find the issue. Perhaps a note in the gcc-14 upgrade notes that the compiler now uses declared array sizes (more) to influence loop execution might be useful. Except for that, this bug can be closed.
Bug#1070299: gcc-14: Wrong vectorized code generated with -O3, ok without -O.
Package: gcc-14 Version: 14-20240429-1 Severity: important X-Debbugs-Cc: f96h...@chalmers.se Dear Maintainer, When compiling the attached code, the pexpo_keep_min() function fails to handle the fifth item in the list if it was compiled with -O3. Compiled without an -O option, it works as expected. I have looked a bit, but not deeply, in the assembler code, and it looks like the first four items are handled with an vectorized operation. Then the fifth item is 'forgotten'. The problem also applies to 14-20240330-1. Does not happen with 13.2.0-23. Compiling the offending function in a separate file, since if the printf() is present with the compilation, the issue does not manifest. The expected output is 100 200 300 400 500 The bad output is 100 200 300 400 5 Best regards, Håkan -- System Information: Debian Release: trixie/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-20-amd64 (SMP w/8 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages gcc-14 depends on: ii binutils 2.42-4 ii gcc-14-base 14-20240429-1 ii gcc-14-x86-64-linux-gnu 14-20240429-1 Versions of packages gcc-14 recommends: ii libc6-dev 2.37-19 Versions of packages gcc-14 suggests: pn gcc-14-doc pn gcc-14-locales pn gcc-14-multilib -- no debconf information *** prime_factor.c #include "prime_factor.h" void pexpo_keep_min(struct prime_exponents *a, const struct prime_exponents *b) { int i; for (i = 0; i < a->num_blocks; i++) { a->expo[i] = (b->expo[i] < a->expo[i]) ? b->expo[i] : a->expo[i]; } } *** prime_factor.h #include typedef int32_t prime_exp_t; struct prime_exponents { int num_blocks; union { prime_exp_t expo[4]; }; }; void pexpo_keep_min(struct prime_exponents *keep_fpf, const struct prime_exponents *in_fpf); *** test.c #include "prime_factor.h" #include int test(struct prime_exponents *a, struct prime_exponents *b) { a->num_blocks = 5; a->expo[0] = 1; a->expo[1] = 2; a->expo[2] = 3; a->expo[3] = 4; a->expo[4] = 5; b->num_blocks = 5; b->expo[0] = 100; b->expo[1] = 200; b->expo[2] = 300; b->expo[3] = 400; b->expo[4] = 500; pexpo_keep_min(a, b); printf ("%d %d %d %d %d\n", a->expo[0], a->expo[1], a->expo[2], a->expo[3], a->expo[4]); } int main() { int area1[16]; int area2[16]; test((struct prime_exponents *) area1, (struct prime_exponents *) area2); return 0; } *** run.sh gcc-14 prime_factor.c -c -o pf-O3.o -O3 gcc-14 prime_factor.c -c -o pf.o gcc-14 test.c -c -o test.o -O3 gcc-14 test.o pf-O3.o -o test-O3 gcc-14 test.o pf.o-o test ./test-O3 ./test
Bug#1024620: dirvish-expire need to mark image as being removed during removal
Package: dirvish Version: 1.2.1-2.1 Severity: normal X-Debbugs-Cc: f96h...@chalmers.se Dear Maintainer, As far as I can see, the removal of an image during expiry consist of two steps: 1. Remove the /tree/ directory in image. 2. Remove the image directory. However, if e.g. the machine is rebooted during step 1, then we are left with an incomplete tree, but if a user (or dirvish itself) just reads the status file of the image directory, it still looks like an successful image. To avoid that, it could be helpful with a first step: 0. Append 'Status: Removing' or something such to the status file, such that it no longer is mistaken as a good image. I have not come up with a sequence of events where the normal dirvish scripts would make mistakes due to this, since it should never try to delete the latest good image. But e.g. a user which is low on diskspace may be trying to remove images manually and then it would be helpful if the status file has been updated also before any removal, such that manual inspection of existing images does not assume a half-removed image is a good one. (Note: the issue was also reported upstream https://lists.dirvish.org/pipermail/dirvish/2022-November/003353.html which I guess was reporting in the wrong order.) -- System Information: Debian Release: 11.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-19-cloud-amd64 (SMP w/8 CPU threads) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages dirvish depends on: ii libtime-parsedate-perl 2015.103-3 ii libtime-period-perl 1.25-1 ii perl5.32.1-4+deb11u2 ii rsync 3.2.3-4+deb11u1 Versions of packages dirvish recommends: ii ssh 1:8.4p1-5+deb11u1 dirvish suggests no packages. -- Configuration Files: /etc/cron.d/dirvish changed [not included] /etc/dirvish/dirvish-cronjob changed [not included] -- no debconf information
Bug#1024615: dirvish-expire deletes newest image when all good images expired
Package: dirvish Version: 1.2.1-2.1 Severity: normal X-Debbugs-Cc: f96h...@chalmers.se Dear Maintainer, It looks like the sort order in sub imsort() at the end of dirvish-expire.pl despite the comment makes it remove the newest images in case all good images are expired. When all good images are expired no image will set $unexpired{} during findop(). Then imsort() order the images by increasing creation time, and thus the oldest images are handled first during the removal loop for $expire (sort {imsort()} @expires) Note: in upstream, that is (I've tried both, same result) for $expire (sort(imsort @expires)) Therefore the oldest image is not deleted (No unexpired good images) and sets $unexpired{}. Then it proceeds by removing all newer images, since $unexpired{} is set. My situation: Image ExpireStatus 20220731_0457 2022-08-28 04:57:30 success 20221113_1745 2022-11-16 17:45:19 incomplete (empty creation date) 20221119_0901 2022-11-22 09:01:09 incomplete (not expired) 20221119_1745 2022-11-20 17:45:27 success When I run 'dirvish-expire --no-run' it suggests the following: " Expiring images as of 2022-11-20 23:01:56 VAULT:BRANCHIMAGE CREATED EXPIRED cannot expire nr_data1:default:20221113_1745 No unexpired good images cannot expire nr_data1:default:20220731_0457 No unexpired good images nr_data1:default 20221119_1745 2022-11-20 21:35 +3 days == 2022-11-20 17:45 " I.e. it would delete the new good image and keep the old. If the creation data sort order is reversed from || $$a{created} cmp $$b{created}; to || $$b{created} cmp $$a{created}; I get: " VAULT:BRANCHIMAGE CREATED EXPIRED cannot expire nr_data1:default:20221119_1745 No unexpired good images nr_data1:default 20220731_0457 2022-07-31 05:07 +28 days == 2022-08-28 04:57 nr_data1:default 20221113_1745 +3 days == 2022-11-16 17:45 " The warning just above imsort() of course makes me a bit worried if I am right: ## WARNING: don't mess with the sort order, it is needed so that if ## WARNING: all images are expired the newest will be retained. but the system has now repeatedly removed some new good copies. Reason it ended up doing this is that the backup takes more time than the expiration period of the new images. It was also offline for a while, explaining the long times and rather old data of the previous good image. (Note: the issue was also reported upstream https://lists.dirvish.org/pipermail/dirvish/2022-November/003354.html which I guess was reporting in the wrong order.) Best regards, Håkan -- System Information: Debian Release: 11.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-19-cloud-amd64 (SMP w/8 CPU threads) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages dirvish depends on: ii libtime-parsedate-perl 2015.103-3 ii libtime-period-perl 1.25-1 ii perl5.32.1-4+deb11u2 ii rsync 3.2.3-4+deb11u1 Versions of packages dirvish recommends: ii ssh 1:8.4p1-5+deb11u1 dirvish suggests no packages. -- Configuration Files: /etc/cron.d/dirvish changed [not included] /etc/dirvish/dirvish-cronjob changed [not included] -- no debconf information
Bug#982459: mdadm examine corrupts host ext4
On Sun, 31 Jul 2022, Chris Hofstaedtler wrote: I can't see a difference that should matter from userspace. I have stared a bit at the kernel code... there have been quite some changes and fixes in this area. Which kernel version were you running when testing this? Could you retry on something >= 5.9? I.e. some version with patch 08fc1ab6d748ab1a690fd483f41e2938984ce353. Dear Chris, I believe that I was running 5.10 (bullseye). It looks like 5.18 (from backports) does not show the issue! (i.e. works) Some more details: I have now tried again: host: linux-image-5.10.0-16-amd64 5.10.127-2 mdadm 4.2-1~bpo11+1 chroot: mdadm 4.1-11 Some more details: This time I did get some dmesg BUG output as well (attached). It does not seem to be the same backtrace on two occurances. I also noticed that the BUG: report in dmesg does not happen directly when doing 'mdadm --examine --scan --config=partitions'. It rather occurs when some activity happens on the host filesystem, e.g. a 'touch /root/a' command. host: linux-image-5.18.0-0.bpo.1-amd64 5.18.2-1~bpo11+1 (did not re-install anything else, except upgraded zfs, also from backports (since pure bullseye would not compile with 5.18)) Does not exhibit the problem. I have tried with both kernels several times, and it was repeatable that 5.10 got stuck while 5.18 does not show issues. Reminder: to get the issue, /dev/ should not be mounted in the chroot. With /dev/ mounted, 5.10 also works. Best regards, Håkan[mÃ¥n aug 1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0010 [mÃ¥n aug 1 15:53:08 2022] #PF: supervisor read access in kernel mode [mÃ¥n aug 1 15:53:08 2022] #PF: error_code(0x) - not-present page [mÃ¥n aug 1 15:53:08 2022] PGD 0 P4D 0 [mÃ¥n aug 1 15:53:08 2022] Oops: [#1] SMP PTI [mÃ¥n aug 1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P OE 5.10.0-16-amd64 #1 Debian 5.10.127-2 [mÃ¥n aug 1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005 [mÃ¥n aug 1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4] [mÃ¥n aug 1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00 [mÃ¥n aug 1 15:53:08 2022] RSP: 0018:ae27c059fd60 EFLAGS: 00010246 [mÃ¥n aug 1 15:53:08 2022] RAX: RBX: 9d1b94505480 RCX: 9d1bc52e5e38 [mÃ¥n aug 1 15:53:08 2022] RDX: 9d1bc13782d8 RSI: 0c14 RDI: c096feb0 [mÃ¥n aug 1 15:53:08 2022] RBP: 9d1bc52e5e38 R08: 9d1be04d5230 R09: 0001 [mÃ¥n aug 1 15:53:08 2022] R10: 9d1bc985f000 R11: 001d R12: 9d1bc13782d8 [mÃ¥n aug 1 15:53:08 2022] R13: 9d1be04d5000 R14: 0c14 R15: 9d1bc13782d8 [mÃ¥n aug 1 15:53:08 2022] FS: 7fed5ecb1840() GS:9d1cd7c8() knlGS: [mÃ¥n aug 1 15:53:08 2022] CS: 0010 DS: ES: CR0: 80050033 [mÃ¥n aug 1 15:53:08 2022] CR2: 0010 CR3: 0001a46d8000 CR4: 06e0 [mÃ¥n aug 1 15:53:08 2022] Call Trace: [mÃ¥n aug 1 15:53:08 2022] ext4_orphan_del+0x23f/0x290 [ext4] [mÃ¥n aug 1 15:53:08 2022] ext4_evict_inode+0x31f/0x630 [ext4] [mÃ¥n aug 1 15:53:08 2022] evict+0xd1/0x1a0 [mÃ¥n aug 1 15:53:08 2022] __dentry_kill+0xe4/0x180 [mÃ¥n aug 1 15:53:08 2022] dput+0x149/0x2f0 [mÃ¥n aug 1 15:53:08 2022] __fput+0xe4/0x240 [mÃ¥n aug 1 15:53:08 2022] task_work_run+0x65/0xa0 [mÃ¥n aug 1 15:53:08 2022] exit_to_user_mode_prepare+0x111/0x120 [mÃ¥n aug 1 15:53:08 2022] syscall_exit_to_user_mode+0x28/0x140 [mÃ¥n aug 1 15:53:08 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [mÃ¥n aug 1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77 [mÃ¥n aug 1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8 [mÃ¥n aug 1 15:53:08 2022] RSP: 002b:7ffd50452818 EFLAGS: 0202 ORIG_RAX: 0003 [mÃ¥n aug 1 15:53:08 2022] RAX: RBX: 55dab4578910 RCX: 7fed5eea2d77 [mÃ¥n aug 1 15:53:08 2022] RDX: 7fed5ef6e8a0 RSI: RDI: 0006 [mÃ¥n aug 1 15:53:08 2022] RBP: R08: R09: 7fed5ef6dbe0 [mÃ¥n aug 1 15:53:08 2022] R10: 006f R11: 0202 R12: 7fed5ef6f4a0 [mÃ¥n aug 1 15:53:08 2022] R13: R14: R15: 0001 [mÃ¥n aug 1 15:53:08 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack
Bug#982459:
Hi, I believe that I have been hit by this bug too. What has happened for me is that the machine in question 'almost' locks up, with a read-only /, and such that most commands to debug further never complete due to waiting for filesystem action. It then requires a reboot. 'dmesg' has worked, and then shows ext4-related issues. However, they were not recorded to /var/log. I generally do not find any corruption on the filesystem itself when running fsck afterwards. On the machine I have a number of chroot debian installations of different releases. By pure chance I found that 'update-initramfs' was the trigger for the system hangs. I could then repeatably trigger the issue again. (Before this, it would happen as part of system maintenance (unattended upgrades in the chroots), so just spuriously hang the machine.) In my case, the chroot installations live on a ZFS filesystem. But the host system itself is on (multiple; /, /usr/, /var/ ) MD raid1. I have had /proc mounted in the chroots. But had forgotten /dev . After mounting /dev (and /dev/pts) in the chroots, the issue has not happened again. The issue was when the host system ran Buster, I then upgraded to Bullseye ~2 weeks ago, hoping it would be resolved, but the issue was still present after the upgrade. Only after that upgrade I found the update-initramfs trigger. I am running with sysvinit, both on host and chroots. Currently, I do not have hands-on access to the system, so cannot inspect or reboot it reliably. Should be able to do some further tests in a few weeks. Best regards, Håkan