Bug#1070299: Acknowledgement (gcc-14: Wrong vectorized code generated with -O3, ok without -O.)

2024-05-04 Thread Håkan T Johansson



This issue turned out to not be an gcc issue, but a badly declared 
flexible / 'zero-length array' at the end of the structure, which then 
relied on undefined behaviour.  The declared size (here [4]) was then 
apparently taken into account in the code generation.


I do not know of a way to diagnose this kind of use without warning for 
basically all kinds of arrays at the end of structures.  Though it would 
be nice, as it was not straightforward to find the issue.


Perhaps a note in the gcc-14 upgrade notes that the compiler now uses 
declared array sizes (more) to influence loop execution might be useful.


Except for that, this bug can be closed.



Bug#1070299: gcc-14: Wrong vectorized code generated with -O3, ok without -O.

2024-05-03 Thread Håkan T Johansson
Package: gcc-14
Version: 14-20240429-1
Severity: important
X-Debbugs-Cc: f96h...@chalmers.se

Dear Maintainer,

When compiling the attached code, the pexpo_keep_min() function fails
to handle the fifth item in the list if it was compiled with -O3.

Compiled without an -O option, it works as expected.

I have looked a bit, but not deeply, in the assembler code, and it looks
like the first four items are handled with an vectorized operation.
Then the fifth item is 'forgotten'.

The problem also applies to 14-20240330-1.  Does not happen with 13.2.0-23.

Compiling the offending function in a separate file, since if the printf()
is present with the compilation, the issue does not manifest.

The expected output is

100 200 300 400 500

The bad output is

100 200 300 400 5

Best regards,
Håkan

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-20-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, 
TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages gcc-14 depends on:
ii  binutils 2.42-4
ii  gcc-14-base  14-20240429-1
ii  gcc-14-x86-64-linux-gnu  14-20240429-1

Versions of packages gcc-14 recommends:
ii  libc6-dev  2.37-19

Versions of packages gcc-14 suggests:
pn  gcc-14-doc   
pn  gcc-14-locales   
pn  gcc-14-multilib  

-- no debconf information

*** prime_factor.c
#include "prime_factor.h"

void pexpo_keep_min(struct prime_exponents *a,
const struct prime_exponents *b)
{
  int i;

  for (i = 0; i < a->num_blocks; i++)
{
  a->expo[i] =
(b->expo[i] < a->expo[i]) ?
 b->expo[i] : a->expo[i];
}
}

*** prime_factor.h
#include 

typedef int32_t   prime_exp_t;

struct prime_exponents
{
  int num_blocks;
  union
  {
prime_exp_t  expo[4];
  };
};

void pexpo_keep_min(struct prime_exponents *keep_fpf,
const struct prime_exponents *in_fpf);

*** test.c
#include "prime_factor.h"
#include 

int test(struct prime_exponents *a, struct prime_exponents *b)
{
  a->num_blocks = 5;
  a->expo[0] = 1;
  a->expo[1] = 2;
  a->expo[2] = 3;
  a->expo[3] = 4;
  a->expo[4] = 5;

  b->num_blocks = 5;
  b->expo[0] = 100;
  b->expo[1] = 200;
  b->expo[2] = 300;
  b->expo[3] = 400;
  b->expo[4] = 500;

  pexpo_keep_min(a, b);

  printf ("%d %d %d %d %d\n",
  a->expo[0],
  a->expo[1],
  a->expo[2],
  a->expo[3],
  a->expo[4]);
}

int main()
{
  int area1[16];
  int area2[16];

  test((struct prime_exponents *) area1,
   (struct prime_exponents *) area2);

  return 0;
}

*** run.sh

gcc-14 prime_factor.c -c -o pf-O3.o -O3
gcc-14 prime_factor.c -c -o pf.o
gcc-14 test.c -c -o test.o -O3
gcc-14 test.o pf-O3.o -o test-O3
gcc-14 test.o pf.o-o test

./test-O3
./test


Bug#1024620: dirvish-expire need to mark image as being removed during removal

2022-11-22 Thread Håkan T Johansson
Package: dirvish
Version: 1.2.1-2.1
Severity: normal
X-Debbugs-Cc: f96h...@chalmers.se

Dear Maintainer,

As far as I can see, the removal of an image during expiry consist of 
two steps:

1. Remove the /tree/ directory in image.
2. Remove the image directory.

However, if e.g. the machine is rebooted during step 1, then we are left 
with an incomplete tree, but if a user (or dirvish itself) just reads 
the status file of the image directory, it still looks like an 
successful image.

To avoid that, it could be helpful with a first step:

0. Append 'Status: Removing' or something such to the status file, such 
that it no longer is mistaken as a good image.

I have not come up with a sequence of events where the normal dirvish 
scripts would make mistakes due to this, since it should never try to 
delete the latest good image.

But e.g. a user which is low on diskspace may be trying to remove images 
manually and then it would be helpful if the status file has been 
updated also before any removal, such that manual inspection of 
existing images does not assume a half-removed image is a good one.

(Note: the issue was also reported upstream
https://lists.dirvish.org/pipermail/dirvish/2022-November/003353.html
which I guess was reporting in the wrong order.)


-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-19-cloud-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages dirvish depends on:
ii  libtime-parsedate-perl  2015.103-3
ii  libtime-period-perl 1.25-1
ii  perl5.32.1-4+deb11u2
ii  rsync   3.2.3-4+deb11u1

Versions of packages dirvish recommends:
ii  ssh  1:8.4p1-5+deb11u1

dirvish suggests no packages.

-- Configuration Files:
/etc/cron.d/dirvish changed [not included]
/etc/dirvish/dirvish-cronjob changed [not included]

-- no debconf information



Bug#1024615: dirvish-expire deletes newest image when all good images expired

2022-11-22 Thread Håkan T Johansson
Package: dirvish
Version: 1.2.1-2.1
Severity: normal
X-Debbugs-Cc: f96h...@chalmers.se

Dear Maintainer,

It looks like the sort order in sub imsort() at the end of 
dirvish-expire.pl despite the comment makes it remove the newest images 
in case all good images are expired.

When all good images are expired no image will set $unexpired{} during 
findop().

Then imsort() order the images by increasing creation time, and thus the 
oldest images are handled first during the removal loop

 for $expire (sort {imsort()} @expires)

Note: in upstream, that is (I've tried both, same result)

 for $expire (sort(imsort @expires))

Therefore the oldest image is not deleted (No unexpired good images) and 
sets $unexpired{}.  Then it proceeds by removing all newer images, since 
$unexpired{} is set.

My situation:

Image ExpireStatus
20220731_0457 2022-08-28 04:57:30   success
20221113_1745 2022-11-16 17:45:19   incomplete (empty creation date)
20221119_0901 2022-11-22 09:01:09   incomplete (not expired)
20221119_1745 2022-11-20 17:45:27   success

When I run 'dirvish-expire --no-run' it suggests the following:

"
Expiring images as of 2022-11-20 23:01:56

VAULT:BRANCHIMAGE   CREATED   EXPIRED
cannot expire nr_data1:default:20221113_1745 No unexpired good images
cannot expire nr_data1:default:20220731_0457 No unexpired good images
nr_data1:default 20221119_1745   2022-11-20 21:35  +3 days == 2022-11-20 17:45
"

I.e. it would delete the new good image and keep the old.

If the creation data sort order is reversed from

|| $$a{created} cmp $$b{created};

to

|| $$b{created} cmp $$a{created};

I get:

"
VAULT:BRANCHIMAGE   CREATED   EXPIRED
cannot expire nr_data1:default:20221119_1745 No unexpired good images
nr_data1:default 20220731_0457   2022-07-31 05:07  +28 days == 2022-08-28 04:57
nr_data1:default 20221113_1745 +3 days == 2022-11-16 17:45
"

The warning just above imsort() of course makes me a bit worried if I am 
right:

## WARNING:  don't mess with the sort order, it is needed so that if
## WARNING:  all images are expired the newest will be retained.

but the system has now repeatedly removed some new good copies.  Reason 
it ended up doing this is that the backup takes more time than the 
expiration period of the new images.  It was also offline for a while, 
explaining the long times and rather old data of the previous good image.

(Note: the issue was also reported upstream
https://lists.dirvish.org/pipermail/dirvish/2022-November/003354.html
which I guess was reporting in the wrong order.)

Best regards,
Håkan


-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-19-cloud-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages dirvish depends on:
ii  libtime-parsedate-perl  2015.103-3
ii  libtime-period-perl 1.25-1
ii  perl5.32.1-4+deb11u2
ii  rsync   3.2.3-4+deb11u1

Versions of packages dirvish recommends:
ii  ssh  1:8.4p1-5+deb11u1

dirvish suggests no packages.

-- Configuration Files:
/etc/cron.d/dirvish changed [not included]
/etc/dirvish/dirvish-cronjob changed [not included]

-- no debconf information


Bug#982459: mdadm examine corrupts host ext4

2022-08-01 Thread Håkan T Johansson


On Sun, 31 Jul 2022, Chris Hofstaedtler wrote:


I can't see a difference that should matter from userspace.

I have stared a bit at the kernel code... there have been quite some
changes and fixes in this area. Which kernel version were you
running when testing this?

Could you retry on something >= 5.9? I.e. some version with patch
   08fc1ab6d748ab1a690fd483f41e2938984ce353.


Dear Chris,

I believe that I was running 5.10 (bullseye).

It looks like 5.18 (from backports) does not show the issue!  (i.e. works)

Some more details:

I have now tried again:

host:
  linux-image-5.10.0-16-amd64   5.10.127-2
  mdadm 4.2-1~bpo11+1
chroot:
  mdadm 4.1-11

  Some more details:

  This time I did get some dmesg BUG output as well (attached).
  It does not seem to be the same backtrace on two occurances.

  I also noticed that the BUG: report in dmesg does not happen directly
  when doing 'mdadm --examine --scan --config=partitions'.  It rather
  occurs when some activity happens on the host filesystem, e.g.
  a 'touch /root/a' command.

host:
  linux-image-5.18.0-0.bpo.1-amd64  5.18.2-1~bpo11+1

  (did not re-install anything else, except upgraded zfs, also from
  backports (since pure bullseye would not compile with 5.18))

  Does not exhibit the problem.

I have tried with both kernels several times, and it was repeatable that 
5.10 got stuck while 5.18 does not show issues.


Reminder: to get the issue, /dev/ should not be mounted in the chroot.
With /dev/ mounted, 5.10 also works.

Best regards,
Håkan[mÃ¥n aug  1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 
0010
[mån aug  1 15:53:08 2022] #PF: supervisor read access in kernel mode
[mån aug  1 15:53:08 2022] #PF: error_code(0x) - not-present page
[mån aug  1 15:53:08 2022] PGD 0 P4D 0 
[mån aug  1 15:53:08 2022] Oops:  [#1] SMP PTI
[mån aug  1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P   
OE 5.10.0-16-amd64 #1 Debian 5.10.127-2
[mån aug  1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 
2850/0T7971, BIOS A04 09/22/2005
[mån aug  1 15:53:08 2022] RIP: 
0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 
41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 
30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 15:53:08 2022] RSP: 0018:ae27c059fd60 EFLAGS: 00010246
[mån aug  1 15:53:08 2022] RAX:  RBX: 9d1b94505480 RCX: 
9d1bc52e5e38
[mån aug  1 15:53:08 2022] RDX: 9d1bc13782d8 RSI: 0c14 RDI: 
c096feb0
[mån aug  1 15:53:08 2022] RBP: 9d1bc52e5e38 R08: 9d1be04d5230 R09: 
0001
[mån aug  1 15:53:08 2022] R10: 9d1bc985f000 R11: 001d R12: 
9d1bc13782d8
[mån aug  1 15:53:08 2022] R13: 9d1be04d5000 R14: 0c14 R15: 
9d1bc13782d8
[mån aug  1 15:53:08 2022] FS:  7fed5ecb1840() 
GS:9d1cd7c8() knlGS:
[mån aug  1 15:53:08 2022] CS:  0010 DS:  ES:  CR0: 80050033
[mån aug  1 15:53:08 2022] CR2: 0010 CR3: 0001a46d8000 CR4: 
06e0
[mån aug  1 15:53:08 2022] Call Trace:
[mån aug  1 15:53:08 2022]  ext4_orphan_del+0x23f/0x290 [ext4]
[mån aug  1 15:53:08 2022]  ext4_evict_inode+0x31f/0x630 [ext4]
[mån aug  1 15:53:08 2022]  evict+0xd1/0x1a0
[mån aug  1 15:53:08 2022]  __dentry_kill+0xe4/0x180
[mån aug  1 15:53:08 2022]  dput+0x149/0x2f0
[mån aug  1 15:53:08 2022]  __fput+0xe4/0x240
[mån aug  1 15:53:08 2022]  task_work_run+0x65/0xa0
[mån aug  1 15:53:08 2022]  exit_to_user_mode_prepare+0x111/0x120
[mån aug  1 15:53:08 2022]  syscall_exit_to_user_mode+0x28/0x140
[mån aug  1 15:53:08 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mån aug  1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77
[mån aug  1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 
b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8
[mån aug  1 15:53:08 2022] RSP: 002b:7ffd50452818 EFLAGS: 0202 
ORIG_RAX: 0003
[mån aug  1 15:53:08 2022] RAX:  RBX: 55dab4578910 RCX: 
7fed5eea2d77
[mån aug  1 15:53:08 2022] RDX: 7fed5ef6e8a0 RSI:  RDI: 
0006
[mån aug  1 15:53:08 2022] RBP:  R08:  R09: 
7fed5ef6dbe0
[mån aug  1 15:53:08 2022] R10: 006f R11: 0202 R12: 
7fed5ef6f4a0
[mån aug  1 15:53:08 2022] R13:  R14:  R15: 
0001
[mån aug  1 15:53:08 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss 
nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent 
xt_multiport xt_tcpudp xt_state xt_conntrack 

Bug#982459:

2021-08-15 Thread Håkan T Johansson


Hi,

I believe that I have been hit by this bug too.

What has happened for me is that the machine in question 'almost' locks 
up, with a read-only /, and such that most commands to debug further never 
complete due to waiting for filesystem action.  It then requires a reboot.


'dmesg' has worked, and then shows ext4-related issues.  However, they 
were not recorded to /var/log.  I generally do not find any corruption on 
the filesystem itself when running fsck afterwards.


On the machine I have a number of chroot debian installations of different 
releases. By pure chance I found that 'update-initramfs' was the trigger 
for the system hangs. I could then repeatably trigger the issue again.
(Before this, it would happen as part of system maintenance (unattended 
upgrades in the chroots), so just spuriously hang the machine.)


In my case, the chroot installations live on a ZFS filesystem.  But the 
host system itself is on (multiple; /, /usr/, /var/ ) MD raid1.


I have had /proc mounted in the chroots.  But had forgotten /dev .  After 
mounting /dev (and /dev/pts) in the chroots, the issue has not happened 
again.


The issue was when the host system ran Buster, I then upgraded to Bullseye 
~2 weeks ago, hoping it would be resolved, but the issue was still present 
after the upgrade.  Only after that upgrade I found the update-initramfs 
trigger.


I am running with sysvinit, both on host and chroots.

Currently, I do not have hands-on access to the system, so cannot inspect 
or reboot it reliably.  Should be able to do some further tests in a few 
weeks.


Best regards,
Håkan