Bug#1063915: mirror submission for debian.mirrors.ovh.net
On 3/22/24 17:42, Adam D. Barratt wrote: Control: tags -1 + moreinfo On Wed, 2024-02-14 at 20:03 +, OVHcloud wrote: Site: debian.mirrors.ovh.net Archive-architecture: ALL amd64 arm64 armel armhf hurd-i386 hurd- amd64 i386 mips mips64el mipsel powerpc ppc64el riscv64 s390x Archive-http: /debian/ Maintainer: OVHcloud Country: FR France Location: Anycast (Gravelines, Roubaix and Strasbourg) Hi Adam, First, let me just explain in a bit more detail how requests are handled. Our DNS A record for debian.mirrors.ovh.net only points to one anycast IP address with 4 locations. Once it receives a request, the following happens: Load balancer (anycast IP) in Beauharnois (Canada)→ Caches in Beauharnois → Backend in Canada if cache miss Load balancer (anycast IP) in Gravelines (France)→ Caches in Gravelines → Backend in France if cache miss Load balancer (anycast IP) in Roubaix(France)→Caches in Roubaix→ Backend in France if cache miss Load balancer (anycast IP) in Strasbourg(France)→Caches in Strasbourg→ Backend in France if cache miss I know there was some discussion on IRC, so apologies if I'm rehashing here, but: - are the individual backends exposed in any way? Not publicly, but if you were to give me me a source IP address, I could provide you with an rsync account that has access to both our backends. We already do this for other distributions. - how do you ensure that the backends are in sync with each other? We take ZFS snapshots after each sync and we send these to the backends. Once the latest snapshot has been sent to both backends, they start pointing to it. This last operation is performed simultaneously on both. After this, we compile a list of changed URLs and issue a series of HTTP PURGE requests to every cache server, simultaneously too. We also have monitoring in place to detect discrepancies between the backends for all our mirrors. - what are the chances of users seeing inconsistent state if they hit different backends which aren't at the same stage of updating? I think the chances are pretty low as we try to run all phases of the sync simultaneously. The most sensitive part would be the cache invalidation process which might purge some URLs at slightly different times. All I can say is that, despite the high number of servers already relying on our Debian mirror, we have never heard of errors caused by this. Regards, Adam I hope this answers all your questions. Have a nice day, Louis
Bug#1040349: ipmiutil: postinst script fails when "ipmiutil sel" works but "ipmiutil sensor" fails
Package: ipmiutil Version: 3.1.8-4 Hi, In some cases (Intel S2600STB boards), ipmiutil fails to install the first time: Setting up ipmiutil (3.1.8-4) ... SDR record 61 is malformed, length 12 is less than minimum 12 0061 GetSDR error -25, rlen = 10 dpkg: error processing package ipmiutil (--configure): installed ipmiutil package post-installation script subprocess returned error exit status 231 Processing triggers for man-db (2.11.2-2) ... Errors were encountered while processing: ipmiutil E: Sub-process /usr/bin/dpkg returned an error code (1) The second time around, it works because /var/lib/ipmiutil/sensor_out.txt now exists. The issue can be reproduced by removing /var/lib/ipmiutil/sensor_out.txt and running the postinst script again (with -x added here to highlight the problem): # dpkg-reconfigure ipmiutil + sbindir=/usr/bin + vardir=/var/lib/ipmiutil + sensorout=/var/lib/ipmiutil/sensor_out.txt + mkdir -p /var/lib/ipmiutil + IPMIcmd=true + /usr/bin/ipmiutil sel -v + true + [ ! -f /var/lib/ipmiutil/sensor_out.txt ] + /usr/bin/ipmiutil sensor -q SDR record 61 is malformed, length 12 is less than minimum 12 0061 GetSDR error -25, rlen = 10 The issue here is that "ipmiutil sel" works but "ipmi sensor" does not. https://jff.email/cgit/ipmiutil.git/tree/debian/postinst?h=debian/3.1.9-1#n16 https://jff.email/cgit/ipmiutil.git/tree/debian/postinst?h=debian/3.1.9-1#n22 The fact that a failing "ipmiutil sensor" command prevents the installation is problematic. Would it please be possible to make this failure non-fatal? https://bugs.launchpad.net/ubuntu/+source/ipmiutil/+bug/1786562 seems to be the same issue, it concerns the same error message with an Intel S2600STB. PS: maybe there is also an upstream ipmiutil bug to report here? "length 12 is less than minimum 12" sounds like an off-by-one error. Kind regards, Louis
Bug#1038818: grub-pc: Empty device (??? MB) in list during postinst with ZFS on / or /boot
Package: grub-pc Version: 2.06-13 Hi, When ZFS is used for the / or /boot partitions on legacy boot servers (with vdevs using more than one disk/partition), grub-pc's postinst shows this during the configure phase: ┌──┤ Configuring grub-pc ├──┐ │ GRUB install devices: │ │ │ │ [*] /dev/sda (240057 MB; SAMSUNG_MZ7LM240HMHQ-5) │ │ [*] /dev/sdb (240057 MB; SAMSUNG_MZ7LM240HMHQ-5) │ │ [ ] (??? MB; ???) │ │ [ ] /dev/md2 (1071 MB; md2) │ │ │ │ │ │ │ │ │ └───┘ Along with errors: Unknown device "/dev/": No such device To reproduce, on a legacy boot server: * Create a zpool with a mirror vdev using two partitions. * Create a dataset and use it as /. * Install Debian Bookworm on the server. * Run "dpkg-reconfigure grub-pc". On my test server: root@localhost:~# df -hT Filesystem Type Size Used Avail Use% Mounted on udev devtmpfs 7.8G 0 7.8G 0% /dev tmpfs tmpfs 1.6G 820K 1.6G 1% /run zp0/zd0 zfs 216G 1.5G 214G 1% / tmpfs tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock /dev/md2 ext4 988M 66M 855M 8% /boot tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000 root@localhost:~# lsblk -f NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS sda ├─sda1 ├─sda2 linux_raid_member 1.2 md2 371c03a8-59c4-5eb7-7fda-f6f5497ae463 │ └─md2 ext4 1.0 boot 197e5606-e75e-4809-87f9-13fbc42ed36d 854.7M 7% /boot ├─sda3 zfs_member 5000 zp0 8324519610909055574 ├─sda4 swap 1 swap-sda4 08d6106d-f50d-4391-8515-5b2b8ffa1d6b [SWAP] └─sda5 iso9660 Joliet Extension config-2 2023-06-16-18-40-55-00 sdb ├─sdb1 ├─sdb2 linux_raid_member 1.2 md2 371c03a8-59c4-5eb7-7fda-f6f5497ae463 │ └─md2 ext4 1.0 boot 197e5606-e75e-4809-87f9-13fbc42ed36d 854.7M 7% /boot ├─sdb3 zfs_member 5000 zp0 8324519610909055574 └─sdb4 swap 1 swap-sdb4 55b30849-9949-4da0-9738-43e04b40a220 [SWAP] root@localhost:~# zfs list NAME USED AVAIL REFER MOUNTPOINT zp0 1.42G 214G 24K none zp0/zd0 1.42G 214G 1.42G / root@localhost:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zp0 222G 1.42G 221G - - 0% 0% 1.00x ONLINE - root@localhost:~# zpool status pool: zp0 state: ONLINE config: NAME STATE READ WRITE CKSUM zp0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda3 ONLINE 0 0 0 sdb3 ONLINE 0 0 0 errors: No known data errors root@localhost:~# cat /etc/fstab UUID=197e5606-e75e-4809-87f9-13fbc42ed36d /boot ext4 defaults 0 0 UUID=08d6106d-f50d-4391-8515-5b2b8ffa1d6b swap swap defaults 0 0 UUID=55b30849-9949-4da0-9738-43e04b40a220 swap swap defaults 0 0 root@localhost:~# dpkg-reconfigure grub-pc # With set -x / set +x added around line 281 of /var/lib/dpkg/info/grub-pc.postinst +++ grub-probe -t device / ++ partition='/dev/sda3 /dev/sdb3' ++ set +x +++ grub-probe -t device /boot ++ partition=/dev/md2 ++ set +x +++ grub-probe -t device /boot/grub ++ partition=/dev/md2 ++ set +x Unknown device "/dev/": No such device Unknown device "/dev/": No such device Unknown device "/dev/": No such device Unknown device "/dev/": No such device grub-pc: Running grub-install ... Installing for i386-pc platform. Installation finished. No error reported. grub-install success for /dev/sda Installing for i386-pc platform. Installation finished. No error reported. grub-install success for /dev/sdb Generating grub configuration file ... Found linux image: /boot/vmlinuz-6.1.0-9-amd64 Found initrd image: /boot/initrd.img-6.1.0-9-amd64 done My understanding is that the following happens: * usable_partitions is called: https://salsa.debian.org/grub-team/grub/-/blob/debian/2.06-13/debian/postinst.in#L275 * partition="$(grub-probe -t device "$path" || true)" is called: https://salsa.debian.org/grub-team/grub/-/blob/debian/2.06-13/debian/postinst.in#L281 * It returns this: # grub-probe -t device / /dev/sda3 /dev/sdb3 * partition_id="$(device_to_id "$partition" || true)"
Bug#1003528: zfsutils-linux: datasets with mountpoint=legacy defined in fstab prevent the system from booting
Package: zfsutils-linux Version: 2.0.3-9 |Severity: important| Hi, Currently, creating a ZFS dataset with mountpoint=legacy and adding it to /etc/fstab as auto causes the system to hang at boot because the mount is attempted before the pool has been imported or the ZFS module loaded. How to reproduce: apt install zfsutils-linux modprobe zfs zpool create tank sdb zfs set mountpoint=legacy tank echo "tank /mnt zfs defaults 0 0" >> /etc/fstab reboot The boot process will hang because /mnt did not mount properly. "journalctl -b -u mnt.mount" shows the following (I don't know why the dates aren't sorted): Jan 11 11:41:16 localhost mount[702]: The ZFS modules are not loaded. Jan 11 11:41:16 localhost mount[702]: Try running '/sbin/modprobe zfs' as root to load them. Jan 11 11:41:13 localhost systemd[1]: Mounting /mnt... Jan 11 11:41:14 localhost systemd[1]: mnt.mount: Mount process exited, code=exited, status=2/INVALIDARGUMENT Jan 11 11:41:14 localhost systemd[1]: mnt.mount: Failed with result 'exit-code'. Jan 11 11:41:14 localhost systemd[1]: Failed to mount /mnt. "journalctl -b -u zfs-load-module.service" shows that the module was loaded afterwards: Jan 11 11:41:19 localhost systemd[1]: Starting Install ZFS kernel module... Jan 11 11:41:19 localhost systemd[1]: Finished Install ZFS kernel module. The same goes for "journalctl -b -u zfs-import-cache.service": Jan 11 11:41:19 localhost systemd[1]: Starting Import ZFS pools by cache file... Jan 11 11:41:19 localhost systemd[1]: Finished Import ZFS pools by cache file. I worked around the problem by making sure the zfs-import.target (used by zfs-import-{cache,scan}.service) is active before mounts are attempted. Contents of /etc/systemd/system/zfs-import.target.d/override.conf [Unit] Before=local-fs-pre.target Should Debian edit /lib/systemd/system/zfs-import.target to include this? Should I report this bug upstream? Are there any dependency loop risks I might have overlooked? Kind regards, Louis OpenPGP_signature Description: OpenPGP digital signature
Bug#974563: Security update of pacemaker
On 28/12/2020 00:24, Markus Koschany wrote: Hello, I have prepared a new security update of pacemaker, the latest version in the 1.1.x series. The update will fix CVE-2018-16877, CVE-2018-16878 and CVE-2020- 25654. I would appreciate it if you could test this version before it is uploaded to stretch-security again. You can find all Debian packages at https://people.debian.org/~apo/lts/pacemaker/ including the source package if you prefer to compile pacemaker from source. If I don't get any negative feedback I intend to upload pacemaker 1.1.24- 0+deb9u1 on 06.01.2021. Regards, Markus Hi Markus, Thanks for letting us know beforehand. I have installed version 1.1.24-0+deb9u0 and it seems to work fine. Kind regards, Louis OpenPGP_signature Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On 17/11/2020 18:41, Alejandro Taboada wrote: > Thank you Markus, > > I just updated deb9u2 and works fine. Let me know when you have new updates > and I can test this thing. > > Regards, > Alejandro > >> On 17 Nov 2020, at 05:16, Markus Koschany wrote: >> >> Control: severity -1 normal >> >> Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada: >>> Hi Markus, >>> >>> Sorry for the delay. With this patch works when is applied only to 1 node. >>> The services restart and the arm resources are up. >>> The problem appears again when I install the patch on a 2nd node. The the >>> resources stopped again. >> >> Hello Alejandro, >> >> thanks for your feedback. At the moment I cannot reproduce the problem hence >> I >> have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of >> pacemaker to stretch-security which restores the old behavior. The regression >> tests shipped with pacemaker also don't report anything unusual. I will keep >> this bug report open for discussions and work on another update. This time I >> intend to upgrade pacemaker to the latest upstream release in the 1.1.x >> branch >> which is currently 1.1.24~rc1. This one also includes fixes for >> CVE-2018-16878 >> and CVE-2018-16877. I expect no big changes in terms of existing features >> but I >> will send new packages for testing before I upload a new upstream release. >> >> Regards, >> >> Markus > > I can confirm that 1.1.16-1+deb9u2 works as expected, thanks for the fix. Kind regards, Louis signature.asc Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On 13/11/2020 12:23, Alejandro Taboada wrote: > Maybe Corocync is not using peer communication? Could you check someway the > packet source address .. if it’s form localhost just allow, other check > permissions > I know is not ideal but will solve a tot of production issues in the > meanwhile. > > >> On 12 Nov 2020, at 23:20, Alejandro Taboada >> wrote: >> >> > > I'm not sure I understand what we need to look for. Aren't they communicating via UNIX sockets from abstract namespaces (@cib_rw@, @attrd@, etc.) ? That's what I see when I strace calls to "crm resource cleanup " which also fails with the patched version. signature.asc Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Package: pacemaker Version: 1.1.16-1+deb9u1 Severity: grave X-Debbugs-CC: a...@debian.org Hi, I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1 fixed it immediately). The logs contain a lot of warnings that seem to point to a permission problem, such as "Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd". I am not using ACLs so the patch should not impact my system. Here is an excerpt from the logs after the upgrade: Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_PENDING Nov 12 06:26:05 cluster-1 attrd[20866]: notice: Defaulting to uname -n for the local corosync node name Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_register' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not add resource service to LRM cluster-1 Nov 12 06:26:06 cluster-1 crmd[20868]:error: Invalid resource definition for service Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Resource service no longer exists in the lrmd Nov 12 06:26:06 cluster-1 crmd[20868]:error: Result of probe operation for service on cluster-1: Error Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Input I_FAIL received in state S_NOT_DC from get_lrm_resource Nov 12 06:26:06 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_RECOVERY Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Fast-tracking shutdown in response to errors Nov 12 06:26:06 cluster-1 crmd[20868]:error: Input I_TERMINATE received in state S_RECOVERY from do_recover Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from the LRM Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from Corosync Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not recover from internal error Nov 12 06:26:06 cluster-1 pacemakerd[20857]:error: The crmd process (20868) exited: Generic Pacemaker error (201) Nov 12 06:26:06 cluster-1 pacemakerd[20857]: notice: Respawning failed child process: crmd My corosync.conf is quite standard: totem { version: 2 cluster_name: debian token: 0 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: aes256 crypto_hash: sha256 interface { ringnumber: 0 bindnetaddr: xxx mcastaddr: yyy mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 } So is my crm configuration: node xxx: cluster-1 \ attributes standby=off node xxx: cluster-2 \ attributes standby=off primitive service systemd:service \ meta failure-timeout=30 \ op monitor interval=5 on-fail=restart timeout=15s primitive vip-1 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s primitive vip-2 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s clone clone_service service colocation service_vip-1 inf: vip-1 clone_service colocation service_vip-2 inf: vip-2 clone_service order kot_before_vip-1 inf: clone_service vip-1 order kot_before_vip-2 inf: clone_service vip-2 location prefer-cluster1-vip-1 vip-1 1: cluster-1 location prefer-cluster2-vip-2 vip-2 1: cluster-2 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \