Bug#1063915: mirror submission for debian.mirrors.ovh.net

2024-03-22 Thread Louis Sautier

On 3/22/24 17:42, Adam D. Barratt wrote:

Control: tags -1 + moreinfo

On Wed, 2024-02-14 at 20:03 +, OVHcloud wrote:

Site: debian.mirrors.ovh.net
Archive-architecture: ALL amd64 arm64 armel armhf hurd-i386 hurd-
amd64 i386 mips mips64el mipsel powerpc ppc64el riscv64 s390x
Archive-http: /debian/
Maintainer: OVHcloud
Country: FR France
Location: Anycast (Gravelines, Roubaix and Strasbourg)


Hi Adam,

First, let me just explain in a bit more detail how requests are 
handled. Our DNS A record for debian.mirrors.ovh.net only points to one 
anycast IP address with 4 locations. Once it receives a request, the 
following happens:


Load balancer (anycast IP) in Beauharnois (Canada)→ Caches in 
Beauharnois → Backend in Canada if cache miss


Load balancer (anycast IP) in Gravelines (France)→ Caches in Gravelines 
→ Backend in France if cache miss


Load balancer (anycast IP) in Roubaix(France)→Caches in Roubaix→ Backend 
in France if cache miss


Load balancer (anycast IP) in Strasbourg(France)→Caches in Strasbourg→ 
Backend in France if cache miss



I know there was some discussion on IRC, so apologies if I'm rehashing
here, but:

- are the individual backends exposed in any way?
Not publicly, but if you were to give me me a source IP address, I could 
provide you with an rsync account that has access to both our backends. 
We already do this for other distributions.

- how do you ensure that the backends are in sync with each other?


We take ZFS snapshots after each sync and we send these to the backends. 
Once the latest snapshot has been sent to both backends, they start 
pointing to it. This last operation is performed simultaneously on both.


After this, we compile a list of changed URLs and issue a series of HTTP 
PURGE requests to every cache server, simultaneously too.


We also have monitoring in place to detect discrepancies between the 
backends for all our mirrors.



- what are the chances of users seeing inconsistent state if they hit
different backends which aren't at the same stage of updating?


I think the chances are pretty low as we try to run all phases of the 
sync simultaneously. The most sensitive part would be the cache 
invalidation process which might purge some URLs at slightly different 
times.


All I can say is that, despite the high number of servers already 
relying on our Debian mirror, we have never heard of errors caused by this.



Regards,

Adam


I hope this answers all your questions.


Have a nice day,


Louis



Bug#1040349: ipmiutil: postinst script fails when "ipmiutil sel" works but "ipmiutil sensor" fails

2023-07-04 Thread Louis Sautier

Package: ipmiutil
Version: 3.1.8-4

Hi,

In some cases (Intel S2600STB boards), ipmiutil fails to install the 
first time:

Setting up ipmiutil (3.1.8-4) ...
SDR record 61 is malformed, length 12 is less than minimum 12
0061 GetSDR error -25, rlen = 10
dpkg: error processing package ipmiutil (--configure):
 installed ipmiutil package post-installation script subprocess 
returned error exit status 231

Processing triggers for man-db (2.11.2-2) ...
Errors were encountered while processing:
 ipmiutil
E: Sub-process /usr/bin/dpkg returned an error code (1)

The second time around, it works because 
/var/lib/ipmiutil/sensor_out.txt now exists.


The issue can be reproduced by removing /var/lib/ipmiutil/sensor_out.txt 
and running the postinst script again (with -x added here to highlight 
the problem):

# dpkg-reconfigure ipmiutil
+ sbindir=/usr/bin
+ vardir=/var/lib/ipmiutil
+ sensorout=/var/lib/ipmiutil/sensor_out.txt
+ mkdir -p /var/lib/ipmiutil
+ IPMIcmd=true
+ /usr/bin/ipmiutil sel -v
+ true
+ [ ! -f /var/lib/ipmiutil/sensor_out.txt ]
+ /usr/bin/ipmiutil sensor -q
SDR record 61 is malformed, length 12 is less than minimum 12
0061 GetSDR error -25, rlen = 10

The issue here is that "ipmiutil sel" works but "ipmi sensor" does not.
https://jff.email/cgit/ipmiutil.git/tree/debian/postinst?h=debian/3.1.9-1#n16
https://jff.email/cgit/ipmiutil.git/tree/debian/postinst?h=debian/3.1.9-1#n22

The fact that a failing "ipmiutil sensor" command prevents the 
installation is problematic. Would it please be possible to make this 
failure non-fatal?


https://bugs.launchpad.net/ubuntu/+source/ipmiutil/+bug/1786562 seems to 
be the same issue, it concerns the same error message with an Intel 
S2600STB.



PS: maybe there is also an upstream ipmiutil bug to report here? "length 
12 is less than minimum 12" sounds like an off-by-one error.


Kind regards,

Louis



Bug#1038818: grub-pc: Empty device (??? MB) in list during postinst with ZFS on / or /boot

2023-06-21 Thread Louis Sautier

Package: grub-pc
Version: 2.06-13

Hi,
When ZFS is used for the / or /boot partitions on legacy boot servers 
(with vdevs using more than one disk/partition), grub-pc's postinst 
shows this during the configure phase:


    ┌──┤ Configuring grub-pc ├──┐
    │ GRUB install devices: │
    │   │
    │  [*] /dev/sda (240057 MB; SAMSUNG_MZ7LM240HMHQ-5) │
    │  [*] /dev/sdb (240057 MB; SAMSUNG_MZ7LM240HMHQ-5) │
    │  [ ] (??? MB; ???)    │
    │  [ ] /dev/md2 (1071 MB; md2)  │
    │   │
    │   │
    │  │
    │   │
    └───┘

Along with errors:

Unknown device "/dev/": No such device


To reproduce, on a legacy boot server:
* Create a zpool with a mirror vdev using two partitions.
* Create a dataset and use it as /.
* Install Debian Bookworm on the server.
* Run "dpkg-reconfigure grub-pc".

On my test server:
root@localhost:~# df -hT
Filesystem Type  Size  Used Avail Use% Mounted on
udev   devtmpfs  7.8G 0  7.8G   0% /dev
tmpfs  tmpfs 1.6G  820K  1.6G   1% /run
zp0/zd0    zfs   216G  1.5G  214G   1% /
tmpfs  tmpfs 7.8G 0  7.8G   0% /dev/shm
tmpfs  tmpfs 5.0M 0  5.0M   0% /run/lock
/dev/md2   ext4  988M   66M  855M   8% /boot
tmpfs  tmpfs 1.6G 0  1.6G   0% /run/user/1000
root@localhost:~# lsblk -f
NAME    FSTYPE    FSVER    LABEL 
UUID FSAVAIL FSUSE% MOUNTPOINTS

sda
├─sda1
├─sda2  linux_raid_member 1.2  md2 
371c03a8-59c4-5eb7-7fda-f6f5497ae463
│ └─md2 ext4  1.0  boot 
197e5606-e75e-4809-87f9-13fbc42ed36d  854.7M 7% /boot

├─sda3  zfs_member    5000 zp0 8324519610909055574
├─sda4  swap  1    swap-sda4 
08d6106d-f50d-4391-8515-5b2b8ffa1d6b    [SWAP]

└─sda5  iso9660   Joliet Extension config-2 2023-06-16-18-40-55-00
sdb
├─sdb1
├─sdb2  linux_raid_member 1.2  md2 
371c03a8-59c4-5eb7-7fda-f6f5497ae463
│ └─md2 ext4  1.0  boot 
197e5606-e75e-4809-87f9-13fbc42ed36d  854.7M 7% /boot

├─sdb3  zfs_member    5000 zp0 8324519610909055574
└─sdb4  swap  1    swap-sdb4 
55b30849-9949-4da0-9738-43e04b40a220    [SWAP]

root@localhost:~# zfs list
NAME  USED  AVAIL REFER  MOUNTPOINT
zp0  1.42G   214G   24K  none
zp0/zd0  1.42G   214G 1.42G  /
root@localhost:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP    
HEALTH  ALTROOT
zp0    222G  1.42G   221G    - - 0% 0% 1.00x    
ONLINE  -

root@localhost:~# zpool status
  pool: zp0
 state: ONLINE
config:

    NAME    STATE READ WRITE CKSUM
    zp0 ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
    sda3    ONLINE   0 0 0
    sdb3    ONLINE   0 0 0

errors: No known data errors
root@localhost:~# cat /etc/fstab
UUID=197e5606-e75e-4809-87f9-13fbc42ed36d    /boot    ext4  defaults   
 0    0
UUID=08d6106d-f50d-4391-8515-5b2b8ffa1d6b    swap    swap  defaults   
 0    0
UUID=55b30849-9949-4da0-9738-43e04b40a220    swap    swap  defaults   
 0    0

root@localhost:~# dpkg-reconfigure grub-pc
# With set -x / set +x added around line 281 of 
/var/lib/dpkg/info/grub-pc.postinst

+++ grub-probe -t device /
++ partition='/dev/sda3
/dev/sdb3'
++ set +x
+++ grub-probe -t device /boot
++ partition=/dev/md2
++ set +x
+++ grub-probe -t device /boot/grub
++ partition=/dev/md2
++ set +x
Unknown device "/dev/": No such device
Unknown device "/dev/": No such device
Unknown device "/dev/": No such device
Unknown device "/dev/": No such device
grub-pc: Running grub-install ...
Installing for i386-pc platform.

Installation finished. No error reported.
  grub-install success for /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
  grub-install success for /dev/sdb
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.1.0-9-amd64
Found initrd image: /boot/initrd.img-6.1.0-9-amd64
done

My understanding is that the following happens:

* usable_partitions is called: 
https://salsa.debian.org/grub-team/grub/-/blob/debian/2.06-13/debian/postinst.in#L275
* partition="$(grub-probe -t device "$path" || true)" is called: 
https://salsa.debian.org/grub-team/grub/-/blob/debian/2.06-13/debian/postinst.in#L281

* It returns this:
  # grub-probe -t device /
  /dev/sda3
  /dev/sdb3
* partition_id="$(device_to_id "$partition" || true)" 

Bug#1003528: zfsutils-linux: datasets with mountpoint=legacy defined in fstab prevent the system from booting

2022-01-11 Thread Louis Sautier

Package: zfsutils-linux
Version: 2.0.3-9
|Severity: important|

Hi,
Currently, creating a ZFS dataset with mountpoint=legacy and adding it to 
/etc/fstab as auto causes the system to hang at boot because the mount is 
attempted before the pool has been imported or the ZFS module loaded.
How to reproduce:
apt install zfsutils-linux
modprobe zfs
zpool create tank sdb
zfs set mountpoint=legacy tank
echo "tank /mnt zfs defaults 0 0" >> /etc/fstab
reboot

The boot process will hang because /mnt did not mount properly. "journalctl -b -u 
mnt.mount" shows the following (I don't know why the dates aren't sorted):
Jan 11 11:41:16 localhost mount[702]: The ZFS modules are not loaded.
Jan 11 11:41:16 localhost mount[702]: Try running '/sbin/modprobe zfs' as root 
to load them.
Jan 11 11:41:13 localhost systemd[1]: Mounting /mnt...
Jan 11 11:41:14 localhost systemd[1]: mnt.mount: Mount process exited, 
code=exited, status=2/INVALIDARGUMENT
Jan 11 11:41:14 localhost systemd[1]: mnt.mount: Failed with result 'exit-code'.
Jan 11 11:41:14 localhost systemd[1]: Failed to mount /mnt.

"journalctl -b -u zfs-load-module.service" shows that the module was loaded 
afterwards:
Jan 11 11:41:19 localhost systemd[1]: Starting Install ZFS kernel module...
Jan 11 11:41:19 localhost systemd[1]: Finished Install ZFS kernel module.

The same goes for "journalctl -b -u zfs-import-cache.service":
Jan 11 11:41:19 localhost systemd[1]: Starting Import ZFS pools by cache file...
Jan 11 11:41:19 localhost systemd[1]: Finished Import ZFS pools by cache file.


I worked around the problem by making sure the zfs-import.target (used by 
zfs-import-{cache,scan}.service) is active before mounts are attempted.
Contents of /etc/systemd/system/zfs-import.target.d/override.conf
[Unit]
Before=local-fs-pre.target

Should Debian edit /lib/systemd/system/zfs-import.target to include this? 
Should I report this bug upstream? Are there any dependency loop risks I might 
have overlooked?

Kind regards,

Louis



OpenPGP_signature
Description: OpenPGP digital signature


Bug#974563: Security update of pacemaker

2020-12-28 Thread Louis Sautier

On 28/12/2020 00:24, Markus Koschany wrote:

Hello,

I have prepared a new security update of pacemaker, the latest version in the
1.1.x series. The update will fix CVE-2018-16877, CVE-2018-16878 and CVE-2020-
25654. I would appreciate it if you could test this version before it is
uploaded to stretch-security again. You can find all Debian packages at

https://people.debian.org/~apo/lts/pacemaker/

including the source package if you prefer to compile pacemaker from source.

If I don't get any negative feedback I intend to upload pacemaker 1.1.24-
0+deb9u1 on 06.01.2021.

Regards,

Markus


Hi Markus,
Thanks for letting us know beforehand. I have installed version 
1.1.24-0+deb9u0 and it seems to work fine.


Kind regards,

Louis



OpenPGP_signature
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-18 Thread Louis Sautier
On 17/11/2020 18:41, Alejandro Taboada wrote:
> Thank you Markus,
> 
> I just updated deb9u2 and works fine. Let me know when you have new updates 
> and I can test this thing.
> 
> Regards,
> Alejandro
> 
>> On 17 Nov 2020, at 05:16, Markus Koschany  wrote:
>>
>> Control: severity -1 normal
>>
>> Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada:
>>> Hi Markus,
>>>
>>> Sorry for the delay. With this patch works when is applied only to 1 node.
>>> The services restart and the arm resources are up.
>>> The problem appears again when I install the patch on a 2nd node. The the
>>> resources stopped again.
>>
>> Hello Alejandro,
>>
>> thanks for your feedback. At the moment I cannot reproduce the problem hence 
>> I
>> have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of
>> pacemaker to stretch-security which restores the old behavior. The regression
>> tests shipped with pacemaker also don't report anything unusual. I will keep
>> this bug report open for discussions and work on another update. This time I
>> intend to upgrade pacemaker to the latest upstream release in the 1.1.x 
>> branch
>> which is currently 1.1.24~rc1. This one also includes fixes for 
>> CVE-2018-16878
>> and CVE-2018-16877. I expect no big changes in terms of existing features 
>> but I
>> will send new packages for testing before I upload a new upstream release. 
>>
>> Regards,
>>
>> Markus
> 
> 
I can confirm that 1.1.16-1+deb9u2 works as expected, thanks for the fix.

Kind regards,

Louis



signature.asc
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-13 Thread Louis Sautier
On 13/11/2020 12:23, Alejandro Taboada wrote:
> Maybe Corocync is not using peer communication? Could you check someway the 
> packet source address .. if it’s form localhost just allow, other check 
> permissions
> I know is not ideal but will solve a tot of production issues in the 
> meanwhile.
> 
> 
>> On 12 Nov 2020, at 23:20, Alejandro Taboada  
>> wrote:
>>
>> 
> 
> 
I'm not sure I understand what we need to look for.

Aren't they communicating via UNIX sockets from abstract namespaces
(@cib_rw@, @attrd@, etc.) ? That's what I see when I strace calls to
"crm resource cleanup " which also fails with the patched version.



signature.asc
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Louis Sautier
Package: pacemaker
Version: 1.1.16-1+deb9u1
Severity: grave
X-Debbugs-CC: a...@debian.org

Hi,
I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of
unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1
fixed it immediately).
The logs contain a lot of warnings that seem to point to a permission
problem, such as "Rejecting IPC request 'lrmd_rsc_info' from
unprivileged client crmd". I am not using ACLs so the patch should not
impact my system.

Here is an excerpt from the logs after the upgrade:
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_PENDING
Nov 12 06:26:05 cluster-1 attrd[20866]:   notice: Defaulting to uname -n
for the local corosync node name
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_register' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not add resource
service to LRM cluster-1
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Invalid resource
definition for service
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input 
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input   
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input   
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input 
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Resource service no
longer exists in the lrmd
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Result of probe
operation for service on cluster-1: Error
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Input I_FAIL received
in state S_NOT_DC from get_lrm_resource
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_RECOVERY
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Fast-tracking shutdown
in response to errors
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Input I_TERMINATE
received in state S_RECOVERY from do_recover
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from the LRM
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from Corosync
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not recover from
internal error
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:error: The crmd process
(20868) exited: Generic Pacemaker error (201)
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:   notice: Respawning failed
child process: crmd

My corosync.conf is quite standard:
totem {
version: 2
cluster_name: debian
token: 0
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
crypto_cipher: aes256
crypto_hash: sha256
interface {
ringnumber: 0
bindnetaddr: xxx
mcastaddr: yyy
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}

So is my crm configuration:
node xxx: cluster-1 \
attributes standby=off
node xxx: cluster-2 \
attributes standby=off
primitive service systemd:service \
meta failure-timeout=30 \
op monitor interval=5 on-fail=restart timeout=15s
primitive vip-1 IPaddr2 \
params ip=xxx cidr_netmask=32 \
op monitor interval=10s
primitive vip-2 IPaddr2 \
params ip=xxx cidr_netmask=32 \
op monitor interval=10s
clone clone_service service
colocation service_vip-1 inf: vip-1 clone_service
colocation service_vip-2 inf: vip-2 clone_service
order kot_before_vip-1 inf: clone_service vip-1
order kot_before_vip-2 inf: clone_service vip-2
location prefer-cluster1-vip-1 vip-1 1: cluster-1
location prefer-cluster2-vip-2 vip-2 1: cluster-2
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \