As the symptom was not clearly observable in the operating environment,
the following verification was performed.
# juju status
Model Controller Cloud/Region Version SLA Timestamp
ceph maas-default maas/default 2.9.52 unsupported 12:10:29Z
App Version Status Scale Charm Channel Rev Exposed Message
ceph-fs 17.2.7 active 1 ceph-fs quincy/stable 194 no Unit
is ready
ceph-mon 17.2.7 active 1 ceph-mon quincy/stable 388 no Unit
is ready and clustered
ceph-osd 17.2.7 active 3 ceph-osd quincy/stable 753 no Unit
is ready (1 OSD)
ubuntu active 3 ubuntu stable 26 no
Unit Workload Agent Machine Public address Ports Message
ceph-fs/0* active idle 7 10.0.0.142 Unit is ready
ceph-mon/0* active idle 3 10.0.0.138 Unit is ready and
clustered
ceph-osd/0 active idle 0 10.0.0.128 Unit is ready (1
OSD)
ceph-osd/1* active idle 1 10.0.0.129 Unit is ready (1
OSD)
ceph-osd/2 active idle 2 10.0.0.130 Unit is ready (1
OSD)
ubuntu/0* active idle 4 10.0.0.139
ubuntu/1 active idle 5 10.0.0.140
ubuntu/2 active idle 6 10.0.0.141
Machine State Address Inst id Series AZ Message
0 started 10.0.0.128 node-28 jammy default Deployed
1 started 10.0.0.129 node-29 jammy default Deployed
2 started 10.0.0.130 node-30 jammy default Deployed
3 started 10.0.0.138 node-38 jammy default Deployed
4 started 10.0.0.139 node-39 jammy default Deployed
5 started 10.0.0.140 node-40 jammy default Deployed
6 started 10.0.0.141 node-41 jammy default Deployed
7 started 10.0.0.142 node-42 jammy default Deployed
The ceph-mon/0 unit was running both the Ceph monitor and an NFS-Ganesha setup.
On ubuntu/0,1,2, the NFS-Ganesha export was mounted at /mnt/nfs.
root@node-39:/mnt/nfs# mount
...
10.0.0.138:/cephfs on /mnt/nfs type nfs4
(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,har
d,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.139,local_lock=none,addr=10.0.0.138)
root@node-39:/mnt/nfs# ls
stress.sh stressdir test
After running a large number of operations (touch, chmod, and others
[1]), ubuntu/0,1,2 became unresponsive after several minutes. The `ls`
command no longer worked under /mnt/nfs on any of the nodes. I had to
reboot 3 nodes.
I then upgraded Ceph using the proposed packages. Because ceph-mon and
NFS-Ganesha were running on the same node, I upgraded all Ceph
components together.
I then ran the same script for over 15 minutes, and it worked without
issues.
I believe this can be considered a proper verification.
ceph-mon
ubuntu@node-38:~$ dpkg -l | grep ceph
ii ceph 17.2.9-0ubuntu0.22.04.1
amd64 distributed storage and file system
ii ceph-base 17.2.9-0ubuntu0.22.04.1
amd64 common ceph daemon libraries and management tools
ii ceph-common 17.2.9-0ubuntu0.22.04.1
amd64 common utilities to mount and interact with a ceph storage
cluster
ii ceph-mds 17.2.9-0ubuntu0.22.04.1
amd64 metadata server for the ceph distributed file system
ii ceph-mgr 17.2.9-0ubuntu0.22.04.1
amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 17.2.9-0ubuntu0.22.04.1
all ceph manager modules which are always enabled
ii ceph-mon 17.2.9-0ubuntu0.22.04.1
amd64 monitor server for the ceph storage system
ii ceph-osd 17.2.9-0ubuntu0.22.04.1
amd64 OSD server for the ceph storage system
ii ceph-volume 17.2.9-0ubuntu0.22.04.1
all tool to facilidate OSD deployment
ii libcephfs2 17.2.9-0ubuntu0.22.04.1
amd64 Ceph distributed file system client library
ii libsqlite3-mod-ceph 17.2.9-0ubuntu0.22.04.1
amd64 SQLite3 VFS for Ceph
ii nfs-ganesha-ceph:amd64 3.5-1ubuntu1
amd64 nfs-ganesha fsal ceph libraries
ii python3-ceph-argparse 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.2.9-0ubuntu0.22.04.1
all Python 3 utility libraries for Ceph
ii python3-cephfs 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 libraries for the Ceph libcephfs library
ceph-osd 1,2,3
ubuntu@node-28:~$ dpkg -l | grep ceph
ii ceph 17.2.9-0ubuntu0.22.04.1
amd64 distributed storage and file system
ii ceph-base 17.2.9-0ubuntu0.22.04.1
amd64 common ceph daemon libraries and management tools
ii ceph-common 17.2.9-0ubuntu0.22.04.1
amd64 common utilities to mount and interact with a ceph storage
cluster
ii ceph-mds 17.2.9-0ubuntu0.22.04.1
amd64 metadata server for the ceph distributed file system
ii ceph-mgr 17.2.9-0ubuntu0.22.04.1
amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 17.2.9-0ubuntu0.22.04.1
all ceph manager modules which are always enabled
ii ceph-mon 17.2.9-0ubuntu0.22.04.1
amd64 monitor server for the ceph storage system
ii ceph-osd 17.2.9-0ubuntu0.22.04.1
amd64 OSD server for the ceph storage system
ii ceph-volume 17.2.9-0ubuntu0.22.04.1
all tool to facilidate OSD deployment
ii libcephfs2 17.2.9-0ubuntu0.22.04.1
amd64 Ceph distributed file system client library
ii libsqlite3-mod-ceph 17.2.9-0ubuntu0.22.04.1
amd64 SQLite3 VFS for Ceph
ii python3-ceph-argparse 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.2.9-0ubuntu0.22.04.1
all Python 3 utility libraries for Ceph
ii python3-cephfs 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 libraries for the Ceph libcephfs library
ceph-fs
ubuntu@node-42:~$ dpkg -l | grep ceph
ii ceph-base 17.2.9-0ubuntu0.22.04.1
amd64 common ceph daemon libraries and management tools
ii ceph-common 17.2.9-0ubuntu0.22.04.1
amd64 common utilities to mount and interact with a ceph storage
cluster
ii ceph-mds 17.2.9-0ubuntu0.22.04.1
amd64 metadata server for the ceph distributed file system
ii libcephfs2 17.2.9-0ubuntu0.22.04.1
amd64 Ceph distributed file system client library
ii python3-ceph-argparse 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.2.9-0ubuntu0.22.04.1
all Python 3 utility libraries for Ceph
ii python3-cephfs 17.2.9-0ubuntu0.22.04.1
amd64 Python 3 libraries for the Ceph libcephfs library
root@node-38:/home/ubuntu# ceph status
cluster:
id: 29be4eca-99eb-11f0-a698-b14e1333ce91
health: HEALTH_WARN
1 pool(s) do not have an application enabled
services:
mon: 1 daemons, quorum node-38 (age 13m)
mgr: node-38(active, since 13m)
mds: 1/1 daemons up
osd: 3 osds: 3 up (since 11m), 3 in (since 3h)
data:
volumes: 1/1 healthy
pools: 4 pools, 113 pgs
objects: 693 objects, 192 MiB
usage: 4.3 GiB used, 131 GiB / 135 GiB avail
pgs: 113 active+clean
io:
client: 2.0 MiB/s wr, 0 op/s rd, 173 op/s wr
[1]
# script
#!/usr/bin/env bash
set -euo pipefail
# config
: "${MOUNT:=/mnt/nfs}"
: "${STRESSDIR:=${MOUNT}/stressdir}"
: "${N_LINKS:=20000}"
ts() { date '+%F %T'; }
log() { printf '%s %s\n' "$(ts)" "$*"; }
need_cmds() {
command -v dd >/dev/null || { echo "dd missing"; exit 1; }
command -v truncate >/dev/null || { echo "truncate missing"; exit 1; }
command -v shuf >/dev/null || { echo "shuf missing"; exit 1; }
command -v setfattr >/dev/null || true
}
init() {
log "[init] STRESSDIR=$STRESSDIR N_LINKS=$N_LINKS"
mkdir -p "$STRESSDIR"
cd "$STRESSDIR"
# cleanup
pkill -f "dd if=/dev/zero of=${STRESSDIR}/seed" 2>/dev/null || true
pkill -f "truncate -s 0 ${STRESSDIR}/seed" 2>/dev/null || true
find . -maxdepth 1 -type f -name 'h_*' -print0 | xargs -0 -r rm -f
: > seed
log "[init] creating $N_LINKS hardlinks..."
for ((i=1;i<=N_LINKS;i++)); do
ln seed "h_${i}" 2>/dev/null || true
done
log "[init] done"
ulimit -c unlimited || true
}
run() {
cd "$STRESSDIR"
log "[run] starting workers"
# write stress
( while true; do
dd if=/dev/zero of=seed bs=1M count=64 oflag=direct conv=notrunc
status=none 2>/dev/null || true
done ) &
# size stress
( while true; do
truncate -s 0 seed 2>/dev/null || true
truncate -s 104857600 seed 2>/dev/null || true
done ) &
# perm stress
( while true; do
chmod 0600 seed 2>/dev/null || true
chmod 0644 seed 2>/dev/null || true
done ) &
# mtime stress
( while true; do
touch -m seed 2>/dev/null || true
done ) &
# xattr stress
if command -v setfattr >/dev/null; then
( while true; do
setfattr -n user.t -v "$(date +%s%N)" seed 2>/dev/null || true
done ) &
fi
# link stress
for j in $(seq 1 64); do
(
while true; do
for k in $(shuf -i 1-"$N_LINKS" -n 1000); do
touch -m "h_${k}" 2>/dev/null || true
chmod 0600 "h_${k}" 2>/dev/null || true
chmod 0644 "h_${k}" 2>/dev/null || true
if command -v setfattr >/dev/null; then
setfattr -n user.t -v "$RANDOM" "h_${k}" 2>/dev/null || true
fi
done
done
) &
done
log "[run] workers started (pids: $(jobs -p | xargs echo))"
}
stop() {
pkill -f "dd if=/dev/zero of=${STRESSDIR}/seed" 2>/dev/null || true
pkill -f "truncate -s 0 ${STRESSDIR}/seed" 2>/dev/null || true
pkill -f "chmod 0600 seed" 2>/dev/null || true
pkill -f "chmod 0644 seed" 2>/dev/null || true
pkill -f "touch -m seed" 2>/dev/null || true
pkill -f "setfattr -n user.t" 2>/dev/null || true
pkill -f "shuf -i 1-${N_LINKS} -n 1000" 2>/dev/null || true
log "[stop] workers stopped"
}
clean() {
stop
cd "$STRESSDIR" 2>/dev/null || exit 0
find . -maxdepth 1 -type f -name 'h_*' -print0 | xargs -0 -r rm -f
rm -f seed
log "[clean] done"
}
status() {
echo "STRESSDIR=$STRESSDIR"
echo "Running PIDs:"; pgrep -fa 'dd if=/dev/zero|truncate -s|touch -m|chmod
06|setfattr|shuf -i' || true
echo "Status: $(date '+%F %T')"
}
CMD="${1:-}"
case "$CMD" in
init) need_cmds; init ;;
run) run ;;
stop) stop ;;
clean) clean ;;
status) status ;;
*)
echo "Usage: $0 [init|run|stop|clean|status]"
echo " (env) MOUNT=$MOUNT N_LINKS=$N_LINKS STRESSDIR=$STRESSDIR"
exit 1
;;
esac
** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done verification-done-jammy
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2078906
Title:
Prevent race condition when printing Inode in ll_sync_inode
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2078906/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs