[Bug 2078906] Re: Prevent race condition when printing Inode in ll_sync_inode

Seyeong Kim Thu, 25 Sep 2025 08:45:14 -0700

As the symptom was not clearly observable in the operating environment,
the following verification was performed.


# juju status
Model  Controller    Cloud/Region  Version  SLA          Timestamp
ceph   maas-default  maas/default  2.9.52   unsupported  12:10:29Z

App       Version  Status  Scale  Charm     Channel        Rev  Exposed  Message
ceph-fs   17.2.7   active      1  ceph-fs   quincy/stable  194  no       Unit 
is ready
ceph-mon  17.2.7   active      1  ceph-mon  quincy/stable  388  no       Unit 
is ready and clustered
ceph-osd  17.2.7   active      3  ceph-osd  quincy/stable  753  no       Unit 
is ready (1 OSD)
ubuntu             active      3  ubuntu    stable          26  no

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-fs/0*   active    idle   7        10.0.0.142             Unit is ready
ceph-mon/0*  active    idle   3        10.0.0.138             Unit is ready and 
clustered
ceph-osd/0   active    idle   0        10.0.0.128             Unit is ready (1 
OSD)
ceph-osd/1*  active    idle   1        10.0.0.129             Unit is ready (1 
OSD)
ceph-osd/2   active    idle   2        10.0.0.130             Unit is ready (1 
OSD)
ubuntu/0*    active    idle   4        10.0.0.139
ubuntu/1     active    idle   5        10.0.0.140
ubuntu/2     active    idle   6        10.0.0.141

Machine  State    Address     Inst id  Series  AZ       Message
0        started  10.0.0.128  node-28  jammy   default  Deployed
1        started  10.0.0.129  node-29  jammy   default  Deployed
2        started  10.0.0.130  node-30  jammy   default  Deployed
3        started  10.0.0.138  node-38  jammy   default  Deployed
4        started  10.0.0.139  node-39  jammy   default  Deployed
5        started  10.0.0.140  node-40  jammy   default  Deployed
6        started  10.0.0.141  node-41  jammy   default  Deployed
7        started  10.0.0.142  node-42  jammy   default  Deployed


The ceph-mon/0 unit was running both the Ceph monitor and an NFS-Ganesha setup.

On ubuntu/0,1,2, the NFS-Ganesha export was mounted at /mnt/nfs.

root@node-39:/mnt/nfs# mount                                                    
                         
...                                                                           
10.0.0.138:/cephfs on /mnt/nfs type nfs4 
(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,har
d,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.139,local_lock=none,addr=10.0.0.138)
           
root@node-39:/mnt/nfs# ls                                                       
                         
stress.sh  stressdir  test                                                      
                         

After running a large number of operations (touch, chmod, and others
[1]), ubuntu/0,1,2 became unresponsive after several minutes. The `ls`
command no longer worked under /mnt/nfs on any of the nodes. I had to
reboot 3 nodes.

I then upgraded Ceph using the proposed packages. Because ceph-mon and
NFS-Ganesha were running on the same node, I upgraded all Ceph
components together.

I then ran the same script for over 15 minutes, and it worked without
issues.

I believe this can be considered a proper verification.

ceph-mon
ubuntu@node-38:~$ dpkg -l | grep ceph
ii  ceph                                   17.2.9-0ubuntu0.22.04.1              
   amd64        distributed storage and file system
ii  ceph-base                              17.2.9-0ubuntu0.22.04.1              
   amd64        common ceph daemon libraries and management tools
ii  ceph-common                            17.2.9-0ubuntu0.22.04.1              
   amd64        common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-mds                               17.2.9-0ubuntu0.22.04.1              
   amd64        metadata server for the ceph distributed file system
ii  ceph-mgr                               17.2.9-0ubuntu0.22.04.1              
   amd64        manager for the ceph distributed file system
ii  ceph-mgr-modules-core                  17.2.9-0ubuntu0.22.04.1              
   all          ceph manager modules which are always enabled
ii  ceph-mon                               17.2.9-0ubuntu0.22.04.1              
   amd64        monitor server for the ceph storage system
ii  ceph-osd                               17.2.9-0ubuntu0.22.04.1              
   amd64        OSD server for the ceph storage system
ii  ceph-volume                            17.2.9-0ubuntu0.22.04.1              
   all          tool to facilidate OSD deployment
ii  libcephfs2                             17.2.9-0ubuntu0.22.04.1              
   amd64        Ceph distributed file system client library
ii  libsqlite3-mod-ceph                    17.2.9-0ubuntu0.22.04.1              
   amd64        SQLite3 VFS for Ceph
ii  nfs-ganesha-ceph:amd64                 3.5-1ubuntu1                         
   amd64        nfs-ganesha fsal ceph libraries
ii  python3-ceph-argparse                  17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 utility libraries for Ceph CLI
ii  python3-ceph-common                    17.2.9-0ubuntu0.22.04.1              
   all          Python 3 utility libraries for Ceph
ii  python3-cephfs                         17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 libraries for the Ceph libcephfs library

ceph-osd 1,2,3
ubuntu@node-28:~$ dpkg -l | grep ceph
ii  ceph                                   17.2.9-0ubuntu0.22.04.1              
   amd64        distributed storage and file system
ii  ceph-base                              17.2.9-0ubuntu0.22.04.1              
   amd64        common ceph daemon libraries and management tools
ii  ceph-common                            17.2.9-0ubuntu0.22.04.1              
   amd64        common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-mds                               17.2.9-0ubuntu0.22.04.1              
   amd64        metadata server for the ceph distributed file system
ii  ceph-mgr                               17.2.9-0ubuntu0.22.04.1              
   amd64        manager for the ceph distributed file system
ii  ceph-mgr-modules-core                  17.2.9-0ubuntu0.22.04.1              
   all          ceph manager modules which are always enabled
ii  ceph-mon                               17.2.9-0ubuntu0.22.04.1              
   amd64        monitor server for the ceph storage system
ii  ceph-osd                               17.2.9-0ubuntu0.22.04.1              
   amd64        OSD server for the ceph storage system
ii  ceph-volume                            17.2.9-0ubuntu0.22.04.1              
   all          tool to facilidate OSD deployment
ii  libcephfs2                             17.2.9-0ubuntu0.22.04.1              
   amd64        Ceph distributed file system client library
ii  libsqlite3-mod-ceph                    17.2.9-0ubuntu0.22.04.1              
   amd64        SQLite3 VFS for Ceph
ii  python3-ceph-argparse                  17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 utility libraries for Ceph CLI
ii  python3-ceph-common                    17.2.9-0ubuntu0.22.04.1              
   all          Python 3 utility libraries for Ceph
ii  python3-cephfs                         17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 libraries for the Ceph libcephfs library

ceph-fs
ubuntu@node-42:~$ dpkg -l | grep ceph
ii  ceph-base                              17.2.9-0ubuntu0.22.04.1              
   amd64        common ceph daemon libraries and management tools
ii  ceph-common                            17.2.9-0ubuntu0.22.04.1              
   amd64        common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-mds                               17.2.9-0ubuntu0.22.04.1              
   amd64        metadata server for the ceph distributed file system
ii  libcephfs2                             17.2.9-0ubuntu0.22.04.1              
   amd64        Ceph distributed file system client library
ii  python3-ceph-argparse                  17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 utility libraries for Ceph CLI
ii  python3-ceph-common                    17.2.9-0ubuntu0.22.04.1              
   all          Python 3 utility libraries for Ceph
ii  python3-cephfs                         17.2.9-0ubuntu0.22.04.1              
   amd64        Python 3 libraries for the Ceph libcephfs library

root@node-38:/home/ubuntu# ceph status                  
  cluster:                                              
    id:     29be4eca-99eb-11f0-a698-b14e1333ce91        
    health: HEALTH_WARN                                 
            1 pool(s) do not have an application enabled
                                                        
  services:                                             
    mon: 1 daemons, quorum node-38 (age 13m)            
    mgr: node-38(active, since 13m)                     
    mds: 1/1 daemons up                                 
    osd: 3 osds: 3 up (since 11m), 3 in (since 3h)      
                                                        
  data:                                                 
    volumes: 1/1 healthy                                
    pools:   4 pools, 113 pgs                           
    objects: 693 objects, 192 MiB                       
    usage:   4.3 GiB used, 131 GiB / 135 GiB avail      
    pgs:     113 active+clean                           
                                                        
  io:                                                   
    client:   2.0 MiB/s wr, 0 op/s rd, 173 op/s wr      
                                                        

[1]
# script
#!/usr/bin/env bash
set -euo pipefail

# config
: "${MOUNT:=/mnt/nfs}"
: "${STRESSDIR:=${MOUNT}/stressdir}"
: "${N_LINKS:=20000}"

ts() { date '+%F %T'; }
log() { printf '%s %s\n' "$(ts)" "$*"; }

need_cmds() {
  command -v dd >/dev/null || { echo "dd missing"; exit 1; }
  command -v truncate >/dev/null || { echo "truncate missing"; exit 1; }
  command -v shuf >/dev/null || { echo "shuf missing"; exit 1; }
  command -v setfattr >/dev/null || true
}

init() {
  log "[init] STRESSDIR=$STRESSDIR N_LINKS=$N_LINKS"
  mkdir -p "$STRESSDIR"
  cd "$STRESSDIR"

  # cleanup
  pkill -f "dd if=/dev/zero of=${STRESSDIR}/seed" 2>/dev/null || true
  pkill -f "truncate -s 0 ${STRESSDIR}/seed" 2>/dev/null || true
  find . -maxdepth 1 -type f -name 'h_*' -print0 | xargs -0 -r rm -f

  : > seed

  log "[init] creating $N_LINKS hardlinks..."
  for ((i=1;i<=N_LINKS;i++)); do
    ln seed "h_${i}" 2>/dev/null || true
  done
  log "[init] done"
  ulimit -c unlimited || true
}

run() {
  cd "$STRESSDIR"
  log "[run] starting workers"

  # write stress
  ( while true; do
      dd if=/dev/zero of=seed bs=1M count=64 oflag=direct conv=notrunc 
status=none 2>/dev/null || true
    done ) &

  # size stress
  ( while true; do
      truncate -s 0 seed 2>/dev/null || true
      truncate -s 104857600 seed 2>/dev/null || true
    done ) &

  # perm stress
  ( while true; do
      chmod 0600 seed 2>/dev/null || true
      chmod 0644 seed 2>/dev/null || true
    done ) &

  # mtime stress
  ( while true; do
      touch -m seed 2>/dev/null || true
    done ) &

  # xattr stress
  if command -v setfattr >/dev/null; then
    ( while true; do
        setfattr -n user.t -v "$(date +%s%N)" seed 2>/dev/null || true
      done ) &
  fi

  # link stress
  for j in $(seq 1 64); do
    (
      while true; do
        for k in $(shuf -i 1-"$N_LINKS" -n 1000); do
          touch -m "h_${k}" 2>/dev/null || true
          chmod 0600 "h_${k}" 2>/dev/null || true
          chmod 0644 "h_${k}" 2>/dev/null || true
          if command -v setfattr >/dev/null; then
            setfattr -n user.t -v "$RANDOM" "h_${k}" 2>/dev/null || true
          fi
        done
      done
    ) &
  done

  log "[run] workers started (pids: $(jobs -p | xargs echo))"
}

stop() {
  pkill -f "dd if=/dev/zero of=${STRESSDIR}/seed" 2>/dev/null || true
  pkill -f "truncate -s 0 ${STRESSDIR}/seed" 2>/dev/null || true
  pkill -f "chmod 0600 seed" 2>/dev/null || true
  pkill -f "chmod 0644 seed" 2>/dev/null || true
  pkill -f "touch -m seed" 2>/dev/null || true
  pkill -f "setfattr -n user.t" 2>/dev/null || true
  pkill -f "shuf -i 1-${N_LINKS} -n 1000" 2>/dev/null || true
  log "[stop] workers stopped"
}

clean() {
  stop
  cd "$STRESSDIR" 2>/dev/null || exit 0
  find . -maxdepth 1 -type f -name 'h_*' -print0 | xargs -0 -r rm -f
  rm -f seed
  log "[clean] done"
}

status() {
  echo "STRESSDIR=$STRESSDIR"
  echo "Running PIDs:"; pgrep -fa 'dd if=/dev/zero|truncate -s|touch -m|chmod 
06|setfattr|shuf -i' || true
  echo "Status: $(date '+%F %T')"
}

CMD="${1:-}"
case "$CMD" in
  init)   need_cmds; init ;;
  run)    run ;;
  stop)   stop ;;
  clean)  clean ;;
  status) status ;;
  *)
    echo "Usage: $0 [init|run|stop|clean|status]"
    echo "  (env) MOUNT=$MOUNT N_LINKS=$N_LINKS STRESSDIR=$STRESSDIR"
    exit 1
    ;;
esac

** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done verification-done-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2078906

Title:
  Prevent race condition when printing Inode in ll_sync_inode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2078906/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2078906] Re: Prevent race condition when printing Inode in ll_sync_inode

Reply via email to