git clone https://git.launchpad.net/ubuntu/+source/ceph
cd ceph

> git grep -n MDSMap::decode
src/mds/FSMap.cc:1086:   * Insert INLINE; see comment in MDSMap::decode.
src/mds/MDSMap.cc:836:void MDSMap::decode(bufferlist::const_iterator& p)

So we're interested in src/mds/MDSMap.cc (if the file was not renamed
and the function was not moved).

Let's get the file for 2 different revisions, extract MDSMap::decode()
function from both and then compare to see the difference.

> git tag | grep 19.2.0-0ubuntu0.24.04.1
applied/19.2.0-0ubuntu0.24.04.1
import/19.2.0-0ubuntu0.24.04.1
> git show applied/19.2.0-0ubuntu0.24.04.1:src/mds/MDSMap.cc > 
> /tmp/MDSMap.cc.new

The old version is 19.2.0~git20240301.4c76c50-0ubuntu6, the closest tag
(by name) in the repo is applied/19.2.0_git20240301.4c76c50-0ubuntu6:

> git show applied/19.2.0_git20240301.4c76c50-0ubuntu6:src/mds/MDSMap.cc
> /tmp/MDSMap.cc.old

After running diff for the files we see that both encode and decode
functions were changed. This is the relevant part for the decode
function:

> diff -u /tmp/MDSMap.cc.old /tmp/MDSMap.cc.new
...
@@ -852,7 +863,8 @@
     decode(cas_pool, p);
   }
 
-  // kclient ignores everything from here
+  // kclient skips most of what's below
+  // see fs/ceph/mdsmap.c for current decoding
   __u16 ev = 1;
   if (struct_v >= 2)
     decode(ev, p);
@@ -949,11 +961,16 @@
   }
 
   if (ev >= 17) {
-    decode(max_xattr_size, p);
+    decode(bal_rank_mask, p);
   }
 
   if (ev >= 18) {
-    decode(bal_rank_mask, p);
+    decode(max_xattr_size, p);
+  }
+
+  if (ev >= 19) {
+    decode(qdb_cluster_leader, p);
+    decode(qdb_cluster_members, p);
   }
 
   /* All MDS since at least v14.0.0 understand INLINE */

We see that the order of fields and the number of fields changed in the
decode() function, and it doesn't seem to be an error handling for the
cases when the format is incorrect.

Now let's explore the binary to see where exactly is the panic in
MDSMap::decode().

We have ceph-mon binary extracted earlier. We could load it in gdb,
which should provide disassembled versions of the functions. We could
also try to load debuginfo and put the source tree at the right place to
get even better symbols and source references.

> gdb ./usr/bin/ceph-mon
...
This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
...
(gdb) start
Downloading source file /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc
Temporary breakpoint 1 at 0x32c670: file 
/usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc, line 250.
...
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdf98)
    at /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc:250
warning: 250    /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc: No such 
file or directory
(gdb)

Now we know that it's looking for the source tree in
/usr/src/ceph-19.2.0-0ubuntu0.24.04.1/. Let's put the tree there (you
may need to add "deb-src" after "deb" (so it becomes "deb deb-src") in
/etc/apt/sources.list.d/ubuntu.sources):

> cd /usr/src/
> sudo apt source ceph

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2089565

Title:
  MON and MDS crash upgrading  CEPH  on ubuntu 24.04 LTS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to