Public bug reported: It is impossible to run ceph cluster built of nodes over different architectures. Specifically, when monitor node is built on s390 box, and at least one OSD node is built on x86_64 box. Or vice versa: if monitor node is deployed on x86_64, and OSD node on s390.
How to reproduce: Build a hybrid ceph cluster. Bootstrap a monitor node over s390 machine and a number of OSD nodes. At least one of OSD nodes should be built on a box of different architecture (x86_64). Start a ceph-manager and a monitor daemon on the monitor node. Then start the OSD daemon on the OSD node over x86_64. The OSD daemon crashes. The problem was investigated with the following hardware and software of the same version on all nodes: Monitor node: s390 OSD node: x86_64 ceph git repository with the top in cda8c83330ed35e47266feb35e4e3f960ad10c40 (octopus) plus a separate applied patch at commit 31da17378b712542e915adbf4084e0212b8bb61 Immediate reason of the crash was identified as checksums mismatch at ./src/osd/OSD.cc:7915 (OSD::handle_osd_map() To find the reason of the checksums mismatch we made snapshots of serialized OSD maps sent by monitor node and received by OSD node by the following way (all steps are in chronological order): 1. Start a monitor (on s390 box) gdb /usr/bin/ceph-mon (gdb) b ./src/osd/OSDMap.cc:3064 (OSDMap::encode()) (gdb) run -f --cluster ceph --id node1 --setuser > log 2. Start OSD daemon (on x86 box) gdb ceph-osd (gdb) b ./src/osd/OSD.cc:7915 (OSD::handle_osd_map()) (gdb) run -f --no-mon-config --cluster ceph --id 2 > log 3. Monitor breaks: (gdb) p crc $1 = 4088430296 (gdb) p bl.hexdump(cout, false) Output stored in the file log.4088430296-mon (gdb) c 4. OSD breaks: (gdb) p inc.full_crc $1 = 4088430296 (gdb) p fbl.hexdump(cout, false) Outpuf stored in the file log.4088430296-osd (gdb) c [...] Segmentation fault (because of checksums mismatch (at ./src/osd/OSD.cc:7915) Then snapshot were compared with the following result: diff -u log.4088430296-mon log.4088430296-osd --- log.4088430296-mon 2020-08-31 14:11:47.235992942 +0000 +++ log.4088430296-osd 2020-08-31 09:13:23.093178289 +0000 @@ -226,5 +226,5 @@ 00000e10 94 ac 12 0f 15 00 00 00 00 00 00 00 00 01 01 01 |................| 00000e20 1c 00 00 00 01 00 00 00 3a 04 00 00 10 00 00 00 |........:.......| 00000e30 02 00 1a 95 ac 12 0f 15 00 00 00 00 00 00 00 00 |................| -00000e40 3f 59 99 9a 3f 73 33 33 3f 66 66 66 0a 0f 00 00 |?Y..?s33?fff....| -00000e50 00 00 00 00 00 00 00 00 00 00 00 00 02 02 |..............| \ No newline at end of file +00000e40 bf 80 00 00 bf 80 00 00 bf 80 00 00 0a 0f 00 00 |................| +00000e50 00 00 00 00 00 00 00 00 00 00 2c 10 5b 13 |..........,.[.| \ No newline at end of file After comparing snapshots we can see that 12 bytes at offset 00000e40 and last 4 bytes at offset (00000e50 + 10) differ. Last 34 bytes started at offset 00000e40 I identified as following: mon (s390) osd (x86) nearfull_ratio 3f 59 99 9a bf 80 00 00 full_ratio 3f 73 33 33 bf 80 00 00 backfillfull_ratio 3f 66 66 66 bf 80 00 00 require_min_compat_client 0a 0a require_osd_release 0f 0f removed_snaps_queue 00 00 00 00 00 00 00 00 crush_node_flags 00 00 00 00 00 00 00 00 device_class_flags 00 00 00 00 00 00 00 00 crc: 00 00 02 02 2c 10 5b 13 ------------- nearfull_ratio 4 float full_ratio 4 float backfillfull_ratio 4 float require_min_compat_client 1 ceph_release_t require_osd_release 1 ceph_release_t removed_snaps_queue 8 mempool::osdmap::map<int64_t, snap_interval_set_t> crush_node_flags 4 mempool::osdmap::map<int32_t,uint32_t> device_class_flags 4 mempool::osdmap::map<int32_t,uint32_t> crc 4 uint32_t -------------- Difference in the following fields is the concern: . nearfull_ratio . full_ratio . backfillfull_ratio On x86 we can see bf 80 00 00 in all three fields. It is interesting that those bytes happen to be the **big-endian** encoding of the floating point value -1, even though we're here on the little-endian system. This allows to make an assumption that the underlying bug is that floating-point values are not appropriately byte-swapped when encoded into a buffer on a big-endian machine. Checking sources at src/include/encoding.h: ... WRITE_RAW_ENCODER(float) WRITE_RAW_ENCODER(double) ... confirms that assumption. Indeed, the "RAW_ENCODER" just dumps bytes without conversion. This should at least to be an endian byte swap. The fixup from Ulrich Weigand implements the endian-swap when encoding/decoding floating-point types and fixes the problem. Sent to upstream: https://github.com/ceph/ceph/pull/36992 As mentioned above, the fix for this issue landed upstream at PR: https://github.com/ceph/ceph/pull/36992 which was backported to Octopus (15.2.x) release at PR: https://github.com/ceph/ceph/pull/37032 This backported patch seems to be applied cleanly in ceph-15.2.3 at focal-updates git tree at : https://git.launchpad.net/ubuntu/+source/ceph/log/?h=applied/ubuntu /focal-updates Please apply the backported patch to this tree. Thanks. ** Affects: ceph (Ubuntu) Importance: Undecided Assignee: Skipper Bug Screeners (skipper-screen-team) Status: New ** Tags: architecture-s39064 bugnameltc-188046 severity-high targetmilestone-inin2004 ** Tags added: architecture-s39064 bugnameltc-188046 severity-high targetmilestone-inin2004 ** Changed in: ubuntu Assignee: (unassigned) => Skipper Bug Screeners (skipper-screen-team) ** Package changed: ubuntu => ceph (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900691 Title: [Ubuntu 20.04] ceph: include/encoding: Fix encode/decode of float types on big-endian systems To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1900691/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs