** Description changed: - This issue in the Ceph tracker has been encountered repeatedly with significant adverse effects on Ceph 12.2.11/12 in Bionic: - https://tracker.ceph.com/issues/38454 + [Impact] + Cancelling large S3/Swift object puts may result in garbage collection entries with zero-length chains. Rados gateway garbage collection does not efficiently process and clean up these zero-length chains. - This PR is the likely candidate for backporting to correct the issue: - https://github.com/ceph/ceph/pull/26601 + A large number of zero-length chains will result in rgw processes + quickly spinning through the garbage collection lists doing very little + work. This can result in abnormally high cpu utilization and op + workloads. + + [Test Case] + Disable garbage collection: + `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}'` + + Repeatedly kill 256MB object put requests for randomized object names. + `for i in {0.. 1000}; do f=$(mktemp); fallocate -l 256M $f; s3cmd put $f s3://test_bucket &; pid=$!; sleep $((RANDOM % 3)); kill $pid; rm $f; done` + + Capture omap detail. Verify zero-length chains were created: + `for i in $(seq 0 ${RGW_GC_MAX_OBJS:-32}); do rados -p default.rgw.log --namespace gc listomapvals gc.$i; done` + + Raise radosgw debug levels, and enable garbage collection: + `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}' loglevel=20` + + Verify zero-lenth chains are processed correctly by inspecting radosgw + logs. + + [Regression Potential] + {Pending} Back-port still needs to be accepted upstream. Need complete fix to assess regression potential. + + [Other Information] + This issue has been reported upstream [0] and was fixed in Nautilus alongside a number of other garbage collection issues/enhancements in pr#26601 [1]: + * adds additional logging to make future debugging easier. + * resolves bug where the truncated flag was not always set correctly in gc_iterate_entries + * resolves bug where marker in RGWGC::process was not advanced + * resolves bug in which gc entries with a zero-length chain were not trimmed + * resolves bug where same gc entry tag was added to list for deletion multiple times + + These fixes were slated for back-port into Luminous and Mimic, but the + Luminous work was not completed because of a required dependency: AIO GC + [2]. This dependency has been resolved upstream, and is pending SRU + verification in Ubuntu packages [3]. + + [0] https://tracker.ceph.com/issues/38454 + [1] https://github.com/ceph/ceph/pull/26601 + [2] https://tracker.ceph.com/issues/23223 + [3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858
** Also affects: cloud-archive Importance: Undecided Status: New ** Summary changed: - Need backport of 0-length gc chain fixes to Luminous + Backport of zero-length gc chain fixes to Luminous -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1843085 Title: Backport of zero-length gc chain fixes to Luminous To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1843085/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs