Hi all,

I've been observing some strange behavior with my object storage cluster
running Nautilus 14.2.4. We currently have around 1800 buckets (A small
percentage of those buckets are actively used), with a total of 13.86M
objects. We have 20 RGWs right now, 10 for regular S3 access, and 10 for
static sites.

When calling $(radosgw-admin bucket stats), it normally comes back within a
few seconds, usually less than five. This returns stats for all buckets in
the cluster, which we use for accounting.

The strange behavior: Lately we've been observing a gradual increase in
runtime for bucket stats, which in extreme cases can take almost 10 minutes
to return. Things start out fine, and over the course of the week, the
runtime increases. From a few seconds to almost 10 minutes. Restarting all
of the S3 RGWs seems to fix this problem immediately. If we restart all the
radosgw processes, the runtime for bucket stats drops to 3 seconds.

This is odd behavior, and I've found nothing so far that would indicate why
this is happening. There is nothing suspicious in the RGWs, although a
message about aborted mutli-part uploads is in there:

2019-12-02 13:12:52.882 7faa7018f700 0 abort_bucket_multiparts WARNING :
aborted 8553000 incomplete multipart uploads

Otherwise, things look normal. Memory usage is low, CPU load is relatively
low and flat, and the cluster itself is not under heavy load.

Has anyone run into this before?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to