Re: [ceph-users] Living with huge bucket sizes
Hello all, I work in the same team as Tyler here, and I can provide more info here.. The cluster is indeed an RGW cluster, with many small (100 KB) objects similar to your use case Bryan. But we have the blind bucket set up with "index_type": 1 for this particular bucket, as we wanted to avoid this bottleneck to begin with (we didn't need listing feature) Would the bucket sharding still be a problem for blind buckets? Mark, would setting logging to 20 give any insights to what threads are doing? Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Living with huge bucket sizes
Bryan, I just went through this myself, also on Hammer, as offline bucket index resharding was backported. I had three buckets with > 10 million objects each, one of them with 30 million. I was experiencing the typical blocked request issue during scrubs, when the placement group containing the bucket index got hit. I solved it in two steps. First, I added an SSD only pool, and moved the bucket index to this new SSD pool. This is an on-line operation. After that was complete I scheduled some downtime (we run a highly available consumer facing website), and made a plan to reshard the bucket indexes. I did some tests with buckets containing 100,000 test objects and found the performance to be satisfactory. Once my maintenance window hit and I stopped all access to RGW, I was able to reshard all my bucket indexes in 20 minutes. I can't remember exact numbers, but I believe I did a 20+ million bucket in about 5 minutes. It was extremely fast, but again I had moved my bucket indexes to an SSD backed pool of fast enterprise SSDs (three hosts, one SSD per host, Samsung 3.84tb PM863a for what it's worth). Once I finished this, all my ceph performance issues disappeared. I'll slowly upgrade my cluster with the end goal of moving to the more efficient bluestore, but I no longer feel the rush. Last detail: I used 100 shards per bucket which seems to be a good compromise. Cullen > Date: Fri, 9 Jun 2017 14:58:41 -0700 > From: Yehuda Sadeh-Weinraub <yeh...@redhat.com> > To: Dan van der Ster <d...@vanderster.com> > Cc: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> > Subject: Re: [ceph-users] Living with huge bucket sizes > Message-ID: > <CADRKj5SMdrA2FZW8W5VY0tzNi_21dAo7r9gWLevV9CMygkiuTw@mail. > gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Fri, Jun 9, 2017 at 2:21 AM, Dan van der Ster <d...@vanderster.com> > wrote: > > Hi Bryan, > > > > On Fri, Jun 9, 2017 at 1:55 AM, Bryan Stillwell <bstillw...@godaddy.com> > wrote: > >> This has come up quite a few times before, but since I was only working > with > >> RBD before I didn't pay too close attention to the conversation. I'm > >> looking > >> for the best way to handle existing clusters that have buckets with a > large > >> number of objects (>20 million) in them. The cluster I'm doing test on > is > >> currently running hammer (0.94.10), so if things got better in jewel I > would > >> love to hear about it! > >> ... > >> Has anyone found a good solution for this for existing large buckets? I > >> know sharding is the solution going forward, but afaik it can't be done > >> on existing buckets yet (although the dynamic resharding work mentioned > >> on today's performance call sounds promising). > > > > I haven't tried it myself, but 0.94.10 should have the (offline) > > resharding feature. From the release notes: > > > > Right. We did add automatic dynamic resharding to Luminous, but > offline resharding should be enough. > > > >> * In RADOS Gateway, it is now possible to reshard an existing bucket's > index > >> using an off-line tool. > >> > >> Usage: > >> > >> $ radosgw-admin bucket reshard --bucket= > --num_shards= > >> > >> This will create a new linked bucket instance that points to the newly > created > >> index objects. The old bucket instance still exists and currently it's > up to > >> the user to manually remove the old bucket index objects. (Note that > bucket > >> resharding currently requires that all IO (especially writes) to the > specific > >> bucket is quiesced.) > > Once resharding is done, use the radosgw-admin bi purge command to > remove the old bucket indexes. > > Yehuda > > > > > -- Dan > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > > Subject: Digest Footer > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > > End of ceph-users Digest, Vol 53, Issue 9 > * > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Living with huge bucket sizes
On Fri, Jun 9, 2017 at 2:21 AM, Dan van der Sterwrote: > Hi Bryan, > > On Fri, Jun 9, 2017 at 1:55 AM, Bryan Stillwell > wrote: >> This has come up quite a few times before, but since I was only working with >> RBD before I didn't pay too close attention to the conversation. I'm >> looking >> for the best way to handle existing clusters that have buckets with a large >> number of objects (>20 million) in them. The cluster I'm doing test on is >> currently running hammer (0.94.10), so if things got better in jewel I would >> love to hear about it! >> ... >> Has anyone found a good solution for this for existing large buckets? I >> know sharding is the solution going forward, but afaik it can't be done >> on existing buckets yet (although the dynamic resharding work mentioned >> on today's performance call sounds promising). > > I haven't tried it myself, but 0.94.10 should have the (offline) > resharding feature. From the release notes: > Right. We did add automatic dynamic resharding to Luminous, but offline resharding should be enough. >> * In RADOS Gateway, it is now possible to reshard an existing bucket's index >> using an off-line tool. >> >> Usage: >> >> $ radosgw-admin bucket reshard --bucket= >> --num_shards= >> >> This will create a new linked bucket instance that points to the newly >> created >> index objects. The old bucket instance still exists and currently it's up to >> the user to manually remove the old bucket index objects. (Note that bucket >> resharding currently requires that all IO (especially writes) to the specific >> bucket is quiesced.) Once resharding is done, use the radosgw-admin bi purge command to remove the old bucket indexes. Yehuda > > -- Dan > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Living with huge bucket sizes
Hi Bryan, On Fri, Jun 9, 2017 at 1:55 AM, Bryan Stillwellwrote: > This has come up quite a few times before, but since I was only working with > RBD before I didn't pay too close attention to the conversation. I'm > looking > for the best way to handle existing clusters that have buckets with a large > number of objects (>20 million) in them. The cluster I'm doing test on is > currently running hammer (0.94.10), so if things got better in jewel I would > love to hear about it! > ... > Has anyone found a good solution for this for existing large buckets? I > know sharding is the solution going forward, but afaik it can't be done > on existing buckets yet (although the dynamic resharding work mentioned > on today's performance call sounds promising). I haven't tried it myself, but 0.94.10 should have the (offline) resharding feature. From the release notes: > * In RADOS Gateway, it is now possible to reshard an existing bucket's index > using an off-line tool. > > Usage: > > $ radosgw-admin bucket reshard --bucket= > --num_shards= > > This will create a new linked bucket instance that points to the newly created > index objects. The old bucket instance still exists and currently it's up to > the user to manually remove the old bucket index objects. (Note that bucket > resharding currently requires that all IO (especially writes) to the specific > bucket is quiesced.) -- Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com