Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-17 Thread Iain Buclaw
On Mon, 8 Apr 2019 at 10:33, Iain Buclaw  wrote:
>
> On Mon, 8 Apr 2019 at 05:01, Matt Benjamin  wrote:
> >
> > Hi Christian,
> >
> > Dynamic bucket-index sharding for multi-site setups is being worked
> > on, and will land in the N release cycle.
> >
>
> What about removing orphaned shards on the master?  Is the existing
> tools able to work with that?
>
> On the secondaries, it is no problem to proxy_pass all requests to the
> master whilst all rgw pools are destroyed and recreated.
>
> I would have though however that manually removing the known orphaned
> indexes be safe though, side-stepping the annoying job of having to
> force degrade the working service.
>

So is there no workaround for the bombarding of 404 requests on the sync master?

I can see object entries in rgw.bucket.index and rgw.log, is there
anywhere else it gets the list of things to sync?  In somewhere
encoded in omap?


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-08 Thread Iain Buclaw
On Mon, 8 Apr 2019 at 05:01, Matt Benjamin  wrote:
>
> Hi Christian,
>
> Dynamic bucket-index sharding for multi-site setups is being worked
> on, and will land in the N release cycle.
>

What about removing orphaned shards on the master?  Is the existing
tools able to work with that?

On the secondaries, it is no problem to proxy_pass all requests to the
master whilst all rgw pools are destroyed and recreated.

I would have though however that manually removing the known orphaned
indexes be safe though, side-stepping the annoying job of having to
force degrade the working service.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-07 Thread Matt Benjamin
Hi Christian,

Dynamic bucket-index sharding for multi-site setups is being worked
on, and will land in the N release cycle.

regards,

Matt

On Sun, Apr 7, 2019 at 6:59 PM Christian Balzer  wrote:
>
> On Fri, 5 Apr 2019 11:42:28 -0400 Casey Bodley wrote:
>
> > Hi Iain,
> >
> > Resharding is not supported in multisite. The issue is that the master zone
> > needs to be authoritative for all metadata. If bucket reshard commands run
> > on the secondary zone, they create new bucket instance metadata that the
> > master zone never sees, so replication can't reconcile those changes.
> >
>
> Unless the above should read "dynamic resharding..." this is in clear
> contrast to the documentation by Redhat Iain cited.
>
> But given how costly manual resharding is including service interruption,
> that's not really a option for most people either.
>
> Looks like Ceph is out of the race for multi-PB use case here, unless
> multi-site and dynamic resharding are less than 6 months away.
>
> Regards,
>
> Christian
>
> > The 'stale-instances rm' command is not safe to run in multisite because it
> > can misidentify as 'stale' some bucket instances that were deleted on the
> > master zone, where data sync on the secondary zone hasn't yet finished
> > deleting all of the objects it contained. Deleting these bucket instances
> > and their associated bucket index objects would leave any remaining objects
> > behind as orphans and leak storage capacity.
> >
> > On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw  wrote:
> >
> > > On Wed, 3 Apr 2019 at 09:41, Iain Buclaw  wrote:
> > > >
> > > > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
> > > > >
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > > > 7511
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > > > 3509
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > > > 3801
> > > > >
> > > >
> > > > Documentation is a horrid mess around the subject on multi-site
> > > resharding
> > > >
> > > >
> > > http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
> > > >
> > > >
> > > https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> > > > (Manual Resharding)
> > > >
> > > >
> > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
> > > >
> > > > All disagree with each other over the correct process to reshard
> > > > indexes in multi-site.  Worse, none of them seem to work correctly
> > > > anyway.
> > > >
> > > > Changelog of 13.2.5 looked promising up until the sentence: "These
> > > > commands should not be used on a multisite setup as the stale
> > > > instances may be unlikely to be from a reshard and can have
> > > > consequences".
> > > >
> > > > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> > > >
> > >
> > > The stale-instances feature only correctly identifies one stale shard.
> > >
> > > # radosgw-admin reshard stale-instances list
> > > [
> > > "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> > > ]
> > >
> > > I can confirm this is one of the orphaned index objects.
> > >
> > > # rados -p .rgw.buckets.index ls | grep
> > > 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
> > >
> > > I would assume then that unlike what documentation says, it's safe to
> > > run 'reshard stale-instances rm' on a multi-site setup.
> > >
> > > However it is quite telling if the author of this feature doesn't
> > > trust what they have written to work correctly.
> > >
> > > There are still thousands of stale index objects that 'stale-instances
> > > list' didn't pick up though.  But it appears that radosgw-admin only
> > > looks at 'metadata list bucket' data, and not what is physically
> > > inside the pool.
> > >
> > > --
> > > Iain Buclaw
> > >
> > > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
> --
> Christian 

Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-07 Thread Christian Balzer
On Fri, 5 Apr 2019 11:42:28 -0400 Casey Bodley wrote:

> Hi Iain,
> 
> Resharding is not supported in multisite. The issue is that the master zone
> needs to be authoritative for all metadata. If bucket reshard commands run
> on the secondary zone, they create new bucket instance metadata that the
> master zone never sees, so replication can't reconcile those changes.
> 

Unless the above should read "dynamic resharding..." this is in clear
contrast to the documentation by Redhat Iain cited.

But given how costly manual resharding is including service interruption,
that's not really a option for most people either.

Looks like Ceph is out of the race for multi-PB use case here, unless
multi-site and dynamic resharding are less than 6 months away.

Regards,

Christian

> The 'stale-instances rm' command is not safe to run in multisite because it
> can misidentify as 'stale' some bucket instances that were deleted on the
> master zone, where data sync on the secondary zone hasn't yet finished
> deleting all of the objects it contained. Deleting these bucket instances
> and their associated bucket index objects would leave any remaining objects
> behind as orphans and leak storage capacity.
> 
> On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw  wrote:
> 
> > On Wed, 3 Apr 2019 at 09:41, Iain Buclaw  wrote:  
> > >
> > > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:  
> > > >
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > > 7511
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > > 3509
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > > 3801
> > > >  
> > >
> > > Documentation is a horrid mess around the subject on multi-site  
> > resharding  
> > >
> > >  
> > http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
> >   
> > >
> > >  
> > https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> >   
> > > (Manual Resharding)
> > >
> > >  
> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
> >   
> > >
> > > All disagree with each other over the correct process to reshard
> > > indexes in multi-site.  Worse, none of them seem to work correctly
> > > anyway.
> > >
> > > Changelog of 13.2.5 looked promising up until the sentence: "These
> > > commands should not be used on a multisite setup as the stale
> > > instances may be unlikely to be from a reshard and can have
> > > consequences".
> > >
> > > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> > >  
> >
> > The stale-instances feature only correctly identifies one stale shard.
> >
> > # radosgw-admin reshard stale-instances list
> > [
> > "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> > ]
> >
> > I can confirm this is one of the orphaned index objects.
> >
> > # rados -p .rgw.buckets.index ls | grep
> > 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
> >
> > I would assume then that unlike what documentation says, it's safe to
> > run 'reshard stale-instances rm' on a multi-site setup.
> >
> > However it is quite telling if the author of this feature doesn't
> > trust what they have written to work correctly.
> >
> > There are still thousands of stale index objects that 'stale-instances
> > list' didn't pick up though.  But it appears that radosgw-admin only
> > looks at 'metadata list bucket' data, and not what is physically
> > inside the pool.
> >
> > --
> > Iain Buclaw
> >
> > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-06 Thread Iain Buclaw
On Fri, 5 Apr 2019 at 17:42, Casey Bodley  wrote:
>
> Hi Iain,
>
> Resharding is not supported in multisite. The issue is that the master zone 
> needs to be authoritative for all metadata. If bucket reshard commands run on 
> the secondary zone, they create new bucket instance metadata that the master 
> zone never sees, so replication can't reconcile those changes.
>
> The 'stale-instances rm' command is not safe to run in multisite because it 
> can misidentify as 'stale' some bucket instances that were deleted on the 
> master zone, where data sync on the secondary zone hasn't yet finished 
> deleting all of the objects it contained. Deleting these bucket instances and 
> their associated bucket index objects would leave any remaining objects 
> behind as orphans and leak storage capacity.
>

So that documentation should be removed then?

Currently the master is receiving dozens of replication requests every
second, most of which are 404s.  I can't see anywhere on the
secondaries where it is getting information on non-existent buckets
from other than the thousands of stale index objects as already
pointed out.

The number of buckets is fixed in my cluster, there's no chance of
misidentifying anything. They're dead index objects, and they have
been causing large omap problems, it looks like the corrective action
is to hand delete these and restart radosgw, but no one else who has
done investigations into this has confirmed otherwise.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-05 Thread Casey Bodley
Hi Iain,

Resharding is not supported in multisite. The issue is that the master zone
needs to be authoritative for all metadata. If bucket reshard commands run
on the secondary zone, they create new bucket instance metadata that the
master zone never sees, so replication can't reconcile those changes.

The 'stale-instances rm' command is not safe to run in multisite because it
can misidentify as 'stale' some bucket instances that were deleted on the
master zone, where data sync on the secondary zone hasn't yet finished
deleting all of the objects it contained. Deleting these bucket instances
and their associated bucket index objects would leave any remaining objects
behind as orphans and leak storage capacity.

On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw  wrote:

> On Wed, 3 Apr 2019 at 09:41, Iain Buclaw  wrote:
> >
> > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
> > >
> > >
> > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > 7511
> > >
> > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > 3509
> > >
> > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > 3801
> > >
> >
> > Documentation is a horrid mess around the subject on multi-site
> resharding
> >
> >
> http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
> >
> >
> https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> > (Manual Resharding)
> >
> >
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
> >
> > All disagree with each other over the correct process to reshard
> > indexes in multi-site.  Worse, none of them seem to work correctly
> > anyway.
> >
> > Changelog of 13.2.5 looked promising up until the sentence: "These
> > commands should not be used on a multisite setup as the stale
> > instances may be unlikely to be from a reshard and can have
> > consequences".
> >
> > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> >
>
> The stale-instances feature only correctly identifies one stale shard.
>
> # radosgw-admin reshard stale-instances list
> [
> "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> ]
>
> I can confirm this is one of the orphaned index objects.
>
> # rados -p .rgw.buckets.index ls | grep
> 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
>
> I would assume then that unlike what documentation says, it's safe to
> run 'reshard stale-instances rm' on a multi-site setup.
>
> However it is quite telling if the author of this feature doesn't
> trust what they have written to work correctly.
>
> There are still thousands of stale index objects that 'stale-instances
> list' didn't pick up though.  But it appears that radosgw-admin only
> looks at 'metadata list bucket' data, and not what is physically
> inside the pool.
>
> --
> Iain Buclaw
>
> *(p < e ? p++ : p) = (c & 0x0f) + '0';
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-04 Thread Iain Buclaw
On Wed, 3 Apr 2019 at 09:41, Iain Buclaw  wrote:
>
> On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
> >
> >
> > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > 7511
> >
> > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > 3509
> >
> > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > 3801
> >
>
> Documentation is a horrid mess around the subject on multi-site resharding
>
> http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
>
> https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> (Manual Resharding)
>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
>
> All disagree with each other over the correct process to reshard
> indexes in multi-site.  Worse, none of them seem to work correctly
> anyway.
>
> Changelog of 13.2.5 looked promising up until the sentence: "These
> commands should not be used on a multisite setup as the stale
> instances may be unlikely to be from a reshard and can have
> consequences".
>
> http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
>

The stale-instances feature only correctly identifies one stale shard.

# radosgw-admin reshard stale-instances list
[
"mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
]

I can confirm this is one of the orphaned index objects.

# rados -p .rgw.buckets.index ls | grep
0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8

I would assume then that unlike what documentation says, it's safe to
run 'reshard stale-instances rm' on a multi-site setup.

However it is quite telling if the author of this feature doesn't
trust what they have written to work correctly.

There are still thousands of stale index objects that 'stale-instances
list' didn't pick up though.  But it appears that radosgw-admin only
looks at 'metadata list bucket' data, and not what is physically
inside the pool.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-03 Thread Iain Buclaw
On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
>
>
> # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> 7511
>
> # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> 3509
>
> # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> 3801
>

Documentation is a horrid mess around the subject on multi-site resharding

http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding

https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
(Manual Resharding)

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw

All disagree with each other over the correct process to reshard
indexes in multi-site.  Worse, none of them seem to work correctly
anyway.

Changelog of 13.2.5 looked promising up until the sentence: "These
commands should not be used on a multisite setup as the stale
instances may be unlikely to be from a reshard and can have
consequences".

http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-03 Thread Iain Buclaw
On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
>
> On Tue, 19 Feb 2019 at 10:05, Iain Buclaw  wrote:
> >
> > On Tue, 19 Feb 2019 at 09:59, Iain Buclaw  wrote:
> > >
> > > On Wed, 6 Feb 2019 at 09:28, Iain Buclaw  wrote:
> > > >
> > > > On Tue, 5 Feb 2019 at 10:04, Iain Buclaw  wrote:
> > > > >
> > > > > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Following the update of one secondary site from 12.2.8 to 12.2.11, 
> > > > > > the
> > > > > > following warning have come up.
> > > > > >
> > > > > > HEALTH_WARN 1 large omap objects
> > > > > > LARGE_OMAP_OBJECTS 1 large omap objects
> > > > > > 1 large objects found in pool '.rgw.buckets.index'
> > > > > > Search the cluster log for 'Large omap object found' for more 
> > > > > > details.
> > > > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > Is this the reason why resharding hasn't propagated?
> > > > > >
> > > > >
> > > > > Furthermore, infact it looks like the index is broken on the 
> > > > > secondaries.
> > > > >
> > > > > On the master:
> > > > >
> > > > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > > > {
> > > > > "type": "plain",
> > > > > "idx": "myobject",
> > > > > "entry": {
> > > > > "name": "myobject",
> > > > > "instance": "",
> > > > > "ver": {
> > > > > "pool": 28,
> > > > > "epoch": 8848
> > > > > },
> > > > > "locator": "",
> > > > > "exists": "true",
> > > > > "meta": {
> > > > > "category": 1,
> > > > > "size": 9200,
> > > > > "mtime": "2018-03-27 21:12:56.612172Z",
> > > > > "etag": "c365c324cda944d2c3b687c0785be735",
> > > > > "owner": "mybucket",
> > > > > "owner_display_name": "Bucket User",
> > > > > "content_type": "application/octet-stream",
> > > > > "accounted_size": 9194,
> > > > > "user_data": ""
> > > > > },
> > > > > "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> > > > > "flags": 0,
> > > > > "pending_map": [],
> > > > > "versioned_epoch": 0
> > > > > }
> > > > > }
> > > > >
> > > > >
> > > > > On the secondaries:
> > > > >
> > > > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > > > ERROR: bi_get(): (2) No such file or directory
> > > > >
> > > > > How does one go about rectifying this mess?
> > > > >
> > > >
> > > > Random blog in language I don't understand seems to allude to using
> > > > radosgw-admin bi put to restore backed up indexes, but not under what
> > > > circumstances you would use such a command.
> > > >
> > > > https://cloud.tencent.com/developer/article/1032854
> > > >
> > > > Would this be safe to run on secondaries?
> > > >
> > >
> > > Removed the bucket on the secondaries and scheduled new sync.  However
> > > this gets stuck at some point and radosgw is complaining about:
> > >
> > > data sync: WARNING: skipping data log entry for missing bucket
> > > mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21
> > >
> > > Hopeless that RGW can't even do a simple job right, I removed the
> > > problematic bucket on the master, but now there are now hundreds of
> > > shard objects inside the index pool, all look to be orphaned, and
> > > still the warnings for missing bucket continue to happen on the
> > > secondaries.  In some cases there's an object on the secondary that
> > > doesn't exist on the master.
> > >
> > > All the while, ceph is still complaining about large omap files.
> > >
> > > $ ceph daemon mon.ceph-mon-1 config get
> > > osd_deep_scrub_large_omap_object_value_sum_threshold
> > > {
> > > "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824"
> > > }
> > >
> > > It seems implausible that the cluster is still complaining about this
> > > when the largest omap contains 71405 entries.
> > >
> > >
> > > I can't run bi purge or metadata rm on the unreferenced entries
> > > because the bucket itself is no more.  Can I remove objects from the
> > > index pool using 'rados rm' ?
> > >
> >
> > Possibly related
> >
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031350.html
> >
>
> # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> 7511
>
> # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> 3509
>
> # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> 3801
>

I guess no one uses multisite RadosGW in production then, I'm
absolutely flabbergasted that RGW can't do what should be a simple job
without breaking.

I'm sure I'll have better luck writing a synchronisation tool myself,
or using minio+syncthing.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-19 Thread Iain Buclaw
On Tue, 19 Feb 2019 at 10:05, Iain Buclaw  wrote:
>
> On Tue, 19 Feb 2019 at 09:59, Iain Buclaw  wrote:
> >
> > On Wed, 6 Feb 2019 at 09:28, Iain Buclaw  wrote:
> > >
> > > On Tue, 5 Feb 2019 at 10:04, Iain Buclaw  wrote:
> > > >
> > > > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Following the update of one secondary site from 12.2.8 to 12.2.11, the
> > > > > following warning have come up.
> > > > >
> > > > > HEALTH_WARN 1 large omap objects
> > > > > LARGE_OMAP_OBJECTS 1 large omap objects
> > > > > 1 large objects found in pool '.rgw.buckets.index'
> > > > > Search the cluster log for 'Large omap object found' for more 
> > > > > details.
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > Is this the reason why resharding hasn't propagated?
> > > > >
> > > >
> > > > Furthermore, infact it looks like the index is broken on the 
> > > > secondaries.
> > > >
> > > > On the master:
> > > >
> > > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > > {
> > > > "type": "plain",
> > > > "idx": "myobject",
> > > > "entry": {
> > > > "name": "myobject",
> > > > "instance": "",
> > > > "ver": {
> > > > "pool": 28,
> > > > "epoch": 8848
> > > > },
> > > > "locator": "",
> > > > "exists": "true",
> > > > "meta": {
> > > > "category": 1,
> > > > "size": 9200,
> > > > "mtime": "2018-03-27 21:12:56.612172Z",
> > > > "etag": "c365c324cda944d2c3b687c0785be735",
> > > > "owner": "mybucket",
> > > > "owner_display_name": "Bucket User",
> > > > "content_type": "application/octet-stream",
> > > > "accounted_size": 9194,
> > > > "user_data": ""
> > > > },
> > > > "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> > > > "flags": 0,
> > > > "pending_map": [],
> > > > "versioned_epoch": 0
> > > > }
> > > > }
> > > >
> > > >
> > > > On the secondaries:
> > > >
> > > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > > ERROR: bi_get(): (2) No such file or directory
> > > >
> > > > How does one go about rectifying this mess?
> > > >
> > >
> > > Random blog in language I don't understand seems to allude to using
> > > radosgw-admin bi put to restore backed up indexes, but not under what
> > > circumstances you would use such a command.
> > >
> > > https://cloud.tencent.com/developer/article/1032854
> > >
> > > Would this be safe to run on secondaries?
> > >
> >
> > Removed the bucket on the secondaries and scheduled new sync.  However
> > this gets stuck at some point and radosgw is complaining about:
> >
> > data sync: WARNING: skipping data log entry for missing bucket
> > mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21
> >
> > Hopeless that RGW can't even do a simple job right, I removed the
> > problematic bucket on the master, but now there are now hundreds of
> > shard objects inside the index pool, all look to be orphaned, and
> > still the warnings for missing bucket continue to happen on the
> > secondaries.  In some cases there's an object on the secondary that
> > doesn't exist on the master.
> >
> > All the while, ceph is still complaining about large omap files.
> >
> > $ ceph daemon mon.ceph-mon-1 config get
> > osd_deep_scrub_large_omap_object_value_sum_threshold
> > {
> > "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824"
> > }
> >
> > It seems implausible that the cluster is still complaining about this
> > when the largest omap contains 71405 entries.
> >
> >
> > I can't run bi purge or metadata rm on the unreferenced entries
> > because the bucket itself is no more.  Can I remove objects from the
> > index pool using 'rados rm' ?
> >
>
> Possibly related
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031350.html
>

# ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
7511

# ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
3509

# ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
3801

I believe the correct phrase in italian would be 'Che Pasticcio'.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-19 Thread Iain Buclaw
On Tue, 19 Feb 2019 at 09:59, Iain Buclaw  wrote:
>
> On Wed, 6 Feb 2019 at 09:28, Iain Buclaw  wrote:
> >
> > On Tue, 5 Feb 2019 at 10:04, Iain Buclaw  wrote:
> > >
> > > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Following the update of one secondary site from 12.2.8 to 12.2.11, the
> > > > following warning have come up.
> > > >
> > > > HEALTH_WARN 1 large omap objects
> > > > LARGE_OMAP_OBJECTS 1 large omap objects
> > > > 1 large objects found in pool '.rgw.buckets.index'
> > > > Search the cluster log for 'Large omap object found' for more 
> > > > details.
> > > >
> > >
> > > [...]
> > >
> > > > Is this the reason why resharding hasn't propagated?
> > > >
> > >
> > > Furthermore, infact it looks like the index is broken on the secondaries.
> > >
> > > On the master:
> > >
> > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > {
> > > "type": "plain",
> > > "idx": "myobject",
> > > "entry": {
> > > "name": "myobject",
> > > "instance": "",
> > > "ver": {
> > > "pool": 28,
> > > "epoch": 8848
> > > },
> > > "locator": "",
> > > "exists": "true",
> > > "meta": {
> > > "category": 1,
> > > "size": 9200,
> > > "mtime": "2018-03-27 21:12:56.612172Z",
> > > "etag": "c365c324cda944d2c3b687c0785be735",
> > > "owner": "mybucket",
> > > "owner_display_name": "Bucket User",
> > > "content_type": "application/octet-stream",
> > > "accounted_size": 9194,
> > > "user_data": ""
> > > },
> > > "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> > > "flags": 0,
> > > "pending_map": [],
> > > "versioned_epoch": 0
> > > }
> > > }
> > >
> > >
> > > On the secondaries:
> > >
> > > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > > ERROR: bi_get(): (2) No such file or directory
> > >
> > > How does one go about rectifying this mess?
> > >
> >
> > Random blog in language I don't understand seems to allude to using
> > radosgw-admin bi put to restore backed up indexes, but not under what
> > circumstances you would use such a command.
> >
> > https://cloud.tencent.com/developer/article/1032854
> >
> > Would this be safe to run on secondaries?
> >
>
> Removed the bucket on the secondaries and scheduled new sync.  However
> this gets stuck at some point and radosgw is complaining about:
>
> data sync: WARNING: skipping data log entry for missing bucket
> mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21
>
> Hopeless that RGW can't even do a simple job right, I removed the
> problematic bucket on the master, but now there are now hundreds of
> shard objects inside the index pool, all look to be orphaned, and
> still the warnings for missing bucket continue to happen on the
> secondaries.  In some cases there's an object on the secondary that
> doesn't exist on the master.
>
> All the while, ceph is still complaining about large omap files.
>
> $ ceph daemon mon.ceph-mon-1 config get
> osd_deep_scrub_large_omap_object_value_sum_threshold
> {
> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824"
> }
>
> It seems implausible that the cluster is still complaining about this
> when the largest omap contains 71405 entries.
>
>
> I can't run bi purge or metadata rm on the unreferenced entries
> because the bucket itself is no more.  Can I remove objects from the
> index pool using 'rados rm' ?
>

Possibly related

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031350.html

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-19 Thread Iain Buclaw
On Wed, 6 Feb 2019 at 09:28, Iain Buclaw  wrote:
>
> On Tue, 5 Feb 2019 at 10:04, Iain Buclaw  wrote:
> >
> > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
> > >
> > > Hi,
> > >
> > > Following the update of one secondary site from 12.2.8 to 12.2.11, the
> > > following warning have come up.
> > >
> > > HEALTH_WARN 1 large omap objects
> > > LARGE_OMAP_OBJECTS 1 large omap objects
> > > 1 large objects found in pool '.rgw.buckets.index'
> > > Search the cluster log for 'Large omap object found' for more details.
> > >
> >
> > [...]
> >
> > > Is this the reason why resharding hasn't propagated?
> > >
> >
> > Furthermore, infact it looks like the index is broken on the secondaries.
> >
> > On the master:
> >
> > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > {
> > "type": "plain",
> > "idx": "myobject",
> > "entry": {
> > "name": "myobject",
> > "instance": "",
> > "ver": {
> > "pool": 28,
> > "epoch": 8848
> > },
> > "locator": "",
> > "exists": "true",
> > "meta": {
> > "category": 1,
> > "size": 9200,
> > "mtime": "2018-03-27 21:12:56.612172Z",
> > "etag": "c365c324cda944d2c3b687c0785be735",
> > "owner": "mybucket",
> > "owner_display_name": "Bucket User",
> > "content_type": "application/octet-stream",
> > "accounted_size": 9194,
> > "user_data": ""
> > },
> > "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> > "flags": 0,
> > "pending_map": [],
> > "versioned_epoch": 0
> > }
> > }
> >
> >
> > On the secondaries:
> >
> > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > ERROR: bi_get(): (2) No such file or directory
> >
> > How does one go about rectifying this mess?
> >
>
> Random blog in language I don't understand seems to allude to using
> radosgw-admin bi put to restore backed up indexes, but not under what
> circumstances you would use such a command.
>
> https://cloud.tencent.com/developer/article/1032854
>
> Would this be safe to run on secondaries?
>

Removed the bucket on the secondaries and scheduled new sync.  However
this gets stuck at some point and radosgw is complaining about:

data sync: WARNING: skipping data log entry for missing bucket
mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21

Hopeless that RGW can't even do a simple job right, I removed the
problematic bucket on the master, but now there are now hundreds of
shard objects inside the index pool, all look to be orphaned, and
still the warnings for missing bucket continue to happen on the
secondaries.  In some cases there's an object on the secondary that
doesn't exist on the master.

All the while, ceph is still complaining about large omap files.

$ ceph daemon mon.ceph-mon-1 config get
osd_deep_scrub_large_omap_object_value_sum_threshold
{
"osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824"
}

It seems implausible that the cluster is still complaining about this
when the largest omap contains 71405 entries.


I can't run bi purge or metadata rm on the unreferenced entries
because the bucket itself is no more.  Can I remove objects from the
index pool using 'rados rm' ?

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-06 Thread Iain Buclaw
On Tue, 5 Feb 2019 at 10:04, Iain Buclaw  wrote:
>
> On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
> >
> > Hi,
> >
> > Following the update of one secondary site from 12.2.8 to 12.2.11, the
> > following warning have come up.
> >
> > HEALTH_WARN 1 large omap objects
> > LARGE_OMAP_OBJECTS 1 large omap objects
> > 1 large objects found in pool '.rgw.buckets.index'
> > Search the cluster log for 'Large omap object found' for more details.
> >
>
> [...]
>
> > Is this the reason why resharding hasn't propagated?
> >
>
> Furthermore, infact it looks like the index is broken on the secondaries.
>
> On the master:
>
> # radosgw-admin bi get --bucket=mybucket --object=myobject
> {
> "type": "plain",
> "idx": "myobject",
> "entry": {
> "name": "myobject",
> "instance": "",
> "ver": {
> "pool": 28,
> "epoch": 8848
> },
> "locator": "",
> "exists": "true",
> "meta": {
> "category": 1,
> "size": 9200,
> "mtime": "2018-03-27 21:12:56.612172Z",
> "etag": "c365c324cda944d2c3b687c0785be735",
> "owner": "mybucket",
> "owner_display_name": "Bucket User",
> "content_type": "application/octet-stream",
> "accounted_size": 9194,
> "user_data": ""
> },
> "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> "flags": 0,
> "pending_map": [],
> "versioned_epoch": 0
> }
> }
>
>
> On the secondaries:
>
> # radosgw-admin bi get --bucket=mybucket --object=myobject
> ERROR: bi_get(): (2) No such file or directory
>
> How does one go about rectifying this mess?
>

Random blog in language I don't understand seems to allude to using
radosgw-admin bi put to restore backed up indexes, but not under what
circumstances you would use such a command.

https://cloud.tencent.com/developer/article/1032854

Would this be safe to run on secondaries?

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-05 Thread Iain Buclaw
On Tue, 5 Feb 2019 at 09:46, Iain Buclaw  wrote:
>
> Hi,
>
> Following the update of one secondary site from 12.2.8 to 12.2.11, the
> following warning have come up.
>
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool '.rgw.buckets.index'
> Search the cluster log for 'Large omap object found' for more details.
>

[...]

> Is this the reason why resharding hasn't propagated?
>

Furthermore, infact it looks like the index is broken on the secondaries.

On the master:

# radosgw-admin bi get --bucket=mybucket --object=myobject
{
"type": "plain",
"idx": "myobject",
"entry": {
"name": "myobject",
"instance": "",
"ver": {
"pool": 28,
"epoch": 8848
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 9200,
"mtime": "2018-03-27 21:12:56.612172Z",
"etag": "c365c324cda944d2c3b687c0785be735",
"owner": "mybucket",
"owner_display_name": "Bucket User",
"content_type": "application/octet-stream",
"accounted_size": 9194,
"user_data": ""
},
"tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
}


On the secondaries:

# radosgw-admin bi get --bucket=mybucket --object=myobject
ERROR: bi_get(): (2) No such file or directory

How does one go about rectifying this mess?

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-02-05 Thread Iain Buclaw
Hi,

Following the update of one secondary site from 12.2.8 to 12.2.11, the
following warning have come up.

HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

listomapkeys confirms this.

.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36605032.1: 2828737

And there's "bucket_index_max_shards = 0" in the multisite configuration map.

Have ran radosgw-admin reshard on all buckets, setting size to 12.
Likewise, bucket_index_max_shards = 12 in the maps and committed the
period.  This followed by bi purge of the old bucket index.

I can see all new files in the index have been sync'd with all
secondaries, however they are all empty.

On the master:

.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.11: 70517
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.10: 69940
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.3: 69992
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.0: 70184
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.6: 70276
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.2: 69695
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.4: 70251
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.7: 69916
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.5: 69677
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.1: 70569
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.9: 70151
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.8: 70312

On the secondaries:

.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.11: 72
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.10: 90
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.3: 42
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36605032.1: 2828737
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.0: 33
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.6: 51
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.2: 51
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.4: 54
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.7: 69
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.5: 60
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.1: 48
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.9: 60
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1.8: 66

Have ran bucket sync, metadata sync, data sync, nothing changes.

How would you synchronize the bucket index from master to secondaries?
 Is it safe to remove the old index on the secondaries?

I have noticed that both the 366... and 908... ids show up here:

# radosgw-admin metadata list bucket.instance
[
"mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1",
"mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36605032.1",
]

# radosgw-admin metadata get bucket:mybucket
{
"key": "bucket:mybucket",
"ver": {
"tag": "_RTDLJ2lyzp0KcHkL_hE4t3Z",
"ver": 2
},
"mtime": "2019-02-04 14:03:47.830500Z",
"data": {
"bucket": {
"name": "mybucket",
"marker": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36605032.1",
"bucket_id": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.90887297.1",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"owner": "mybucket",
"creation_time": "2018-03-27 19:05:22.776182Z",
"linked": "true",
"has_bucket_info": "false"
}
}

Is this the reason why resharding hasn't propagated?

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com