Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?
On 02/19/2018 09:49 PM, Robin H. Johnson wrote: When I read the bucket instance metadata back again, it still reads "placement_rule": "" so I wonder if the bucket_info change is really taking effect. So it never showed the new placement_rule if you did a get after the put? I think not. It's odd; it returned an empty list once, then reverted to producing a file not found error. Hard to explain or understand that! A quick debug session seems to show it still querying the wrong pool (100) for the index, so it seems that my attempt to update the bucket_info is either failing or incorrect! Did you run a local build w/ the linked patch? I think that would have more effect than I did just build a local copy of 12.2.2 with the patch - and it does seem to fix it. Thanks! Graham -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?
On Mon, Feb 19, 2018 at 07:57:18PM -0600, Graham Allan wrote: > Sorry to send another long followup, but actually... I'm not sure how to > change the placement_rule for a bucket - or at least what I tried does > not seem to work. Using a different (more disposable) bucket, my attempt > went like this:: [snip] > first created a new placement rule "old-placement" in both the zonegroup > and zone commit new period - this looks ok. ... > I edit "placement_rule": to change "" -> "old-placement" and write it > back using: > > > radosgw-admin metadata put bucket.instance:boto-demo-100:default.2170793.10 > > < boto-demo-100.json > > Now when I run "radosgw-admin bucket list --bucket=boto-demo-100" I am > getting an empty list, though I'm pretty sure the bucket contains some > objects. > > When I read the bucket instance metadata back again, it still reads > "placement_rule": "" so I wonder if the bucket_info change is really > taking effect. So it never showed the new placement_rule if you did a get after the put? > A quick debug session seems to show it still querying the wrong pool > (100) for the index, so it seems that my attempt to update the > bucket_info is either failing or incorrect! Did you run a local build w/ the linked patch? I think that would have more effect than -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: Digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?
Sorry to send another long followup, but actually... I'm not sure how to change the placement_rule for a bucket - or at least what I tried does not seem to work. Using a different (more disposable) bucket, my attempt went like this:: first created a new placement rule "old-placement" in both the zonegroup and zone commit new period - this looks ok. { "key": "old-placement", "val": { "index_pool": ".rgw.buckets", "data_pool": ".rgw.buckets", "data_extra_pool": "", "index_type": 0, "compression": "" } The current metadata for the test bucket looks like this: { "key": "bucket.instance:boto-demo-100:default.2170793.10", "ver": { "tag": "_BFMtrwjsyFbhU-65IielO3q", "ver": 652 }, "mtime": "2018-02-14 13:46:37.993218Z", "data": { "bucket_info": { "bucket": { "name": "boto-demo-100", "marker": "default.2170793.10", "bucket_id": "default.2170793.10", "tenant": "", "explicit_placement": { "data_pool": ".rgw.buckets", "data_extra_pool": "", "index_pool": ".rgw.buckets" } }, "creation_time": "0.00", "owner": "xx", "flags": 2, "zonegroup": "default", "placement_rule": "", "has_instance_obj": "true", "quota": { "enabled": false, "check_on_raw": false, "max_size": -1024, "max_size_kb": 0, "max_objects": -1 }, "num_shards": 32, "bi_shard_hash_type": 0, "requester_pays": "false", "has_website": "false", "swift_versioning": "false", "swift_ver_location": "", "index_type": 0, "mdsearch_config": [], "reshard_status": 0, "new_bucket_instance_id": "" }, "attrs": [ { "key": "user.rgw.acl", "val": "AgKFAwIXBgAAAGJseW5jaAkAAABCZW4gTHluY2gDA2IBAQYAAABibHluY2gPAQYAAABibHluY2gEAzcCAgQABgAAAGJseW5jagIEDwkAAABCZW4gTHluY2gAAA==" }, { "key": "user.rgw.idtag", "val": "" } ] } } I edit "placement_rule": to change "" -> "old-placement" and write it back using: radosgw-admin metadata put bucket.instance:boto-demo-100:default.2170793.10 < boto-demo-100.json Now when I run "radosgw-admin bucket list --bucket=boto-demo-100" I am getting an empty list, though I'm pretty sure the bucket contains some objects. When I read the bucket instance metadata back again, it still reads "placement_rule": "" so I wonder if the bucket_info change is really taking effect. A quick debug session seems to show it still querying the wrong pool (100) for the index, so it seems that my attempt to update the bucket_info is either failing or incorrect! Graham On 02/19/2018 04:00 PM, Graham Allan wrote: Thanks Robin, Of the two issues, this seems to me like it must be #22928. Since the majority of index entries for this bucket are in the .rgw.buckets pool, but newer entries have been created in .rgw.buckets.index, it's clearly failing to use the explicit placement pool - and with the index data split across two pools I don't see how resharding could correct this. I can get the object names from the "new" (incorrect) indexes with something like: for i in `rados -p .rgw.buckets.index ls - | grep "default.2049236.2"`; do rados -p .rgw.buckets.index listomapkeys $i|grep "^[a-zA-Z0-9]"; done Fortunately there are only ~20, in this bucket (my grep is just a stupid way to skip what I assume are the multipart parts, which have a non-ascii first char). ... and these files are downloadable, at least using s3cmd (minio client fails, it seems to try and check the index first). Once I have these newer files downloaded, then to restore access to the older index I like the suggestion in the issue tracker to create a new placement target in the zone, and modify the bucket's placement rule to match. It seems like it might be safer than copying objects from one index pool to the other (though the latter certainly sounds faster and easier!) From a quick check, I suspect I probably have 40 or so other buckets with this problem... will need to check them more closely. Actually it looks like a lot of the affected buckets were created around 10/2016 - I suspect the placement policies were incorrect for a short time due to confusion over the hammer->jewel upgrade (the realm/period/zonegroup/zone conversion didn't really go smoothly!) On 02/16/2018 11:39 PM, Robin H. Johnson wrote
Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?
Thanks Robin, Of the two issues, this seems to me like it must be #22928. Since the majority of index entries for this bucket are in the .rgw.buckets pool, but newer entries have been created in .rgw.buckets.index, it's clearly failing to use the explicit placement pool - and with the index data split across two pools I don't see how resharding could correct this. I can get the object names from the "new" (incorrect) indexes with something like: for i in `rados -p .rgw.buckets.index ls - | grep "default.2049236.2"`; do rados -p .rgw.buckets.index listomapkeys $i|grep "^[a-zA-Z0-9]"; done Fortunately there are only ~20, in this bucket (my grep is just a stupid way to skip what I assume are the multipart parts, which have a non-ascii first char). ... and these files are downloadable, at least using s3cmd (minio client fails, it seems to try and check the index first). Once I have these newer files downloaded, then to restore access to the older index I like the suggestion in the issue tracker to create a new placement target in the zone, and modify the bucket's placement rule to match. It seems like it might be safer than copying objects from one index pool to the other (though the latter certainly sounds faster and easier!) From a quick check, I suspect I probably have 40 or so other buckets with this problem... will need to check them more closely. Actually it looks like a lot of the affected buckets were created around 10/2016 - I suspect the placement policies were incorrect for a short time due to confusion over the hammer->jewel upgrade (the realm/period/zonegroup/zone conversion didn't really go smoothly!) On 02/16/2018 11:39 PM, Robin H. Johnson wrote: On Fri, Feb 16, 2018 at 07:06:21PM -0600, Graham Allan wrote: [snip great debugging] This seems similar to two open issues, could be either of them depending on how old that bucket is. http://tracker.ceph.com/issues/22756 http://tracker.ceph.com/issues/22928 - I have a mitigation posted to 22756. - There's a PR posted for 22928, but it'll probably only be in v12.2.4. -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?
On Fri, Feb 16, 2018 at 07:06:21PM -0600, Graham Allan wrote: [snip great debugging] This seems similar to two open issues, could be either of them depending on how old that bucket is. http://tracker.ceph.com/issues/22756 http://tracker.ceph.com/issues/22928 - I have a mitigation posted to 22756. - There's a PR posted for 22928, but it'll probably only be in v12.2.4. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: Digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com