[ceph-users] Rocksdb as omap db backend on jewel 10.2.10
Hi, does anyone know if it rocksdb is ready for use as omap db on jewel? According to the release notes of RH Ceph 2.4, "RocksDB is enabled as an option to replace levelDB" and they even have a solution on how to convert the leveldb omap db to rocksdb (https://access.redhat.com/solutions/3210951) which mentions you need at least 10.2.7. However, to be able to use it on upstream ceph 10.2.10, you still need to set the "enable experimental unrecoverable data corrupting features = rocksdb" setting in ceph.conf, and on every "ceph health" or other command you get the warning "the following dangerous and experimental features are enabled: rocksdb". We have a lot of performance issues with due to very large omap db's and would love to find out if switching to rocksdb would help there. Anyone any experience with this (good or bad)? regards, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous, RGW bucket resharding
On 11-12-17 16:23, Orit Wasserman wrote: > On Mon, Dec 11, 2017 at 4:58 PM, Sam Wouters <s...@ericom.be> wrote: >> Hi Orrit, >> >> >> On 04-12-17 18:57, Orit Wasserman wrote: >>> Hi Andreas, >>> >>> On Mon, Dec 4, 2017 at 11:26 AM, Andreas Calminder >>> <andreas.calmin...@klarna.com> wrote: >>>> Hello, >>>> With release 12.2.2 dynamic resharding bucket index has been disabled >>>> when running a multisite environment >>>> (http://tracker.ceph.com/issues/21725). Does this mean that resharding >>>> of bucket indexes shouldn't be done at all, manually, while running >>>> multisite as there's a risk of corruption? >>>> >>> You will need to stop the sync on the bucket before doing the >>> resharding and start it again after the resharding completes. >>> It will start a full sync on the bucket (it doesn't mean we copy the >>> objects but we go over on all of them to check if the need to be >>> synced). >>> We will automate this as part of the reshard admin command in the next >>> Luminous release. >> Does this also apply to Jewel? Stop sync and restart after resharding. >> (I don't know if there is any way to disable sync for a specific bucket.) >> > In Jewel we only support offline bucket resharding, you have to stop > both zones gateways before resharding. > Do: > Execute the resharding radosgw-admin command. > Run full sync on the bucket using: radosgw-admin bucket sync init on the > bucket. > Start the gateways. > > This should work but I have not tried it ... > Regards, > Orit Is it necessary to really stop the gateways? We tend to block all traffic to the bucket being resharded with the use of ACLs in the haproxy in front, to avoid downtime for non related buckets. Would a: - restart gws with sync thread disabled - block traffic to bucket - reshard - unblock traffic - bucket sync init - restart gws with sync enabled work as well? r, Sam >> r, >> Sam >>>> Also, as dynamic bucket resharding was/is the main motivator moving to >>>> Luminous (for me at least) is dynamic reshardning something that is >>>> planned to be fixed for multisite environments later in the Luminous >>>> life-cycle or will it be left disabled forever? >>>> >>> We are planning to enable it in Luminous time. >>> >>> Regards, >>> Orit >>> >>>> Thanks! >>>> /andreas >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous, RGW bucket resharding
Hi Orrit, On 04-12-17 18:57, Orit Wasserman wrote: > Hi Andreas, > > On Mon, Dec 4, 2017 at 11:26 AM, Andreas Calminder >wrote: >> Hello, >> With release 12.2.2 dynamic resharding bucket index has been disabled >> when running a multisite environment >> (http://tracker.ceph.com/issues/21725). Does this mean that resharding >> of bucket indexes shouldn't be done at all, manually, while running >> multisite as there's a risk of corruption? >> > You will need to stop the sync on the bucket before doing the > resharding and start it again after the resharding completes. > It will start a full sync on the bucket (it doesn't mean we copy the > objects but we go over on all of them to check if the need to be > synced). > We will automate this as part of the reshard admin command in the next > Luminous release. Does this also apply to Jewel? Stop sync and restart after resharding. (I don't know if there is any way to disable sync for a specific bucket.) r, Sam >> Also, as dynamic bucket resharding was/is the main motivator moving to >> Luminous (for me at least) is dynamic reshardning something that is >> planned to be fixed for multisite environments later in the Luminous >> life-cycle or will it be left disabled forever? >> > We are planning to enable it in Luminous time. > > Regards, > Orit > >> Thanks! >> /andreas >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] jewel - radosgw-admin bucket limit check broken?
Hi, I wanted to the test the new feature to check the present buckets for optimal index sharding. According to the docs this should be as simple as "radosgw-admin -n client.xxx bucket limit check" with an optional param for printing only buckets over or nearing the limit. When I invoke this, however I get the simple error output unrecognized arg limit usage: radosgw-admin [options...] followed by help output. Tested this with 10.2.8 and 10.2.9; other radosgw-admin commands work fine. I've looked into the open issues but don't seem to find this in the tracker. Simple bug or am I completely missing something? r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Recovering rgw index pool with large omap size
Hi list, we need to recover an index pool distributed over 4 ssd based osd's. We needed to kick out one of the OSDs cause it was blocking all rgw access due to leveldb compacting. Since then we've restarted the OSD with "leveldb compact on mount = true" and noup flag set, running the leveldb compact offline, but the index pg's are now running in degraded mode. Goal is to make the recovery as fast as possible during a small maintenance window and/or with minimal client impact. Cluster is running jewel 10.2.7 (recently upgraded from hammer) and has ongoing backfill operations (from changing the tunables). We have some buckets with a large amount of objects in it. Bucket index re-sharding would be needed, but we don't have the opportunity to do that right now. Plan so far: * set global I/O scheduling priority to 7 (lowest) * set index pool osd's specifics: - set recovery prio to highest (63) - set client prio to lowest (1) - increase recovery threads to 2 - set disk thread prio to highest (0) - limit omap entries per chunk for recovery to 32k (64k seems to give timeouts) * unset noup flag to let the misbehaving OSD kick in and start recovery Any further ideas, experience or remarks would be very much appreciated... r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?
Yes, don't know exactly since which release it was introduced, but in latest jewel and beyond there is: Please use pool level options recovery_priority and recovery_op_priority for enabling pool level recovery priority feature: Raw # ceph osd pool set default.rgw.buckets.index recovery_priority 5 # ceph osd pool set default.rgw.buckets.index recovery_op_priority 5 Recovery value 5 will help because the default is 3 in jewel release, use below command to check if both options are set properly Is there a way to prioritize specific pools during recovery? I know > there are issues open for it, but I wasn't aware it was implemented yet... > > Regards, > Logan > > - On Jun 20, 2017, at 8:20 AM, Sam Wouters <s...@ericom.be> wrote: > > Hi, > > Are they all in the same pool? Otherwise you could prioritize pool > recovery. > If not, maybe you can play with the osd max backfills number, no > idea if it accepts a value of 0 to actually disable it for > specific OSDs. > > r, > Sam > > On 20-06-17 14:44, Richard Hesketh wrote: > > Is there a way, either by individual PG or by OSD, I can prioritise > backfill/recovery on a set of PGs which are currently particularly important > to me? > > For context, I am replacing disks in a 5-node Jewel cluster, on a > node-by-node basis - mark out the OSDs on a node, wait for them to clear, > replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've > done my first node, but the significant CRUSH map changes means most of my > data is moving. I only currently care about the PGs on my next set of OSDs to > replace - the other remapped PGs I don't care about settling because they're > only going to end up moving around again after I do the next set of disks. I > do want the PGs specifically on the OSDs I am about to replace to backfill > because I don't want to compromise data integrity by downing them while they > host active PGs. If I could specifically prioritise the backfill on those > PGs/OSDs, I could get on with replacing disks without worrying about causing > degraded PGs. > > I'm in a situation right now where there is merely a couple of dozen > PGs on the disks I want to replace, which are all remapped and waiting to > backfill - but there are 2200 other PGs also waiting to backfill because > they've moved around too, and it's extremely frustating to be sat waiting to > see when the ones I care about will finally be handled so I can get on with > replacing those disks. > > Rich > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?
Hi, Are they all in the same pool? Otherwise you could prioritize pool recovery. If not, maybe you can play with the osd max backfills number, no idea if it accepts a value of 0 to actually disable it for specific OSDs. r, Sam On 20-06-17 14:44, Richard Hesketh wrote: > Is there a way, either by individual PG or by OSD, I can prioritise > backfill/recovery on a set of PGs which are currently particularly important > to me? > > For context, I am replacing disks in a 5-node Jewel cluster, on a > node-by-node basis - mark out the OSDs on a node, wait for them to clear, > replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've > done my first node, but the significant CRUSH map changes means most of my > data is moving. I only currently care about the PGs on my next set of OSDs to > replace - the other remapped PGs I don't care about settling because they're > only going to end up moving around again after I do the next set of disks. I > do want the PGs specifically on the OSDs I am about to replace to backfill > because I don't want to compromise data integrity by downing them while they > host active PGs. If I could specifically prioritise the backfill on those > PGs/OSDs, I could get on with replacing disks without worrying about causing > degraded PGs. > > I'm in a situation right now where there is merely a couple of dozen PGs on > the disks I want to replace, which are all remapped and waiting to backfill - > but there are 2200 other PGs also waiting to backfill because they've moved > around too, and it's extremely frustating to be sat waiting to see when the > ones I care about will finally be handled so I can get on with replacing > those disks. > > Rich > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jewel - rgw blocked on deep-scrub of bucket index pg
Hi, On 06-05-17 20:08, Wido den Hollander wrote: >> Op 6 mei 2017 om 9:55 schreef Christian Balzer <ch...@gol.com>: >> >> >> >> Hello, >> >> On Sat, 6 May 2017 09:25:15 +0200 (CEST) Wido den Hollander wrote: >> >>>> Op 5 mei 2017 om 10:33 schreef Sam Wouters <s...@ericom.be>: >>>> >>>> >>>> Hi, >>>> >>>> we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd >>>> data and journal co located on the disks; main purpose rgw secondary zone. >>>> >>>> Since the upgrade to jewel, whenever a deep scrub starts on one of the >>>> rgw index pool pg's, slow requests start piling up and rgw requests are >>>> blocked after some hours. >>>> The deep-scrub doesn't seem to finish (still running after +11 hours) >>>> and only escape I found so far is a restart of the primary osd holding >>>> the pg. >>>> >>>> Maybe important to know, we have some large rgw buckets regarding >>>> #objects (+ 3 million) with only index sharding of 8. >>>> >>>> scrub related settings: >>>> osd scrub sleep = 0.1 >>> Try removing this line, it can block threads under Jewel. I also found the bug report (#19497) yesterday, so indeed removed the sleep and manually started the deep-scrub. I didn't had time to check the result until now. After almost 26 hours the deep-scrub operation finished (2017-05-05 10:57:08 -> 2017-05-06 12:29:05), however during the scrubbing frequent timeouts and complete rgw downtime for various periods of time occurred. Our primary cluster is still running hammer, and on there the index pools are on ssd's, but this still raises concerns for after the planned upgrade of that one... Thanks a lot for the help! r, Sam >>> >> I'd really REALLY wish that would get fixed properly, as in the original >> functionality restored. > Afaik new work is being done on this. There was a recent thread on the > ceph-users or devel (can't find it) that new code is out there to fix this. > > Wido > >> Because as we've learned entrusting everything into internal Ceph queues >> with priorities isn't working as expected in all cases. >> >> For a second, very distant option, turn it into a NOP for the time being. >> As it stands now, it's another self-made, Jewel introduced bug... >> >> Christian >> >>> See how that works out. >>> >>> Wido >>> >>>> osd scrub during recovery = False >>>> osd scrub priority = 1 >>>> osd deep scrub stride = 1048576 >>>> osd scrub chunk min = 1 >>>> osd scrub chunk max = 1 >>>> >>>> Any help on debugging / resolving would be very much appreciated... >>>> >>>> regards, >>>> Sam >>>> >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> -- >> Christian BalzerNetwork/Systems Engineer >> ch...@gol.comGlobal OnLine Japan/Rakuten Communications >> http://www.gol.com/ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] jewel - rgw blocked on deep-scrub of bucket index pg
Hi, we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd data and journal co located on the disks; main purpose rgw secondary zone. Since the upgrade to jewel, whenever a deep scrub starts on one of the rgw index pool pg's, slow requests start piling up and rgw requests are blocked after some hours. The deep-scrub doesn't seem to finish (still running after +11 hours) and only escape I found so far is a restart of the primary osd holding the pg. Maybe important to know, we have some large rgw buckets regarding #objects (+ 3 million) with only index sharding of 8. scrub related settings: osd scrub sleep = 0.1 osd scrub during recovery = False osd scrub priority = 1 osd deep scrub stride = 1048576 osd scrub chunk min = 1 osd scrub chunk max = 1 Any help on debugging / resolving would be very much appreciated... regards, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Antw: Re: Best practices for extending a ceph cluster with minimal client impact data movement
Hi, >>> Now, add the OSDs to the cluster, but NOT to the CRUSHMap. >>> >>> When all the OSDs are online, inject a new CRUSHMap where you add the new >>> OSDs to the data placement. >>> >>> $ ceph osd setcrushmap -i >>> >>> The OSDs will now start to migrate data, but this is throttled by the max >>> recovery and backfill settings. I was wondering how exactly you accomplish that? Can you do this with a "ceph-deploy create" with "noin" or "noup" flags set, or does one need to follow the manual steps of adding an osd? r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can Jewel read Hammer radosgw buckets?
On 23-04-16 18:17, Yehuda Sadeh-Weinraub wrote: > On Sat, Apr 23, 2016 at 6:22 AM, Richard Chan >wrote: >> Hi Cephers, >> >> I upgraded to Jewel and noted the is massive radosgw multisite rework >> in the release notes. >> >> Can Jewel radosgw be configured to present existing Hammer buckets? >> On a test system, jewel didn't recognise my Hammer buckets; >> >> Hammer used pools .rgw.* >> Jewel created by default: .rgw.root and default.rgw* >> >> >> > Yes, jewel should be able to read hammer buckets. If it detects that > there's an old config, it should migrate existing setup into the new > config. It seemsthat something didn't work as expected here. One way > to fix it would be to create a new zone and set its pools to point at > the old config's pools. We'll need to figure out what went wrong > though. > Hi, I'm also wandering about the correct upgrade procedure for the radosgw's, especially in a multi gateway setup in a federated config. If you say existing setup should migrate, is it ok then to have hammer and jewel radosgw's co-exist (for a short time)? We have for example multi radosgw instances behind an haproxy. Can they be upgraded one at a time or do they all need to be stopped before starting the first jewel radosgw? Does the ceph.conf file needs to be adapted to the jewel config, fe change "rgw region root pool" into "rgw zonegroup root pool"? Before or after the upgrade? Concerning data replication. I understand the radosgw-agent is deprecated in jewel and the replication is done by the radosgw's them selves. Is this also automatically enabled or does this need to be started / configured somehow? thanks in advance, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubleshooting rgw bucket list
Thanks! Playing around with max_keys in bucket listing retrieval actually gives me results or not, this gives me a way to list the content until the bug is fixed. Is it possible somehow to copy the objects to a new bucket (with versioning disabled), and rename the current one? I don't think the latter is possible through the api but maybe there is some hidden way ;-) Could you also take a minute to confirm another versioning related bug I posted: http://tracker.ceph.com/issues/12819 If you could give me some pointers to contribute, I don't mind digging into code, I will gladly do so. r, Sam On 01-09-15 22:37, Yehuda Sadeh-Weinraub wrote: Yeah, I'm able to reproduce the issue. It is related to the fact that you have a bunch of delete markers in the bucket, as it triggers some bug there. I opened a new ceph issue for this one: http://tracker.ceph.com/issues/12913 Thanks, Yehuda On Tue, Sep 1, 2015 at 11:39 AM, Sam Wouters <s...@ericom.be> wrote: Sorry, forgot to mention: - yes, filtered by thread - the "is not valid" line occurred when performing the bucket --check - when doing a bucket listing, I also get an "is not valid", but on a different object: 7fe4f1d5b700 20 cls/rgw/cls_rgw.cc:460: entry abc_econtract/data/6scbrrlo4vttk72melewizj6n3[] is not valid bilog entry for this object similar to the one below r, Sam On 01-09-15 20:30, Sam Wouters wrote: Hi, see inline On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote: I assume you filtered the log by thread? I don't see the response messages. For the bucket check you can run radosgw-admin with --log-to-stderr. nothing is logged to the console when I do that Can you also set 'debug objclass = 20' on the osds? You can do it by: $ ceph tell osd.\* injectargs --debug-objclass 20 this continuously prints "20 cls/rgw/cls_rgw.cc:460: entry abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0 Also, it'd be interesting to get the following: $ radosgw-admin bi list --bucket= --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5 this gives me an empty array: [ ] but we did a trim of the bilog a while ago cause a lot entries regarding objects that were already removed from the bucket kept on syncing with the sync agent, causing a lot of delete_markers at the replication site. The object in the error above from the osd log, gives the following: # radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list --bucket=aws-cmis-prod --object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse [ { "type": "plain", "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "entry": { "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "instance": "", "ver": { "pool": -1, "epoch": 0 }, "locator": "", "exists": "false", "meta": { "category": 0, "size": 0, "mtime": "0.00", "etag": "", "owner": "", "owner_display_name": "", "content_type": "", "accounted_size": 0 }, "tag": "", "flags": 8, "pending_map": [], "versioned_epoch": 0 } }, { "type": "plain", "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu", "entry": { "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu", "ver": { "pool": 23, "epoch": 9680 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 103410, "mtime": "2015-08-07 17:57:32.00Z", "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a", "owner": "aws-cmis-prod", "owner_display_name": "AWS-CMIS prod user", "content_type": "application\/pdf", "accounted_size": 103410 }, "tag": "be-east.34319.452037
Re: [ceph-users] Troubleshooting rgw bucket list
Hi, see inline On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote: > I assume you filtered the log by thread? I don't see the response > messages. For the bucket check you can run radosgw-admin with > --log-to-stderr. nothing is logged to the console when I do that > > Can you also set 'debug objclass = 20' on the osds? You can do it by: > > $ ceph tell osd.\* injectargs --debug-objclass 20 this continuously prints "20 cls/rgw/cls_rgw.cc:460: entry abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0 > > Also, it'd be interesting to get the following: > > $ radosgw-admin bi list --bucket= > --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5 this gives me an empty array: [ ] but we did a trim of the bilog a while ago cause a lot entries regarding objects that were already removed from the bucket kept on syncing with the sync agent, causing a lot of delete_markers at the replication site. The object in the error above from the osd log, gives the following: # radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list --bucket=aws-cmis-prod --object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse [ { "type": "plain", "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "entry": { "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "instance": "", "ver": { "pool": -1, "epoch": 0 }, "locator": "", "exists": "false", "meta": { "category": 0, "size": 0, "mtime": "0.00", "etag": "", "owner": "", "owner_display_name": "", "content_type": "", "accounted_size": 0 }, "tag": "", "flags": 8, "pending_map": [], "versioned_epoch": 0 } }, { "type": "plain", "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu", "entry": { "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu", "ver": { "pool": 23, "epoch": 9680 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 103410, "mtime": "2015-08-07 17:57:32.00Z", "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a", "owner": "aws-cmis-prod", "owner_display_name": "AWS-CMIS prod user", "content_type": "application\/pdf", "accounted_size": 103410 }, "tag": "be-east.34319.4520377", "flags": 3, "pending_map": [], "versioned_epoch": 2 } }, { "type": "instance", "idx": "�1000_abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu", "entry": { "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu", "ver": { "pool": 23, "epoch": 9680 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 103410, "mtime": "2015-08-07 17:57:32.00Z", "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a", "owner": "aws-cmis-prod", "owner_display_name": "AWS-CMIS prod user", "content_type": "application\/pdf", "accounted_size": 103410 }, "tag": "be-east.34319.4520377", "flags": 3, "pendi
Re: [ceph-users] Troubleshooting rgw bucket list
not sure where I can find the logs for the bucket check, I can't really filter them out in the radosgw log. -Sam On 01-09-15 19:25, Sam Wouters wrote: > It looks like it, this is what shows in the logs after bumping the debug > and requesting a bucket list. > > 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list > aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) > start > abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] > num_entries 1 > 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from > .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 > 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state: > rctx=0x7fccb17c84d0 > obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 > state=0x7fcde01a4060 s->prefetch_data=0 > 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get: > name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit > 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was > set empty > 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get: > name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit > 2015-09-01 17:14:53.008675 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> > 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874 > ... > .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 > ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 > > On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote: >> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the >> operations (bucket listing and bucket check) go into some kind of >> infinite loop? >> >> Yehuda >> >> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote: >>> Hi, I've started the bucket --check --fix on friday evening and it's >>> still running. 'ceph -s' shows the cluster health as OK, I don't know if >>> there is anything else I could check? Is there a way of finding out if >>> its actually doing something? >>> >>> We only have this issue on the one bucket with versioning enabled, I >>> can't get rid of the feeling it has something todo with that. The >>> "underscore bug" is also still present on that bucket >>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any >>> way. >>> Are there any alternatives, as for example copy all the objects into a >>> new bucket without versioning? Simple way would be to list the objects >>> and copy them to a new bucket, but bucket listing is not working so... >>> >>> -Sam >>> >>> >>> On 31-08-15 10:47, Gregory Farnum wrote: >>>> This generally shouldn't be a problem at your bucket sizes. Have you >>>> checked that the cluster is actually in a healthy state? The sleeping >>>> locks are normal but should be getting woken up; if they aren't it >>>> means the object access isn't working for some reason. A down PG or >>>> something would be the simplest explanation. >>>> -Greg >>>> >>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote: >>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose >>>>> or progress logging of the radosgw-admin tool. >>>>> I will start a check and let it run over the weekend. >>>>> >>>>> tnx, >>>>> Sam >>>>> >>>>> On 28-08-15 18:16, Sam Wouters wrote: >>>>>> Hi, >>>>>> >>>>>> this bucket only has 13389 objects, so the index size shouldn't be a >>>>>> problem. Also, on the same cluster we have an other bucket with 1200543 >>>>>> objects (but no versioning configured), which has no issues. >>>>>> >>>>>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be >>>>>> happening. Putting an strace on the process shows a lot of lines like >>>>>> these: >>>>>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL >>>>>> >>>>>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL >>>>>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 >>>>>> [pid 99385] <... futex resumed> ) = -1 EAGAIN (Resource >>>>>> temporarily unavailable) >>>>>> [pid 99371] <... futex resumed> ) = 0 >>>>>> >>>>>> but no errors in the ceph logs or health warnings. >>>>>>
Re: [ceph-users] Troubleshooting rgw bucket list
Sorry, forgot to mention: - yes, filtered by thread - the "is not valid" line occurred when performing the bucket --check - when doing a bucket listing, I also get an "is not valid", but on a different object: 7fe4f1d5b700 20 cls/rgw/cls_rgw.cc:460: entry abc_econtract/data/6scbrrlo4vttk72melewizj6n3[] is not valid bilog entry for this object similar to the one below r, Sam On 01-09-15 20:30, Sam Wouters wrote: > Hi, > > see inline > > On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote: >> I assume you filtered the log by thread? I don't see the response >> messages. For the bucket check you can run radosgw-admin with >> --log-to-stderr. > nothing is logged to the console when I do that >> Can you also set 'debug objclass = 20' on the osds? You can do it by: >> >> $ ceph tell osd.\* injectargs --debug-objclass 20 > this continuously prints "20 cls/rgw/cls_rgw.cc:460: entry > abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0 >> Also, it'd be interesting to get the following: >> >> $ radosgw-admin bi list --bucket= >> --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5 > this gives me an empty array: > [ > ] > but we did a trim of the bilog a while ago cause a lot entries regarding > objects that were already removed from the bucket kept on syncing with > the sync agent, causing a lot of delete_markers at the replication site. > > The object in the error above from the osd log, gives the following: > # radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list > --bucket=aws-cmis-prod > --object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse > [ > { > "type": "plain", > "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", > "entry": { > "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", > "instance": "", > "ver": { > "pool": -1, > "epoch": 0 > }, > "locator": "", > "exists": "false", > "meta": { > "category": 0, > "size": 0, > "mtime": "0.00", > "etag": "", > "owner": "", > "owner_display_name": "", > "content_type": "", > "accounted_size": 0 > }, > "tag": "", > "flags": 8, > "pending_map": [], > "versioned_epoch": 0 > } > }, > { > "type": "plain", > "idx": > "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu", > "entry": { > "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", > "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu", > "ver": { > "pool": 23, > "epoch": 9680 > }, > "locator": "", > "exists": "true", > "meta": { > "category": 1, > "size": 103410, > "mtime": "2015-08-07 17:57:32.00Z", > "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a", > "owner": "aws-cmis-prod", > "owner_display_name": "AWS-CMIS prod user", > "content_type": "application\/pdf", > "accounted_size": 103410 > }, > "tag": "be-east.34319.4520377", > "flags": 3, > "pending_map": [], > "versioned_epoch": 2 > } > }, > { > "type": "instance", > "idx": > "�1000_abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu", > "entry": { > "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse", > "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu&quo
Re: [ceph-users] Troubleshooting rgw bucket list
It looks like it, this is what shows in the logs after bumping the debug and requesting a bucket list. 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) start abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] num_entries 1 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state: rctx=0x7fccb17c84d0 obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 state=0x7fcde01a4060 s->prefetch_data=0 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was set empty 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.008675 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874 .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 2015-09-01 17:14:53.009136 7fccb17ca700 10 cls_bucket_list aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) start abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] num_entries 1 2015-09-01 17:14:53.009146 7fccb17ca700 20 reading from .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 2015-09-01 17:14:53.009153 7fccb17ca700 20 get_obj_state: rctx=0x7fccb17c84d0 obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 state=0x7fcde01a4060 s->prefetch_data=0 2015-09-01 17:14:53.009158 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.009163 7fccb17ca700 20 get_obj_state: s->obj_tag was set empty 2015-09-01 17:14:53.009165 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.009189 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435876 .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 2015-09-01 17:14:53.009629 7fccb17ca700 10 cls_bucket_list aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) start abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] num_entries 1 2015-09-01 17:14:53.009638 7fccb17ca700 20 reading from .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 2015-09-01 17:14:53.009645 7fccb17ca700 20 get_obj_state: rctx=0x7fccb17c84d0 obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 state=0x7fcde01a4060 s->prefetch_data=0 2015-09-01 17:14:53.009651 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.009655 7fccb17ca700 20 get_obj_state: s->obj_tag was set empty 2015-09-01 17:14:53.009657 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.009681 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435878 .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 2015-09-01 17:14:53.010139 7fccb17ca700 10 cls_bucket_list aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) start abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] num_entries 1 2015-09-01 17:14:53.010149 7fccb17ca700 20 reading from .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 2015-09-01 17:14:53.010156 7fccb17ca700 20 get_obj_state: rctx=0x7fccb17c84d0 obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 state=0x7fcde01a4060 s->prefetch_data=0 2015-09-01 17:14:53.010161 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.010166 7fccb17ca700 20 get_obj_state: s->obj_tag was set empty 2015-09-01 17:14:53.010168 7fccb17ca700 10 cache get: name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit 2015-09-01 17:14:53.010192 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435880 .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote: > Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the > operations (bucket listing and bucket check) go into some kind of > infinite loop? > > Yehuda > > On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote: >> Hi, I've started the bucket --check --fix on friday evening and it's >
Re: [ceph-users] Troubleshooting rgw bucket list
Hi, I've started the bucket --check --fix on friday evening and it's still running. 'ceph -s' shows the cluster health as OK, I don't know if there is anything else I could check? Is there a way of finding out if its actually doing something? We only have this issue on the one bucket with versioning enabled, I can't get rid of the feeling it has something todo with that. The "underscore bug" is also still present on that bucket (http://tracker.ceph.com/issues/12819). Not sure if thats related in any way. Are there any alternatives, as for example copy all the objects into a new bucket without versioning? Simple way would be to list the objects and copy them to a new bucket, but bucket listing is not working so... -Sam On 31-08-15 10:47, Gregory Farnum wrote: > This generally shouldn't be a problem at your bucket sizes. Have you > checked that the cluster is actually in a healthy state? The sleeping > locks are normal but should be getting woken up; if they aren't it > means the object access isn't working for some reason. A down PG or > something would be the simplest explanation. > -Greg > > On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote: >> Ok, maybe I'm to impatient. It would be great if there were some verbose >> or progress logging of the radosgw-admin tool. >> I will start a check and let it run over the weekend. >> >> tnx, >> Sam >> >> On 28-08-15 18:16, Sam Wouters wrote: >>> Hi, >>> >>> this bucket only has 13389 objects, so the index size shouldn't be a >>> problem. Also, on the same cluster we have an other bucket with 1200543 >>> objects (but no versioning configured), which has no issues. >>> >>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be >>> happening. Putting an strace on the process shows a lot of lines like these: >>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL >>> >>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL >>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 >>> [pid 99385] <... futex resumed> ) = -1 EAGAIN (Resource >>> temporarily unavailable) >>> [pid 99371] <... futex resumed> ) = 0 >>> >>> but no errors in the ceph logs or health warnings. >>> >>> r, >>> Sam >>> >>> On 28-08-15 17:49, Ben Hines wrote: >>>> How many objects in the bucket? >>>> >>>> RGW has problems with index size once number of objects gets into the >>>> 90+ level. The buckets need to be recreated with 'sharded bucket >>>> indexes' on: >>>> >>>> rgw override bucket index max shards = 23 >>>> >>>> You could also try repairing the index with: >>>> >>>> radosgw-admin bucket check --fix --bucket= >>>> >>>> -Ben >>>> >>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <s...@ericom.be> wrote: >>>>> Hi, >>>>> >>>>> we have a rgw bucket (with versioning) where PUT and GET operations for >>>>> specific objects succeed, but retrieving an object list fails. >>>>> Using python-boto, after a timeout just gives us an 500 internal error; >>>>> radosgw-admin just hangs. >>>>> Also a radosgw-admin bucket check just seems to hang... >>>>> >>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we >>>>> quietly hoped upgrading would fix but it didn't... >>>>> >>>>> r, >>>>> Sam >>>>> ___ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rgw 0.94.3: objects starting with underscore in bucket with versioning enabled are not retrievable
Hi, we had an issue in our ceph clusters and are able to reproduce this in our labo cluster, just upgraded to hammer 0.94.3. Steps to reproduce: 1) create bucket test2 2) put _test object - lists and retrieves ok 3) enable versioning on test 4) put _test2 object - lists, but get fails with ERROR: ErrorNoSuchKey; object _test is still retrievable 5) disable versioning (bucket.configure_versioning(False)) 6) put _test3 object - lists ok, retrieves OK (still errorNoSuchKey on object _test2) - Does anyone know if this is a known bug or should I open a tracker? - we're fine to disable versioning for now, but we should find a way to retrieve or _objects uploaded with versioning support enabled. Or be able to rename/delete them... Any help or pointers would be much appreciated. Running the fix-tool doesn't show any errors, or doesn't fix anything: radosgw-admin -n client.radosgw.be-south-1 bucket check --check-head-obj-locator --bucket=test2 { bucket: test2, check_objects: [ { key: { type: head, name: _test, instance: }, oid: be03-south.7213293.1___test, locator: be03-south.7213293.1__test, needs_fixing: false, status: ok }, { key: { type: tail, name: _test, instance: }, needs_fixing: false, status: ok }, { key: { type: head, name: _test2, instance: NiKP46KSCHJAVQbnkoGv.RLfuYobP7B }, oid: be03-south.7213293.1__:NiKP46KSCHJAVQbnkoGv.RLfuYobP7B__test2, locator: be03-south.7213293.1__test2, needs_fixing: false, status: ok }, { key: { type: tail, name: _test2, instance: NiKP46KSCHJAVQbnkoGv.RLfuYobP7B }, needs_fixing: false, status: ok }, { key: { type: head, name: _test3, instance: }, oid: be03-south.7213293.1___test3, locator: be03-south.7213293.1__test3, needs_fixing: false, status: ok }, { key: { type: tail, name: _test3, instance: }, needs_fixing: false, status: ok } ] } regards, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Troubleshooting rgw bucket list
Hi, we have a rgw bucket (with versioning) where PUT and GET operations for specific objects succeed, but retrieving an object list fails. Using python-boto, after a timeout just gives us an 500 internal error; radosgw-admin just hangs. Also a radosgw-admin bucket check just seems to hang... ceph version is 0.94.3 but this also was happening with 0.94.2, we quietly hoped upgrading would fix but it didn't... r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubleshooting rgw bucket list
Hi, this bucket only has 13389 objects, so the index size shouldn't be a problem. Also, on the same cluster we have an other bucket with 1200543 objects (but no versioning configured), which has no issues. when we run a radosgw-admin bucket --check (--fix), nothing seems to be happening. Putting an strace on the process shows a lot of lines like these: [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL unfinished ... [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ... [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 unfinished ... [pid 99385] ... futex resumed ) = -1 EAGAIN (Resource temporarily unavailable) [pid 99371] ... futex resumed ) = 0 but no errors in the ceph logs or health warnings. r, Sam On 28-08-15 17:49, Ben Hines wrote: How many objects in the bucket? RGW has problems with index size once number of objects gets into the 90+ level. The buckets need to be recreated with 'sharded bucket indexes' on: rgw override bucket index max shards = 23 You could also try repairing the index with: radosgw-admin bucket check --fix --bucket=bucketname -Ben On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote: Hi, we have a rgw bucket (with versioning) where PUT and GET operations for specific objects succeed, but retrieving an object list fails. Using python-boto, after a timeout just gives us an 500 internal error; radosgw-admin just hangs. Also a radosgw-admin bucket check just seems to hang... ceph version is 0.94.3 but this also was happening with 0.94.2, we quietly hoped upgrading would fix but it didn't... r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubleshooting rgw bucket list
Ok, maybe I'm to impatient. It would be great if there were some verbose or progress logging of the radosgw-admin tool. I will start a check and let it run over the weekend. tnx, Sam On 28-08-15 18:16, Sam Wouters wrote: Hi, this bucket only has 13389 objects, so the index size shouldn't be a problem. Also, on the same cluster we have an other bucket with 1200543 objects (but no versioning configured), which has no issues. when we run a radosgw-admin bucket --check (--fix), nothing seems to be happening. Putting an strace on the process shows a lot of lines like these: [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL unfinished ... [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ... [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 unfinished ... [pid 99385] ... futex resumed ) = -1 EAGAIN (Resource temporarily unavailable) [pid 99371] ... futex resumed ) = 0 but no errors in the ceph logs or health warnings. r, Sam On 28-08-15 17:49, Ben Hines wrote: How many objects in the bucket? RGW has problems with index size once number of objects gets into the 90+ level. The buckets need to be recreated with 'sharded bucket indexes' on: rgw override bucket index max shards = 23 You could also try repairing the index with: radosgw-admin bucket check --fix --bucket=bucketname -Ben On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote: Hi, we have a rgw bucket (with versioning) where PUT and GET operations for specific objects succeed, but retrieving an object list fails. Using python-boto, after a timeout just gives us an 500 internal error; radosgw-admin just hangs. Also a radosgw-admin bucket check just seems to hang... ceph version is 0.94.3 but this also was happening with 0.94.2, we quietly hoped upgrading would fix but it didn't... r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw hanging - blocking rgw.bucket_list ops
tried removing, but no luck: rados -p .be-east.rgw.buckets rm be-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity error removing .be-east.rgw.bucketsbe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity: (2) anyone? On 21-08-15 13:06, Sam Wouters wrote: I suspect these to be the cause: rados ls -p .be-east.rgw.buckets | grep sanitybe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity be-east.5436.1__sanity be-east.5436.1__:2vBijaGnVQF4Q0IjZPeyZSKeUmBGn9X__sanity be-east.5436.1__sanity be-east.5436.1__:4JTCVFxB1qoDWPu1nhuMDuZ3QNPaq5n__sanity be-east.5436.1__sanity be-east.5436.1__:9jFwd8xvqJMdrqZuM8Au4mi9M62ikyo__sanity be-east.5436.1__sanity be-east.5436.1__:BlfbGYGvLi92QPSiabT2mP7OeuETz0P__sanity be-east.5436.1__sanity be-east.5436.1__:MigpcpJKkan7Po6vBsQsSD.hEIRWuim__sanity be-east.5436.1__sanity be-east.5436.1__:QDTxD5p0AmVlPW4v8OPU3vtDLzenj4y__sanity be-east.5436.1__sanity be-east.5436.1__:S43EiNAk5hOkzgfbOynbOZOuLtUv0SB__sanity be-east.5436.1__sanity be-east.5436.1__:UKlOVMQBQnlK20BHJPyvnG6m.2ogBRW__sanity be-east.5436.1__sanity be-east.5436.1__:kkb6muzJgREie6XftdEJdFHxR2MaFeB__sanity be-east.5436.1__sanity be-east.5436.1__:oqPhWzFDSQ-sNPtppsl1tPjoryaHNZY__sanity be-east.5436.1__sanity be-east.5436.1__:pLhygPGKf3uw7C7OxSJNCw8rQEMOw5l__sanity be-east.5436.1__sanity be-east.5436.1__:tO1Nf3S2WOfmcnKVPv0tMeXbwa5JR36__sanity be-east.5436.1__sanity be-east.5436.1__:ye4oRwDDh1cGckbMbIo56nQvM7OEyPM__sanity be-east.5436.1__sanity be-east.5436.1___sanitybe-east.5436.1__sanity would it be save and/or help to remove those with rados rm, and try an bucket check --fix --check-objects? On 21-08-15 11:28, Sam Wouters wrote: Hi, We are running hammer 0.94.2 and have an increasing amount of heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had timed out after 600 messages in our radosgw logs, with radosgw eventually stalling. A restart of the radosgw helps for a few minutes, but after that it hangs again. ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows call rgw.bucket_list ops. No new bucket lists are requested, so those ops seem to stay there. Anyone any idea how to get rid of those. Restart of the affecting osd didn't help neither. I'm not sure if its related, but we have an object called _sanity in the bucket where the listing was performed on. I know there is some bug with objects starting with _. Any help would be much appreciated. r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw hanging - blocking rgw.bucket_list ops
I suspect these to be the cause: rados ls -p .be-east.rgw.buckets | grep sanitybe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity be-east.5436.1__sanity be-east.5436.1__:2vBijaGnVQF4Q0IjZPeyZSKeUmBGn9X__sanity be-east.5436.1__sanity be-east.5436.1__:4JTCVFxB1qoDWPu1nhuMDuZ3QNPaq5n__sanity be-east.5436.1__sanity be-east.5436.1__:9jFwd8xvqJMdrqZuM8Au4mi9M62ikyo__sanity be-east.5436.1__sanity be-east.5436.1__:BlfbGYGvLi92QPSiabT2mP7OeuETz0P__sanity be-east.5436.1__sanity be-east.5436.1__:MigpcpJKkan7Po6vBsQsSD.hEIRWuim__sanity be-east.5436.1__sanity be-east.5436.1__:QDTxD5p0AmVlPW4v8OPU3vtDLzenj4y__sanity be-east.5436.1__sanity be-east.5436.1__:S43EiNAk5hOkzgfbOynbOZOuLtUv0SB__sanity be-east.5436.1__sanity be-east.5436.1__:UKlOVMQBQnlK20BHJPyvnG6m.2ogBRW__sanity be-east.5436.1__sanity be-east.5436.1__:kkb6muzJgREie6XftdEJdFHxR2MaFeB__sanity be-east.5436.1__sanity be-east.5436.1__:oqPhWzFDSQ-sNPtppsl1tPjoryaHNZY__sanity be-east.5436.1__sanity be-east.5436.1__:pLhygPGKf3uw7C7OxSJNCw8rQEMOw5l__sanity be-east.5436.1__sanity be-east.5436.1__:tO1Nf3S2WOfmcnKVPv0tMeXbwa5JR36__sanity be-east.5436.1__sanity be-east.5436.1__:ye4oRwDDh1cGckbMbIo56nQvM7OEyPM__sanity be-east.5436.1__sanity be-east.5436.1___sanitybe-east.5436.1__sanity would it be save and/or help to remove those with rados rm, and try an bucket check --fix --check-objects? On 21-08-15 11:28, Sam Wouters wrote: Hi, We are running hammer 0.94.2 and have an increasing amount of heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had timed out after 600 messages in our radosgw logs, with radosgw eventually stalling. A restart of the radosgw helps for a few minutes, but after that it hangs again. ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows call rgw.bucket_list ops. No new bucket lists are requested, so those ops seem to stay there. Anyone any idea how to get rid of those. Restart of the affecting osd didn't help neither. I'm not sure if its related, but we have an object called _sanity in the bucket where the listing was performed on. I know there is some bug with objects starting with _. Any help would be much appreciated. r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw hanging - blocking rgw.bucket_list ops
Hi, We are running hammer 0.94.2 and have an increasing amount of heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had timed out after 600 messages in our radosgw logs, with radosgw eventually stalling. A restart of the radosgw helps for a few minutes, but after that it hangs again. ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows call rgw.bucket_list ops. No new bucket lists are requested, so those ops seem to stay there. Anyone any idea how to get rid of those. Restart of the affecting osd didn't help neither. I'm not sure if its related, but we have an object called _sanity in the bucket where the listing was performed on. I know there is some bug with objects starting with _. Any help would be much appreciated. r, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw-agent keeps syncing most active bucket - ignoring others
Hi, from the doc of radosgw-agent and some items in this list, I understood that the max-entries argument was there to prevent a very active bucket to keep the other buckets from keeping synced. In our agent logs however we saw a lot of bucket instance bla has 1000 entries after bla, and the agent kept on syncing that active bucket. Looking at the code, in class DataWorkerIncremental, it looks like the agent loops in fetching log entries from the bucket until it receives less entries then the max_entries. Is this intended behaviour? I would suspect it to just pass the max_entries log entries for processing and increase the marker. Is there any other way to make sure less active buckets are frequently synced? We've tried increasing num-workers, but this only has affect the first pass. Thanks, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-agent keeps syncing most active bucket - ignoring others
Hmm, looks like intended behaviour: SNIP CommitDate: Mon Mar 3 06:08:42 2014 -0800 worker: process all bucket instance log entries at once Currently if there are more than max_entries in a single bucket instance's log, only max_entries of those will be processed, and the bucket instance will not be examined again until it is modified again. To keep it simple, get the entire log of entries to be updated and process them all at once. This means one busy shard may block others from syncing, but multiple instances of radosgw-agent can be run to circumvent that issue. With only one instance, users can be sure everything is synced when an incremental sync completes with no errors. /SNIP However, this brings us to a new issue. After starting a second agent, one of the agents is busy syncing the busy shard and the other agent synced correctly all of the other buckets. So far, so good. But, since a few of them are almost static, it looks like it started syncing those in a second run from the beginning all over again. As versioning was enabled on those buckets after they were created and with already objects and removed objects in there, it seems like the agent is copying those unversioned objects to versioned ones, creating a lot of delete markers and multiple versions in the secondary zone. Anyone any idea how to handle this correctly. I've already did a cleanup some weeks ago, but if the agent is going to keep on restarting the sync from the beginning, I'll have to cleanup every time. regards, Sam On 18-08-15 09:36, Sam Wouters wrote: Hi, from the doc of radosgw-agent and some items in this list, I understood that the max-entries argument was there to prevent a very active bucket to keep the other buckets from keeping synced. In our agent logs however we saw a lot of bucket instance bla has 1000 entries after bla, and the agent kept on syncing that active bucket. Looking at the code, in class DataWorkerIncremental, it looks like the agent loops in fetching log entries from the bucket until it receives less entries then the max_entries. Is this intended behaviour? I would suspect it to just pass the max_entries log entries for processing and increase the marker. Is there any other way to make sure less active buckets are frequently synced? We've tried increasing num-workers, but this only has affect the first pass. Thanks, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Radosgw-agent with version enabled bucket - duplicate objects
Hi, I've upgraded to from hammer 0.94.1 to 0.94.2 and radosgw-agent-1.2.2-0.el7.centos.noarch from 1.2.1 and after restart of the agent (with versioned set to true), I noticed duplicate objects in a version-ed enabled bucket on the replication site. For example: on source side: object: Key: metadatab/e58438be260f48dd8d7b7855 version_id: null (old object before versioning was enabled on the bucket) on replication side: object: Key: metadatab/e58438be260f48dd8d7b7855 version_id 1: rZ1f4LtbeDSx6O8Nsz.m28MNamraPFd version_id 2: null When restarting the agent without the --versioned param, it seems like it does a full sync again, and now I'm getting three objects for every source object. I have no idea how to get the zones back into sync (without duplicate objects) and how to prevent this from happening again, so any help would be much appreciated. regards, Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy
]# cat ceph.repo [Ceph] name=Ceph packages for $basearch baseurl=http://ceph.com/rpm-giant/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [Ceph-noarch] name=Ceph noarch packages baseurl=http://ceph.com/rpm-giant/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [ceph-source] name=Ceph source packages baseurl=http://ceph.com/rpm-giant/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 When i visit this directory http://ceph.com/rpm-giant/el7 , i can see multiple versions of python-ceph i.e. python-ceph-0.86-0.el7.centos.x86_64 python-ceph-0.87-0.el7.centos.x86_64 python-ceph-0.87-1.el7.centos.x86_64 *This is the reason , yum is getting confused to install the latest available version python-ceph-0.87-1.el7.centos.x86_64. This issue looks like yum priority plugin and RPM obsolete.* http://tracker.ceph.com/issues/10476 [root@rgw-node1 yum.repos.d]# cat /etc/yum/pluginconf.d/priorities.conf [main] enabled = 1 check_obsoletes = 1 [root@rgw-node1 yum.repos.d]# [root@rgw-node1 yum.repos.d]# [root@rgw-node1 yum.repos.d]# uname -r 3.10.0-229.1.2.el7.x86_64 [root@rgw-node1 yum.repos.d]# cat /etc/redhat-release CentOS Linux release 7.1.1503 (Core) [root@rgw-node1 yum.repos.d]# However it worked *fine 1 week back* on CentOS 7.0 [root@ceph-node1 ceph]# uname -r 3.10.0-123.20.1.el7.x86_64 [root@ceph-node1 ceph]# cat /etc/redhat-release CentOS Linux release 7.0.1406 (Core) [root@ceph-node1 ceph]# Any fix to this is highly appreciated. Regards VS ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Sam Wouters Ericom Computers -- *Ericom Computers* Tiensestraat 178 3000 Leuven Tel : +32 (0) 16 23 77 55 Fax : +32 (0) 16 23 48 05 Ericom Website http://www.ericom.be * http://www.ericom.be* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com