Re: [ceph-users] Some long running ops may lock osd
Looking further, i guess what i tried to tell was a simplified version of sharded threadpools, released in giant. Is it possible for that to be backported to firefly? On Tue, Mar 3, 2015 at 9:33 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Thank you folks for bringing that up. I had some questions about sharding. We'd like blind buckets too, at least it's on the roadmap. For the current sharded implementation, what are the final details? Is number of shards defined per bucket or globally? Is there a way to split current indexes into shards? On the other hand what i'd like to point here is not necessarily large-bucket-index specific. The problem is the mechanism around thread pools. Any request may require locks on a pg and this should not block the requests for other pgs. I'm no expert but the threads may be able to requeue the requests to a locked pg, processing others for other pgs. Or maybe a thread per pg design was possible. Because, you know, it is somewhat OK not being able to do anything for a locked resource. Then you can go and improve your processing or your locks. But it's a whole different problem when a locked pg blocks requests for a few hundred other pgs in other pools for no good reason. On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines bhi...@gmail.com wrote: Blind-bucket would be perfect for us, as we don't need to list the objects. We only need to list the bucket when doing a bucket deletion. If we could clean out/delete all objects in a bucket (without iterating/listing them) that would be ideal.. On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote: We have had good experience so far keeping each bucket less than 0.5 million objects, by client side sharding. But I think it would be nice you can test at your scale, with your hardware configuration, as well as your expectation over the tail latency. Generally the bucket sharding should help, both for Write throughput and *stall with recovering/scrubbing*, but it comes with a prices - The X shards you have for each bucket, the listing/trimming would be X times weighted, from OSD's load's point of view. There was discussion to implement: 1) blind bucket (for use cases bucket listing is not needed). 2) Un-ordered listing, which could improve the problem I mentioned above. They are on the roadmap... Thanks, Guang From: bhi...@gmail.com Date: Mon, 2 Mar 2015 18:13:25 -0800 To: erdem.agao...@gmail.com CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Some long running ops may lock osd We're seeing a lot of this as well. (as i mentioned to sage at SCALE..) Is there a rule of thumb at all for how big is safe to let a RGW bucket get? Also, is this theoretically resolved by the new bucket-sharding feature in the latest dev release? -Ben On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi Gregory, We are not using listomapkeys that way or in any way to be precise. I used it here just to reproduce the behavior/issue. What i am really interested in is if scrubbing-deep actually mitigates the problem and/or is there something that can be further improved. Or i guess we should go upgrade now and hope for the best :) On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote: On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi all, especially devs, We have recently pinpointed one of the causes of slow requests in our cluster. It seems deep-scrubs on pg's that contain the index file for a large radosgw bucket lock the osds. Incresing op threads and/or disk threads helps a little bit, but we need to increase them beyond reason in order to completely get rid of the problem. A somewhat similar (and more severe) version of the issue occurs when we call listomapkeys for the index file, and since the logs for deep-scrubbing was much harder read, this inspection was based on listomapkeys. In this example osd.121 is the primary of pg 10.c91 which contains file .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains ~500k objects. Standard listomapkeys call take about 3 seconds. time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null real 0m2.983s user 0m0.760s sys 0m0.148s In order to lock the osd we request 2 of them simultaneously with something like: rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null sleep 1 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 'debug_osd=30' logs show the flow like: At t0 some thread enqueue_op's my omap-get-keys request. Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k keys. Op-Thread B responds to several other requests during that 1 second sleep. They're generally extremely fast subops on other pgs. At t1 (about a second later) my
Re: [ceph-users] Some long running ops may lock osd
Thank you folks for bringing that up. I had some questions about sharding. We'd like blind buckets too, at least it's on the roadmap. For the current sharded implementation, what are the final details? Is number of shards defined per bucket or globally? Is there a way to split current indexes into shards? On the other hand what i'd like to point here is not necessarily large-bucket-index specific. The problem is the mechanism around thread pools. Any request may require locks on a pg and this should not block the requests for other pgs. I'm no expert but the threads may be able to requeue the requests to a locked pg, processing others for other pgs. Or maybe a thread per pg design was possible. Because, you know, it is somewhat OK not being able to do anything for a locked resource. Then you can go and improve your processing or your locks. But it's a whole different problem when a locked pg blocks requests for a few hundred other pgs in other pools for no good reason. On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines bhi...@gmail.com wrote: Blind-bucket would be perfect for us, as we don't need to list the objects. We only need to list the bucket when doing a bucket deletion. If we could clean out/delete all objects in a bucket (without iterating/listing them) that would be ideal.. On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote: We have had good experience so far keeping each bucket less than 0.5 million objects, by client side sharding. But I think it would be nice you can test at your scale, with your hardware configuration, as well as your expectation over the tail latency. Generally the bucket sharding should help, both for Write throughput and *stall with recovering/scrubbing*, but it comes with a prices - The X shards you have for each bucket, the listing/trimming would be X times weighted, from OSD's load's point of view. There was discussion to implement: 1) blind bucket (for use cases bucket listing is not needed). 2) Un-ordered listing, which could improve the problem I mentioned above. They are on the roadmap... Thanks, Guang From: bhi...@gmail.com Date: Mon, 2 Mar 2015 18:13:25 -0800 To: erdem.agao...@gmail.com CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Some long running ops may lock osd We're seeing a lot of this as well. (as i mentioned to sage at SCALE..) Is there a rule of thumb at all for how big is safe to let a RGW bucket get? Also, is this theoretically resolved by the new bucket-sharding feature in the latest dev release? -Ben On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi Gregory, We are not using listomapkeys that way or in any way to be precise. I used it here just to reproduce the behavior/issue. What i am really interested in is if scrubbing-deep actually mitigates the problem and/or is there something that can be further improved. Or i guess we should go upgrade now and hope for the best :) On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote: On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi all, especially devs, We have recently pinpointed one of the causes of slow requests in our cluster. It seems deep-scrubs on pg's that contain the index file for a large radosgw bucket lock the osds. Incresing op threads and/or disk threads helps a little bit, but we need to increase them beyond reason in order to completely get rid of the problem. A somewhat similar (and more severe) version of the issue occurs when we call listomapkeys for the index file, and since the logs for deep-scrubbing was much harder read, this inspection was based on listomapkeys. In this example osd.121 is the primary of pg 10.c91 which contains file .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains ~500k objects. Standard listomapkeys call take about 3 seconds. time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null real 0m2.983s user 0m0.760s sys 0m0.148s In order to lock the osd we request 2 of them simultaneously with something like: rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null sleep 1 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 'debug_osd=30' logs show the flow like: At t0 some thread enqueue_op's my omap-get-keys request. Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k keys. Op-Thread B responds to several other requests during that 1 second sleep. They're generally extremely fast subops on other pgs. At t1 (about a second later) my second omap-get-keys request gets enqueue_op'ed. But it does not start probably because of the lock held by Thread A. After that point other threads enqueue_op other requests on other pgs too but none of them starts processing, in which i consider
[ceph-users] Some long running ops may lock osd
Hi all, especially devs, We have recently pinpointed one of the causes of slow requests in our cluster. It seems deep-scrubs on pg's that contain the index file for a large radosgw bucket lock the osds. Incresing op threads and/or disk threads helps a little bit, but we need to increase them beyond reason in order to completely get rid of the problem. A somewhat similar (and more severe) version of the issue occurs when we call listomapkeys for the index file, and since the logs for deep-scrubbing was much harder read, this inspection was based on listomapkeys. In this example osd.121 is the primary of pg 10.c91 which contains file .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains ~500k objects. Standard listomapkeys call take about 3 seconds. time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null real 0m2.983s user 0m0.760s sys 0m0.148s In order to lock the osd we request 2 of them simultaneously with something like: rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null sleep 1 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 'debug_osd=30' logs show the flow like: At t0 some thread enqueue_op's my omap-get-keys request. Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k keys. Op-Thread B responds to several other requests during that 1 second sleep. They're generally extremely fast subops on other pgs. At t1 (about a second later) my second omap-get-keys request gets enqueue_op'ed. But it does not start probably because of the lock held by Thread A. After that point other threads enqueue_op other requests on other pgs too but none of them starts processing, in which i consider the osd is locked. At t2 (about another second later) my first omap-get-keys request is finished. Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts reading ~500k keys again. Op-Thread A continues to process the requests enqueued in t1-t2. It seems Op-Thread B is waiting on the lock held by Op-Thread A while it can process other requests for other pg's just fine. My guess is a somewhat larger scenario happens in deep-scrubbing, like on the pg containing index for the bucket of 20M objects. A disk/op thread starts reading through the omap which will take say 60 seconds. During the first seconds, other requests for other pgs pass just fine. But in 60 seconds there are bound to be other requests for the same pg, especially since it holds the index file. Each of these requests lock another disk/op thread to the point where there are no free threads left to process any requests for any pg. Causing slow-requests. So first of all thanks if you can make it here, and sorry for the involved mail, i'm exploring the problem as i go. Now, is that deep-scrubbing situation i tried to theorize even possible? If not can you point us where to look further. We are currently running 0.72.2 and know about newer ioprio settings in Firefly and such. While we are planning to upgrade in a few weeks but i don't think those options will help us in any way. Am i correct? Are there any other improvements that we are not aware? Regards, -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Some long running ops may lock osd
Hi Gregory, We are not using listomapkeys that way or in any way to be precise. I used it here just to reproduce the behavior/issue. What i am really interested in is if scrubbing-deep actually mitigates the problem and/or is there something that can be further improved. Or i guess we should go upgrade now and hope for the best :) On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote: On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi all, especially devs, We have recently pinpointed one of the causes of slow requests in our cluster. It seems deep-scrubs on pg's that contain the index file for a large radosgw bucket lock the osds. Incresing op threads and/or disk threads helps a little bit, but we need to increase them beyond reason in order to completely get rid of the problem. A somewhat similar (and more severe) version of the issue occurs when we call listomapkeys for the index file, and since the logs for deep-scrubbing was much harder read, this inspection was based on listomapkeys. In this example osd.121 is the primary of pg 10.c91 which contains file .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains ~500k objects. Standard listomapkeys call take about 3 seconds. time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null real 0m2.983s user 0m0.760s sys 0m0.148s In order to lock the osd we request 2 of them simultaneously with something like: rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null sleep 1 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 'debug_osd=30' logs show the flow like: At t0 some thread enqueue_op's my omap-get-keys request. Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k keys. Op-Thread B responds to several other requests during that 1 second sleep. They're generally extremely fast subops on other pgs. At t1 (about a second later) my second omap-get-keys request gets enqueue_op'ed. But it does not start probably because of the lock held by Thread A. After that point other threads enqueue_op other requests on other pgs too but none of them starts processing, in which i consider the osd is locked. At t2 (about another second later) my first omap-get-keys request is finished. Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts reading ~500k keys again. Op-Thread A continues to process the requests enqueued in t1-t2. It seems Op-Thread B is waiting on the lock held by Op-Thread A while it can process other requests for other pg's just fine. My guess is a somewhat larger scenario happens in deep-scrubbing, like on the pg containing index for the bucket of 20M objects. A disk/op thread starts reading through the omap which will take say 60 seconds. During the first seconds, other requests for other pgs pass just fine. But in 60 seconds there are bound to be other requests for the same pg, especially since it holds the index file. Each of these requests lock another disk/op thread to the point where there are no free threads left to process any requests for any pg. Causing slow-requests. So first of all thanks if you can make it here, and sorry for the involved mail, i'm exploring the problem as i go. Now, is that deep-scrubbing situation i tried to theorize even possible? If not can you point us where to look further. We are currently running 0.72.2 and know about newer ioprio settings in Firefly and such. While we are planning to upgrade in a few weeks but i don't think those options will help us in any way. Am i correct? Are there any other improvements that we are not aware? This is all basically correct; it's one of the reasons you don't want to let individual buckets get too large. That said, I'm a little confused about why you're running listomapkeys that way. RGW throttles itself by getting only a certain number of entries at a time (1000?) and any system you're also building should do the same. That would reduce the frequency of any issues, and I *think* that scrubbing has some mitigating factors to help (although maybe not; it's been a while since I looked at any of that stuff). Although I just realized that my vague memory of deep scrubbing working better might be based on improvements that only got in for firefly...not sure. -Greg -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages
We are running ubuntu 12.04 and Folsom. Compiling qemu 1.5 only caused random complaints about 'qemu query-commands not found' or sth like that on libvirt end. Upgrading libvirt to 1.0.5 fixed it. But that had some problems with attaching rbd disks: could not open disk image rbd:vols/volume-foo:id=volumes:key=bar:auth_supported=cephx\\;none: Operation not supported I don't know if that's something with our setup but only thing we could do to fix that is to patch libvirt where it appends ':auth_supported=cephx\\;none' and remove those slashes. I guess somewhere around rbd/libvirt upgrades that slashes started to become a problem. But as i said i'm not sure. On Wed, May 29, 2013 at 6:50 PM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote: Hi, Can I assume i am safe without this patch if i don't use any rbd cache? 发自我的 iPhone 在 2013-5-29,16:00,Alex Bligh a...@alex.org.uk 写道: On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote: for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5, it didn't work nicely with libvirt) which includes important fixes to RBD for ubuntu 12.04 AMD64. If you want to save some time, I can share the packages with you. drop me a line if you're interested. The issue Wolfgang is referring to is here: http://tracker.ceph.com/issues/3737 And the actual patch to QEMU is here: http://patchwork.ozlabs.org/patch/232489/ I'd be interested in whether the raring version (1.4.0+dfsg-1expubuntu4) contains this (unchecked as yet). -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw with nginx
Hi all, We are trying to run radosgw with nginx. We've found an example https://gist.github.com/guilhem/4964818 And changed our nginx.conf like below: http { server { listen 0.0.0.0:80 http://0.0.0.0/; server_name _; access_log off; location / { fastcgi_pass_header Authorization; fastcgi_pass_request_headers on; include fastcgi_params; fastcgi_keep_conn on; fastcgi_pass unix:/tmp/radosgw.sock; } } } But the simplest test gives following error: # curl -v http://x.x.x.x/bucket/test.jpg * About to connect() to x.x.x.x port 80 (#0) * Trying x.x.x.x ... connected GET /bucket/test.jpg HTTP/1.1 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: x.x.x.x Accept: */* HTTP/1.1 400 Server: nginx/1.1.19 Date: Thu, 23 May 2013 15:34:05 GMT Content-Type: application/json Content-Length: 26 Connection: keep-alive Accept-Ranges: bytes * Connection #0 to host x.x.x.x left intact * Closing connection #0 {Code:InvalidArgument} radosgw logs show these: 2013-05-23 08:34:31.074037 7f0739c33780 20 enqueued request req=0x1e78870 2013-05-23 08:34:31.074044 7f0739c33780 20 RGWWQ: 2013-05-23 08:34:31.074045 7f0739c33780 20 req: 0x1e78870 2013-05-23 08:34:31.074047 7f0739c33780 10 allocated request req=0x1ec6490 2013-05-23 08:34:31.074084 7f0720ce8700 20 dequeued request req=0x1e78870 2013-05-23 08:34:31.074093 7f0720ce8700 20 RGWWQ: empty 2013-05-23 08:34:31.074098 7f0720ce8700 1 == starting new request req=0x1e78870 = 2013-05-23 08:34:31.074140 7f0720ce8700 2 req 4:0.42initializing 2013-05-23 08:34:31.074174 7f0720ce8700 5 nothing to log for operation 2013-05-23 08:34:31.074178 7f0720ce8700 2 req 4:0.80::GET /bucket/test.jpg::http status=400 2013-05-23 08:34:31.074192 7f0720ce8700 1 == req done req=0x1e78870 http_status=400 == Normally we expect a well formed 403 (because request doesn't have Authorization header) but we have a 400 and cannot figure out why. Thanks in advance. -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fixing a rgw bucket index
We ended up directly importing our original files to another bucket. Now we're cleaning the files in the broken bucket. Thanks for all the help. On Mon, Apr 8, 2013 at 10:27 PM, Erdem Agaoglu erdem.agao...@gmail.comwrote: There seems to be an open issue at s3cmd https://github.com/s3tools/s3cmd/issues/37. I'll try with other tools On Mon, Apr 8, 2013 at 9:26 PM, Yehuda Sadeh yeh...@inktank.com wrote: This one fails because copy object into itself would only work if replacing it's attrs (X_AMZ_METADATA_DIRECTIVE=REPLACE). On Mon, Apr 8, 2013 at 10:35 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: This is the log grepped with the relevant threadid. It shows 400 in the last lines but nothing seems odd besides that. http://pastebin.com/xWCYmnXV Thanks for your interest. On Mon, Apr 8, 2013 at 8:21 PM, Yehuda Sadeh yeh...@inktank.com wrote: Each bucket has a unique prefix which you can get by doing radosgw-admin bucket stats on that bucket. You can grep that prefix in 'rados ls -p .rgw.buckets'. Do you have any rgw log showing why you get the Invalid Request response? Can you also add 'debug ms = 1' for the log? Thanks On Mon, Apr 8, 2013 at 10:12 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Just tried that file: $ s3cmd mv s3://imgiz/data/avatars/492/492923.jpg s3://imgiz/data/avatars/492/492923.jpg ERROR: S3 error: 400 (InvalidRequest) a more verbose output shows that the sign-headers was 'PUT\n\n\n\nx-amz-copy-source:/imgiz/data/avatars/492/492923.jpg\nx-amz-date:Mon, 08 Apr 2013 16:59:30 +\nx-amz-metadata-directive:COPY\n/imgiz/data/avatars/492/492923.jpg' But i guess it doesn't work even if the index is correct. I get the same response on a clear bucket too. We might try that but we don't have a file list. I guess its possible with 'rados ls | grep | sed' ? On Mon, Apr 8, 2013 at 7:53 PM, Yehuda Sadeh yeh...@inktank.com wrote: Can you try copying one of these objects to itself? Would that work and/or change the index entry? Another option would be to try copying all the objects to a different bucket. On Mon, Apr 8, 2013 at 9:48 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: omap header and all other omap attributes was destroyed. I copied another index over the destroyed one to get a somewhat valid header and it seems intact. After a 'check --fix': # rados -p .rgw.buckets getomapheader .dir.4470.1 header (49 bytes) : : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 : ..+. 0010 : 00 7d 7a 3f 6e 01 00 00 00 00 d0 00 7e 01 00 00 : .}z?n...~... 0020 : 00 bb f5 01 00 00 00 00 00 00 00 00 00 00 00 00 : 0030 : 00 : . Rados shows objects are there: # rados ls -p .rgw.buckets |grep 4470.1_data/avatars 4470.1_data/avatars/11047/11047823_20101211154308.jpg 4470.1_data/avatars/106/106976-orig 4470.1_data/avatars/492/492923.jpg 4470.1_data/avatars/275/275479.jpg ... And i am able to GET them $ s3cmd get s3://imgiz/data/avatars/492/492923.jpg s3://imgiz/data/avatars/492/492923.jpg - ./492923.jpg [1 of 1] 3587 of 3587 100% in0s93.40 kB/s done But unable to list them $ s3cmd ls s3://imgiz/data/avatars NOTHING My initial expectation was that 'bucket check --fix --check-objects' will actually read the files like 'rados ls' does and would recreate the missing omapkeys but it doesn't seem to do that. Now a simple check says # radosgw-admin bucket check -b imgiz { existing_header: { usage: { rgw.main: { size_kb: 6000607, size_kb_actual: 6258740, num_objects: 128443}}}, calculated_header: { usage: { rgw.main: { size_kb: 6000607, size_kb_actual: 6258740, num_objects: 128443 But i know we have more than 128k objects. On Mon, Apr 8, 2013 at 7:17 PM, Yehuda Sadeh yeh...@inktank.com wrote: We'll need to have more info about the current state. Was just the omap header destroyed, or does it still exist? What does the header contain now? Are you able to actually access objects in that bucket, but just fail to list them? On Mon, Apr 8, 2013 at 8:34 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi again, I managed to change the file with some other bucket's index. --check-objects --fix worked but my hopes have failed as it didn't actually read through the files or fixed anything. Any suggestions? On Thu, Apr 4, 2013 at 5:56 PM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi all, After a major failure, and getting our cluster health back OK (with some help from inktank folks, thanks), we found out that we have managed to corrupt one of our bucket indices. As far as i can track it, we are missing the omapheader on that specific index, so
Re: [ceph-users] Adding OSD sometimes suspends cluster
Thanks Sam, I'll provide details if it keeps happening On Thu, Apr 4, 2013 at 4:01 PM, Sam Lang sl...@inktank.com wrote: Hi Erdem, This is likely a bug. We've created a ticket to keep track: http://tracker.ceph.com/issues/4645. -slang [inktank dev | http://www.inktank.com | http://www.ceph.com] On Mon, Apr 1, 2013 at 3:18 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: In addition, i was able to extract some logs from the last time active/peering problem happened. http://pastebin.com/BakFREFP It ends with me restarting it. On Mon, Apr 1, 2013 at 10:23 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote: Hi all, We are currently in process of enlarging our bobtail cluster size by adding OSDs. We have 12 disks per machine and we are creating one OSD per disk, adding them one by one as recommended. Only thing we don't do is starting with a small weight and increasing it slowly. Weights are all 1. In this scenario both rbd and radosgw are unable to respond only in the first two minutes of adding a new OSD. After that small hiccup, we have some pgs like active+remapped+wait_backfill, active+remapped+backfilling, active+recovery_wait+remapped, active+degraded+remapped+backfilling and everything works OK. After a few hours of backfilling and recovery all pgs come active+clean and we add another OSD. But sometimes, that small hiccup takes longer than a few minutes. In that times status shows some pgs are stuck in active and some are stuck in peering. When we look at the pg dump we see all those active or peering pgs are on the same 2 OSDs and are unable to move forward. At this stage rbd works poorly and radosgw is completely stalled. Only after restarting one of those 2 OSDs, pg's start to backfill and clients continue with their operations. Since this is a live cluster we don't want to wait too long and usually go restart the OSD in a hurry. That's why i cannot currently provide status or pg query outputs. We have some logs but i don't know what to look for or if they are verbose enough. Can this be any kind of a known issue? If not, where should i look to get any ideas about what's happening when it occurs? Thanks in advance -- erdem agaoglu -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com