Re: [ceph-users] Some long running ops may lock osd

2015-03-03 Thread Erdem Agaoglu
Looking further, i guess what i tried to tell was a simplified version of
sharded threadpools, released in giant. Is it possible for that to be
backported to firefly?

On Tue, Mar 3, 2015 at 9:33 AM, Erdem Agaoglu erdem.agao...@gmail.com
wrote:

 Thank you folks for bringing that up. I had some questions about sharding.
 We'd like blind buckets too, at least it's on the roadmap. For the current
 sharded implementation, what are the final details? Is number of shards
 defined per bucket or globally? Is there a way to split current indexes
 into shards?

 On the other hand what i'd like to point here is not necessarily
 large-bucket-index specific. The problem is the mechanism around thread
 pools. Any request may require locks on a pg and this should not block the
 requests for other pgs. I'm no expert but the threads may be able to
 requeue the requests to a locked pg, processing others for other pgs. Or
 maybe a thread per pg design was possible. Because, you know, it is
 somewhat OK not being able to do anything for a locked resource. Then you
 can go and improve your processing or your locks. But it's a whole
 different problem when a locked pg blocks requests for a few hundred other
 pgs in other pools for no good reason.

 On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines bhi...@gmail.com wrote:

 Blind-bucket would be perfect for us, as we don't need to list the
 objects.

 We only need to list the bucket when doing a bucket deletion. If we
 could clean out/delete all objects in a bucket (without
 iterating/listing them) that would be ideal..

 On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote:
  We have had good experience so far keeping each bucket less than 0.5
 million objects, by client side sharding. But I think it would be nice you
 can test at your scale, with your hardware configuration, as well as your
 expectation over the tail latency.
 
  Generally the bucket sharding should help, both for Write throughput
 and *stall with recovering/scrubbing*, but it comes with a prices -  The X
 shards you have for each bucket, the listing/trimming would be X times
 weighted, from OSD's load's point of view. There was discussion to
 implement: 1) blind bucket (for use cases bucket listing is not needed). 2)
 Un-ordered listing, which could improve the problem I mentioned above. They
 are on the roadmap...
 
  Thanks,
  Guang
 
 
  
  From: bhi...@gmail.com
  Date: Mon, 2 Mar 2015 18:13:25 -0800
  To: erdem.agao...@gmail.com
  CC: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Some long running ops may lock osd
 
  We're seeing a lot of this as well. (as i mentioned to sage at
  SCALE..) Is there a rule of thumb at all for how big is safe to let a
  RGW bucket get?
 
  Also, is this theoretically resolved by the new bucket-sharding
  feature in the latest dev release?
 
  -Ben
 
  On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com wrote:
  Hi Gregory,
 
  We are not using listomapkeys that way or in any way to be precise. I
 used
  it here just to reproduce the behavior/issue.
 
  What i am really interested in is if scrubbing-deep actually
 mitigates the
  problem and/or is there something that can be further improved.
 
  Or i guess we should go upgrade now and hope for the best :)
 
  On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com
 wrote:
 
  On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com
  wrote:
  Hi all, especially devs,
 
  We have recently pinpointed one of the causes of slow requests in
 our
  cluster. It seems deep-scrubs on pg's that contain the index file
 for a
  large radosgw bucket lock the osds. Incresing op threads and/or disk
  threads
  helps a little bit, but we need to increase them beyond reason in
 order
  to
  completely get rid of the problem. A somewhat similar (and more
 severe)
  version of the issue occurs when we call listomapkeys for the index
  file,
  and since the logs for deep-scrubbing was much harder read, this
  inspection
  was based on listomapkeys.
 
  In this example osd.121 is the primary of pg 10.c91 which contains
 file
  .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket
 contains
  ~500k objects. Standard listomapkeys call take about 3 seconds.
 
  time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null
  real 0m2.983s
  user 0m0.760s
  sys 0m0.148s
 
  In order to lock the osd we request 2 of them simultaneously with
  something
  like:
 
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
  sleep 1
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
 
  'debug_osd=30' logs show the flow like:
 
  At t0 some thread enqueue_op's my omap-get-keys request.
  Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading
 ~500k
  keys.
  Op-Thread B responds to several other requests during that 1 second
  sleep.
  They're generally extremely fast subops on other pgs.
  At t1 (about a second later) my

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Thank you folks for bringing that up. I had some questions about sharding.
We'd like blind buckets too, at least it's on the roadmap. For the current
sharded implementation, what are the final details? Is number of shards
defined per bucket or globally? Is there a way to split current indexes
into shards?

On the other hand what i'd like to point here is not necessarily
large-bucket-index specific. The problem is the mechanism around thread
pools. Any request may require locks on a pg and this should not block the
requests for other pgs. I'm no expert but the threads may be able to
requeue the requests to a locked pg, processing others for other pgs. Or
maybe a thread per pg design was possible. Because, you know, it is
somewhat OK not being able to do anything for a locked resource. Then you
can go and improve your processing or your locks. But it's a whole
different problem when a locked pg blocks requests for a few hundred other
pgs in other pools for no good reason.

On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines bhi...@gmail.com wrote:

 Blind-bucket would be perfect for us, as we don't need to list the objects.

 We only need to list the bucket when doing a bucket deletion. If we
 could clean out/delete all objects in a bucket (without
 iterating/listing them) that would be ideal..

 On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote:
  We have had good experience so far keeping each bucket less than 0.5
 million objects, by client side sharding. But I think it would be nice you
 can test at your scale, with your hardware configuration, as well as your
 expectation over the tail latency.
 
  Generally the bucket sharding should help, both for Write throughput and
 *stall with recovering/scrubbing*, but it comes with a prices -  The X
 shards you have for each bucket, the listing/trimming would be X times
 weighted, from OSD's load's point of view. There was discussion to
 implement: 1) blind bucket (for use cases bucket listing is not needed). 2)
 Un-ordered listing, which could improve the problem I mentioned above. They
 are on the roadmap...
 
  Thanks,
  Guang
 
 
  
  From: bhi...@gmail.com
  Date: Mon, 2 Mar 2015 18:13:25 -0800
  To: erdem.agao...@gmail.com
  CC: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Some long running ops may lock osd
 
  We're seeing a lot of this as well. (as i mentioned to sage at
  SCALE..) Is there a rule of thumb at all for how big is safe to let a
  RGW bucket get?
 
  Also, is this theoretically resolved by the new bucket-sharding
  feature in the latest dev release?
 
  -Ben
 
  On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
  Hi Gregory,
 
  We are not using listomapkeys that way or in any way to be precise. I
 used
  it here just to reproduce the behavior/issue.
 
  What i am really interested in is if scrubbing-deep actually mitigates
 the
  problem and/or is there something that can be further improved.
 
  Or i guess we should go upgrade now and hope for the best :)
 
  On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com
 wrote:
 
  On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com
  wrote:
  Hi all, especially devs,
 
  We have recently pinpointed one of the causes of slow requests in our
  cluster. It seems deep-scrubs on pg's that contain the index file
 for a
  large radosgw bucket lock the osds. Incresing op threads and/or disk
  threads
  helps a little bit, but we need to increase them beyond reason in
 order
  to
  completely get rid of the problem. A somewhat similar (and more
 severe)
  version of the issue occurs when we call listomapkeys for the index
  file,
  and since the logs for deep-scrubbing was much harder read, this
  inspection
  was based on listomapkeys.
 
  In this example osd.121 is the primary of pg 10.c91 which contains
 file
  .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket
 contains
  ~500k objects. Standard listomapkeys call take about 3 seconds.
 
  time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null
  real 0m2.983s
  user 0m0.760s
  sys 0m0.148s
 
  In order to lock the osd we request 2 of them simultaneously with
  something
  like:
 
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
  sleep 1
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
 
  'debug_osd=30' logs show the flow like:
 
  At t0 some thread enqueue_op's my omap-get-keys request.
  Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading
 ~500k
  keys.
  Op-Thread B responds to several other requests during that 1 second
  sleep.
  They're generally extremely fast subops on other pgs.
  At t1 (about a second later) my second omap-get-keys request gets
  enqueue_op'ed. But it does not start probably because of the lock
 held
  by
  Thread A.
  After that point other threads enqueue_op other requests on other pgs
  too
  but none of them starts processing, in which i consider

[ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Hi all, especially devs,

We have recently pinpointed one of the causes of slow requests in our
cluster. It seems deep-scrubs on pg's that contain the index file for a
large radosgw bucket lock the osds. Incresing op threads and/or disk
threads helps a little bit, but we need to increase them beyond reason in
order to completely get rid of the problem. A somewhat similar (and more
severe) version of the issue occurs when we call listomapkeys for the index
file, and since the logs for deep-scrubbing was much harder read, this
inspection was based on listomapkeys.

In this example osd.121 is the primary of pg 10.c91 which contains file
.dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
~500k objects. Standard listomapkeys call take about 3 seconds.

time rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null
real 0m2.983s
user 0m0.760s
sys 0m0.148s

In order to lock the osd we request 2 of them simultaneously with something
like:

rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 
sleep 1
rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 

'debug_osd=30' logs show the flow like:

At t0 some thread enqueue_op's my omap-get-keys request.
Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
keys.
Op-Thread B responds to several other requests during that 1 second sleep.
They're generally extremely fast subops on other pgs.
At t1 (about a second later) my second omap-get-keys request gets
enqueue_op'ed. But it does not start probably because of the lock held by
Thread A.
After that point other threads enqueue_op other requests on other pgs too
but none of them starts processing, in which i consider the osd is locked.
At t2 (about another second later) my first omap-get-keys request is
finished.
Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts
reading ~500k keys again.
Op-Thread A continues to process the requests enqueued in t1-t2.

It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
can process other requests for other pg's just fine.

My guess is a somewhat larger scenario happens in deep-scrubbing, like on
the pg containing index for the bucket of 20M objects. A disk/op thread
starts reading through the omap which will take say 60 seconds. During the
first seconds, other requests for other pgs pass just fine. But in 60
seconds there are bound to be other requests for the same pg, especially
since it holds the index file. Each of these requests lock another disk/op
thread to the point where there are no free threads left to process any
requests for any pg. Causing slow-requests.

So first of all thanks if you can make it here, and sorry for the involved
mail, i'm exploring the problem as i go.
Now, is that deep-scrubbing situation i tried to theorize even possible? If
not can you point us where to look further.
We are currently running 0.72.2 and know about newer ioprio settings in
Firefly and such. While we are planning to upgrade in a few weeks but i
don't think those options will help us in any way. Am i correct?
Are there any other improvements that we are not aware?

Regards,


-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Hi Gregory,

We are not using listomapkeys that way or in any way to be precise. I used
it here just to reproduce the behavior/issue.

What i am really interested in is if scrubbing-deep actually mitigates the
problem and/or is there something that can be further improved.

Or i guess we should go upgrade now and hope for the best :)

On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote:

 On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
  Hi all, especially devs,
 
  We have recently pinpointed one of the causes of slow requests in our
  cluster. It seems deep-scrubs on pg's that contain the index file for a
  large radosgw bucket lock the osds. Incresing op threads and/or disk
 threads
  helps a little bit, but we need to increase them beyond reason in order
 to
  completely get rid of the problem. A somewhat similar (and more severe)
  version of the issue occurs when we call listomapkeys for the index file,
  and since the logs for deep-scrubbing was much harder read, this
 inspection
  was based on listomapkeys.
 
  In this example osd.121 is the primary of pg 10.c91 which contains file
  .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
  ~500k objects. Standard listomapkeys call take about 3 seconds.
 
  time rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null
  real 0m2.983s
  user 0m0.760s
  sys 0m0.148s
 
  In order to lock the osd we request 2 of them simultaneously with
 something
  like:
 
  rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 
  sleep 1
  rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 
 
  'debug_osd=30' logs show the flow like:
 
  At t0 some thread enqueue_op's my omap-get-keys request.
  Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
  keys.
  Op-Thread B responds to several other requests during that 1 second
 sleep.
  They're generally extremely fast subops on other pgs.
  At t1 (about a second later) my second omap-get-keys request gets
  enqueue_op'ed. But it does not start probably because of the lock held by
  Thread A.
  After that point other threads enqueue_op other requests on other pgs too
  but none of them starts processing, in which i consider the osd is
 locked.
  At t2 (about another second later) my first omap-get-keys request is
  finished.
  Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts
  reading ~500k keys again.
  Op-Thread A continues to process the requests enqueued in t1-t2.
 
  It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
 can
  process other requests for other pg's just fine.
 
  My guess is a somewhat larger scenario happens in deep-scrubbing, like on
  the pg containing index for the bucket of 20M objects. A disk/op thread
  starts reading through the omap which will take say 60 seconds. During
 the
  first seconds, other requests for other pgs pass just fine. But in 60
  seconds there are bound to be other requests for the same pg, especially
  since it holds the index file. Each of these requests lock another
 disk/op
  thread to the point where there are no free threads left to process any
  requests for any pg. Causing slow-requests.
 
  So first of all thanks if you can make it here, and sorry for the
 involved
  mail, i'm exploring the problem as i go.
  Now, is that deep-scrubbing situation i tried to theorize even possible?
 If
  not can you point us where to look further.
  We are currently running 0.72.2 and know about newer ioprio settings in
  Firefly and such. While we are planning to upgrade in a few weeks but i
  don't think those options will help us in any way. Am i correct?
  Are there any other improvements that we are not aware?

 This is all basically correct; it's one of the reasons you don't want
 to let individual buckets get too large.

 That said, I'm a little confused about why you're running listomapkeys
 that way. RGW throttles itself by getting only a certain number of
 entries at a time (1000?) and any system you're also building should
 do the same. That would reduce the frequency of any issues, and I
 *think* that scrubbing has some mitigating factors to help (although
 maybe not; it's been a while since I looked at any of that stuff).

 Although I just realized that my vague memory of deep scrubbing
 working better might be based on improvements that only got in for
 firefly...not sure.
 -Greg




-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages

2013-05-29 Thread Erdem Agaoglu
We are running ubuntu 12.04 and Folsom. Compiling qemu 1.5 only caused
random complaints about 'qemu query-commands not found' or sth like that on
libvirt end. Upgrading libvirt to 1.0.5 fixed it. But that had some
problems with attaching rbd disks:

could not open disk
image rbd:vols/volume-foo:id=volumes:key=bar:auth_supported=cephx\\;none:
Operation not supported

I don't know if that's something with our setup but only thing we could do
to fix that is to patch libvirt where it appends
':auth_supported=cephx\\;none' and remove those slashes. I guess somewhere
around rbd/libvirt upgrades that slashes started to become a problem. But
as i said i'm not sure.


On Wed, May 29, 2013 at 6:50 PM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:

 Hi,
  Can I assume i am safe without this patch if i don't use any rbd
 cache?

 发自我的 iPhone

 在 2013-5-29,16:00,Alex Bligh a...@alex.org.uk 写道:

 
  On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote:
 
  for anybody who's interested, I've packaged the latest qemu-1.4.2 (not
 1.5, it didn't work nicely with libvirt) which includes important fixes to
 RBD for ubuntu 12.04 AMD64. If you want to save some time, I can share the
 packages with you. drop me a line if you're interested.
 
 
  The issue Wolfgang is referring to is here:
  http://tracker.ceph.com/issues/3737
 
  And the actual patch to QEMU is here:
  http://patchwork.ozlabs.org/patch/232489/
 
  I'd be interested in whether the raring version (1.4.0+dfsg-1expubuntu4)
 contains this (unchecked as yet).
 
  --
  Alex Bligh
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw with nginx

2013-05-23 Thread Erdem Agaoglu
Hi all,

We are trying to run radosgw with nginx.
We've found an example https://gist.github.com/guilhem/4964818
And changed our nginx.conf like below:

http {
server {
listen 0.0.0.0:80 http://0.0.0.0/;
server_name _;
access_log  off;
location / {
fastcgi_pass_header Authorization;
fastcgi_pass_request_headers on;
include fastcgi_params;
fastcgi_keep_conn on;
fastcgi_pass unix:/tmp/radosgw.sock;
}
}
}

But the simplest test gives following error:

# curl -v http://x.x.x.x/bucket/test.jpg
* About to connect() to x.x.x.x port 80 (#0)
*   Trying x.x.x.x ... connected
 GET /bucket/test.jpg HTTP/1.1
 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
 Host: x.x.x.x
 Accept: */*

 HTTP/1.1 400
 Server: nginx/1.1.19
 Date: Thu, 23 May 2013 15:34:05 GMT
 Content-Type: application/json
 Content-Length: 26
 Connection: keep-alive
 Accept-Ranges: bytes

* Connection #0 to host x.x.x.x left intact
* Closing connection #0
{Code:InvalidArgument}

radosgw logs show these:

2013-05-23 08:34:31.074037 7f0739c33780 20 enqueued request req=0x1e78870
2013-05-23 08:34:31.074044 7f0739c33780 20 RGWWQ:
2013-05-23 08:34:31.074045 7f0739c33780 20 req: 0x1e78870
2013-05-23 08:34:31.074047 7f0739c33780 10 allocated request req=0x1ec6490
2013-05-23 08:34:31.074084 7f0720ce8700 20 dequeued request req=0x1e78870
2013-05-23 08:34:31.074093 7f0720ce8700 20 RGWWQ: empty
2013-05-23 08:34:31.074098 7f0720ce8700  1 == starting new request
req=0x1e78870 =
2013-05-23 08:34:31.074140 7f0720ce8700  2 req 4:0.42initializing
2013-05-23 08:34:31.074174 7f0720ce8700  5 nothing to log for operation
2013-05-23 08:34:31.074178 7f0720ce8700  2 req 4:0.80::GET
/bucket/test.jpg::http status=400
2013-05-23 08:34:31.074192 7f0720ce8700  1 == req done req=0x1e78870
http_status=400 ==


Normally we expect a well formed 403 (because request doesn't have
Authorization header) but we have a 400 and cannot figure out why.

Thanks in advance.

-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixing a rgw bucket index

2013-04-09 Thread Erdem Agaoglu
We ended up directly importing our original files to another bucket. Now
we're cleaning the files in the broken bucket.

Thanks for all the help.


On Mon, Apr 8, 2013 at 10:27 PM, Erdem Agaoglu erdem.agao...@gmail.comwrote:

 There seems to be an open issue at s3cmd
 https://github.com/s3tools/s3cmd/issues/37. I'll try with other tools


 On Mon, Apr 8, 2013 at 9:26 PM, Yehuda Sadeh yeh...@inktank.com wrote:

 This one fails because copy object into itself would only work if
 replacing it's attrs (X_AMZ_METADATA_DIRECTIVE=REPLACE).

 On Mon, Apr 8, 2013 at 10:35 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
  This is the log grepped with the relevant threadid. It shows 400 in the
 last
  lines but nothing seems odd besides that.
  http://pastebin.com/xWCYmnXV
 
  Thanks for your interest.
 
 
  On Mon, Apr 8, 2013 at 8:21 PM, Yehuda Sadeh yeh...@inktank.com
 wrote:
 
  Each bucket has a unique prefix which you can get by doing
 radosgw-admin
  bucket stats on that bucket. You can grep that prefix in 'rados ls -p
  .rgw.buckets'.
 
  Do you have any rgw log showing why you get the Invalid Request
 response?
  Can you also add 'debug ms = 1' for the log?
 
  Thanks
 
 
  On Mon, Apr 8, 2013 at 10:12 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com
  wrote:
 
  Just tried that file:
 
  $ s3cmd mv s3://imgiz/data/avatars/492/492923.jpg
  s3://imgiz/data/avatars/492/492923.jpg
  ERROR: S3 error: 400 (InvalidRequest)
 
  a more verbose output shows that the sign-headers was
 
 
 'PUT\n\n\n\nx-amz-copy-source:/imgiz/data/avatars/492/492923.jpg\nx-amz-date:Mon,
  08 Apr 2013 16:59:30
 
 +\nx-amz-metadata-directive:COPY\n/imgiz/data/avatars/492/492923.jpg'
 
  But i guess it doesn't work even if the index is correct. I get the
 same
  response on a clear bucket too.
 
  We might try that but we don't have a file list. I guess its possible
  with 'rados ls | grep | sed' ?
 
 
  On Mon, Apr 8, 2013 at 7:53 PM, Yehuda Sadeh yeh...@inktank.com
 wrote:
 
  Can you try copying one of these objects to itself? Would that work
  and/or change the index entry? Another option would be to try
 copying all
  the objects to a different bucket.
 
 
  On Mon, Apr 8, 2013 at 9:48 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com
  wrote:
 
  omap header and all other omap attributes was destroyed. I copied
  another index over the destroyed one to get a somewhat valid header
 and it
  seems intact. After a 'check --fix':
 
   # rados -p .rgw.buckets getomapheader .dir.4470.1
  header (49 bytes) :
   : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 :
  ..+.
  0010 : 00 7d 7a 3f 6e 01 00 00 00 00 d0 00 7e 01 00 00 :
  .}z?n...~...
  0020 : 00 bb f5 01 00 00 00 00 00 00 00 00 00 00 00 00 :
  
  0030 : 00  : .
 
 
  Rados shows objects are there:
 
  # rados ls -p .rgw.buckets |grep 4470.1_data/avatars
  4470.1_data/avatars/11047/11047823_20101211154308.jpg
  4470.1_data/avatars/106/106976-orig
  4470.1_data/avatars/492/492923.jpg
  4470.1_data/avatars/275/275479.jpg
  ...
 
 
  And i am able to GET them
 
  $ s3cmd get s3://imgiz/data/avatars/492/492923.jpg
  s3://imgiz/data/avatars/492/492923.jpg - ./492923.jpg  [1 of 1]
   3587 of 3587   100% in0s93.40 kB/s  done
 
 
  But unable to list them
 
  $ s3cmd ls s3://imgiz/data/avatars
  NOTHING
 
 
  My initial expectation was that 'bucket check --fix --check-objects'
  will actually read the files like 'rados ls' does and would
 recreate the
  missing omapkeys but it doesn't seem to do that. Now a simple check
 says
 
  # radosgw-admin bucket check -b imgiz
  { existing_header: { usage: { rgw.main: { size_kb: 6000607,
size_kb_actual: 6258740,
num_objects: 128443}}},
calculated_header: { usage: { rgw.main: { size_kb:
 6000607,
size_kb_actual: 6258740,
num_objects: 128443
 
  But i know we have more than 128k objects.
 
 
 
  On Mon, Apr 8, 2013 at 7:17 PM, Yehuda Sadeh yeh...@inktank.com
  wrote:
 
  We'll need to have more info about the current state. Was just the
  omap header destroyed, or does it still exist? What does the header
  contain now? Are you able to actually access objects in that
 bucket,
  but just fail to list them?
 
  On Mon, Apr 8, 2013 at 8:34 AM, Erdem Agaoglu
  erdem.agao...@gmail.com wrote:
   Hi again,
  
   I managed to change the file with some other bucket's index.
   --check-objects
   --fix worked but my hopes have failed as it didn't actually read
   through the
   files or fixed anything. Any suggestions?
  
  
   On Thu, Apr 4, 2013 at 5:56 PM, Erdem Agaoglu
   erdem.agao...@gmail.com
   wrote:
  
   Hi all,
  
   After a major failure, and getting our cluster health back OK
 (with
   some
   help from inktank folks, thanks), we found out that we have
 managed
   to
   corrupt one of our bucket indices. As far as i can track it, we
 are
   missing
   the omapheader on that specific index, so

Re: [ceph-users] Adding OSD sometimes suspends cluster

2013-04-04 Thread Erdem Agaoglu
Thanks Sam,

I'll provide details if it keeps happening


On Thu, Apr 4, 2013 at 4:01 PM, Sam Lang sl...@inktank.com wrote:

 Hi Erdem,

 This is likely a bug.  We've created a ticket to keep track:
 http://tracker.ceph.com/issues/4645.

 -slang [inktank dev | http://www.inktank.com | http://www.ceph.com]

 On Mon, Apr 1, 2013 at 3:18 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
  In addition, i was able to extract some logs from the last time
  active/peering problem happened.
  http://pastebin.com/BakFREFP
  It ends with me restarting it.
 
 
  On Mon, Apr 1, 2013 at 10:23 AM, Erdem Agaoglu erdem.agao...@gmail.com
  wrote:
 
  Hi all,
 
  We are currently in process of enlarging our bobtail cluster size by
  adding OSDs. We have 12 disks per machine and we are creating one OSD
 per
  disk, adding them one by one as recommended. Only thing we don't do is
  starting with a small weight and increasing it slowly. Weights are all
 1.
 
  In this scenario both rbd and radosgw are unable to respond only in the
  first two minutes of adding a new OSD. After that small hiccup, we have
 some
  pgs like active+remapped+wait_backfill, active+remapped+backfilling,
  active+recovery_wait+remapped, active+degraded+remapped+backfilling and
  everything works OK. After a few hours of backfilling and recovery all
 pgs
  come active+clean and we add another OSD.
 
  But sometimes, that small hiccup takes longer than a few minutes. In
 that
  times status shows some pgs are stuck in active and some are stuck in
  peering. When we look at the pg dump we see all those active or peering
 pgs
  are on the same 2 OSDs and are unable to move forward. At this stage rbd
  works poorly and radosgw is completely stalled. Only after restarting
 one of
  those 2 OSDs, pg's start to backfill and clients continue with their
  operations.
 
  Since this is a live cluster we don't want to wait too long and usually
 go
  restart the OSD in a hurry. That's why i cannot currently provide
 status or
  pg query outputs. We have some logs but i don't know what to look for
 or if
  they are verbose enough.
 
  Can this be any kind of a known issue? If not, where should i look to
 get
  any ideas about what's happening when it occurs?
 
  Thanks in advance
 
  --
  erdem agaoglu
 
 
 
 
  --
  erdem agaoglu
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 




-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com