Re: RadosGW crashing on copy for one specific object

2014-08-25 Thread Sylvain Munaut
Hi Yehuda, diff --git a/src/rgw/rgw_rados.h b/src/rgw/rgw_rados.h index ed8f02d..0042df2 100644 --- a/src/rgw/rgw_rados.h +++ b/src/rgw/rgw_rados.h @@ -306,6 +306,11 @@ public: bool has_tail() { if (explicit_objs) { + if (objs.size() == 1) { +mapuint64_t,

Re: RadosGW crashing on copy for one specific object

2014-08-21 Thread Sylvain Munaut
Hi, If by any chance you applied the previous patch, revert it, as it's wrong. This might fix the issue: diff --git a/src/rgw/rgw_rados.h b/src/rgw/rgw_rados.h index d50fb59..0f13590 100644 --- a/src/rgw/rgw_rados.h +++ b/src/rgw/rgw_rados.h @@ -298,6 +298,9 @@ public: bool

Re: RadosGW crashing on copy for one specific object

2014-08-20 Thread Sylvain Munaut
Hi, What does 'radosgw-admin object stat --bucket=bucket --object=object' show? { name: 5ae1b8cb8a2bdc3c2d7e1868b60d76abea2536f4604d6d312df95b719470fb3b\/render-image, size: 239879, policy: { acl: { acl_user_map: [ { user: kp, acl: 15}],

RadosGW crashing on copy for one specific object

2014-08-19 Thread Sylvain Munaut
Hi, Today I have an issue when trying to issue a COPY for one object I have in RGW. It only happens for this object (at least that I noticed and I did 1000's of COPYs in this batch) and I can do a GET of this object just fine. The stack trace : ceph version 0.80.5-173-g7429f00

Re: RadosGW storage format

2014-08-08 Thread Sylvain Munaut
Hi, I don't think there's one document that describes everything. Definitely not one that is up to date. It would really be great to have something like that. Some of the stuff was described in messages to the mailing list when it was conceived, but things have since might have gone major

RadosGW storage format

2014-08-07 Thread Sylvain Munaut
Hi, Is there a document somewhere describing the mapping from S3 to RADOS ? (things like how files are cut, what manifest are, what rados features are used ) Reading the source code, it is not always obvious how things are organized internally and you're never sure if you're understanding

Re: [ceph-users] v0.80.4 Firefly released

2014-07-16 Thread Sylvain Munaut
On Wed, Jul 16, 2014 at 10:50 AM, James Harper ja...@ejbdigital.com.au wrote: Can you offer some comments on what the impact is likely to be to the data in an affected cluster? Should all data now be treated with suspicion and restored back to before the firefly upgrade? Yes, I'd definitely

Re: v0.80.2?

2014-07-14 Thread Sylvain Munaut
Hi, Ideally the thing to do here is run s3-tests on your end and confirm that the tests are failing with the patch and figure out why. Or, if it passes for you, we can figure out what is different between your environment and QA. And then, ideally, we can extend s3-tests to reproduce the

Re: v0.80.2?

2014-07-14 Thread Sylvain Munaut
Here's the culprit IMHO: ea68b9372319fd0bab40856db26528d36359102e rgw: don't allow multiple writers to same multiobject part Fixes: #8269 Backport: firefly, dumpling A client might need to retry a multipart part write. The original thread might race with the new one,

Re: v0.80.2?

2014-07-11 Thread Sylvain Munaut
Hi, We built v0.80.2 yesterday and pushed it out to the repos, but quickly discovered a regression in radosgw that preventing reading objects written with earlier versions. We pulled the packages, fixed the bug, and are rerunning tests to confirm the fix and ensure there aren't other

RGW: Multi part upload - Attributes not copied from the upload object to the final meta object

2014-06-04 Thread Sylvain Munaut
Hi, When playing around with firefly, I noticed that the content-type I'm setting when doing multi-part upload is not taken into account. And looking at the rados object, I can see the xattr present on the .meta file (upload obj), but when doing a complete, that object is removed and the

Re: RGW: Multi part upload - Attributes not copied from the upload object to the final meta object

2014-06-04 Thread Sylvain Munaut
This is a known issue (8452). We have a fix for that and we'll get it to firefly. Ok, great. Somehow my search didn't turn that up. Looking at the patch, shouldn't the meta_obj.set_in_extra_data(true); be moved as well ? Cheers, Sylvain -- To unsubscribe from this list: send the line

Re: [ceph-users] recreate bucket error

2014-01-22 Thread Sylvain Munaut
Hi, On Sat, Dec 7, 2013 at 6:34 PM, Yehuda Sadeh yeh...@inktank.com wrote: Sounds like disabling the cache triggers some bug. I'll open a relevant ticket. Any news on this ? I have the same issue, but the cache only masks the problem. If you restart radosgw, you'll get it again (once for

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-12-03 Thread Sylvain Munaut
Hi, What sort of memory are your instances using? I just had a look. Around 120 Mb. Which indeed is a bit higher that I'd like. I haven't turned on any caching so I assume it's disabled. Yes. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread Sylvain Munaut
Hi James, Are you still working on this in any way? Well I'm using it, but I haven't worked on it. I never was able to reproduce any issue with it locally ... In prod, I do run it with cache disabled though since I never took the time to check using the cache was safe in the various failure

Re: Feature request regarding size and min_size on pools

2013-09-10 Thread Sylvain Munaut
Hi, Now to our problem. We want to be sure that a write is replicated before we get a ack. That should be the case AFAIU. There was however a bug that was recently fixed that make RBD ack too early. one osd can lead to data loss with min_size set to 1. It definitely shouldn't. Unless of

Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-09-02 Thread Sylvain Munaut
Hi Yehuda, I just pushed a fix to wip-6161, can you verify that it fixes the issue for you? The fix wip-6161 branch seem to fix the problem for me. I was able to re-enable the cache and have radosgw start (and seemingly work properly). Cheers, Sylvain -- To unsubscribe from this list:

Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-30 Thread Sylvain Munaut
Hi, I just pushed a fix to wip-6161, can you verify that it fixes the issue for you? Thanks, I'll give it a shot on monday, I'm out of the office at the moment. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-29 Thread Sylvain Munaut
Hi, I just updated our test cluster to 0.67.2+ (latest dumpling branch) ( from 0.61.x ) and radosgw refuses to start. 2013-08-29 11:46:34.915552 7ffccbc3d780 0 ceph version 0.67.2-19-gc81bc5b (c81bc5b59dda37f54c02039ca3d5aada64076624), process lt-radosgw, pid 30404 2013-08-29 11:46:34.915598

Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-29 Thread Sylvain Munaut
:37.555982 7f46da40b780 -1 *** Caught signal (Floating point exception) ** in thread 7f46da40b780 Cheers, Sylvain On Thu, Aug 29, 2013 at 11:53 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, I just updated our test cluster to 0.67.2+ (latest dumpling branch) ( from 0.61.x

Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-29 Thread Sylvain Munaut
I tried going back to a src/rgw/ directory as it is in the 0.67.2 release, but didn't start either. (failed to init storage). Finally I disabled the cache for now and it seem to have started properly. On Thu, Aug 29, 2013 at 1:50 PM, Sylvain Munaut s.mun...@whatever-company.com wrote: I

Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-29 Thread Sylvain Munaut
, 2013 at 4:15 PM, Yehuda Sadeh yeh...@inktank.com wrote: What's commit c81bc5b59dda37f54c02039ca3d5aada64076624? I don't see it on the ceph github repository. On Thu, Aug 29, 2013 at 5:59 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: I tried going back to a src/rgw/ directory

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi, I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi Frederik, A traceback would be great if you can get a core file. And possibly compile tapdisk with debug symbols. I'm not quite sure what u mean, can u give some more information on how I do this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this is what u meant. Yes,

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing. Ok thanks. Are you using cache btw ? Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi, I hope not. How could I tell? It's not something I've explicitly enabled. It's disabled by default. So you'd have to have enabled it either in ceph.conf or directly in the device path in the xen config. (option is 'rbd cache', http://ceph.com/docs/next/rbd/rbd-config-ref/ ) Cheers,

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi, I have been testing this a while now, and just finished testing your untested patch. The rbd caching problem still persists. Yes, I wouldn't expect to change anything for caching. But I still don't understand why caching would change anything at all ... all of it should be handled within

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
On Wed, Aug 14, 2013 at 1:39 AM, James Harper james.har...@bendigoit.com.au wrote: I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before

Correct usage of rbd_aio_release

2013-08-12 Thread Sylvain Munaut
Hi, When should / can rbd_aio_release be called exactly ? For example if I create a rbd_aio_create_completion then do a rbd_aio_XXX that fails, should I call rbd_aio_release ? I would think yes, but when looking at the qemu rbd code, it doesn't and I'm not sure if it's by design. Cheers,

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread Sylvain Munaut
Hi, tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000] tapdisk:9180 blocked for more than 120 seconds. tapdisk D 88043fc13540 0 9180 1 0x You can try generating a core file by

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-09 Thread Sylvain Munaut
Hi, I've had a few occasions where tapdisk has segfaulted: tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000] tapdisk:9180 blocked for more than 120 seconds. tapdisk D 88043fc13540 0 9180 1

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi, Yes the procedure didn't change. If you're on debian I could also sent your prebuilt .deb for blktap and for a patched xen version that includes userspace RBD support. If you have any issue, I can be found on ceph's IRC under 'tnt' nick. Cheers, Sylvain -- To unsubscribe from this

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi, It's working great so far. I just pulled the source and built it then copied blktap in. Good to hear :) I've been using it more and more recently and it'll been good for me too, even with live migrations. For some reason I already had a tapdisk in /usr/sbin, as well as the one in

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
I think I saw an announcement recently on xen-devel that blktap3 development has been stopped.. Oh :( In the mail it speaks about QEMU but is it possible to use the QEMU driver model when booting PV domains ? (and not PVHVM). Cheers, Sylvain -- To unsubscribe from this list: send the

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi George, Yes; qemu knows how to be a Xen PV block back-end. Very interesting. Is there documentation about this somewhere ? I had a look some time ago and it was really not very clear. Things like what Xen version support this. And with which features ( indirect descriptors, persistent

Re: Set object mtime

2013-07-12 Thread Sylvain Munaut
Hi, Why do you want to copy the data to change the PG count? PG splitting is supported now, so unless you need to do merging for some reason you should be good to go. :) I need to do merging :) At the time the PG count formula wasn't too clear and I got too many PGs which tend to slow things

Re: Set object mtime

2013-07-11 Thread Sylvain Munaut
Hi, Okay thanks! A call in the C API would be handy. I was wanting to look at creating a tool to sync RADOS between clusters. Is that anything that's in the development plan already? I was just thinking of the same thing. Basically I want rsync between pools, mostly to allow changing the

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-07-01 Thread Sylvain Munaut
Hi again, However when rbd cache is enabled with: [client] rbd_cache = true the tapdisk process crashes if I do this in the domU: dd if=/dev/xvda bs=1M /dev/null I tested this locally and couldn't reproduce the issue. Doing reads doesn't do anything bad AFAICT. Doing writes OTOH seems to

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-06-21 Thread Sylvain Munaut
Hi, I've been testing this on Ubuntu 12.04.02 64-bit with kernel 3.2.0-48 and ceph 0.61.4 Thanks for testing :) However when rbd cache is enabled with: [client] rbd_cache = true the tapdisk process crashes if I do this in the domU: dd if=/dev/xvda bs=1M /dev/null Interesting. I'm

Re: caller_ops.size error messages upstream/cuttlefish

2013-05-31 Thread Sylvain Munaut
Just as FYI, I also had a couple of those messages last night when adding a couple new OSDs. [ERR] 3.274 caller_ops.size 3002 log size 3001 it eventually cleared and a deep scrub on that pg doesnt' show any error so I'm not sure what it means ... Cheers, Sylvain -- To unsubscribe from

radosgw: Files left over after deletion (even after the gc period/process)

2013-05-30 Thread Sylvain Munaut
Hi, The basic operation I did was : - Upload a file using multi-part upload - Copy that file to a new key - Delete that original file (which only creates a new head) - Delete the copy And it seems all the multipart shadow files are not properly deleted. Cheers, Sylvain -- To

Re: ceph gets stopped but never started...

2013-05-30 Thread Sylvain Munaut
Hi since commit 85fb422a084785176af3b694882964841e02195d in cuttle fish cherry-picked from 2f193fb931ed09d921e6fa5a985ab87aa4874589 ceph gets stopped on debian while upgrading but it never gets started again?? Does this really makes sense? Should the admin handle the stop / start / restart

High CPU usage when enabling mon leveldb compression

2013-05-29 Thread Sylvain Munaut
Hi, In an attempt to reduce the high IO usage of the mon, I tried to enable the LevelDB snappy compression. The good news is that it did reduce the IO substantially (~ 3.5x less IO) and also the disk space (by about the same ratio). Unfortunately, it seems it also takes a LOT of CPU. ( ~

Re: OSD memory leak when scrubbing [0.56.6]

2013-05-21 Thread Sylvain Munaut
Hi, subject seems familiar, version was 0.48.3 in the last mail. Not anyone else with perhaps large pg's experiencing such behaviour? Any advice on how to proceed? I had the same behavior in both argonaut and bobtail, raising sharply ~ 100M or so at each scrub (every 24h). It's now

Re: [ceph-users] mon IO usage

2013-05-21 Thread Sylvain Munaut
Hi, So, AFAICT, the bulk of the write would be writing out the pgmap to disk every second or so. It should be writing out the full map only every N commits... see 'paxos stash full interval', which defaults to 25. But doesn't it also write it in full when there is a new pgmap ? I have a

Debian package dependencies across version

2013-05-08 Thread Sylvain Munaut
Hi, In the debian/control file, the dependency from one ceph package to the other doesn't always specify that version should match. For example the 'ceph' package depends on 'ceph-common' but not on 'ceph-common (= ${binary:Version})'. The result is that when I did a apt-get install ceph to

Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-26 Thread Sylvain Munaut
Hi, I just wanted to mention that I implemented a simple request merging strategy to counter-act the request splitting done by the Xen block if protocol. The results are pretty good. When comparing to using the rbd kernel module, I can now get 2-4x better write performance and 2x read

Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-26 Thread Sylvain Munaut
Hi, Is this in the blktap layer or in librbd? FWIW, when rbd cache = true, the writes will get merged by the cache and written out in large extents on flush. In the blktap layer. I don't have the cache enabled because FLUSH request from the VM are not forwarded down to that layer, when you

Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-23 Thread Sylvain Munaut
Hi, We can test this, but just a couple of lines of input might be needed to get us going with this without digging through all the code. Ok, so I added proper argument parsing (using the same format as the qemu rbd driver) now, so it's easier to test. First off, you need a working blktap

Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-23 Thread Sylvain Munaut
Hi, My distro (openSuSE 12.1) has /usr/sbin/tapdisk for original tapdisk v1, and /usr/sbin/tapdisk2 for version 2 stuff. I'm replacing /usr/sbin/tapdisk2. Mm, do you know where I could find the source for those tapdisk binaries ? This is attempting to just use tap2:aio for the image

Re: poor write performance

2013-04-22 Thread Sylvain Munaut
Hi, Correct, but that's the theoretical maximum I was referring to. If I calculate that I should be able to get 50MB/second then 30MB/second is acceptable but 500KB/second is not :) I have written a small benchmark for RBD : https://gist.github.com/smunaut/5433222 It uses the librbd API

Re: poor write performance

2013-04-22 Thread Sylvain Munaut
Hi, Unless Sylvian implemented this in his tool explicitly, it won't happen there either. The small bench tool submits requests using the asynchronous API as fast as possible, using a 1M chunk. Then it just waits for all the completions to be done. Sylvain -- To unsubscribe from this

Re: poor write performance

2013-04-21 Thread Sylvain Munaut
Hi, My goal is 4 OSD's, each on separate machines, with 1 drive in each for a start, but I want to see performance of at least the same order of magnitude as the theoretical maximum on my hardware before I think about replacing my existing setup. My current understanding is that it's not

Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-19 Thread Sylvain Munaut
Hi, My Xen is kind of rusty, last time I used it was about 3 years ago, but can't you do something similar like with Qemu? Just submit all the arguments semi-column separated? Yes probably, I just didn't get to it. I wanted to check first if this approach was solving the issues I had with RBD

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-19 Thread Sylvain Munaut
If you have time to write up some lines about steps required to test this, that'd be nice, it'll help people to test this stuff. To quickly test, I compiled the package and just replaced the tapdisk binary from my normal blktap install with the newly compiled one. Then you need to setup a RBD

Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-18 Thread Sylvain Munaut
Hi, I've been working on getting a working blktap driver allowing to access ceph RBD block devices without relying on the RBD kernel driver and it finally got to a point where, it works and is testable. Some of the advantages are: - Easier to update to newer RBD version - Allows functionality

Re: RGW: Refusing FastCGI request with empty CONTENT_LENGTH ? Why ?

2013-04-03 Thread Sylvain Munaut
Hi, https://github.com/carsonoid/ceph/commit/96896eb092c3b4e0760e56d5228ef0d604951a12 Yes, exactly. The same commit is also in the official same repo with the same commit ID. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

RGW: Refusing FastCGI request with empty CONTENT_LENGTH ? Why ?

2013-04-02 Thread Sylvain Munaut
Hi, In src/rgw/rgw_rest.cc you can find : if (s-length) { if (*s-length == '\0') return -EINVAL; s-content_length = atoll(s-length); } So that means if there is a CONTENT_LENGTH field in the environment but it's empty, then the request is just refused. Why ? As it turns out, nginx

Re: RGW: Refusing FastCGI request with empty CONTENT_LENGTH ? Why ?

2013-04-02 Thread Sylvain Munaut
Hi, Replying to myself So that means if there is a CONTENT_LENGTH field in the environment but it's empty, then the request is just refused. This is fixed in master but wasn't backported to the bobtail branch, which is how I missed it ... Sorry for the noise. Cheers, Sylvain -- To

[radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, I've just noticed something rather worrying on our cluster. Some files are apparently truncated. From the first look I had at it, it happened on files where there was a metadata update right after the file was stored. The exact sequence was: - PUT to store the file - GET to get the file

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, What version are you using? Do you have logs? I'm running a custom build 0.56.3 + some patches ( basically up to7889c5412 + fixes for #4150 and #4177 ). I don't have any radosgw low ( debug level is set to 0 and it didn't output anything ). I have the HTTP logs : 10.0.0.253 s3.svc -

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, Can't make much out of it, will probably need rgw logs (and preferably with also 'debug ms = 1') for this issue. Well, the problem is that I can't make it happen again ... it happened 4 times during an import of ~3000 files ... I'm trying to reproduce this on a test cluster but so far, no

Re: Usable Space

2013-03-06 Thread Sylvain Munaut
Total Space: X / Y || Usable Space: A / B Would it be possible to add this in at some point? Seems like a great addition to go with some of the other 'usability enhancements' that are planned. Or would this get computationally sticky based on having many pools with different replication

Re: maintanance on osd host

2013-03-01 Thread Sylvain Munaut
Hi, I have it documented here: http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing That looks wrong to me AFAIU it should be 'noout'. You want it marked down ASAP. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe

radosgw: Update a key's meta data

2013-02-14 Thread Sylvain Munaut
around). I tried reading the code, but although part of the code seem to hint at support for this (in rgw_rest_s3.cc), some other part seem to not look at all if the src == dst (like rgw_op.cc). Cheers, Sylvain Munaut -- To unsubscribe from this list: send the line unsubscribe ceph-devel

Re: FOSDEM 2013 ceph informal meeting : last call

2013-02-01 Thread Sylvain Munaut
Hi, If you're interested in meeting to discuss ceph tomorrow ( Saturday ) at 2pm during https://fosdem.org/2013/, let me know. I'll find a place and advertise it as a reply to this mail, in the morning. Feel free to call me ( +33 6 64 03 29 07 ) if you're lost :-) I'm not sure where I'll

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-31 Thread Sylvain Munaut
Hi, I disabled scrubbing using ceph osd tell \* injectargs '--osd-scrub-min-interval 100' ceph osd tell \* injectargs '--osd-scrub-max-interval 1000' and the leak seems to be gone. See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory for the 12 osd processes over the

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-31 Thread Sylvain Munaut
Hi, I'm crossing my fingers, but I just noticed that since I upgraded to kernel version 3.2.0-36-generic on Ubuntu 12.04 the other day, ceph-osd memory usage has stayed stable. Unfortunately for me, I'm already on 3.2.0-36-generic (Ubuntu 12.04 as well). Cheers, Sylvain PS: Dave

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sylvain Munaut
Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sylvain Munaut
Hi, Can you try disabling scrubbing and see if the leak stops? ceph osd tell \* injectargs '--osd-scrub-load-threshold .01' (that will work for 0.56.1, but is fixed in later versions, btw.) On newer code, ceph osd tell \* injectargs '--osd-scrub-min-interval 100'

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-27 Thread Sylvain Munaut
Hi, Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-27 Thread Sylvain Munaut
Hi, Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-25 Thread Sylvain Munaut
Could provide those heaps? Is it possible? We're updating this weekend to 0.56.1. If it still happens after the update, I'll try and reproduce it on our test infra and do the profile there, because unfortunately running the profiler seem to make it eat up CPU and RAM a lot ... I also need to

Re: RadosGW load balancing

2013-01-24 Thread Sylvain Munaut
Hi, Is possible to load balance multiple radosgw servers? Yes, Just use a http load balancer to redirect to several backend servers. Which kind of datas should be shared between each machine or is it fully stateless? The radosgw process doesn't have any state of its own. Every important

Re: RadosGW load balancing

2013-01-24 Thread Sylvain Munaut
Awesome. In this case I can bring up an nginx load balancer that will balance across 1,2 or 20 radosgw backend servers. what about the authentication token? Is this managed by ceph or by radosgw? If client will authenticate with radosgw1, are they also able to execute APIs with radosgw2 with

[0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sylvain Munaut
Hi, Since I have ceph in prod, I experienced a memory leak in the OSD forcing to restart them every 5 or 6 days. Without that the OSD process just grows infinitely and eventually gets killed by the OOM killer. (To make sure it wasn't legitimate, I left one grow up to 4G or RSS ...). Here's for

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sylvain Munaut
Hi, I don't really want to try the mem profiler, I had quite a bad experience with it on a test cluster. While running the profiler some OSD crashed... The only way to fix this is to provide a heap dump. Could you provide one? I just did: ceph osd tell 0 heap start_profiler ceph osd tell 0

Re: rbd kernel driver on the osd server

2013-01-14 Thread Sylvain Munaut
Hi, The other problem to consider is the possibility of deadlock under memory pressure. This is a problem with any network file system or block device that is backed by a user-level process on the same host. When the VM system is under memory pressure, it will ask the fs to write out some

Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi, happened to me using ubuntu packages. usually when you upgrade a package it calls all its dependencies, for ceph you have to update one by one. did you try that ? All ceph packages are up to date. Same happens with a custom compiled radosgw from git Cheers, Sylvain -- To

Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Ok, I tracked this down ... I'm using lighttpd as a FastCGI front end and it doesn't set SCRIPT_URI environment. So the line 1123 in rgw/rgw_rest.cc : s-script_uri = s-env-get(SCRIPT_URI); Tries to assign NULL to s-script_uri which crashes with the particularly unhelpful stack trace I pasted

Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi, As far as I know relying on SCRIPT_URI is rather dangerous since it's not always there. There better should be an if/else-satement surrounding that code having it defaulting to something else if SCRIPT_URI isn't available. I've opened a bug and proposed a patch setting the default value

Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi, Yeah, it's missing a guard here. Strange, I remember fixing this and others, but I can't find any trace of that. I think setting it to empty string is ok, though we may want to explore other fixes (configurable?) -- it affects the Location field in S3 POST response. Yes, I've seen it's

radosgw segfault in 0.56

2013-01-03 Thread Sylvain Munaut
Hi, I've just updated a test cluster to 0.56 and I'm getting a segfault when doing requests on radosgw : root@ceph /var/log/ceph # /usr/bin/radosgw -n client.radosgw.gateway -f *** Caught signal (Segmentation fault) ** in thread 7fee095f8700 ceph version 0.56

Re: Very bad behavior when

2012-12-04 Thread Sylvain Munaut
Hi, Sorry to let this drop for so long, but is this something you've seen happen before/again or otherwise reproduced? I'm not entirely sure how to best test for it (other than just jerking the time around), and while I can come up with scenarios where the OSD leaks memory, I've got nothing

Re: OSD daemon changes port no

2012-12-03 Thread Sylvain Munaut
# cephfs foo set_layout --pool 3 From early tests I seem to remember that just setting the pool using set_layout wasn't accepted by the tool, I had to reset all the layout parameters in the command. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in

Re: RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-12-03 Thread Sylvain Munaut
Hi, Can you attach/post the whole log somewhere? I'm curious what is leading up to it not having secret_id=0. Ideally with 'debug auth = 20' and 'debug osd = 20' and 'debug ms = 1'. I repoduced the problem with debug auth = 10 and debug ms = 1 (no debug osd ... that's just too verbose,

RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-11-29 Thread Sylvain Munaut
Hi, I'm using RBD to store VM image and they're accessed through the kernel client (xen vms). In the client dmesg log, I see periodically : Nov 29 10:46:48 b53-04 kernel: [160055.012206] libceph: osd8 10.208.2.213:6806 socket closed Nov 29 10:46:48 b53-04 kernel: [160055.013635] libceph: osd8

Re: RBD: periodic cephx issue ? CephxAuthorizeHandler::verify_authorizer isvalid=0

2012-11-29 Thread Sylvain Munaut
fwding to the list as I forgot to hit reply all ... Can you attach/post the whole log somewhere? I'm curious what is leading up to it not having secret_id=0. Ideally with 'debug auth = 20' and 'debug osd = 20' and 'debug ms = 1'. Well without the debug options there isn't anything else

Upgrade a running cluster to bobtail ? (when it will be out :p)

2012-11-28 Thread Sylvain Munaut
Hi, I'd like to know the recommended upgrade procedure when bobtail is out ? I've read in the past in the list that rolling upgrade between stable release was going to be possible, but exactly what is the way to do this ? Upgrade OSD host by host, then upgrade MON one by one ? Or do the MON

Re: OSD and MON memory usage

2012-11-28 Thread Sylvain Munaut
Hi, If you want, I can try to restart the whole thing tomorrow and collect fresh log output from the dying OSDs, or any other action or debug info that you might find useful. Is the clock synchronized on all machines ? What you describe (growing mem, recovery that doesn't seem to end) seems

Re: Very bad behavior when

2012-11-24 Thread Sylvain Munaut
Hi, In addition to Sam's question about version, are you using cephx? I'm running 0.48.2 on Ubuntu Precise and cephx auth is enabled. Cheers, Sylvain -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Very bad behavior when

2012-11-22 Thread Sylvain Munaut
Hi, I know that ceph has time synced servers has a requirements, but I think a sane failure mode like a message in the logs instead of incontrollably growing memory usage would be a good idea. I had the NTP process die on me tonight on an OSD (for unknown reason so far ...) and the clock went

Re: RGW: Pools .rgw .rgw.control .users.uid .users.email .users

2012-11-05 Thread Sylvain Munaut
Hi, Also, I assume those pools will actually be pretty small and so I can just leave them with PG_NUM=8 without much issue ? Data will not be distributed evenly across the cluster, and there may be a high contention on these pgs so it'd affect performance. But what is stored in those

RGW: Pools .rgw .rgw.control .users.uid .users.email .users

2012-11-02 Thread Sylvain Munaut
Hi, I've just started RGW and it created some pools automatically : .rgw .rgw.control .users.uid .users.email .users But when looking at them, they used PG_NUM=8 which is a bit low. Now, I noticed it and created .rgw.buckets myself with an appropriate PG_NUM, I assume it will work or is there

Re: [Xen-users] Ceph + RBD + Xen: Complete collapse - Network issue in domU / Bad data for OSD / OOM Kill

2012-09-03 Thread Sylvain Munaut
On Thu, Aug 30, 2012 at 5:04 PM, Fajar A. Nugraha l...@fajar.net wrote: On Thu, Aug 30, 2012 at 9:43 PM, Alex Elder el...@inktank.com wrote: On 08/30/2012 09:06 AM, Sylvain Munaut wrote: Hi, I posted the following comments on IRC but am putting it here just so it's visible along

Re: Unable to set pg_num property of pool data

2012-08-30 Thread Sylvain Munaut
I was able to increase the PG_num property but only problem is to reduce it. Anyways thanks for your valuable reply. AFAIK the old tool allowed you to do that but it didn't really work internally if the pool wasn't empty. In the new version it has been removed completely because it wasn't

Ceph + RBD + Xen: Complete collapse - Network issue in domU / Bad data for OSD / OOM Kill

2012-08-30 Thread Sylvain Munaut
Hi, A bit of explanation of what I'm trying to achieve : We have a bunch of homogeneous nodes that have CPU + RAM + Storage and we want to use that as some generic cluster. The idea is to have Xen on all of these and run Ceph OSD in a domU on each to export the local storage space to the entire

Re: Unable to set pg_num property of pool data

2012-08-29 Thread Sylvain Munaut
Hi, I am facing problem while I am explicitly trying to set pg_number of data pool. You can't change pg_num of an existing pool, you can only specify this number when creating the pool. Support for online pg split/merge is an upcoming feature. Cheers, Sylvain -- To unsubscribe from

Re: Integration work

2012-08-29 Thread Sylvain Munaut
Correct me if I'm wrong, but when I was at Citrix in May this year somebody there told me that Xen was going 100% Qemu? Huh ... I've never heard this. Also the guys in ##xen haven't either. I'm not really involved in xen dev and don't follow it closely but that seems unlikely. The few slides I

RBD Async request: When / How are the call back called ?

2012-08-29 Thread Sylvain Munaut
Hi, It might be obvious for people knowing the API but somehow I can't figure it out: How and when will the call back specified in a rbd_completion_t be called ? Imagine I do a rbd_aio_write and then do while (1); ... I don't see how the library could call my callback unless there is threads

  1   2   >