Re: 0.56 scrub OSD memleaks, WAS Re: [0.48.3] OSD memory leak when scrubbing

2013-02-19 Thread Samuel Just
Can you confirm that the memory size reported is res? -Sam On Mon, Feb 18, 2013 at 8:46 AM, Christopher Kunz chrisl...@de-punkt.de wrote: Am 16.02.13 10:09, schrieb Wido den Hollander: On 02/16/2013 08:09 AM, Andrey Korolyov wrote: Can anyone who hit this bug please confirm that your system

0.56 scrub OSD memleaks, WAS Re: [0.48.3] OSD memory leak when scrubbing

2013-02-18 Thread Christopher Kunz
Am 16.02.13 10:09, schrieb Wido den Hollander: On 02/16/2013 08:09 AM, Andrey Korolyov wrote: Can anyone who hit this bug please confirm that your system contains libc 2.15+? Hello, when we started a deep scrub on our 0.56.2 cluster today, we saw a massive memleak about 1 hour into the

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-17 Thread Sébastien Han
+1 -- Regards, Sébastien Han. On Sat, Feb 16, 2013 at 10:09 AM, Wido den Hollander w...@42on.com wrote: On 02/16/2013 08:09 AM, Andrey Korolyov wrote: Can anyone who hit this bug please confirm that your system contains libc 2.15+? I've seen this with 0.56.2 as well on Ubuntu 12.04.

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-15 Thread Andrey Korolyov
Can anyone who hit this bug please confirm that your system contains libc 2.15+? On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han han.sebast...@gmail.com wrote: oh nice, the pattern also matches path :D, didn't know that thanks Greg -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:22 PM,

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Sébastien Han
Hum just tried several times on my test cluster and I can't get any core dump. Does Ceph commit suicide or something? Is it expected behavior? -- Regards, Sébastien Han. On Sun, Feb 3, 2013 at 10:03 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi Loïc, Thanks for bringing our discussion

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Sage Weil
On Mon, 4 Feb 2013, S?bastien Han wrote: Hum just tried several times on my test cluster and I can't get any core dump. Does Ceph commit suicide or something? Is it expected behavior? SIGSEGV should trigger the usual path that dumps a stack trace and then dumps core. Was your ulimit -c set

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Dan Mick
...and/or do you have the corepath set interestingly, or one of the core-trapping mechanisms turned on? On 02/04/2013 11:29 AM, Sage Weil wrote: On Mon, 4 Feb 2013, S?bastien Han wrote: Hum just tried several times on my test cluster and I can't get any core dump. Does Ceph commit suicide or

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Sébastien Han
ok I finally managed to get something on my test cluster, unfortunately, the dump goes to / any idea to change the destination path? My production / won't be big enough... -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick dan.m...@inktank.com wrote: ...and/or do you have

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Gregory Farnum
Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core -Greg On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han han.sebast...@gmail.com wrote: ok I finally managed to get something on my test cluster, unfortunately, the dump goes to / any idea to change the destination

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-04 Thread Sébastien Han
oh nice, the pattern also matches path :D, didn't know that thanks Greg -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum g...@inktank.com wrote: Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core -Greg On Mon, Feb 4, 2013 at 1:08 PM,

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-03 Thread Loic Dachary
Hi, As discussed during FOSDEM, the script you wrote to kill the OSD when it grows too much could be amended to core dump instead of just being killed restarted. The binary + core could probably be used to figure out where the leak is. You should make sure the OSD current working directory

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-03 Thread Sébastien Han
Hi Loïc, Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). Cheer -- Regards, Sébastien Han. On Sun, Feb 3, 2013 at 10:01 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi Loïc, Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). Cheers

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-31 Thread Sylvain Munaut
Hi, I disabled scrubbing using ceph osd tell \* injectargs '--osd-scrub-min-interval 100' ceph osd tell \* injectargs '--osd-scrub-max-interval 1000' and the leak seems to be gone. See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory for the 12 osd processes over the

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-31 Thread Sylvain Munaut
Hi, I'm crossing my fingers, but I just noticed that since I upgraded to kernel version 3.2.0-36-generic on Ubuntu 12.04 the other day, ceph-osd memory usage has stayed stable. Unfortunately for me, I'm already on 3.2.0-36-generic (Ubuntu 12.04 as well). Cheers, Sylvain PS: Dave

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-31 Thread Sage Weil
On Thu, 31 Jan 2013, Sylvain Munaut wrote: Hi, I disabled scrubbing using ceph osd tell \* injectargs '--osd-scrub-min-interval 100' ceph osd tell \* injectargs '--osd-scrub-max-interval 1000' and the leak seems to be gone. See the graph at http://i.imgur.com/A0KmVot.png

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sylvain Munaut
Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sage Weil
On Wed, 30 Jan 2013, Sylvain Munaut wrote: Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sylvain Munaut
Hi, Can you try disabling scrubbing and see if the leak stops? ceph osd tell \* injectargs '--osd-scrub-load-threshold .01' (that will work for 0.56.1, but is fixed in later versions, btw.) On newer code, ceph osd tell \* injectargs '--osd-scrub-min-interval 100'

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-30 Thread Sage Weil
On Wed, 30 Jan 2013, Sylvain Munaut wrote: Hi, Can you try disabling scrubbing and see if the leak stops? ceph osd tell \* injectargs '--osd-scrub-load-threshold .01' (that will work for 0.56.1, but is fixed in later versions, btw.) On newer code, ceph osd

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-27 Thread Sylvain Munaut
Hi, Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-27 Thread Sage Weil
On Sun, 27 Jan 2013, Sylvain Munaut wrote: Hi, Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now,

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-27 Thread Sylvain Munaut
Hi, Just to keep you posted, upgraded our cluster yesterday to a custom compiled 0.56.1 and it has now been more than 24h and there is no sign on memory leak anymore. Previously it would rise by ~ 100 M every 24h almost like clock work and now, it's been slightly more than 24h and memory is

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-25 Thread Sébastien Han
Hi, Could provide those heaps? Is it possible? -- Regards, Sébastien Han. On Tue, Jan 22, 2013 at 10:38 PM, Sébastien Han han.sebast...@gmail.com wrote: Well ideally you want to run the profiler during the scrubbing process when the memory leaks appear :-). -- Regards, Sébastien Han.

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-25 Thread Sylvain Munaut
Could provide those heaps? Is it possible? We're updating this weekend to 0.56.1. If it still happens after the update, I'll try and reproduce it on our test infra and do the profile there, because unfortunately running the profiler seem to make it eat up CPU and RAM a lot ... I also need to

[0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sylvain Munaut
Hi, Since I have ceph in prod, I experienced a memory leak in the OSD forcing to restart them every 5 or 6 days. Without that the OSD process just grows infinitely and eventually gets killed by the OOM killer. (To make sure it wasn't legitimate, I left one grow up to 4G or RSS ...). Here's for

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sébastien Han
Hi, I originally started a thread around these memory leaks problems here: http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg11000.html I'm happy to see that someone supports my theory about the scrubbing process leaking the memory. I only use RBD from Ceph, so your theory makes sense as

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sylvain Munaut
Hi, I don't really want to try the mem profiler, I had quite a bad experience with it on a test cluster. While running the profiler some OSD crashed... The only way to fix this is to provide a heap dump. Could you provide one? I just did: ceph osd tell 0 heap start_profiler ceph osd tell 0

Re: [0.48.3] OSD memory leak when scrubbing

2013-01-22 Thread Sébastien Han
Well ideally you want to run the profiler during the scrubbing process when the memory leaks appear :-). -- Regards, Sébastien Han. On Tue, Jan 22, 2013 at 10:32 PM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, I don't really want to try the mem profiler, I had quite a bad