Re: OSD memory leaks?
Sebastien, I just had to restart the OSD about 10 minutes ago, so it looks like all it did was slow down the process. Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Dave Spano dsp...@optogenics.com Cc: Greg Farnum g...@inktank.com, ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com Sent: Wednesday, March 13, 2013 3:59:03 PM Subject: Re: OSD memory leaks? Dave, Just to be sure, did the log max recent=1 _completely_ stod the memory leak or did it slow it down? Thanks! -- Regards, Sébastien Han. On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano dsp...@optogenics.com wrote: Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. Dave Spano - Original Message - From: Greg Farnum g...@inktank.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com, Sébastien Han han.sebast...@gmail.com Sent: Tuesday, March 12, 2013 5:37:37 PM Subject: Re: OSD memory leaks? Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w
Re: OSD memory leaks?
Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. Dave Spano - Original Message - From: Greg Farnum g...@inktank.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com, Sébastien Han han.sebast...@gmail.com Sent: Tuesday, March 12, 2013 5:37:37 PM Subject: Re: OSD memory leaks? Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com) Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards
Re: OSD memory leaks?
Dave, Just to be sure, did the log max recent=1 _completely_ stod the memory leak or did it slow it down? Thanks! -- Regards, Sébastien Han. On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano dsp...@optogenics.com wrote: Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. Dave Spano - Original Message - From: Greg Farnum g...@inktank.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com, Sébastien Han han.sebast...@gmail.com Sent: Tuesday, March 12, 2013 5:37:37 PM Subject: Re: OSD memory leaks? Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com) Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable
Re: OSD memory leaks?
Sebastien, I'm not totally sure yet, but everything is still working. Sage and Greg, I copied my glance image pool per the posting I mentioned previously, and everything works when I use the ceph tools. I can export rbds from the new pool and delete them as well. I noticed that the copied images pool does not work with glance. I get this error when I try to create images in the new pool. If I put the old pool back, I can create images no problem. Is there something I'm missing in glance that I need to work with a pool created in bobtail? I'm using Openstack Folsom. File /usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 437, in _upload image_meta['size']) File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 244, in add image_size, order) File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 207, in _create_image features=rbd.RBD_FEATURE_LAYERING) File /usr/lib/python2.7/dist-packages/rbd.py, line 194, in create raise make_ex(ret, 'error creating image') PermissionError: error creating image Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Dave Spano dsp...@optogenics.com Cc: Greg Farnum g...@inktank.com, ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com Sent: Wednesday, March 13, 2013 3:59:03 PM Subject: Re: OSD memory leaks? Dave, Just to be sure, did the log max recent=1 _completely_ stod the memory leak or did it slow it down? Thanks! -- Regards, Sébastien Han. On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano dsp...@optogenics.com wrote: Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. Dave Spano - Original Message - From: Greg Farnum g...@inktank.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com, Sébastien Han han.sebast...@gmail.com Sent: Tuesday, March 12, 2013 5:37:37 PM Subject: Re: OSD memory leaks? Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http
Re: OSD memory leaks?
It sounds like maybe you didn't rename the new pool to use the old pool's name? Glance is looking for a specific pool to store its data in; I believe it's configurable but you'll need to do one or the other. -Greg On Wednesday, March 13, 2013 at 3:38 PM, Dave Spano wrote: Sebastien, I'm not totally sure yet, but everything is still working. Sage and Greg, I copied my glance image pool per the posting I mentioned previously, and everything works when I use the ceph tools. I can export rbds from the new pool and delete them as well. I noticed that the copied images pool does not work with glance. I get this error when I try to create images in the new pool. If I put the old pool back, I can create images no problem. Is there something I'm missing in glance that I need to work with a pool created in bobtail? I'm using Openstack Folsom. File /usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 437, in _upload image_meta['size']) File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 244, in add image_size, order) File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 207, in _create_image features=rbd.RBD_FEATURE_LAYERING) File /usr/lib/python2.7/dist-packages/rbd.py, line 194, in create raise make_ex(ret, 'error creating image') PermissionError: error creating image Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) Cc: Greg Farnum g...@inktank.com (mailto:g...@inktank.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Wednesday, March 13, 2013 3:59:03 PM Subject: Re: OSD memory leaks? Dave, Just to be sure, did the log max recent=1 _completely_ stod the memory leak or did it slow it down? Thanks! -- Regards, Sébastien Han. On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. Dave Spano - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) Cc: ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com), Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Sent: Tuesday, March 12, 2013 5:37:37 PM Subject: Re: OSD memory leaks? Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement
Re: OSD memory leaks?
I renamed the old one from images to images-old, and the new one from images-new to images. Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum lt;g...@inktank.comgt; To: Dave Spano lt;dsp...@optogenics.comgt; Cc: Sébastien Han lt;han.sebast...@gmail.comgt;, ceph-devel lt;ceph-devel@vger.kernel.orggt;, Sage Weil lt;s...@inktank.comgt;, Wido den Hollander lt;w...@42on.comgt;, Sylvain Munaut lt;s.mun...@whatever-company.comgt;, Samuel Just lt;sam.j...@inktank.comgt;, Vladislav Gorbunov lt;vadi...@gmail.comgt; Sent: Wed, 13 Mar 2013 18:52:29 -0400 (EDT) Subject: Re: OSD memory leaks? It sounds like maybe you didn't rename the new pool to use the old pool's name? Glance is looking for a specific pool to store its data in; I believe it's configurable but you'll need to do one or the other. -Greg On Wednesday, March 13, 2013 at 3:38 PM, Dave Spano wrote: gt; Sebastien, gt; gt; I'm not totally sure yet, but everything is still working. gt; gt; gt; Sage and Greg, gt; I copied my glance image pool per the posting I mentioned previously, and everything works when I use the ceph tools. I can export rbds from the new pool and delete them as well. gt; gt; I noticed that the copied images pool does not work with glance. gt; gt; I get this error when I try to create images in the new pool. If I put the old pool back, I can create images no problem. gt; gt; Is there something I'm missing in glance that I need to work with a pool created in bobtail? I'm using Openstack Folsom. gt; gt; File /usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 437, in _upload gt; image_meta['size']) gt; File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 244, in add gt; image_size, order) gt; File /usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 207, in _create_image gt; features=rbd.RBD_FEATURE_LAYERING) gt; File /usr/lib/python2.7/dist-packages/rbd.py, line 194, in create gt; raise make_ex(ret, 'error creating image') gt; PermissionError: error creating image gt; gt; gt; Dave Spano gt; gt; gt; gt; gt; - Original Message - gt; gt; From: Sébastien Han lt;han.sebast...@gmail.com (mailto:han.sebast...@gmail.com)gt; gt; To: Dave Spano lt;dsp...@optogenics.com (mailto:dsp...@optogenics.com)gt; gt; Cc: Greg Farnum lt;g...@inktank.com (mailto:g...@inktank.com)gt;, ceph-devel lt;ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org)gt;, Sage Weil lt;s...@inktank.com (mailto:s...@inktank.com)gt;, Wido den Hollander lt;w...@42on.com (mailto:w...@42on.com)gt;, Sylvain Munaut lt;s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com)gt;, Samuel Just lt;sam.j...@inktank.com (mailto:sam.j...@inktank.com)gt;, Vladislav Gorbunov lt;vadi...@gmail.com (mailto:vadi...@gmail.com)gt; gt; Sent: Wednesday, March 13, 2013 3:59:03 PM gt; Subject: Re: OSD memory leaks? gt; gt; Dave, gt; gt; Just to be sure, did the log max recent=1 _completely_ stod the gt; memory leak or did it slow it down? gt; gt; Thanks! gt; -- gt; Regards, gt; Sébastien Han. gt; gt; gt; On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano lt;dsp...@optogenics.com (mailto:dsp...@optogenics.com)gt; wrote: gt; gt; Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. gt; gt; gt; gt; I'm still crossing my fingers, but since I added log max recent=1 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. gt; gt; gt; gt; Dave Spano gt; gt; gt; gt; gt; gt; gt; gt; gt; gt; - Original Message - gt; gt; gt; gt; From: Greg Farnum lt;g...@inktank.com (mailto:g...@inktank.com)gt; gt; gt; To: Dave Spano lt;dsp...@optogenics.com (mailto:dsp...@optogenics.com)gt; gt; gt; Cc: ceph-devel lt;ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org)gt;, Sage Weil lt;s...@inktank.com (mailto:s...@inktank.com)gt;, Wido den Hollander lt;w...@42on.com (mailto:w...@42on.com)gt;, Sylvain Munaut lt;s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com)gt;, Samuel Just lt;sam.j...@inktank.com (mailto:sam.j...@inktank.com)gt;, Vladislav Gorbunov lt;vadi...@gmail.com (mailto:vadi...@gmail.com)gt;, Sébastien Han lt;han.sebast...@gmail.com (mailto:han.sebast...@gmail.com)gt; gt; gt; Sent: Tuesday, March 12, 2013 5:37:37 PM gt; gt; Subject: Re: OSD memory leaks? gt; gt; gt; gt; Yeah. There's not anything intelligent about that cppool mechanism. :) gt; gt; -Greg gt; gt; gt; gt; On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: gt; gt; gt; gt; gt; I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? gt; gt; gt; gt; gt; gt; Dave
Re: OSD memory leaks?
On 03/13/2013 05:05 PM, Dave Spano wrote: I renamed the old one from images to images-old, and the new one from images-new to images. This reminds me of a problem you might hit with this: RBD clones track the parent image pool by id, so they'll continue working after the pool is renamed. If you have any clones of the images-old pool, they'll stop working when that pool is deleted. To get around this, you'll need to flatten any clones whose parents are in images-old before deleting the images-old pool. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com: Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com: Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com To: Sébastien Han han.sebast...@gmail.com Cc: Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Gregory Farnum g...@inktank.com, Sylvain Munaut s.mun...@whatever-company.com, ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Vladislav Gorbunov vadi...@gmail.com Cc: Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Gregory Farnum g...@inktank.com, Sylvain Munaut s.mun...@whatever-company.com, Dave Spano dsp...@optogenics.com, ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com: Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing
Re: OSD memory leaks?
Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com To: Sébastien Han han.sebast...@gmail.com Cc: Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Gregory Farnum g...@inktank.com, Sylvain Munaut s.mun...@whatever-company.com, ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Vladislav Gorbunov vadi...@gmail.com Cc: Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Gregory Farnum g...@inktank.com, Sylvain Munaut s.mun...@whatever-company.com, Dave Spano dsp...@optogenics.com, ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com: Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory
Re: OSD memory leaks?
On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com) Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com): Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com): FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't
Re: OSD memory leaks?
han.sebast...@gmail.com said: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature I've been following the instructions here: http://ceph.com/docs/master/rados/configuration/osd-config-ref/ under data placement, trying to set the number of pgs in ceph.conf. I've added these lines in the global section: osd pool default pg num = 500 osd pool default pgp num = 500 but they don't seem to have any effect on how mkcephfs behaves. Before I added these lines, mkcephfs created a data pool with 3904 pgs. After wiping everything, adding the lines and re-creating the pool, it still ends up with 3904 pgs. What am I doing wrong? Thanks, Bryan -- Bryan Wright |If you take cranberries and stew them like Physics Department| applesauce, they taste much more like prunes University of Virginia| than rhubarb does. -- Groucho Charlottesville, VA 22901| (434) 924-7218| br...@virginia.edu -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com To: Sébastien Han han.sebast...@gmail.com Cc: Dave Spano dsp...@optogenics.com, ceph-devel ceph-devel@vger.kernel.org, Sage Weil s...@inktank.com, Wido den Hollander w...@42on.com, Sylvain Munaut s.mun...@whatever-company.com, Samuel Just sam.j...@inktank.com, Vladislav Gorbunov vadi...@gmail.com Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com) Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com): Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com): FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com (mailto:s...@inktank.com) wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all
Re: OSD memory leaks?
Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano Optogenics Systems Administrator - Original Message - From: Greg Farnum g...@inktank.com (mailto:g...@inktank.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 4:20:13 PM Subject: Re: OSD memory leaks? On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) wrote: Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com) To: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com), Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Sent: Tuesday, March 12, 2013 1:41:21 PM Subject: Re: OSD memory leaks? If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com) To: Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) Cc: Sage Weil s...@inktank.com (mailto:s...@inktank.com), Wido den Hollander w...@42on.com (mailto:w...@42on.com), Gregory Farnum g...@inktank.com (mailto:g...@inktank.com), Sylvain Munaut s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com), Dave Spano dsp...@optogenics.com (mailto:dsp...@optogenics.com), ceph-devel ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org), Samuel Just sam.j...@inktank.com (mailto:sam.j...@inktank.com) Sent: Tuesday, March 12, 2013 9:43:44 AM Subject: Re: OSD memory leaks? Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com (mailto:han.sebast...@gmail.com): Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com (mailto:vadi...@gmail.com) wrote: FYI I'm using 450 pgs for my pools
Re: OSD memory leaks?
Dave, It still a production platform so no I didn't try it. I've also found that now ceph-mon are constantly leaking... I truly hope your log max recent = 1 will help. Cheers. -- Regards, Sébastien Han. On Mon, Mar 11, 2013 at 7:43 PM, Dave Spano dsp...@optogenics.com wrote: Sebastien, Did the patch that Sage mentioned work for you? I've found that this behavior is happening more frequently with my first osd during deep scrubbing on version 0.56.3. OOM Killer now goes after the ceph-osd process after a couple of days. Sage, Yesterday after following the OSD memory requirements thread, I added log max recent = 1 to ceph.conf, and osd.0 seems to have returned to a state of normalcy. If it makes it through a deep scrubbing with no problem, I'll be very happy. s...@inktank.com said: - pg log trimming (probably a conservative subset) to avoid memory bloat Anything that reduces the size of OSD processes would be appreciated. You can probably do this with just log max recent = 1000 By default it's keeping 100k lines of logs in memory, which can eat a lot of ram (but is great when debugging issues). Dave Spano - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Sage Weil s...@inktank.com Cc: Wido den Hollander w...@42on.com, Gregory Farnum g...@inktank.com, Sylvain Munaut s.mun...@whatever-company.com, Dave Spano dsp...@optogenics.com, ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Monday, March 4, 2013 12:11:22 PM Subject: Re: OSD memory leaks? FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line
Re: OSD memory leaks?
FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil s...@inktank.com wrote: On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
That pattern would seem to support the log trimming theory of the leak. -Sam On Fri, Mar 1, 2013 at 7:51 AM, Wido den Hollander w...@42on.com wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On Fri, 1 Mar 2013, Wido den Hollander wrote: On 02/23/2013 01:44 AM, Sage Weil wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak. I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved. I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results. The PGs were active+clean when you saw the leak? There is a problem (that we just fixed in master) where pg logs aren't trimmed for degraded PGs. sage Wido Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Ok thanks guys. Hope we will find something :-). -- Regards, Sébastien Han. On Mon, Feb 25, 2013 at 8:51 AM, Wido den Hollander w...@42on.com wrote: On 02/25/2013 01:21 AM, Sage Weil wrote: On Mon, 25 Feb 2013, S?bastien Han wrote: Hi Sage, Sorry it's a production system, so I can't test it. So at the end, you can't get anything out of the core dump? I saw a bunch of dup object anmes, which is what led us to the pg log theory. I can look a bit more carefully to confirm, but in the end it would be nice to see users scrubbing without leaking. This may be a bit moot because we want to allow trimming for other reasons, so those patches are being tested and working their way into master. We'll backport when things are solid. In the meantime, if someone has been able to reproduce this in a test environment, testing is obviously welcome :) I'll see what I can do later this week. I know of a cluster which has the same issues which is in semi-production as far as I know. Wido sage -- Regards, S?bastien Han. On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil s...@inktank.com wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Hi Sage, Sorry it's a production system, so I can't test it. So at the end, you can't get anything out of the core dump? -- Regards, Sébastien Han. On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil s...@inktank.com wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On Mon, 25 Feb 2013, S?bastien Han wrote: Hi Sage, Sorry it's a production system, so I can't test it. So at the end, you can't get anything out of the core dump? I saw a bunch of dup object anmes, which is what led us to the pg log theory. I can look a bit more carefully to confirm, but in the end it would be nice to see users scrubbing without leaking. This may be a bit moot because we want to allow trimming for other reasons, so those patches are being tested and working their way into master. We'll backport when things are solid. In the meantime, if someone has been able to reproduce this in a test environment, testing is obviously welcome :) sage -- Regards, S?bastien Han. On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil s...@inktank.com wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On 02/25/2013 01:21 AM, Sage Weil wrote: On Mon, 25 Feb 2013, S?bastien Han wrote: Hi Sage, Sorry it's a production system, so I can't test it. So at the end, you can't get anything out of the core dump? I saw a bunch of dup object anmes, which is what led us to the pg log theory. I can look a bit more carefully to confirm, but in the end it would be nice to see users scrubbing without leaking. This may be a bit moot because we want to allow trimming for other reasons, so those patches are being tested and working their way into master. We'll backport when things are solid. In the meantime, if someone has been able to reproduce this in a test environment, testing is obviously welcome :) I'll see what I can do later this week. I know of a cluster which has the same issues which is in semi-production as far as I know. Wido sage -- Regards, S?bastien Han. On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil s...@inktank.com wrote: On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). -- Regards, Sébastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, Sébastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase — this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On Fri, 22 Feb 2013, S?bastien Han wrote: Hi all, I finally got a core dump. I did it with a kill -SEGV on the OSD process. https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 Hope we will get something out of it :-). AHA! We have a theory. The pg log isnt trimmed during scrub (because teh old scrub code required that), but the new (deep) scrub can take a very long time, which means the pg log will eat ram in the meantime.. especially under high iops. Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see if that seems to work? Note that that patch shouldn't be run in a mixed argonaut+bobtail cluster, since it isn't properly checking if the scrub is class or chunky/deep. Thanks! sage -- Regards, S?bastien Han. On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum g...@inktank.com wrote: On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase ? this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase — this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. Any ideas? Nothing to say about my scrumbing theory? Thanks! -- Regards, Sébastien Han. On Thu, Jan 10, 2013 at 10:44 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Jan 9, 2013 at 10:09 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Just fyi, I also have growing memory on OSD, and I have the same logs: libceph: osd4 172.20.11.32:6801 socket closed in the RBD clients That message is not an error; it just happens if the RBD client doesn't talk to that OSD for a while. I believe its volume has been turned down quite a lot in the latest kernels/our git tree. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On Fri, Jan 11, 2013 at 6:57 AM, Sébastien Han han.sebast...@gmail.com wrote: Is osd.1 using the heap profiler as well? Keep in mind that active use of the memory profiler will itself cause memory usage to increase — this sounds a bit like that to me since it's staying stable at a large but finite portion of total memory. Well, the memory consumption was already high before the profiler was started. So yes with the memory profiler enable an OSD might consume more memory but this doesn't cause the memory leaks. My concern is that maybe you saw a leak but when you restarted with the memory profiling you lost whatever conditions caused it. Any ideas? Nothing to say about my scrumbing theory? I like it, but Sam indicates that without some heap dumps which capture the actual leak then scrub is too large to effectively code review for leaks. :( -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
On Wed, Jan 9, 2013 at 10:09 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Just fyi, I also have growing memory on OSD, and I have the same logs: libceph: osd4 172.20.11.32:6801 socket closed in the RBD clients That message is not an error; it just happens if the RBD client doesn't talk to that OSD for a while. I believe its volume has been turned down quite a lot in the latest kernels/our git tree. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Thank you. I appreciate it! Dave Spano Optogenics Systems Administrator - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Wednesday, January 9, 2013 5:12:12 PM Subject: Re: OSD memory leaks? Dave, I share you my little script for now if you want it: #!/bin/bash for i in $(ps aux | grep [c]eph-osd | awk '{print $4}') do MEM_INTEGER=$(echo $i | cut -d '.' -f1) OSD=$(ps aux | grep [c]eph-osd | grep $i | awk '{print $13}') if [[ $MEM_INTEGER -ge 25 ]];then service ceph restart osd.$OSD /dev/null if [ $? -eq 0 ]; then logger -t ceph-memory-usage The OSD number $OSD has been restarted since it was using $i % of the memory else logger -t ceph-memory-usage ERROR while restarting the OSD daemon fi else logger -t ceph-memory-usage The OSD number $OSD is only using $i % of the memory, doing nothing fi logger -t ceph-memory-usage Waiting 60 seconds before testing the next OSD... sleep 60 done logger -t ceph-memory-usage Ceph state after memory check operation is: $(ceph health) Crons run with 10 min interval everyday for each storage node ;-). Waiting for some Inktank guys now :-). -- Regards, Sébastien Han. On Wed, Jan 9, 2013 at 10:42 PM, Dave Spano dsp...@optogenics.com wrote: That's very good to know. I'll be restarting ceph-osd right now! Thanks for the heads up! Dave Spano Optogenics Systems Administrator - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Wednesday, January 9, 2013 11:35:13 AM Subject: Re: OSD memory leaks? If you wait too long, the system will trigger OOM killer :D, I already experienced that unfortunately... Sam? On Wed, Jan 9, 2013 at 5:10 PM, Dave Spano dsp...@optogenics.com wrote: OOM killer -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Awesome! What version are you running (ceph-osd -v, include the hash)? -Sam On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano dsp...@optogenics.com wrote: This failed the first time I sent it, so I'm resending in plain text. Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano dsp...@optogenics.com To: Sébastien Han han.sebast...@gmail.com Cc: ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Monday, January 7, 2013 12:40:06 PM Subject: Re: OSD memory leaks? Sam, Attached are some heaps that I collected today. 001 and 003 are just after I started the profiler; 011 is the most recent. If you need more, or anything different let me know. Already the OSD in question is at 38% memory usage. As mentioned by Sèbastien, restarting ceph-osd keeps things going. Not sure if this is helpful information, but out of the two OSDs that I have running, the first one (osd.0) is the one that develops this problem the quickest. osd.1 does have the same issue, it just takes much longer. Do the monitors hit the first osd in the list first, when there's activity? Dave Spano Optogenics Systems Administrator - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Samuel Just sam.j...@inktank.com Cc: ceph-devel ceph-devel@vger.kernel.org Sent: Friday, January 4, 2013 10:20:58 AM Subject: Re: OSD memory leaks? Hi Sam, Thanks for your answer and sorry the late reply. Unfortunately I can't get something out from the profiler, actually I do but I guess it doesn't show what is supposed to show... I will keep on trying this. Anyway yesterday I just thought that the problem might be due to some over usage of some OSDs. I was thinking that the distribution of the primary OSD might be uneven, this could have explained that some memory leaks are more important with some servers. At the end, the repartition seems even but while looking at the pg dump I found something interesting in the scrub column, timestamps from the last scrubbing operation matched with times showed on the graph. After this, I made some calculation, I compared the total number of scrubbing operation with the time range where memory leaks occurred. First of all check my setup: root@c2-ceph-01 ~ # ceph osd tree dumped osdmap tree epoch 859 # id weight type name up/down reweight -1 12 pool default -3 12 rack lc2_rack33 -2 3 host c2-ceph-01 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -4 3 host c2-ceph-04 10 1 osd.10 up 1 11 1 osd.11 up 1 9 1 osd.9 up 1 -5 3 host c2-ceph-02 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -6 3 host c2-ceph-03 6 1 osd.6 up 1 7 1 osd.7 up 1 8 1 osd.8 up 1 And there are the results: * Ceph node 1 which has the most important memory leak performed 1608 in total and 1059 during the time range where memory leaks occured * Ceph node 2, 1168 in total and 776 during the time range where memory leaks occured * Ceph node 3, 940 in total and 94 during the time range where memory leaks occurred * Ceph node 4, 899 in total and 191 during the time range where memory leaks occurred I'm still not entirely sure that the scrub operation causes the leak but the only relevant relation that I found... Could it be that the scrubbing process doesn't release memory? Btw I was wondering, how ceph decides at what time it should run the scrubbing operation? I know that it's once a day and control by the following options OPTION(osd_scrub_min_interval, OPT_FLOAT, 300) OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24) But how ceph determined the time where the operation started, during cluster creation probably? I just checked the options that control OSD scrubbing and found that by default: OPTION(osd_max_scrubs, OPT_INT, 1) So that might explain why only one OSD uses a lot of memory. My dirty workaround at the moment is to performed a check of memory use by every OSD and restart it if it uses more than 25% of the total memory. Also note that on ceph 1, 3 and 4 it's always one OSD that uses a lot of memory, for ceph 2 only the mem usage is high but almost the same for all the OSD process. Thank you in advance. -- Regards, Sébastien Han. On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just sam.j...@inktank.com wrote: Sorry, it's been very busy. The next step would to try to get a heap dump. You can start a heap profile on osd N by: ceph osd tell N heap start_profiler and you can get it to dump the collected profile using ceph osd tell N heap dump. The dumps should show up in the osd log directory. Assuming the heap profiler is working correctly, you can look at the dump using pprof in google-perftools. On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han han.sebast...@gmail.com wrote: No more suggestions? :( -- Regards, Sébastien Han. On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han han.sebast...@gmail.com wrote: Nothing terrific
Re: OSD memory leaks?
Hi Sam, Thanks for your answer and sorry the late reply. Unfortunately I can't get something out from the profiler, actually I do but I guess it doesn't show what is supposed to show... I will keep on trying this. Anyway yesterday I just thought that the problem might be due to some over usage of some OSDs. I was thinking that the distribution of the primary OSD might be uneven, this could have explained that some memory leaks are more important with some servers. At the end, the repartition seems even but while looking at the pg dump I found something interesting in the scrub column, timestamps from the last scrubbing operation matched with times showed on the graph. After this, I made some calculation, I compared the total number of scrubbing operation with the time range where memory leaks occurred. First of all check my setup: root@c2-ceph-01 ~ # ceph osd tree dumped osdmap tree epoch 859 # id weight type name up/down reweight -1 12 pool default -3 12 rack lc2_rack33 -2 3 host c2-ceph-01 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -4 3 host c2-ceph-04 10 1 osd.10 up 1 11 1 osd.11 up 1 9 1 osd.9 up 1 -5 3 host c2-ceph-02 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -6 3 host c2-ceph-03 6 1 osd.6 up 1 7 1 osd.7 up 1 8 1 osd.8 up 1 And there are the results: * Ceph node 1 which has the most important memory leak performed 1608 in total and 1059 during the time range where memory leaks occured * Ceph node 2, 1168 in total and 776 during the time range where memory leaks occured * Ceph node 3, 940 in total and 94 during the time range where memory leaks occurred * Ceph node 4, 899 in total and 191 during the time range where memory leaks occurred I'm still not entirely sure that the scrub operation causes the leak but the only relevant relation that I found... Could it be that the scrubbing process doesn't release memory? Btw I was wondering, how ceph decides at what time it should run the scrubbing operation? I know that it's once a day and control by the following options OPTION(osd_scrub_min_interval, OPT_FLOAT, 300) OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24) But how ceph determined the time where the operation started, during cluster creation probably? I just checked the options that control OSD scrubbing and found that by default: OPTION(osd_max_scrubs, OPT_INT, 1) So that might explain why only one OSD uses a lot of memory. My dirty workaround at the moment is to performed a check of memory use by every OSD and restart it if it uses more than 25% of the total memory. Also note that on ceph 1, 3 and 4 it's always one OSD that uses a lot of memory, for ceph 2 only the mem usage is high but almost the same for all the OSD process. Thank you in advance. -- Regards, Sébastien Han. On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just sam.j...@inktank.com wrote: Sorry, it's been very busy. The next step would to try to get a heap dump. You can start a heap profile on osd N by: ceph osd tell N heap start_profiler and you can get it to dump the collected profile using ceph osd tell N heap dump. The dumps should show up in the osd log directory. Assuming the heap profiler is working correctly, you can look at the dump using pprof in google-perftools. On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han han.sebast...@gmail.com wrote: No more suggestions? :( -- Regards, Sébastien Han. On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han han.sebast...@gmail.com wrote: Nothing terrific... Kernel logs from my clients are full of libceph: osd4 172.20.11.32:6801 socket closed I saw this somewhere on the tracker. Does this harm? Thanks. -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just sam.j...@inktank.com wrote: What is the workload like? -Sam On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just sam.j...@inktank.com wrote: Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this
Re: OSD memory leaks?
No more suggestions? :( -- Regards, Sébastien Han. On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han han.sebast...@gmail.com wrote: Nothing terrific... Kernel logs from my clients are full of libceph: osd4 172.20.11.32:6801 socket closed I saw this somewhere on the tracker. Does this harm? Thanks. -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just sam.j...@inktank.com wrote: What is the workload like? -Sam On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just sam.j...@inktank.com wrote: Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this mail. I run 0.48.2 on Ubuntu 12.04. The other nodes also grow, but slowly than the first one. I'm not quite sure about the information that I have to provide. So let me know. The only thing I can say is that the load haven't increase that much this week. It seems to be consuming and not giving back the memory. Thank you in advance. -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Sorry, it's been very busy. The next step would to try to get a heap dump. You can start a heap profile on osd N by: ceph osd tell N heap start_profiler and you can get it to dump the collected profile using ceph osd tell N heap dump. The dumps should show up in the osd log directory. Assuming the heap profiler is working correctly, you can look at the dump using pprof in google-perftools. On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han han.sebast...@gmail.com wrote: No more suggestions? :( -- Regards, Sébastien Han. On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han han.sebast...@gmail.com wrote: Nothing terrific... Kernel logs from my clients are full of libceph: osd4 172.20.11.32:6801 socket closed I saw this somewhere on the tracker. Does this harm? Thanks. -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just sam.j...@inktank.com wrote: What is the workload like? -Sam On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just sam.j...@inktank.com wrote: Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this mail. I run 0.48.2 on Ubuntu 12.04. The other nodes also grow, but slowly than the first one. I'm not quite sure about the information that I have to provide. So let me know. The only thing I can say is that the load haven't increase that much this week. It seems to be consuming and not giving back the memory. Thank you in advance. -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Nothing terrific... Kernel logs from my clients are full of libceph: osd4 172.20.11.32:6801 socket closed I saw this somewhere on the tracker. Does this harm? Thanks. -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just sam.j...@inktank.com wrote: What is the workload like? -Sam On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just sam.j...@inktank.com wrote: Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this mail. I run 0.48.2 on Ubuntu 12.04. The other nodes also grow, but slowly than the first one. I'm not quite sure about the information that I have to provide. So let me know. The only thing I can say is that the load haven't increase that much this week. It seems to be consuming and not giving back the memory. Thank you in advance. -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this mail. I run 0.48.2 on Ubuntu 12.04. The other nodes also grow, but slowly than the first one. I'm not quite sure about the information that I have to provide. So let me know. The only thing I can say is that the load haven't increase that much this week. It seems to be consuming and not giving back the memory. Thank you in advance. -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han han.sebast...@gmail.com wrote: Hi, No, I don't see nothing abnormal in the network stats. I don't see anything in the logs... :( The weird thing is that one node over 4 seems to take way more memory than the others... -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just sam.j...@inktank.com wrote: Are you having network hiccups? There was a bug noticed recently that could cause a memory leak if nodes are being marked up and down. -Sam On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi guys, Today looking at my graphs I noticed that one over 4 ceph nodes used a lot of memory. It keeps growing and growing. See the graph attached to this mail. I run 0.48.2 on Ubuntu 12.04. The other nodes also grow, but slowly than the first one. I'm not quite sure about the information that I have to provide. So let me know. The only thing I can say is that the load haven't increase that much this week. It seems to be consuming and not giving back the memory. Thank you in advance. -- Regards, Sébastien Han. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html