Re: OSD memory leaks?

2013-03-14 Thread Dave Spano
Sebastien, 

I just had to restart the OSD about 10 minutes ago, so it looks like all it did 
was slow down the process. 

Dave Spano 




- Original Message - 

From: "Sébastien Han"  
To: "Dave Spano"  
Cc: "Greg Farnum" , "ceph-devel" 
, "Sage Weil" , "Wido den 
Hollander" , "Sylvain Munaut" , 
"Samuel Just" , "Vladislav Gorbunov"  
Sent: Wednesday, March 13, 2013 3:59:03 PM 
Subject: Re: OSD memory leaks? 

Dave, 

Just to be sure, did the log max recent=1 _completely_ stod the 
memory leak or did it slow it down? 

Thanks! 
-- 
Regards, 
Sébastien Han. 


On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano  wrote: 
> Lol. I'm totally fine with that. My glance images pool isn't used too often. 
> I'm going to give that a try today and see what happens. 
> 
> I'm still crossing my fingers, but since I added log max recent=1 to 
> ceph.conf, I've been okay despite the improper pg_num, and a lot of 
> scrubbing/deep scrubbing yesterday. 
> 
> Dave Spano 
> 
> 
> 
> 
> - Original Message - 
> 
> From: "Greg Farnum"  
> To: "Dave Spano"  
> Cc: "ceph-devel" , "Sage Weil" 
> , "Wido den Hollander" , "Sylvain Munaut" 
> , "Samuel Just" , 
> "Vladislav Gorbunov" , "Sébastien Han" 
>  
> Sent: Tuesday, March 12, 2013 5:37:37 PM 
> Subject: Re: OSD memory leaks? 
> 
> Yeah. There's not anything intelligent about that cppool mechanism. :) 
> -Greg 
> 
> On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: 
> 
>> I'd rather shut the cloud down and copy the pool to a new one than take any 
>> chances of corruption by using an experimental feature. My guess is that 
>> there cannot be any i/o to the pool while copying, otherwise you'll lose the 
>> changes that are happening during the copy, correct? 
>> 
>> Dave Spano 
>> Optogenics 
>> Systems Administrator 
>> 
>> 
>> 
>> - Original Message - 
>> 
>> From: "Greg Farnum" mailto:g...@inktank.com)> 
>> To: "Sébastien Han" > (mailto:han.sebast...@gmail.com)> 
>> Cc: "Dave Spano" mailto:dsp...@optogenics.com)>, 
>> "ceph-devel" > (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" > (mailto:s...@inktank.com)>, "Wido den Hollander" > (mailto:w...@42on.com)>, "Sylvain Munaut" > (mailto:s.mun...@whatever-company.com)>, "Samuel Just" > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov" > (mailto:vadi...@gmail.com)> 
>> Sent: Tuesday, March 12, 2013 4:20:13 PM 
>> Subject: Re: OSD memory leaks? 
>> 
>> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: 
>> > Well to avoid un necessary data movement, there is also an 
>> > _experimental_ feature to change on fly the number of PGs in a pool. 
>> > 
>> > ceph osd pool set  pg_num  --allow-experimental-feature 
>> Don't do that. We've got a set of 3 patches which fix bugs we know about 
>> that aren't in bobtail yet, and I'm sure there's more we aren't aware of… 
>> -Greg 
>> 
>> Software Engineer #42 @ http://inktank.com | http://ceph.com 
>> 
>> > 
>> > Cheers! 
>> > -- 
>> > Regards, 
>> > Sébastien Han. 
>> > 
>> > 
>> > On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano > > (mailto:dsp...@optogenics.com)> wrote: 
>> > > Disregard my previous question. I found my answer in the post below. 
>> > > Absolutely brilliant! I thought I was screwed! 
>> > > 
>> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 
>> > > 
>> > > Dave Spano 
>> > > Optogenics 
>> > > Systems Administrator 
>> > > 
>> > > 
>> > > 
>> > > - Original Message - 
>> > > 
>> > > From: "Dave Spano" > > > (mailto:dsp...@optogenics.com)> 
>> > > To: "Sébastien Han" > > > (mailto:han.sebast...@gmail.com)> 
>> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
>> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
>> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
>> > > mailto:s.mun...@whatever-company.com)>, 
>> > > "ceph-devel" > > > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"

Re: OSD memory leaks?

2013-03-13 Thread Josh Durgin

On 03/13/2013 05:05 PM, Dave Spano wrote:

I renamed the old one from images to images-old, and the new one from 
images-new to images.


This reminds me of a problem you might hit with this:

RBD clones track the parent image pool by id, so they'll continue
working after the pool is renamed. If you have any clones of the
images-old pool, they'll stop working when that pool is deleted.

To get around this, you'll need to flatten any clones whose parents are
in images-old before deleting the images-old pool.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-13 Thread Dave Spano
I renamed the old one from images to images-old, and the new one from 
images-new to images. 

Dave Spano
Optogenics
Systems Administrator



- Original Message -
From: Greg Farnum <g...@inktank.com>
To: Dave Spano <dsp...@optogenics.com>
Cc: Sébastien Han <han.sebast...@gmail.com>, ceph-devel 
<ceph-devel@vger.kernel.org>, Sage Weil <s...@inktank.com>, Wido 
den Hollander <w...@42on.com>, Sylvain Munaut 
<s.mun...@whatever-company.com>, Samuel Just 
<sam.j...@inktank.com>, Vladislav Gorbunov <vadi...@gmail.com>
Sent: Wed, 13 Mar 2013 18:52:29 -0400 (EDT)
Subject: Re: OSD memory leaks?

It sounds like maybe you didn't rename the new pool to use the old pool's name? 
Glance is looking for a specific pool to store its data in; I believe it's 
configurable but you'll need to do one or the other.
-Greg

On Wednesday, March 13, 2013 at 3:38 PM, Dave Spano wrote:

> Sebastien,
> 
> I'm not totally sure yet, but everything is still working. 
> 
> 
> Sage and Greg, 
> I copied my glance image pool per the posting I mentioned previously, and 
everything works when I use the ceph tools. I can export rbds from the new pool 
and delete them as well.
> 
> I noticed that the copied images pool does not work with glance. 
> 
> I get this error when I try to create images in the new pool. If I put the 
old pool back, I can create images no problem. 
> 
> Is there something I'm missing in glance that I need to work with a pool 
created in bobtail? I'm using Openstack Folsom. 
> 
> File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 437, 
in _upload 
> image_meta['size']) 
> File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 244, in 
add 
> image_size, order) 
> File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 207, in 
_create_image 
> features=rbd.RBD_FEATURE_LAYERING) 
> File "/usr/lib/python2.7/dist-packages/rbd.py", line 194, in create 
> raise make_ex(ret, 'error creating image') 
> PermissionError: error creating image
> 
> 
> Dave Spano 
> 
> 
> 
> 
> - Original Message - 
> 
> From: "Sébastien Han" <han.sebast...@gmail.com 
(mailto:han.sebast...@gmail.com)> 
> To: "Dave Spano" <dsp...@optogenics.com 
(mailto:dsp...@optogenics.com)> 
> Cc: "Greg Farnum" <g...@inktank.com (mailto:g...@inktank.com)>, 
"ceph-devel" <ceph-devel@vger.kernel.org 
(mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" <s...@inktank.com 
(mailto:s...@inktank.com)>, "Wido den Hollander" <w...@42on.com 
(mailto:w...@42on.com)>, "Sylvain Munaut" <s.mun...@whatever-company.com 
(mailto:s.mun...@whatever-company.com)>, "Samuel Just" 
<sam.j...@inktank.com (mailto:sam.j...@inktank.com)>, "Vladislav 
Gorbunov" <vadi...@gmail.com (mailto:vadi...@gmail.com)> 
> Sent: Wednesday, March 13, 2013 3:59:03 PM 
> Subject: Re: OSD memory leaks? 
> 
> Dave, 
> 
> Just to be sure, did the log max recent=1 _completely_ stod the 
> memory leak or did it slow it down? 
> 
> Thanks! 
> -- 
> Regards, 
> Sébastien Han. 
> 
> 
> On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano <dsp...@optogenics.com 
(mailto:dsp...@optogenics.com)> wrote: 
> > Lol. I'm totally fine with that. My glance images pool isn't used too 
often. I'm going to give that a try today and see what happens. 
> > 
> > I'm still crossing my fingers, but since I added log max recent=1 
to ceph.conf, I've been okay despite the improper pg_num, and a lot of 
scrubbing/deep scrubbing yesterday. 
> > 
> > Dave Spano 
> > 
> > 
> > 
> > 
> > - Original Message - 
> > 
> > From: "Greg Farnum" <g...@inktank.com 
(mailto:g...@inktank.com)> 
> > To: "Dave Spano" <dsp...@optogenics.com 
(mailto:dsp...@optogenics.com)> 
> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org 
(mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" <s...@inktank.com 
(mailto:s...@inktank.com)>, "Wido den Hollander" <w...@42on.com 
(mailto:w...@42on.com)>, "Sylvain Munaut" <s.mun...@whatever-company.com 
(mailto:s.mun...@whatever-company.com)>, "Samuel Just" 
<sam.j...@inktank.com (mailto:sam.j...@inktank.com)>, "Vladislav 
Gorbunov" <vadi...@gmail.com (mailto:vadi...@gmail.com)>, "Sébastien Han" 
<han.sebast...@gmail.com (mailto:han.sebast...@gmail.com)> 
> > Sent: Tuesday, March 12, 2013 5:37:37 PM 
> > Subject: Re: OSD memory leaks? 
> > 
> > Yeah. There

Re: OSD memory leaks?

2013-03-13 Thread Greg Farnum
It sounds like maybe you didn't rename the new pool to use the old pool's name? 
Glance is looking for a specific pool to store its data in; I believe it's 
configurable but you'll need to do one or the other.
-Greg

On Wednesday, March 13, 2013 at 3:38 PM, Dave Spano wrote:

> Sebastien,
>  
> I'm not totally sure yet, but everything is still working.  
>  
>  
> Sage and Greg,  
> I copied my glance image pool per the posting I mentioned previously, and 
> everything works when I use the ceph tools. I can export rbds from the new 
> pool and delete them as well.
>  
> I noticed that the copied images pool does not work with glance.  
>  
> I get this error when I try to create images in the new pool. If I put the 
> old pool back, I can create images no problem.  
>  
> Is there something I'm missing in glance that I need to work with a pool 
> created in bobtail? I'm using Openstack Folsom.  
>  
> File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 437, in 
> _upload  
> image_meta['size'])  
> File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 244, in add 
>  
> image_size, order)  
> File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 207, in 
> _create_image  
> features=rbd.RBD_FEATURE_LAYERING)  
> File "/usr/lib/python2.7/dist-packages/rbd.py", line 194, in create  
> raise make_ex(ret, 'error creating image')  
> PermissionError: error creating image
>  
>  
> Dave Spano  
>  
>  
>  
>  
> - Original Message -  
>  
> From: "Sébastien Han"  (mailto:han.sebast...@gmail.com)>  
> To: "Dave Spano" mailto:dsp...@optogenics.com)>  
> Cc: "Greg Farnum" mailto:g...@inktank.com)>, "ceph-devel" 
> mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" 
> mailto:s...@inktank.com)>, "Wido den Hollander" 
> mailto:w...@42on.com)>, "Sylvain Munaut" 
> mailto:s.mun...@whatever-company.com)>, 
> "Samuel Just" mailto:sam.j...@inktank.com)>, 
> "Vladislav Gorbunov" mailto:vadi...@gmail.com)>  
> Sent: Wednesday, March 13, 2013 3:59:03 PM  
> Subject: Re: OSD memory leaks?  
>  
> Dave,  
>  
> Just to be sure, did the log max recent=1 _completely_ stod the  
> memory leak or did it slow it down?  
>  
> Thanks!  
> --  
> Regards,  
> Sébastien Han.  
>  
>  
> On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano  (mailto:dsp...@optogenics.com)> wrote:  
> > Lol. I'm totally fine with that. My glance images pool isn't used too 
> > often. I'm going to give that a try today and see what happens.  
> >  
> > I'm still crossing my fingers, but since I added log max recent=1 to 
> > ceph.conf, I've been okay despite the improper pg_num, and a lot of 
> > scrubbing/deep scrubbing yesterday.  
> >  
> > Dave Spano  
> >  
> >  
> >  
> >  
> > - Original Message -  
> >  
> > From: "Greg Farnum" mailto:g...@inktank.com)>  
> > To: "Dave Spano" mailto:dsp...@optogenics.com)>  
> > Cc: "ceph-devel"  > (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil"  > (mailto:s...@inktank.com)>, "Wido den Hollander"  > (mailto:w...@42on.com)>, "Sylvain Munaut"  > (mailto:s.mun...@whatever-company.com)>, "Samuel Just" 
> > mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov" 
> > mailto:vadi...@gmail.com)>, "Sébastien Han" 
> > mailto:han.sebast...@gmail.com)>  
> > Sent: Tuesday, March 12, 2013 5:37:37 PM  
> > Subject: Re: OSD memory leaks?  
> >  
> > Yeah. There's not anything intelligent about that cppool mechanism. :)  
> > -Greg  
> >  
> > On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote:  
> >  
> > > I'd rather shut the cloud down and copy the pool to a new one than take 
> > > any chances of corruption by using an experimental feature. My guess is 
> > > that there cannot be any i/o to the pool while copying, otherwise you'll 
> > > lose the changes that are happening during the copy, correct?  
> > >  
> > > Dave Spano  
> > > Optogenics  
> > > Systems Administrator  
> > >  
> > >  
> > >  
> > > - Original Message -  
> > >  
> > > From: "Greg Farnum" mailto:g...@inktank.com)>  
> > > To: "Sébastien Han"  > > (mailto:han.sebast...@gmail.com)>  
> > >

Re: OSD memory leaks?

2013-03-13 Thread Dave Spano
Sebastien,

I'm not totally sure yet, but everything is still working. 


Sage and Greg, 
I copied my glance image pool per the posting I mentioned previously, and 
everything works when I use the ceph tools. I can export rbds from the new pool 
and delete them as well.

I noticed that the copied images pool does not work with glance. 

I get this error when I try to create images in the new pool. If I put the old 
pool back, I can create images no problem. 

Is there something I'm missing in glance that I need to work with a pool 
created in bobtail? I'm using Openstack Folsom. 

  File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 437, in 
_upload 
image_meta['size']) 
 
  File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 244, in add 
 
image_size, order)  
 
  File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 207, in 
_create_image
features=rbd.RBD_FEATURE_LAYERING)  
 
  File "/usr/lib/python2.7/dist-packages/rbd.py", line 194, in create   
 
raise make_ex(ret, 'error creating image')  
 
PermissionError: error creating image


Dave Spano 
 



- Original Message - 

From: "Sébastien Han"  
To: "Dave Spano"  
Cc: "Greg Farnum" , "ceph-devel" 
, "Sage Weil" , "Wido den 
Hollander" , "Sylvain Munaut" , 
"Samuel Just" , "Vladislav Gorbunov"  
Sent: Wednesday, March 13, 2013 3:59:03 PM 
Subject: Re: OSD memory leaks? 

Dave, 

Just to be sure, did the log max recent=1 _completely_ stod the 
memory leak or did it slow it down? 

Thanks! 
-- 
Regards, 
Sébastien Han. 


On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano  wrote: 
> Lol. I'm totally fine with that. My glance images pool isn't used too often. 
> I'm going to give that a try today and see what happens. 
> 
> I'm still crossing my fingers, but since I added log max recent=1 to 
> ceph.conf, I've been okay despite the improper pg_num, and a lot of 
> scrubbing/deep scrubbing yesterday. 
> 
> Dave Spano 
> 
> 
> 
> 
> - Original Message - 
> 
> From: "Greg Farnum"  
> To: "Dave Spano"  
> Cc: "ceph-devel" , "Sage Weil" 
> , "Wido den Hollander" , "Sylvain Munaut" 
> , "Samuel Just" , 
> "Vladislav Gorbunov" , "Sébastien Han" 
>  
> Sent: Tuesday, March 12, 2013 5:37:37 PM 
> Subject: Re: OSD memory leaks? 
> 
> Yeah. There's not anything intelligent about that cppool mechanism. :) 
> -Greg 
> 
> On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: 
> 
>> I'd rather shut the cloud down and copy the pool to a new one than take any 
>> chances of corruption by using an experimental feature. My guess is that 
>> there cannot be any i/o to the pool while copying, otherwise you'll lose the 
>> changes that are happening during the copy, correct? 
>> 
>> Dave Spano 
>> Optogenics 
>> Systems Administrator 
>> 
>> 
>> 
>> - Original Message - 
>> 
>> From: "Greg Farnum" mailto:g...@inktank.com)> 
>> To: "Sébastien Han" > (mailto:han.sebast...@gmail.com)> 
>> Cc: "Dave Spano" mailto:dsp...@optogenics.com)>, 
>> "ceph-devel" > (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" > (mailto:s...@inktank.com)>, "Wido den Hollander" > (mailto:w...@42on.com)>, "Sylvain Munaut" > (mailto:s.mun...@whatever-company.com)>, "Samuel Just" > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov" > (mailto:vadi...@gmail.com)> 
>> Sent: Tuesday, March 12, 2013 4:20:13 PM 
>> Subject: Re: OSD memory leaks? 
>> 
>> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: 
>> > Well to avoid un necessary data movement, there is also an 
>> > _experimental_ feature to change on fly the number of PGs in a pool. 
>> > 
>> > ceph osd pool set  pg_num  --allow-experimental-feature 
>> Don't do that. We've got a set of 3 patches which fix bugs we know about 
>> that aren't in bobtail yet, and I'm sure there's more we aren't aware of… 
>> -Greg 
>> 
>> Software Engineer #42 @ http://inktank.com | h

Re: OSD memory leaks?

2013-03-13 Thread Sébastien Han
Dave,

Just to be sure, did the log max recent=1 _completely_ stod the
memory leak or did it slow it down?

Thanks!
--
Regards,
Sébastien Han.


On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano  wrote:
> Lol. I'm totally fine with that. My glance images pool isn't used too often. 
> I'm going to give that a try today and see what happens.
>
> I'm still crossing my fingers, but since I added log max recent=1 to 
> ceph.conf, I've been okay despite the improper pg_num, and a lot of 
> scrubbing/deep scrubbing yesterday.
>
> Dave Spano
>
>
>
>
> - Original Message -
>
> From: "Greg Farnum" 
> To: "Dave Spano" 
> Cc: "ceph-devel" , "Sage Weil" 
> , "Wido den Hollander" , "Sylvain Munaut" 
> , "Samuel Just" , 
> "Vladislav Gorbunov" , "Sébastien Han" 
> 
> Sent: Tuesday, March 12, 2013 5:37:37 PM
> Subject: Re: OSD memory leaks?
>
> Yeah. There's not anything intelligent about that cppool mechanism. :)
> -Greg
>
> On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote:
>
>> I'd rather shut the cloud down and copy the pool to a new one than take any 
>> chances of corruption by using an experimental feature. My guess is that 
>> there cannot be any i/o to the pool while copying, otherwise you'll lose the 
>> changes that are happening during the copy, correct?
>>
>> Dave Spano
>> Optogenics
>> Systems Administrator
>>
>>
>>
>> - Original Message -
>>
>> From: "Greg Farnum" mailto:g...@inktank.com)>
>> To: "Sébastien Han" > (mailto:han.sebast...@gmail.com)>
>> Cc: "Dave Spano" mailto:dsp...@optogenics.com)>, 
>> "ceph-devel" > (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil" > (mailto:s...@inktank.com)>, "Wido den Hollander" > (mailto:w...@42on.com)>, "Sylvain Munaut" > (mailto:s.mun...@whatever-company.com)>, "Samuel Just" > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov" > (mailto:vadi...@gmail.com)>
>> Sent: Tuesday, March 12, 2013 4:20:13 PM
>> Subject: Re: OSD memory leaks?
>>
>> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote:
>> > Well to avoid un necessary data movement, there is also an
>> > _experimental_ feature to change on fly the number of PGs in a pool.
>> >
>> > ceph osd pool set  pg_num  --allow-experimental-feature
>> Don't do that. We've got a set of 3 patches which fix bugs we know about 
>> that aren't in bobtail yet, and I'm sure there's more we aren't aware of…
>> -Greg
>>
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>> >
>> > Cheers!
>> > --
>> > Regards,
>> > Sébastien Han.
>> >
>> >
>> > On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano > > (mailto:dsp...@optogenics.com)> wrote:
>> > > Disregard my previous question. I found my answer in the post below. 
>> > > Absolutely brilliant! I thought I was screwed!
>> > >
>> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924
>> > >
>> > > Dave Spano
>> > > Optogenics
>> > > Systems Administrator
>> > >
>> > >
>> > >
>> > > - Original Message -
>> > >
>> > > From: "Dave Spano" mailto:dsp...@optogenics.com)>
>> > > To: "Sébastien Han" > > > (mailto:han.sebast...@gmail.com)>
>> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
>> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
>> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
>> > > mailto:s.mun...@whatever-company.com)>, 
>> > > "ceph-devel" > > > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just" 
>> > > mailto:sam.j...@inktank.com)>, "Vladislav 
>> > > Gorbunov" mailto:vadi...@gmail.com)>
>> > > Sent: Tuesday, March 12, 2013 1:41:21 PM
>> > > Subject: Re: OSD memory leaks?
>> > >
>> > >
>> > > If one were stupid enough to have their pg_num and pgp_num set to 8 on 
>> > > two of their pools, how could you fix that?
>> > >
>> > >
>> > > Dave Spano
>> > >
>> > >
>> > >
>> > > - Original Message 

Re: OSD memory leaks?

2013-03-13 Thread Dave Spano
Lol. I'm totally fine with that. My glance images pool isn't used too often. 
I'm going to give that a try today and see what happens. 

I'm still crossing my fingers, but since I added log max recent=1 to 
ceph.conf, I've been okay despite the improper pg_num, and a lot of 
scrubbing/deep scrubbing yesterday. 

Dave Spano 




- Original Message - 

From: "Greg Farnum"  
To: "Dave Spano"  
Cc: "ceph-devel" , "Sage Weil" , 
"Wido den Hollander" , "Sylvain Munaut" 
, "Samuel Just" , 
"Vladislav Gorbunov" , "Sébastien Han" 
 
Sent: Tuesday, March 12, 2013 5:37:37 PM 
Subject: Re: OSD memory leaks? 

Yeah. There's not anything intelligent about that cppool mechanism. :) 
-Greg 

On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: 

> I'd rather shut the cloud down and copy the pool to a new one than take any 
> chances of corruption by using an experimental feature. My guess is that 
> there cannot be any i/o to the pool while copying, otherwise you'll lose the 
> changes that are happening during the copy, correct? 
> 
> Dave Spano 
> Optogenics 
> Systems Administrator 
> 
> 
> 
> - Original Message - 
> 
> From: "Greg Farnum" mailto:g...@inktank.com)> 
> To: "Sébastien Han"  (mailto:han.sebast...@gmail.com)> 
> Cc: "Dave Spano" mailto:dsp...@optogenics.com)>, 
> "ceph-devel"  (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil"  (mailto:s...@inktank.com)>, "Wido den Hollander"  (mailto:w...@42on.com)>, "Sylvain Munaut"  (mailto:s.mun...@whatever-company.com)>, "Samuel Just"  (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  (mailto:vadi...@gmail.com)> 
> Sent: Tuesday, March 12, 2013 4:20:13 PM 
> Subject: Re: OSD memory leaks? 
> 
> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: 
> > Well to avoid un necessary data movement, there is also an 
> > _experimental_ feature to change on fly the number of PGs in a pool. 
> > 
> > ceph osd pool set  pg_num  --allow-experimental-feature 
> Don't do that. We've got a set of 3 patches which fix bugs we know about that 
> aren't in bobtail yet, and I'm sure there's more we aren't aware of… 
> -Greg 
> 
> Software Engineer #42 @ http://inktank.com | http://ceph.com 
> 
> > 
> > Cheers! 
> > -- 
> > Regards, 
> > Sébastien Han. 
> > 
> > 
> > On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano  > (mailto:dsp...@optogenics.com)> wrote: 
> > > Disregard my previous question. I found my answer in the post below. 
> > > Absolutely brilliant! I thought I was screwed! 
> > > 
> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 
> > > 
> > > Dave Spano 
> > > Optogenics 
> > > Systems Administrator 
> > > 
> > > 
> > > 
> > > - Original Message - 
> > > 
> > > From: "Dave Spano" mailto:dsp...@optogenics.com)> 
> > > To: "Sébastien Han"  > > (mailto:han.sebast...@gmail.com)> 
> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > > mailto:s.mun...@whatever-company.com)>, 
> > > "ceph-devel"  > > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  > > (mailto:vadi...@gmail.com)> 
> > > Sent: Tuesday, March 12, 2013 1:41:21 PM 
> > > Subject: Re: OSD memory leaks? 
> > > 
> > > 
> > > If one were stupid enough to have their pg_num and pgp_num set to 8 on 
> > > two of their pools, how could you fix that? 
> > > 
> > > 
> > > Dave Spano 
> > > 
> > > 
> > > 
> > > - Original Message - 
> > > 
> > > From: "Sébastien Han"  > > (mailto:han.sebast...@gmail.com)> 
> > > To: "Vladislav Gorbunov" mailto:vadi...@gmail.com)> 
> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > > mailto:s.mun...@whatever-company.com)>, 
> > > "Dave Spano" mailto:dsp...@optogenics.com)>, 
> >

Re: OSD memory leaks?

2013-03-12 Thread Greg Farnum
Yeah. There's not anything intelligent about that cppool mechanism. :)
-Greg

On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote:

> I'd rather shut the cloud down and copy the pool to a new one than take any 
> chances of corruption by using an experimental feature. My guess is that 
> there cannot be any i/o to the pool while copying, otherwise you'll lose the 
> changes that are happening during the copy, correct?  
>  
> Dave Spano  
> Optogenics  
> Systems Administrator  
>  
>  
>  
> - Original Message -  
>  
> From: "Greg Farnum" mailto:g...@inktank.com)>  
> To: "Sébastien Han"  (mailto:han.sebast...@gmail.com)>  
> Cc: "Dave Spano" mailto:dsp...@optogenics.com)>, 
> "ceph-devel"  (mailto:ceph-devel@vger.kernel.org)>, "Sage Weil"  (mailto:s...@inktank.com)>, "Wido den Hollander"  (mailto:w...@42on.com)>, "Sylvain Munaut"  (mailto:s.mun...@whatever-company.com)>, "Samuel Just"  (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  (mailto:vadi...@gmail.com)>  
> Sent: Tuesday, March 12, 2013 4:20:13 PM  
> Subject: Re: OSD memory leaks?  
>  
> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote:  
> > Well to avoid un necessary data movement, there is also an  
> > _experimental_ feature to change on fly the number of PGs in a pool.  
> >  
> > ceph osd pool set  pg_num  --allow-experimental-feature  
> Don't do that. We've got a set of 3 patches which fix bugs we know about that 
> aren't in bobtail yet, and I'm sure there's more we aren't aware of…  
> -Greg  
>  
> Software Engineer #42 @ http://inktank.com | http://ceph.com  
>  
> >  
> > Cheers!  
> > --  
> > Regards,  
> > Sébastien Han.  
> >  
> >  
> > On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano  > (mailto:dsp...@optogenics.com)> wrote:  
> > > Disregard my previous question. I found my answer in the post below. 
> > > Absolutely brilliant! I thought I was screwed!  
> > >  
> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924  
> > >  
> > > Dave Spano  
> > > Optogenics  
> > > Systems Administrator  
> > >  
> > >  
> > >  
> > > - Original Message -  
> > >  
> > > From: "Dave Spano" mailto:dsp...@optogenics.com)> 
> > >  
> > > To: "Sébastien Han"  > > (mailto:han.sebast...@gmail.com)>  
> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > > mailto:s.mun...@whatever-company.com)>, 
> > > "ceph-devel"  > > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  > > (mailto:vadi...@gmail.com)>  
> > > Sent: Tuesday, March 12, 2013 1:41:21 PM  
> > > Subject: Re: OSD memory leaks?  
> > >  
> > >  
> > > If one were stupid enough to have their pg_num and pgp_num set to 8 on 
> > > two of their pools, how could you fix that?  
> > >  
> > >  
> > > Dave Spano  
> > >  
> > >  
> > >  
> > > - Original Message -  
> > >  
> > > From: "Sébastien Han"  > > (mailto:han.sebast...@gmail.com)>  
> > > To: "Vladislav Gorbunov" mailto:vadi...@gmail.com)>  
> > > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > > mailto:s.mun...@whatever-company.com)>, 
> > > "Dave Spano" mailto:dsp...@optogenics.com)>, 
> > > "ceph-devel"  > > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > > (mailto:sam.j...@inktank.com)>  
> > > Sent: Tuesday, March 12, 2013 9:43:44 AM  
> > > Subject: Re: OSD memory leaks?  
> > >  
> > > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd  
> > > > dump | grep 'rep size'"  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > Well it's still 450 each...  
> > >  
> > > > The default pg_num value 8 is NOT suitable for big cluster.  
> &g

Re: OSD memory leaks?

2013-03-12 Thread Dave Spano
I'd rather shut the cloud down and copy the pool to a new one than take any 
chances of corruption by using an experimental feature. My guess is that there 
cannot be any i/o to the pool while copying, otherwise you'll lose the changes 
that are happening during the copy, correct? 

Dave Spano 
Optogenics 
Systems Administrator 



- Original Message - 

From: "Greg Farnum"  
To: "Sébastien Han"  
Cc: "Dave Spano" , "ceph-devel" 
, "Sage Weil" , "Wido den 
Hollander" , "Sylvain Munaut" , 
"Samuel Just" , "Vladislav Gorbunov"  
Sent: Tuesday, March 12, 2013 4:20:13 PM 
Subject: Re: OSD memory leaks? 

On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: 
> Well to avoid un necessary data movement, there is also an 
> _experimental_ feature to change on fly the number of PGs in a pool. 
> 
> ceph osd pool set  pg_num  --allow-experimental-feature 
Don't do that. We've got a set of 3 patches which fix bugs we know about that 
aren't in bobtail yet, and I'm sure there's more we aren't aware of… 
-Greg 

Software Engineer #42 @ http://inktank.com | http://ceph.com 

> 
> Cheers! 
> -- 
> Regards, 
> Sébastien Han. 
> 
> 
> On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano  (mailto:dsp...@optogenics.com)> wrote: 
> > Disregard my previous question. I found my answer in the post below. 
> > Absolutely brilliant! I thought I was screwed! 
> > 
> > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > 
> > - Original Message - 
> > 
> > From: "Dave Spano" mailto:dsp...@optogenics.com)> 
> > To: "Sébastien Han"  > (mailto:han.sebast...@gmail.com)> 
> > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > mailto:s.mun...@whatever-company.com)>, 
> > "ceph-devel"  > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  > (mailto:vadi...@gmail.com)> 
> > Sent: Tuesday, March 12, 2013 1:41:21 PM 
> > Subject: Re: OSD memory leaks? 
> > 
> > 
> > If one were stupid enough to have their pg_num and pgp_num set to 8 on two 
> > of their pools, how could you fix that? 
> > 
> > 
> > Dave Spano 
> > 
> > 
> > 
> > - Original Message - 
> > 
> > From: "Sébastien Han"  > (mailto:han.sebast...@gmail.com)> 
> > To: "Vladislav Gorbunov" mailto:vadi...@gmail.com)> 
> > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > mailto:s.mun...@whatever-company.com)>, 
> > "Dave Spano" mailto:dsp...@optogenics.com)>, 
> > "ceph-devel"  > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > (mailto:sam.j...@inktank.com)> 
> > Sent: Tuesday, March 12, 2013 9:43:44 AM 
> > Subject: Re: OSD memory leaks? 
> > 
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
> > > dump | grep 'rep size'" 
> > 
> > 
> > 
> > Well it's still 450 each... 
> > 
> > > The default pg_num value 8 is NOT suitable for big cluster. 
> > 
> > Thanks I know, I'm not new with Ceph. What's your point here? I 
> > already said that pg_num was 450... 
> > -- 
> > Regards, 
> > Sébastien Han. 
> > 
> > 
> > On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov  > (mailto:vadi...@gmail.com)> wrote: 
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
> > > dump | grep 'rep size'" 
> > > The default pg_num value 8 is NOT suitable for big cluster. 
> > > 
> > > 2013/3/13 Sébastien Han  > > (mailto:han.sebast...@gmail.com)>: 
> > > > Replica count has been set to 2. 
> > > > 
> > > > Why? 
> > > > -- 
> > > > Regards, 
> > > > Sébastien Han. 
> > > > 
> > > > 
> > > > On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  > > > (mailto:vadi...@gmail.com)> wrote: 
> > > > > >

Re: OSD memory leaks?

2013-03-12 Thread Bryan K. Wright

han.sebast...@gmail.com said:
> Well to avoid un necessary data movement, there is also an _experimental_
> feature to change on fly the number of PGs in a pool.
> ceph osd pool set  pg_num  --allow-experimental-feature 

I've been following the instructions here:

http://ceph.com/docs/master/rados/configuration/osd-config-ref/

under "data placement", trying to set the number of pgs in ceph.conf.
I've added these lines in the "global" section:

osd pool default pg num = 500
osd pool default pgp num = 500

but they don't seem to have any effect on how mkcephfs behaves.
Before I added these lines, mkcephfs created a data pool with
3904 pgs.  After wiping everything, adding the lines and 
re-creating the pool, it still ends up with 3904 pgs.  What
am I doing wrong?

Thanks,
Bryan
-- 

Bryan Wright  |"If you take cranberries and stew them like 
Physics Department| applesauce, they taste much more like prunes
University of Virginia| than rhubarb does."  --  Groucho 
Charlottesville, VA  22901| 
(434) 924-7218| br...@virginia.edu



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-12 Thread Greg Farnum
On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote:
> Well to avoid un necessary data movement, there is also an
> _experimental_ feature to change on fly the number of PGs in a pool.
>  
> ceph osd pool set  pg_num  --allow-experimental-feature
Don't do that. We've got a set of 3 patches which fix bugs we know about that 
aren't in bobtail yet, and I'm sure there's more we aren't aware of…
-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com  

>  
> Cheers!
> --
> Regards,
> Sébastien Han.
>  
>  
> On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano  (mailto:dsp...@optogenics.com)> wrote:
> > Disregard my previous question. I found my answer in the post below. 
> > Absolutely brilliant! I thought I was screwed!
> >  
> > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924
> >  
> > Dave Spano
> > Optogenics
> > Systems Administrator
> >  
> >  
> >  
> > - Original Message -
> >  
> > From: "Dave Spano" mailto:dsp...@optogenics.com)>
> > To: "Sébastien Han"  > (mailto:han.sebast...@gmail.com)>
> > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > mailto:s.mun...@whatever-company.com)>, 
> > "ceph-devel"  > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov"  > (mailto:vadi...@gmail.com)>
> > Sent: Tuesday, March 12, 2013 1:41:21 PM
> > Subject: Re: OSD memory leaks?
> >  
> >  
> > If one were stupid enough to have their pg_num and pgp_num set to 8 on two 
> > of their pools, how could you fix that?
> >  
> >  
> > Dave Spano
> >  
> >  
> >  
> > - Original Message -
> >  
> > From: "Sébastien Han"  > (mailto:han.sebast...@gmail.com)>
> > To: "Vladislav Gorbunov" mailto:vadi...@gmail.com)>
> > Cc: "Sage Weil" mailto:s...@inktank.com)>, "Wido den 
> > Hollander" mailto:w...@42on.com)>, "Gregory Farnum" 
> > mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > mailto:s.mun...@whatever-company.com)>, 
> > "Dave Spano" mailto:dsp...@optogenics.com)>, 
> > "ceph-devel"  > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just"  > (mailto:sam.j...@inktank.com)>
> > Sent: Tuesday, March 12, 2013 9:43:44 AM
> > Subject: Re: OSD memory leaks?
> >  
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
> > > dump | grep 'rep size'"
> >  
> >  
> >  
> > Well it's still 450 each...
> >  
> > > The default pg_num value 8 is NOT suitable for big cluster.
> >  
> > Thanks I know, I'm not new with Ceph. What's your point here? I
> > already said that pg_num was 450...
> > --
> > Regards,
> > Sébastien Han.
> >  
> >  
> > On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov  > (mailto:vadi...@gmail.com)> wrote:
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
> > > dump | grep 'rep size'"
> > > The default pg_num value 8 is NOT suitable for big cluster.
> > >  
> > > 2013/3/13 Sébastien Han  > > (mailto:han.sebast...@gmail.com)>:
> > > > Replica count has been set to 2.
> > > >  
> > > > Why?
> > > > --
> > > > Regards,
> > > > Sébastien Han.
> > > >  
> > > >  
> > > > On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  > > > (mailto:vadi...@gmail.com)> wrote:
> > > > > > FYI I'm using 450 pgs for my pools.
> > > > >  
> > > > >  
> > > > > Please, can you show the number of object replicas?
> > > > >  
> > > > > ceph osd dump | grep 'rep size'
> > > > >  
> > > > > Vlad Gorbunov
> > > > >  
> > > > > 2013/3/5 Sébastien Han  > > > > (mailto:han.sebast...@gmail.com)>:
> > > > > > FYI I'm using 450 pgs for my pools.
> > > > > >  
> > > > > > --
> > > > > > Regards,
> > > > > > Sébastien Han.
> > > > > >  
> > > > > >  
> > &

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
Well to avoid un necessary data movement, there is also an
_experimental_ feature to change on fly the number of PGs in a pool.

ceph osd pool set  pg_num  --allow-experimental-feature

Cheers!
--
Regards,
Sébastien Han.


On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano  wrote:
> Disregard my previous question. I found my answer in the post below. 
> Absolutely brilliant! I thought I was screwed!
>
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924
>
> Dave Spano
> Optogenics
> Systems Administrator
>
>
>
> - Original Message -
>
> From: "Dave Spano" 
> To: "Sébastien Han" 
> Cc: "Sage Weil" , "Wido den Hollander" , 
> "Gregory Farnum" , "Sylvain Munaut" 
> , "ceph-devel" , 
> "Samuel Just" , "Vladislav Gorbunov" 
> Sent: Tuesday, March 12, 2013 1:41:21 PM
> Subject: Re: OSD memory leaks?
>
>
> If one were stupid enough to have their pg_num and pgp_num set to 8 on two of 
> their pools, how could you fix that?
>
>
> Dave Spano
>
>
>
> - Original Message -
>
> From: "Sébastien Han" 
> To: "Vladislav Gorbunov" 
> Cc: "Sage Weil" , "Wido den Hollander" , 
> "Gregory Farnum" , "Sylvain Munaut" 
> , "Dave Spano" , 
> "ceph-devel" , "Samuel Just" 
> 
> Sent: Tuesday, March 12, 2013 9:43:44 AM
> Subject: Re: OSD memory leaks?
>
>>Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
>>dump | grep 'rep size'"
>
> Well it's still 450 each...
>
>>The default pg_num value 8 is NOT suitable for big cluster.
>
> Thanks I know, I'm not new with Ceph. What's your point here? I
> already said that pg_num was 450...
> --
> Regards,
> Sébastien Han.
>
>
> On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov  wrote:
>> Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
>> dump | grep 'rep size'"
>> The default pg_num value 8 is NOT suitable for big cluster.
>>
>> 2013/3/13 Sébastien Han :
>>> Replica count has been set to 2.
>>>
>>> Why?
>>> --
>>> Regards,
>>> Sébastien Han.
>>>
>>>
>>> On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  
>>> wrote:
>>>>> FYI I'm using 450 pgs for my pools.
>>>> Please, can you show the number of object replicas?
>>>>
>>>> ceph osd dump | grep 'rep size'
>>>>
>>>> Vlad Gorbunov
>>>>
>>>> 2013/3/5 Sébastien Han :
>>>>> FYI I'm using 450 pgs for my pools.
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Sébastien Han.
>>>>>
>>>>>
>>>>> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>>>>>>
>>>>>> On Fri, 1 Mar 2013, Wido den Hollander wrote:
>>>>>> > On 02/23/2013 01:44 AM, Sage Weil wrote:
>>>>>> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
>>>>>> > > > Hi all,
>>>>>> > > >
>>>>>> > > > I finally got a core dump.
>>>>>> > > >
>>>>>> > > > I did it with a kill -SEGV on the OSD process.
>>>>>> > > >
>>>>>> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>>>>> > > >
>>>>>> > > > Hope we will get something out of it :-).
>>>>>> > >
>>>>>> > > AHA! We have a theory. The pg log isnt trimmed during scrub (because 
>>>>>> > > teh
>>>>>> > > old scrub code required that), but the new (deep) scrub can take a 
>>>>>> > > very
>>>>>> > > long time, which means the pg log will eat ram in the meantime..
>>>>>> > > especially under high iops.
>>>>>> > >
>>>>>> >
>>>>>> > Does the number of PGs influence the memory leak? So my theory is that 
>>>>>> > when
>>>>>> > you have a high number of PGs with a low number of objects per PG you 
>>>>>> > don't
>>>>>> > see the memory leak.
>>>>>> >
>>>>>> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
>>

Re: OSD memory leaks?

2013-03-12 Thread Dave Spano
Disregard my previous question. I found my answer in the post below. Absolutely 
brilliant! I thought I was screwed! 

http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 

Dave Spano 
Optogenics 
Systems Administrator 



- Original Message - 

From: "Dave Spano"  
To: "Sébastien Han"  
Cc: "Sage Weil" , "Wido den Hollander" , 
"Gregory Farnum" , "Sylvain Munaut" 
, "ceph-devel" , 
"Samuel Just" , "Vladislav Gorbunov"  
Sent: Tuesday, March 12, 2013 1:41:21 PM 
Subject: Re: OSD memory leaks? 


If one were stupid enough to have their pg_num and pgp_num set to 8 on two of 
their pools, how could you fix that? 


Dave Spano 



- Original Message -

From: "Sébastien Han"  
To: "Vladislav Gorbunov"  
Cc: "Sage Weil" , "Wido den Hollander" , 
"Gregory Farnum" , "Sylvain Munaut" 
, "Dave Spano" , 
"ceph-devel" , "Samuel Just"  
Sent: Tuesday, March 12, 2013 9:43:44 AM 
Subject: Re: OSD memory leaks? 

>Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
>dump | grep 'rep size'" 

Well it's still 450 each... 

>The default pg_num value 8 is NOT suitable for big cluster. 

Thanks I know, I'm not new with Ceph. What's your point here? I 
already said that pg_num was 450... 
-- 
Regards, 
Sébastien Han. 


On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov  wrote: 
> Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
> dump | grep 'rep size'" 
> The default pg_num value 8 is NOT suitable for big cluster. 
> 
> 2013/3/13 Sébastien Han : 
>> Replica count has been set to 2. 
>> 
>> Why? 
>> -- 
>> Regards, 
>> Sébastien Han. 
>> 
>> 
>> On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  
>> wrote: 
>>>> FYI I'm using 450 pgs for my pools. 
>>> Please, can you show the number of object replicas? 
>>> 
>>> ceph osd dump | grep 'rep size' 
>>> 
>>> Vlad Gorbunov 
>>> 
>>> 2013/3/5 Sébastien Han : 
>>>> FYI I'm using 450 pgs for my pools. 
>>>> 
>>>> -- 
>>>> Regards, 
>>>> Sébastien Han. 
>>>> 
>>>> 
>>>> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote: 
>>>>> 
>>>>> On Fri, 1 Mar 2013, Wido den Hollander wrote: 
>>>>> > On 02/23/2013 01:44 AM, Sage Weil wrote: 
>>>>> > > On Fri, 22 Feb 2013, S?bastien Han wrote: 
>>>>> > > > Hi all, 
>>>>> > > > 
>>>>> > > > I finally got a core dump. 
>>>>> > > > 
>>>>> > > > I did it with a kill -SEGV on the OSD process. 
>>>>> > > > 
>>>>> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>>>> > > >  
>>>>> > > > 
>>>>> > > > Hope we will get something out of it :-). 
>>>>> > > 
>>>>> > > AHA! We have a theory. The pg log isnt trimmed during scrub (because 
>>>>> > > teh 
>>>>> > > old scrub code required that), but the new (deep) scrub can take a 
>>>>> > > very 
>>>>> > > long time, which means the pg log will eat ram in the meantime.. 
>>>>> > > especially under high iops. 
>>>>> > > 
>>>>> > 
>>>>> > Does the number of PGs influence the memory leak? So my theory is that 
>>>>> > when 
>>>>> > you have a high number of PGs with a low number of objects per PG you 
>>>>> > don't 
>>>>> > see the memory leak. 
>>>>> > 
>>>>> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
>>>>> > after 
>>>>> > going to 1024 PGs in a new pool it seemed to be resolved. 
>>>>> > 
>>>>> > I've asked somebody else to try your patch since he's still seeing it 
>>>>> > on his 
>>>>> > systems. Hopefully that gives us some results. 
>>>>> 
>>>>> The PGs were active+clean when you saw the leak? There is a problem (that 
>>>>> we just fixed in master) where pg logs aren't trimmed for degraded PGs. 
>>>>> 
>>>>> sage 
>>>>> 
>>>>> 

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
>Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
>dump | grep 'rep size'"

Well it's still 450 each...

>The default pg_num value 8 is NOT suitable for big cluster.

Thanks I know, I'm not new with Ceph. What's your point here? I
already said that pg_num was 450...
--
Regards,
Sébastien Han.


On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov  wrote:
> Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
> dump | grep 'rep size'"
> The default pg_num value 8 is NOT suitable for big cluster.
>
> 2013/3/13 Sébastien Han :
>> Replica count has been set to 2.
>>
>> Why?
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>> On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  
>> wrote:
 FYI I'm using 450 pgs for my pools.
>>> Please, can you show the number of object replicas?
>>>
>>> ceph osd dump | grep 'rep size'
>>>
>>> Vlad Gorbunov
>>>
>>> 2013/3/5 Sébastien Han :
 FYI I'm using 450 pgs for my pools.

 --
 Regards,
 Sébastien Han.


 On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>
> On Fri, 1 Mar 2013, Wido den Hollander wrote:
> > On 02/23/2013 01:44 AM, Sage Weil wrote:
> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
> > > > Hi all,
> > > >
> > > > I finally got a core dump.
> > > >
> > > > I did it with a kill -SEGV on the OSD process.
> > > >
> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> > > >
> > > > Hope we will get something out of it :-).
> > >
> > > AHA!  We have a theory.  The pg log isnt trimmed during scrub 
> > > (because teh
> > > old scrub code required that), but the new (deep) scrub can take a 
> > > very
> > > long time, which means the pg log will eat ram in the meantime..
> > > especially under high iops.
> > >
> >
> > Does the number of PGs influence the memory leak? So my theory is that 
> > when
> > you have a high number of PGs with a low number of objects per PG you 
> > don't
> > see the memory leak.
> >
> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
> > after
> > going to 1024 PGs in a new pool it seemed to be resolved.
> >
> > I've asked somebody else to try your patch since he's still seeing it 
> > on his
> > systems. Hopefully that gives us some results.
>
> The PGs were active+clean when you saw the leak?  There is a problem (that
> we just fixed in master) where pg logs aren't trimmed for degraded PGs.
>
> sage
>
> >
> > Wido
> >
> > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and 
> > > see
> > > if that seems to work?  Note that that patch shouldn't be run in a 
> > > mixed
> > > argonaut+bobtail cluster, since it isn't properly checking if the 
> > > scrub is
> > > class or chunky/deep.
> > >
> > > Thanks!
> > > sage
> > >
> > >
> > >   > --
> > > > Regards,
> > > > S?bastien Han.
> > > >
> > > >
> > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
> > > > wrote:
> > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
> > > > > 
> > > > > wrote:
> > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that 
> > > > > > > active
> > > > > > > use
> > > > > > > of the memory profiler will itself cause memory usage to 
> > > > > > > increase ?
> > > > > > > this sounds a bit like that to me since it's staying stable 
> > > > > > > at a
> > > > > > > large
> > > > > > > but finite portion of total memory.
> > > > > >
> > > > > > Well, the memory consumption was already high before the 
> > > > > > profiler was
> > > > > > started. So yes with the memory profiler enable an OSD might 
> > > > > > consume
> > > > > > more memory but this doesn't cause the memory leaks.
> > > > >
> > > > > My concern is that maybe you saw a leak but when you restarted 
> > > > > with
> > > > > the memory profiling you lost whatever conditions caused it.
> > > > >
> > > > > > Any ideas? Nothing to say about my scrumbing theory?
> > > > > I like it, but Sam indicates that without some heap dumps which
> > > > > capture the actual leak then scrub is too large to effectively 
> > > > > code
> > > > > review for leaks. :(
> > > > > -Greg
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe 
> > > > ceph-devel" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > > in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: OSD memory leaks?

2013-03-12 Thread Vladislav Gorbunov
Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd
dump | grep 'rep size'"
The default pg_num value 8 is NOT suitable for big cluster.

2013/3/13 Sébastien Han :
> Replica count has been set to 2.
>
> Why?
> --
> Regards,
> Sébastien Han.
>
>
> On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  
> wrote:
>>> FYI I'm using 450 pgs for my pools.
>> Please, can you show the number of object replicas?
>>
>> ceph osd dump | grep 'rep size'
>>
>> Vlad Gorbunov
>>
>> 2013/3/5 Sébastien Han :
>>> FYI I'm using 450 pgs for my pools.
>>>
>>> --
>>> Regards,
>>> Sébastien Han.
>>>
>>>
>>> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:

 On Fri, 1 Mar 2013, Wido den Hollander wrote:
 > On 02/23/2013 01:44 AM, Sage Weil wrote:
 > > On Fri, 22 Feb 2013, S?bastien Han wrote:
 > > > Hi all,
 > > >
 > > > I finally got a core dump.
 > > >
 > > > I did it with a kill -SEGV on the OSD process.
 > > >
 > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
 > > >
 > > > Hope we will get something out of it :-).
 > >
 > > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because 
 > > teh
 > > old scrub code required that), but the new (deep) scrub can take a very
 > > long time, which means the pg log will eat ram in the meantime..
 > > especially under high iops.
 > >
 >
 > Does the number of PGs influence the memory leak? So my theory is that 
 > when
 > you have a high number of PGs with a low number of objects per PG you 
 > don't
 > see the memory leak.
 >
 > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
 > after
 > going to 1024 PGs in a new pool it seemed to be resolved.
 >
 > I've asked somebody else to try your patch since he's still seeing it on 
 > his
 > systems. Hopefully that gives us some results.

 The PGs were active+clean when you saw the leak?  There is a problem (that
 we just fixed in master) where pg logs aren't trimmed for degraded PGs.

 sage

 >
 > Wido
 >
 > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and 
 > > see
 > > if that seems to work?  Note that that patch shouldn't be run in a 
 > > mixed
 > > argonaut+bobtail cluster, since it isn't properly checking if the 
 > > scrub is
 > > class or chunky/deep.
 > >
 > > Thanks!
 > > sage
 > >
 > >
 > >   > --
 > > > Regards,
 > > > S?bastien Han.
 > > >
 > > >
 > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
 > > > wrote:
 > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
 > > > > 
 > > > > wrote:
 > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that 
 > > > > > > active
 > > > > > > use
 > > > > > > of the memory profiler will itself cause memory usage to 
 > > > > > > increase ?
 > > > > > > this sounds a bit like that to me since it's staying stable at 
 > > > > > > a
 > > > > > > large
 > > > > > > but finite portion of total memory.
 > > > > >
 > > > > > Well, the memory consumption was already high before the 
 > > > > > profiler was
 > > > > > started. So yes with the memory profiler enable an OSD might 
 > > > > > consume
 > > > > > more memory but this doesn't cause the memory leaks.
 > > > >
 > > > > My concern is that maybe you saw a leak but when you restarted with
 > > > > the memory profiling you lost whatever conditions caused it.
 > > > >
 > > > > > Any ideas? Nothing to say about my scrumbing theory?
 > > > > I like it, but Sam indicates that without some heap dumps which
 > > > > capture the actual leak then scrub is too large to effectively code
 > > > > review for leaks. :(
 > > > > -Greg
 > > > --
 > > > To unsubscribe from this list: send the line "unsubscribe 
 > > > ceph-devel" in
 > > > the body of a message to majord...@vger.kernel.org
 > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > > >
 > > >
 > > --
 > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
 > > in
 > > the body of a message to majord...@vger.kernel.org
 > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > >
 >
 >
 > --
 > Wido den Hollander
 > 42on B.V.
 >
 > Phone: +31 (0)20 700 9902
 > Skype: contact42on
 >
 >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
Replica count has been set to 2.

Why?
--
Regards,
Sébastien Han.


On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov  wrote:
>> FYI I'm using 450 pgs for my pools.
> Please, can you show the number of object replicas?
>
> ceph osd dump | grep 'rep size'
>
> Vlad Gorbunov
>
> 2013/3/5 Sébastien Han :
>> FYI I'm using 450 pgs for my pools.
>>
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>>>
>>> On Fri, 1 Mar 2013, Wido den Hollander wrote:
>>> > On 02/23/2013 01:44 AM, Sage Weil wrote:
>>> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
>>> > > > Hi all,
>>> > > >
>>> > > > I finally got a core dump.
>>> > > >
>>> > > > I did it with a kill -SEGV on the OSD process.
>>> > > >
>>> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>> > > >
>>> > > > Hope we will get something out of it :-).
>>> > >
>>> > > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because 
>>> > > teh
>>> > > old scrub code required that), but the new (deep) scrub can take a very
>>> > > long time, which means the pg log will eat ram in the meantime..
>>> > > especially under high iops.
>>> > >
>>> >
>>> > Does the number of PGs influence the memory leak? So my theory is that 
>>> > when
>>> > you have a high number of PGs with a low number of objects per PG you 
>>> > don't
>>> > see the memory leak.
>>> >
>>> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
>>> > after
>>> > going to 1024 PGs in a new pool it seemed to be resolved.
>>> >
>>> > I've asked somebody else to try your patch since he's still seeing it on 
>>> > his
>>> > systems. Hopefully that gives us some results.
>>>
>>> The PGs were active+clean when you saw the leak?  There is a problem (that
>>> we just fixed in master) where pg logs aren't trimmed for degraded PGs.
>>>
>>> sage
>>>
>>> >
>>> > Wido
>>> >
>>> > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
>>> > > if that seems to work?  Note that that patch shouldn't be run in a mixed
>>> > > argonaut+bobtail cluster, since it isn't properly checking if the scrub 
>>> > > is
>>> > > class or chunky/deep.
>>> > >
>>> > > Thanks!
>>> > > sage
>>> > >
>>> > >
>>> > >   > --
>>> > > > Regards,
>>> > > > S?bastien Han.
>>> > > >
>>> > > >
>>> > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
>>> > > > wrote:
>>> > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
>>> > > > > 
>>> > > > > wrote:
>>> > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that 
>>> > > > > > > active
>>> > > > > > > use
>>> > > > > > > of the memory profiler will itself cause memory usage to 
>>> > > > > > > increase ?
>>> > > > > > > this sounds a bit like that to me since it's staying stable at a
>>> > > > > > > large
>>> > > > > > > but finite portion of total memory.
>>> > > > > >
>>> > > > > > Well, the memory consumption was already high before the profiler 
>>> > > > > > was
>>> > > > > > started. So yes with the memory profiler enable an OSD might 
>>> > > > > > consume
>>> > > > > > more memory but this doesn't cause the memory leaks.
>>> > > > >
>>> > > > > My concern is that maybe you saw a leak but when you restarted with
>>> > > > > the memory profiling you lost whatever conditions caused it.
>>> > > > >
>>> > > > > > Any ideas? Nothing to say about my scrumbing theory?
>>> > > > > I like it, but Sam indicates that without some heap dumps which
>>> > > > > capture the actual leak then scrub is too large to effectively code
>>> > > > > review for leaks. :(
>>> > > > > -Greg
>>> > > > --
>>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>>> > > > in
>>> > > > the body of a message to majord...@vger.kernel.org
>>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > > >
>>> > > >
>>> > > --
>>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > > the body of a message to majord...@vger.kernel.org
>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > >
>>> >
>>> >
>>> > --
>>> > Wido den Hollander
>>> > 42on B.V.
>>> >
>>> > Phone: +31 (0)20 700 9902
>>> > Skype: contact42on
>>> >
>>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-12 Thread Vladislav Gorbunov
> FYI I'm using 450 pgs for my pools.
Please, can you show the number of object replicas?

ceph osd dump | grep 'rep size'

Vlad Gorbunov

2013/3/5 Sébastien Han :
> FYI I'm using 450 pgs for my pools.
>
> --
> Regards,
> Sébastien Han.
>
>
> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>>
>> On Fri, 1 Mar 2013, Wido den Hollander wrote:
>> > On 02/23/2013 01:44 AM, Sage Weil wrote:
>> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
>> > > > Hi all,
>> > > >
>> > > > I finally got a core dump.
>> > > >
>> > > > I did it with a kill -SEGV on the OSD process.
>> > > >
>> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>> > > >
>> > > > Hope we will get something out of it :-).
>> > >
>> > > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because 
>> > > teh
>> > > old scrub code required that), but the new (deep) scrub can take a very
>> > > long time, which means the pg log will eat ram in the meantime..
>> > > especially under high iops.
>> > >
>> >
>> > Does the number of PGs influence the memory leak? So my theory is that when
>> > you have a high number of PGs with a low number of objects per PG you don't
>> > see the memory leak.
>> >
>> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
>> > after
>> > going to 1024 PGs in a new pool it seemed to be resolved.
>> >
>> > I've asked somebody else to try your patch since he's still seeing it on 
>> > his
>> > systems. Hopefully that gives us some results.
>>
>> The PGs were active+clean when you saw the leak?  There is a problem (that
>> we just fixed in master) where pg logs aren't trimmed for degraded PGs.
>>
>> sage
>>
>> >
>> > Wido
>> >
>> > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
>> > > if that seems to work?  Note that that patch shouldn't be run in a mixed
>> > > argonaut+bobtail cluster, since it isn't properly checking if the scrub 
>> > > is
>> > > class or chunky/deep.
>> > >
>> > > Thanks!
>> > > sage
>> > >
>> > >
>> > >   > --
>> > > > Regards,
>> > > > S?bastien Han.
>> > > >
>> > > >
>> > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
>> > > > wrote:
>> > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
>> > > > > 
>> > > > > wrote:
>> > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that 
>> > > > > > > active
>> > > > > > > use
>> > > > > > > of the memory profiler will itself cause memory usage to 
>> > > > > > > increase ?
>> > > > > > > this sounds a bit like that to me since it's staying stable at a
>> > > > > > > large
>> > > > > > > but finite portion of total memory.
>> > > > > >
>> > > > > > Well, the memory consumption was already high before the profiler 
>> > > > > > was
>> > > > > > started. So yes with the memory profiler enable an OSD might 
>> > > > > > consume
>> > > > > > more memory but this doesn't cause the memory leaks.
>> > > > >
>> > > > > My concern is that maybe you saw a leak but when you restarted with
>> > > > > the memory profiling you lost whatever conditions caused it.
>> > > > >
>> > > > > > Any ideas? Nothing to say about my scrumbing theory?
>> > > > > I like it, but Sam indicates that without some heap dumps which
>> > > > > capture the actual leak then scrub is too large to effectively code
>> > > > > review for leaks. :(
>> > > > > -Greg
>> > > > --
>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> > > > in
>> > > > the body of a message to majord...@vger.kernel.org
>> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > > >
>> > > >
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > the body of a message to majord...@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >
>> >
>> >
>> > --
>> > Wido den Hollander
>> > 42on B.V.
>> >
>> > Phone: +31 (0)20 700 9902
>> > Skype: contact42on
>> >
>> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-11 Thread Sébastien Han
Dave,

It still a production platform so no I didn't try it. I've also found
that now ceph-mon are constantly leaking... I truly hope your log max
recent = 1 will help.

Cheers.
--
Regards,
Sébastien Han.


On Mon, Mar 11, 2013 at 7:43 PM, Dave Spano  wrote:
> Sebastien,
>
> Did the patch that Sage mentioned work for you? I've found that this behavior 
> is happening more frequently with my first osd during deep scrubbing on 
> version 0.56.3. OOM Killer now goes after the ceph-osd process after a couple 
> of days.
>
> Sage,
> Yesterday after following the OSD memory requirements thread, I added log max 
> recent = 1 to ceph.conf, and osd.0 seems to have returned to a state of 
> normalcy. If it makes it through a deep scrubbing with no problem, I'll be 
> very happy.
>
>
>>> s...@inktank.com said:
>>>> - pg log trimming (probably a conservative subset) to avoid memory bloat
>>>
>>> Anything that reduces the size of OSD processes would be appreciated.
>>> You can probably do this with just
>>>  log max recent = 1000
>>> By default it's keeping 100k lines of logs in memory, which can eat a lot  
>>> of
>>> ram (but is great when debugging issues).
>
> Dave Spano
>
>
>
>
> - Original Message -
> From: "Sébastien Han" 
> To: "Sage Weil" 
> Cc: "Wido den Hollander" , "Gregory Farnum" 
> , "Sylvain Munaut" , "Dave 
> Spano" , "ceph-devel" , 
> "Samuel Just" 
> Sent: Monday, March 4, 2013 12:11:22 PM
> Subject: Re: OSD memory leaks?
>
> FYI I'm using 450 pgs for my pools.
>
> --
> Regards,
> Sébastien Han.
>
>
> On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>>
>> On Fri, 1 Mar 2013, Wido den Hollander wrote:
>> > On 02/23/2013 01:44 AM, Sage Weil wrote:
>> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
>> > > > Hi all,
>> > > >
>> > > > I finally got a core dump.
>> > > >
>> > > > I did it with a kill -SEGV on the OSD process.
>> > > >
>> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>> > > >
>> > > > Hope we will get something out of it :-).
>> > >
>> > > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because 
>> > > teh
>> > > old scrub code required that), but the new (deep) scrub can take a very
>> > > long time, which means the pg log will eat ram in the meantime..
>> > > especially under high iops.
>> > >
>> >
>> > Does the number of PGs influence the memory leak? So my theory is that when
>> > you have a high number of PGs with a low number of objects per PG you don't
>> > see the memory leak.
>> >
>> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
>> > after
>> > going to 1024 PGs in a new pool it seemed to be resolved.
>> >
>> > I've asked somebody else to try your patch since he's still seeing it on 
>> > his
>> > systems. Hopefully that gives us some results.
>>
>> The PGs were active+clean when you saw the leak?  There is a problem (that
>> we just fixed in master) where pg logs aren't trimmed for degraded PGs.
>>
>> sage
>>
>> >
>> > Wido
>> >
>> > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
>> > > if that seems to work?  Note that that patch shouldn't be run in a mixed
>> > > argonaut+bobtail cluster, since it isn't properly checking if the scrub 
>> > > is
>> > > class or chunky/deep.
>> > >
>> > > Thanks!
>> > > sage
>> > >
>> > >
>> > >   > --
>> > > > Regards,
>> > > > S?bastien Han.
>> > > >
>> > > >
>> > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
>> > > > wrote:
>> > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
>> > > > > 
>> > > > > wrote:
>> > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that 
>> > > > > > > active
>> > > > > > > use
>> > > > > > > of the memory profiler will itself cause memory usage to 
>> > > > > > > increase ?
>> > > > > > > this sound

Re: OSD memory leaks?

2013-03-04 Thread Sébastien Han
FYI I'm using 450 pgs for my pools.

--
Regards,
Sébastien Han.


On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil  wrote:
>
> On Fri, 1 Mar 2013, Wido den Hollander wrote:
> > On 02/23/2013 01:44 AM, Sage Weil wrote:
> > > On Fri, 22 Feb 2013, S?bastien Han wrote:
> > > > Hi all,
> > > >
> > > > I finally got a core dump.
> > > >
> > > > I did it with a kill -SEGV on the OSD process.
> > > >
> > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> > > >
> > > > Hope we will get something out of it :-).
> > >
> > > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
> > > old scrub code required that), but the new (deep) scrub can take a very
> > > long time, which means the pg log will eat ram in the meantime..
> > > especially under high iops.
> > >
> >
> > Does the number of PGs influence the memory leak? So my theory is that when
> > you have a high number of PGs with a low number of objects per PG you don't
> > see the memory leak.
> >
> > I saw the memory leak on a RBD system where a pool had just 8 PGs, but after
> > going to 1024 PGs in a new pool it seemed to be resolved.
> >
> > I've asked somebody else to try your patch since he's still seeing it on his
> > systems. Hopefully that gives us some results.
>
> The PGs were active+clean when you saw the leak?  There is a problem (that
> we just fixed in master) where pg logs aren't trimmed for degraded PGs.
>
> sage
>
> >
> > Wido
> >
> > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
> > > if that seems to work?  Note that that patch shouldn't be run in a mixed
> > > argonaut+bobtail cluster, since it isn't properly checking if the scrub is
> > > class or chunky/deep.
> > >
> > > Thanks!
> > > sage
> > >
> > >
> > >   > --
> > > > Regards,
> > > > S?bastien Han.
> > > >
> > > >
> > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  
> > > > wrote:
> > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
> > > > > 
> > > > > wrote:
> > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that active
> > > > > > > use
> > > > > > > of the memory profiler will itself cause memory usage to increase 
> > > > > > > ?
> > > > > > > this sounds a bit like that to me since it's staying stable at a
> > > > > > > large
> > > > > > > but finite portion of total memory.
> > > > > >
> > > > > > Well, the memory consumption was already high before the profiler 
> > > > > > was
> > > > > > started. So yes with the memory profiler enable an OSD might consume
> > > > > > more memory but this doesn't cause the memory leaks.
> > > > >
> > > > > My concern is that maybe you saw a leak but when you restarted with
> > > > > the memory profiling you lost whatever conditions caused it.
> > > > >
> > > > > > Any ideas? Nothing to say about my scrumbing theory?
> > > > > I like it, but Sam indicates that without some heap dumps which
> > > > > capture the actual leak then scrub is too large to effectively code
> > > > > review for leaks. :(
> > > > > -Greg
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> >
> >
> > --
> > Wido den Hollander
> > 42on B.V.
> >
> > Phone: +31 (0)20 700 9902
> > Skype: contact42on
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Sage Weil
On Fri, 1 Mar 2013, Wido den Hollander wrote:
> On 02/23/2013 01:44 AM, Sage Weil wrote:
> > On Fri, 22 Feb 2013, S?bastien Han wrote:
> > > Hi all,
> > > 
> > > I finally got a core dump.
> > > 
> > > I did it with a kill -SEGV on the OSD process.
> > > 
> > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> > > 
> > > Hope we will get something out of it :-).
> > 
> > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
> > old scrub code required that), but the new (deep) scrub can take a very
> > long time, which means the pg log will eat ram in the meantime..
> > especially under high iops.
> > 
> 
> Does the number of PGs influence the memory leak? So my theory is that when
> you have a high number of PGs with a low number of objects per PG you don't
> see the memory leak.
> 
> I saw the memory leak on a RBD system where a pool had just 8 PGs, but after
> going to 1024 PGs in a new pool it seemed to be resolved.
> 
> I've asked somebody else to try your patch since he's still seeing it on his
> systems. Hopefully that gives us some results.

The PGs were active+clean when you saw the leak?  There is a problem (that 
we just fixed in master) where pg logs aren't trimmed for degraded PGs.

sage

> 
> Wido
> 
> > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
> > if that seems to work?  Note that that patch shouldn't be run in a mixed
> > argonaut+bobtail cluster, since it isn't properly checking if the scrub is
> > class or chunky/deep.
> > 
> > Thanks!
> > sage
> > 
> > 
> >   > --
> > > Regards,
> > > S?bastien Han.
> > > 
> > > 
> > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
> > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
> > > > wrote:
> > > > > > Is osd.1 using the heap profiler as well? Keep in mind that active
> > > > > > use
> > > > > > of the memory profiler will itself cause memory usage to increase ?
> > > > > > this sounds a bit like that to me since it's staying stable at a
> > > > > > large
> > > > > > but finite portion of total memory.
> > > > > 
> > > > > Well, the memory consumption was already high before the profiler was
> > > > > started. So yes with the memory profiler enable an OSD might consume
> > > > > more memory but this doesn't cause the memory leaks.
> > > > 
> > > > My concern is that maybe you saw a leak but when you restarted with
> > > > the memory profiling you lost whatever conditions caused it.
> > > > 
> > > > > Any ideas? Nothing to say about my scrumbing theory?
> > > > I like it, but Sam indicates that without some heap dumps which
> > > > capture the actual leak then scrub is too large to effectively code
> > > > review for leaks. :(
> > > > -Greg
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Samuel Just
That pattern would seem to support the log trimming theory of the leak.
-Sam

On Fri, Mar 1, 2013 at 7:51 AM, Wido den Hollander  wrote:
> On 02/23/2013 01:44 AM, Sage Weil wrote:
>>
>> On Fri, 22 Feb 2013, S?bastien Han wrote:
>>>
>>> Hi all,
>>>
>>> I finally got a core dump.
>>>
>>> I did it with a kill -SEGV on the OSD process.
>>>
>>>
>>> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>>
>>> Hope we will get something out of it :-).
>>
>>
>> AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
>> old scrub code required that), but the new (deep) scrub can take a very
>> long time, which means the pg log will eat ram in the meantime..
>> especially under high iops.
>>
>
> Does the number of PGs influence the memory leak? So my theory is that when
> you have a high number of PGs with a low number of objects per PG you don't
> see the memory leak.
>
> I saw the memory leak on a RBD system where a pool had just 8 PGs, but after
> going to 1024 PGs in a new pool it seemed to be resolved.
>
> I've asked somebody else to try your patch since he's still seeing it on his
> systems. Hopefully that gives us some results.
>
> Wido
>
>
>> Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
>> if that seems to work?  Note that that patch shouldn't be run in a mixed
>> argonaut+bobtail cluster, since it isn't properly checking if the scrub is
>> class or chunky/deep.
>>
>> Thanks!
>> sage
>>
>>
>>   > --
>>>
>>> Regards,
>>> S?bastien Han.
>>>
>>>
>>> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:

 On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
 wrote:
>>
>> Is osd.1 using the heap profiler as well? Keep in mind that active use
>> of the memory profiler will itself cause memory usage to increase ?
>> this sounds a bit like that to me since it's staying stable at a large
>> but finite portion of total memory.
>
>
> Well, the memory consumption was already high before the profiler was
> started. So yes with the memory profiler enable an OSD might consume
> more memory but this doesn't cause the memory leaks.


 My concern is that maybe you saw a leak but when you restarted with
 the memory profiling you lost whatever conditions caused it.

> Any ideas? Nothing to say about my scrumbing theory?

 I like it, but Sam indicates that without some heap dumps which
 capture the actual leak then scrub is too large to effectively code
 review for leaks. :(
 -Greg
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Wido den Hollander

On 02/23/2013 01:44 AM, Sage Weil wrote:

On Fri, 22 Feb 2013, S?bastien Han wrote:

Hi all,

I finally got a core dump.

I did it with a kill -SEGV on the OSD process.

https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008

Hope we will get something out of it :-).


AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
old scrub code required that), but the new (deep) scrub can take a very
long time, which means the pg log will eat ram in the meantime..
especially under high iops.



Does the number of PGs influence the memory leak? So my theory is that 
when you have a high number of PGs with a low number of objects per PG 
you don't see the memory leak.


I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
after going to 1024 PGs in a new pool it seemed to be resolved.


I've asked somebody else to try your patch since he's still seeing it on 
his systems. Hopefully that gives us some results.


Wido


Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
if that seems to work?  Note that that patch shouldn't be run in a mixed
argonaut+bobtail cluster, since it isn't properly checking if the scrub is
class or chunky/deep.

Thanks!
sage


  > --

Regards,
S?bastien Han.


On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:

On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  wrote:

Is osd.1 using the heap profiler as well? Keep in mind that active use
of the memory profiler will itself cause memory usage to increase ?
this sounds a bit like that to me since it's staying stable at a large
but finite portion of total memory.


Well, the memory consumption was already high before the profiler was
started. So yes with the memory profiler enable an OSD might consume
more memory but this doesn't cause the memory leaks.


My concern is that maybe you saw a leak but when you restarted with
the memory profiling you lost whatever conditions caused it.


Any ideas? Nothing to say about my scrumbing theory?

I like it, but Sam indicates that without some heap dumps which
capture the actual leak then scrub is too large to effectively code
review for leaks. :(
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-25 Thread Sébastien Han
Ok thanks guys. Hope we will find something :-).
--
Regards,
Sébastien Han.


On Mon, Feb 25, 2013 at 8:51 AM, Wido den Hollander  wrote:
> On 02/25/2013 01:21 AM, Sage Weil wrote:
>>
>> On Mon, 25 Feb 2013, S?bastien Han wrote:
>>>
>>> Hi Sage,
>>>
>>> Sorry it's a production system, so I can't test it.
>>> So at the end, you can't get anything out of the core dump?
>>
>>
>> I saw a bunch of dup object anmes, which is what led us to the pg log
>> theory.  I can look a bit more carefully to confirm, but in the end it
>> would be nice to see users scrubbing without leaking.
>>
>> This may be a bit moot because we want to allow trimming for other
>> reasons, so those patches are being tested and working their way into
>> master.  We'll backport when things are solid.
>>
>> In the meantime, if someone has been able to reproduce this in a test
>> environment, testing is obviously welcome :)
>>
>
> I'll see what I can do later this week. I know of a cluster which has the
> same issues which is in semi-production as far as I know.
>
> Wido
>
>
>> sage
>>
>>
>>
>>
>>   >
>>>
>>> --
>>> Regards,
>>> S?bastien Han.
>>>
>>>
>>> On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil  wrote:

 On Fri, 22 Feb 2013, S?bastien Han wrote:
>
> Hi all,
>
> I finally got a core dump.
>
> I did it with a kill -SEGV on the OSD process.
>
>
> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>
> Hope we will get something out of it :-).


 AHA!  We have a theory.  The pg log isnt trimmed during scrub (because
 teh
 old scrub code required that), but the new (deep) scrub can take a very
 long time, which means the pg log will eat ram in the meantime..
 especially under high iops.

 Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
 if that seems to work?  Note that that patch shouldn't be run in a mixed
 argonaut+bobtail cluster, since it isn't properly checking if the scrub
 is
 class or chunky/deep.

 Thanks!
 sage


   > --
>
> Regards,
> S?bastien Han.
>
>
> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum 
> wrote:
>>
>> On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han
>>  wrote:

 Is osd.1 using the heap profiler as well? Keep in mind that active
 use
 of the memory profiler will itself cause memory usage to increase ?
 this sounds a bit like that to me since it's staying stable at a
 large
 but finite portion of total memory.
>>>
>>>
>>> Well, the memory consumption was already high before the profiler was
>>> started. So yes with the memory profiler enable an OSD might consume
>>> more memory but this doesn't cause the memory leaks.
>>
>>
>> My concern is that maybe you saw a leak but when you restarted with
>> the memory profiling you lost whatever conditions caused it.
>>
>>> Any ideas? Nothing to say about my scrumbing theory?
>>
>> I like it, but Sam indicates that without some heap dumps which
>> capture the actual leak then scrub is too large to effectively code
>> review for leaks. :(
>> -Greg
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-24 Thread Wido den Hollander

On 02/25/2013 01:21 AM, Sage Weil wrote:

On Mon, 25 Feb 2013, S?bastien Han wrote:

Hi Sage,

Sorry it's a production system, so I can't test it.
So at the end, you can't get anything out of the core dump?


I saw a bunch of dup object anmes, which is what led us to the pg log
theory.  I can look a bit more carefully to confirm, but in the end it
would be nice to see users scrubbing without leaking.

This may be a bit moot because we want to allow trimming for other
reasons, so those patches are being tested and working their way into
master.  We'll backport when things are solid.

In the meantime, if someone has been able to reproduce this in a test
environment, testing is obviously welcome :)



I'll see what I can do later this week. I know of a cluster which has 
the same issues which is in semi-production as far as I know.


Wido


sage




  >

--
Regards,
S?bastien Han.


On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil  wrote:

On Fri, 22 Feb 2013, S?bastien Han wrote:

Hi all,

I finally got a core dump.

I did it with a kill -SEGV on the OSD process.

https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008

Hope we will get something out of it :-).


AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
old scrub code required that), but the new (deep) scrub can take a very
long time, which means the pg log will eat ram in the meantime..
especially under high iops.

Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
if that seems to work?  Note that that patch shouldn't be run in a mixed
argonaut+bobtail cluster, since it isn't properly checking if the scrub is
class or chunky/deep.

Thanks!
sage


  > --

Regards,
S?bastien Han.


On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:

On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  wrote:

Is osd.1 using the heap profiler as well? Keep in mind that active use
of the memory profiler will itself cause memory usage to increase ?
this sounds a bit like that to me since it's staying stable at a large
but finite portion of total memory.


Well, the memory consumption was already high before the profiler was
started. So yes with the memory profiler enable an OSD might consume
more memory but this doesn't cause the memory leaks.


My concern is that maybe you saw a leak but when you restarted with
the memory profiling you lost whatever conditions caused it.


Any ideas? Nothing to say about my scrumbing theory?

I like it, but Sam indicates that without some heap dumps which
capture the actual leak then scrub is too large to effectively code
review for leaks. :(
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-24 Thread Sage Weil
On Mon, 25 Feb 2013, S?bastien Han wrote:
> Hi Sage,
> 
> Sorry it's a production system, so I can't test it.
> So at the end, you can't get anything out of the core dump?

I saw a bunch of dup object anmes, which is what led us to the pg log 
theory.  I can look a bit more carefully to confirm, but in the end it 
would be nice to see users scrubbing without leaking.

This may be a bit moot because we want to allow trimming for other 
reasons, so those patches are being tested and working their way into 
master.  We'll backport when things are solid.

In the meantime, if someone has been able to reproduce this in a test 
environment, testing is obviously welcome :)

sage




 > 
> --
> Regards,
> S?bastien Han.
> 
> 
> On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil  wrote:
> > On Fri, 22 Feb 2013, S?bastien Han wrote:
> >> Hi all,
> >>
> >> I finally got a core dump.
> >>
> >> I did it with a kill -SEGV on the OSD process.
> >>
> >> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> >>
> >> Hope we will get something out of it :-).
> >
> > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
> > old scrub code required that), but the new (deep) scrub can take a very
> > long time, which means the pg log will eat ram in the meantime..
> > especially under high iops.
> >
> > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
> > if that seems to work?  Note that that patch shouldn't be run in a mixed
> > argonaut+bobtail cluster, since it isn't properly checking if the scrub is
> > class or chunky/deep.
> >
> > Thanks!
> > sage
> >
> >
> >  > --
> >> Regards,
> >> S?bastien Han.
> >>
> >>
> >> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
> >> > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  
> >> > wrote:
> >> >>> Is osd.1 using the heap profiler as well? Keep in mind that active use
> >> >>> of the memory profiler will itself cause memory usage to increase ?
> >> >>> this sounds a bit like that to me since it's staying stable at a large
> >> >>> but finite portion of total memory.
> >> >>
> >> >> Well, the memory consumption was already high before the profiler was
> >> >> started. So yes with the memory profiler enable an OSD might consume
> >> >> more memory but this doesn't cause the memory leaks.
> >> >
> >> > My concern is that maybe you saw a leak but when you restarted with
> >> > the memory profiling you lost whatever conditions caused it.
> >> >
> >> >> Any ideas? Nothing to say about my scrumbing theory?
> >> > I like it, but Sam indicates that without some heap dumps which
> >> > capture the actual leak then scrub is too large to effectively code
> >> > review for leaks. :(
> >> > -Greg
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-24 Thread Sébastien Han
Hi Sage,

Sorry it's a production system, so I can't test it.
So at the end, you can't get anything out of the core dump?

--
Regards,
Sébastien Han.


On Sat, Feb 23, 2013 at 1:44 AM, Sage Weil  wrote:
> On Fri, 22 Feb 2013, S?bastien Han wrote:
>> Hi all,
>>
>> I finally got a core dump.
>>
>> I did it with a kill -SEGV on the OSD process.
>>
>> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>
>> Hope we will get something out of it :-).
>
> AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
> old scrub code required that), but the new (deep) scrub can take a very
> long time, which means the pg log will eat ram in the meantime..
> especially under high iops.
>
> Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
> if that seems to work?  Note that that patch shouldn't be run in a mixed
> argonaut+bobtail cluster, since it isn't properly checking if the scrub is
> class or chunky/deep.
>
> Thanks!
> sage
>
>
>  > --
>> Regards,
>> S?bastien Han.
>>
>>
>> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
>> > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  
>> > wrote:
>> >>> Is osd.1 using the heap profiler as well? Keep in mind that active use
>> >>> of the memory profiler will itself cause memory usage to increase ?
>> >>> this sounds a bit like that to me since it's staying stable at a large
>> >>> but finite portion of total memory.
>> >>
>> >> Well, the memory consumption was already high before the profiler was
>> >> started. So yes with the memory profiler enable an OSD might consume
>> >> more memory but this doesn't cause the memory leaks.
>> >
>> > My concern is that maybe you saw a leak but when you restarted with
>> > the memory profiling you lost whatever conditions caused it.
>> >
>> >> Any ideas? Nothing to say about my scrumbing theory?
>> > I like it, but Sam indicates that without some heap dumps which
>> > capture the actual leak then scrub is too large to effectively code
>> > review for leaks. :(
>> > -Greg
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-22 Thread Sage Weil
On Fri, 22 Feb 2013, S?bastien Han wrote:
> Hi all,
> 
> I finally got a core dump.
> 
> I did it with a kill -SEGV on the OSD process.
> 
> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> 
> Hope we will get something out of it :-).

AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh 
old scrub code required that), but the new (deep) scrub can take a very 
long time, which means the pg log will eat ram in the meantime.. 
especially under high iops.

Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see 
if that seems to work?  Note that that patch shouldn't be run in a mixed 
argonaut+bobtail cluster, since it isn't properly checking if the scrub is 
class or chunky/deep.

Thanks!
sage


 > --
> Regards,
> S?bastien Han.
> 
> 
> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
> > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  
> > wrote:
> >>> Is osd.1 using the heap profiler as well? Keep in mind that active use
> >>> of the memory profiler will itself cause memory usage to increase ?
> >>> this sounds a bit like that to me since it's staying stable at a large
> >>> but finite portion of total memory.
> >>
> >> Well, the memory consumption was already high before the profiler was
> >> started. So yes with the memory profiler enable an OSD might consume
> >> more memory but this doesn't cause the memory leaks.
> >
> > My concern is that maybe you saw a leak but when you restarted with
> > the memory profiling you lost whatever conditions caused it.
> >
> >> Any ideas? Nothing to say about my scrumbing theory?
> > I like it, but Sam indicates that without some heap dumps which
> > capture the actual leak then scrub is too large to effectively code
> > review for leaks. :(
> > -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-02-22 Thread Sébastien Han
Hi all,

I finally got a core dump.

I did it with a kill -SEGV on the OSD process.

https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008

Hope we will get something out of it :-).
--
Regards,
Sébastien Han.


On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
> On Fri, Jan 11, 2013 at 6:57 AM, Sébastien Han  
> wrote:
>>> Is osd.1 using the heap profiler as well? Keep in mind that active use
>>> of the memory profiler will itself cause memory usage to increase —
>>> this sounds a bit like that to me since it's staying stable at a large
>>> but finite portion of total memory.
>>
>> Well, the memory consumption was already high before the profiler was
>> started. So yes with the memory profiler enable an OSD might consume
>> more memory but this doesn't cause the memory leaks.
>
> My concern is that maybe you saw a leak but when you restarted with
> the memory profiling you lost whatever conditions caused it.
>
>> Any ideas? Nothing to say about my scrumbing theory?
> I like it, but Sam indicates that without some heap dumps which
> capture the actual leak then scrub is too large to effectively code
> review for leaks. :(
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-11 Thread Gregory Farnum
On Fri, Jan 11, 2013 at 6:57 AM, Sébastien Han  wrote:
>> Is osd.1 using the heap profiler as well? Keep in mind that active use
>> of the memory profiler will itself cause memory usage to increase —
>> this sounds a bit like that to me since it's staying stable at a large
>> but finite portion of total memory.
>
> Well, the memory consumption was already high before the profiler was
> started. So yes with the memory profiler enable an OSD might consume
> more memory but this doesn't cause the memory leaks.

My concern is that maybe you saw a leak but when you restarted with
the memory profiling you lost whatever conditions caused it.

> Any ideas? Nothing to say about my scrumbing theory?
I like it, but Sam indicates that without some heap dumps which
capture the actual leak then scrub is too large to effectively code
review for leaks. :(
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-11 Thread Sébastien Han
> Is osd.1 using the heap profiler as well? Keep in mind that active use
> of the memory profiler will itself cause memory usage to increase —
> this sounds a bit like that to me since it's staying stable at a large
> but finite portion of total memory.

Well, the memory consumption was already high before the profiler was
started. So yes with the memory profiler enable an OSD might consume
more memory but this doesn't cause the memory leaks.

Any ideas? Nothing to say about my scrumbing theory?

Thanks!
--
Regards,
Sébastien Han.


On Thu, Jan 10, 2013 at 10:44 PM, Gregory Farnum  wrote:
> On Wed, Jan 9, 2013 at 10:09 AM, Sylvain Munaut
>  wrote:
>> Just fyi, I also have growing memory on OSD, and I have the same logs:
>>
>> "libceph: osd4 172.20.11.32:6801 socket closed" in the RBD clients
>
> That message is not an error; it just happens if the RBD client
> doesn't talk to that OSD for a while. I believe its volume has been
> turned down quite a lot in the latest kernels/our git tree.
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-10 Thread Gregory Farnum
On Wed, Jan 9, 2013 at 10:09 AM, Sylvain Munaut
 wrote:
> Just fyi, I also have growing memory on OSD, and I have the same logs:
>
> "libceph: osd4 172.20.11.32:6801 socket closed" in the RBD clients

That message is not an error; it just happens if the RBD client
doesn't talk to that OSD for a while. I believe its volume has been
turned down quite a lot in the latest kernels/our git tree.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-10 Thread Gregory Farnum
On Wed, Jan 9, 2013 at 8:10 AM, Dave Spano  wrote:
> Yes, I'm using argonaut.
>
> I've got 38 heap files from yesterday. Currently, the OSD in question is 
> using 91.2% of memory according to top, and staying there. I initially 
> thought it would go until the OOM killer started killing processes, but I 
> don't see anything funny in the system logs that indicate that.
>
> On the other hand, the ceph-osd process on osd.1 is using far less memory.

Is osd.1 using the heap profiler as well? Keep in mind that active use
of the memory profiler will itself cause memory usage to increase —
this sounds a bit like that to me since it's staying stable at a large
but finite portion of total memory.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Dave Spano
Thank you. I appreciate it! 

Dave Spano 
Optogenics 
Systems Administrator 



- Original Message - 

From: "Sébastien Han"  
To: "Dave Spano"  
Cc: "ceph-devel" , "Samuel Just" 
 
Sent: Wednesday, January 9, 2013 5:12:12 PM 
Subject: Re: OSD memory leaks? 

Dave, I share you my little script for now if you want it: 

#!/bin/bash 

for i in $(ps aux | grep [c]eph-osd | awk '{print $4}') 
do 
MEM_INTEGER=$(echo $i | cut -d '.' -f1) 
OSD=$(ps aux | grep [c]eph-osd | grep "$i " | awk '{print $13}') 
if [[ $MEM_INTEGER -ge 25 ]];then 
service ceph restart osd.$OSD >> /dev/null 
if [ $? -eq 0 ]; then 
logger -t ceph-memory-usage "The OSD number $OSD has been restarted 
since it was using $i % of the memory" 
else 
logger -t ceph-memory-usage "ERROR while 
restarting the OSD daemon" 
fi 
else 
logger -t ceph-memory-usage "The OSD number $OSD is 
only using $i % of the memory, doing nothing" 
fi 
logger -t ceph-memory-usage "Waiting 60 seconds before testing the next OSD..." 
sleep 60 
done 

logger -t ceph-memory-usage "Ceph state after memory check operation 
is: $(ceph health)" 

Crons run with 10 min interval everyday for each storage node ;-). 

Waiting for some Inktank guys now :-). 
-- 
Regards, 
Sébastien Han. 


On Wed, Jan 9, 2013 at 10:42 PM, Dave Spano  wrote: 
> That's very good to know. I'll be restarting ceph-osd right now! Thanks for 
> the heads up! 
> 
> Dave Spano 
> Optogenics 
> Systems Administrator 
> 
> 
> 
> ----- Original Message - 
> 
> From: "Sébastien Han"  
> To: "Dave Spano"  
> Cc: "ceph-devel" , "Samuel Just" 
>  
> Sent: Wednesday, January 9, 2013 11:35:13 AM 
> Subject: Re: OSD memory leaks? 
> 
> If you wait too long, the system will trigger OOM killer :D, I already 
> experienced that unfortunately... 
> 
> Sam? 
> 
> On Wed, Jan 9, 2013 at 5:10 PM, Dave Spano  wrote: 
>> OOM killer 
> 
> 
> 
> -- 
> Regards, 
> Sébastien Han.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Sébastien Han
Dave, I share you my little script for now if you want it:

#!/bin/bash

for i in $(ps aux | grep [c]eph-osd | awk '{print $4}')
do
MEM_INTEGER=$(echo $i | cut -d '.' -f1)
OSD=$(ps aux | grep [c]eph-osd | grep "$i " | awk '{print $13}')
if [[ $MEM_INTEGER -ge 25 ]];then
service ceph restart osd.$OSD >> /dev/null
if [ $? -eq 0 ]; then
logger -t ceph-memory-usage "The OSD number $OSD has been restarted
since it was using $i % of the memory"
else
logger -t ceph-memory-usage "ERROR while
restarting the OSD daemon"
fi
else
logger -t ceph-memory-usage "The OSD number $OSD is
only using $i % of the memory, doing nothing"
fi
logger -t ceph-memory-usage "Waiting 60 seconds before testing the next OSD..."
sleep 60
done

logger -t ceph-memory-usage "Ceph state after memory check operation
is: $(ceph health)"

Crons run with 10 min interval everyday for each storage node ;-).

Waiting for some Inktank guys now :-).
--
Regards,
Sébastien Han.


On Wed, Jan 9, 2013 at 10:42 PM, Dave Spano  wrote:
> That's very good to know. I'll be restarting ceph-osd right now! Thanks for 
> the heads up!
>
> Dave Spano
> Optogenics
> Systems Administrator
>
>
>
> - Original Message -
>
> From: "Sébastien Han" 
> To: "Dave Spano" 
> Cc: "ceph-devel" , "Samuel Just" 
> 
> Sent: Wednesday, January 9, 2013 11:35:13 AM
> Subject: Re: OSD memory leaks?
>
> If you wait too long, the system will trigger OOM killer :D, I already
> experienced that unfortunately...
>
> Sam?
>
> On Wed, Jan 9, 2013 at 5:10 PM, Dave Spano  wrote:
>> OOM killer
>
>
>
> --
> Regards,
> Sébastien Han.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Dave Spano
That's very good to know. I'll be restarting ceph-osd right now! Thanks for the 
heads up! 

Dave Spano 
Optogenics 
Systems Administrator 



- Original Message - 

From: "Sébastien Han"  
To: "Dave Spano"  
Cc: "ceph-devel" , "Samuel Just" 
 
Sent: Wednesday, January 9, 2013 11:35:13 AM 
Subject: Re: OSD memory leaks? 

If you wait too long, the system will trigger OOM killer :D, I already 
experienced that unfortunately... 

Sam? 

On Wed, Jan 9, 2013 at 5:10 PM, Dave Spano  wrote: 
> OOM killer 



-- 
Regards, 
Sébastien Han.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Sébastien Han
Hi,

Thanks for the input.

I also have tons of "socket closed", I recall that this message is
harmless. Anyway Cephx is disable on my platform from the beginning...
Anyone to approve or disapprove my "scrub theory"?
--
Regards,
Sébastien Han.


On Wed, Jan 9, 2013 at 7:09 PM, Sylvain Munaut
 wrote:
> Just fyi, I also have growing memory on OSD, and I have the same logs:
>
> "libceph: osd4 172.20.11.32:6801 socket closed" in the RBD clients
>
>
> I traced that problem and correlated it to some cephx issue in the OSD
> some time ago in this thread
>
> http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg10634.html
>
> but the thread kind of died without a solution ...
>
> Cheers,
>
>Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Sylvain Munaut
Just fyi, I also have growing memory on OSD, and I have the same logs:

"libceph: osd4 172.20.11.32:6801 socket closed" in the RBD clients


I traced that problem and correlated it to some cephx issue in the OSD
some time ago in this thread

http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg10634.html

but the thread kind of died without a solution ...

Cheers,

   Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Sébastien Han
If you wait too long, the system will trigger OOM killer :D, I already
experienced that unfortunately...

Sam?

On Wed, Jan 9, 2013 at 5:10 PM, Dave Spano  wrote:
> OOM killer



--
Regards,
Sébastien Han.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-01-09 Thread Dave Spano
Yes, I'm using argonaut. 

I've got 38 heap files from yesterday. Currently, the OSD in question is using 
91.2% of memory according to top, and staying there. I initially thought it 
would go until the OOM killer started killing processes, but I don't see 
anything funny in the system logs that indicate that. 

On the other hand, the ceph-osd process on osd.1 is using far less memory. 

osd.0
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
 9151 root  20   0 20.4g  14g 2548 S1 91.2 517:58.71 ceph-osd 

osd.1

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
10785 root  20   0  673m 310m 5164 S3  1.9 107:04.39 ceph-osd  

Here's what tcmalloc says when I run ceph osd tell 0 heap stats:
2013-01-09 11:09:36.778675 7f62aae23700  0 log [INF] : osd.0tcmalloc heap 
stats:
2013-01-09 11:09:36.779113 7f62aae23700  0 log [INF] : MALLOC:  210884768 ( 
 201.1 MB) Bytes in use by application
2013-01-09 11:09:36.779348 7f62aae23700  0 log [INF] : MALLOC: + 89026560 ( 
  84.9 MB) Bytes in page heap freelist
2013-01-09 11:09:36.779928 7f62aae23700  0 log [INF] : MALLOC: +  7926512 ( 
   7.6 MB) Bytes in central cache freelist
2013-01-09 11:09:36.779951 7f62aae23700  0 log [INF] : MALLOC: +   144896 ( 
   0.1 MB) Bytes in transfer cache freelist
2013-01-09 11:09:36.779972 7f62aae23700  0 log [INF] : MALLOC: + 11046512 ( 
  10.5 MB) Bytes in thread cache freelists
2013-01-09 11:09:36.780013 7f62aae23700  0 log [INF] : MALLOC: +  5177344 ( 
   4.9 MB) Bytes in malloc metadata
2013-01-09 11:09:36.780030 7f62aae23700  0 log [INF] : MALLOC:   
2013-01-09 11:09:36.780056 7f62aae23700  0 log [INF] : MALLOC: =324206592 ( 
 309.2 MB) Actual memory used (physical + swap)
2013-01-09 11:09:36.780081 7f62aae23700  0 log [INF] : MALLOC: +126177280 ( 
 120.3 MB) Bytes released to OS (aka unmapped)
2013-01-09 11:09:36.780112 7f62aae23700  0 log [INF] : MALLOC:   
2013-01-09 11:09:36.780127 7f62aae23700  0 log [INF] : MALLOC: =450383872 ( 
 429.5 MB) Virtual address space used
2013-01-09 11:09:36.780152 7f62aae23700  0 log [INF] : MALLOC:
2013-01-09 11:09:36.780168 7f62aae23700  0 log [INF] : MALLOC:  37492   
   Spans in use
2013-01-09 11:09:36.780330 7f62aae23700  0 log [INF] : MALLOC: 51   
   Thread heaps in use
2013-01-09 11:09:36.780359 7f62aae23700  0 log [INF] : MALLOC:   4096   
   Tcmalloc page size
2013-01-09 11:09:36.780384 7f62aae23700  0 log [INF] : 



Dave Spano 
Optogenics 
Systems Administrator 



- Original Message - 

From: "Sébastien Han"  
To: "Samuel Just"  
Cc: "Dave Spano" , "ceph-devel" 
 
Sent: Wednesday, January 9, 2013 10:20:43 AM 
Subject: Re: OSD memory leaks? 

I guess he runs Argonaut as well. 

More suggestions about this problem? 

Thanks! 

-- 
Regards, 
Sébastien Han. 


On Mon, Jan 7, 2013 at 8:09 PM, Samuel Just  wrote: 
> 
> Awesome! What version are you running (ceph-osd -v, include the hash)? 
> -Sam 
> 
> On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano  wrote: 
> > This failed the first time I sent it, so I'm resending in plain text. 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > 
> > - Original Message - 
> > 
> > From: "Dave Spano"  
> > To: "Sébastien Han"  
> > Cc: "ceph-devel" , "Samuel Just" 
> >  
> > Sent: Monday, January 7, 2013 12:40:06 PM 
> > Subject: Re: OSD memory leaks? 
> > 
> > 
> > Sam, 
> > 
> > Attached are some heaps that I collected today. 001 and 003 are just after 
> > I started the profiler; 011 is the most recent. If you need more, or 
> > anything different let me know. Already the OSD in question is at 38% 
> > memory usage. As mentioned by Sèbastien, restarting ceph-osd keeps things 
> > going. 
> > 
> > Not sure if this is helpful information, but out of the two OSDs that I 
> > have running, the first one (osd.0) is the one that develops this problem 
> > the quickest. osd.1 does have the same issue, it just takes much longer. Do 
> > the monitors hit the first osd in the list first, when there's activity? 
> > 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > - Original Message - 
> > 
> > From: "Séb

Re: OSD memory leaks?

2013-01-09 Thread Sébastien Han
I guess he runs Argonaut as well.

More suggestions about this problem?

Thanks!

--
Regards,
Sébastien Han.


On Mon, Jan 7, 2013 at 8:09 PM, Samuel Just  wrote:
>
> Awesome!  What version are you running (ceph-osd -v, include the hash)?
> -Sam
>
> On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano  wrote:
> > This failed the first time I sent it, so I'm resending in plain text.
> >
> > Dave Spano
> > Optogenics
> > Systems Administrator
> >
> >
> >
> > - Original Message -
> >
> > From: "Dave Spano" 
> > To: "Sébastien Han" 
> > Cc: "ceph-devel" , "Samuel Just" 
> > 
> > Sent: Monday, January 7, 2013 12:40:06 PM
> > Subject: Re: OSD memory leaks?
> >
> >
> > Sam,
> >
> > Attached are some heaps that I collected today. 001 and 003 are just after 
> > I started the profiler; 011 is the most recent. If you need more, or 
> > anything different let me know. Already the OSD in question is at 38% 
> > memory usage. As mentioned by Sèbastien, restarting ceph-osd keeps things 
> > going.
> >
> > Not sure if this is helpful information, but out of the two OSDs that I 
> > have running, the first one (osd.0) is the one that develops this problem 
> > the quickest. osd.1 does have the same issue, it just takes much longer. Do 
> > the monitors hit the first osd in the list first, when there's activity?
> >
> >
> > Dave Spano
> > Optogenics
> > Systems Administrator
> >
> >
> > - Original Message -
> >
> > From: "Sébastien Han" 
> > To: "Samuel Just" 
> > Cc: "ceph-devel" 
> > Sent: Friday, January 4, 2013 10:20:58 AM
> > Subject: Re: OSD memory leaks?
> >
> > Hi Sam,
> >
> > Thanks for your answer and sorry the late reply.
> >
> > Unfortunately I can't get something out from the profiler, actually I
> > do but I guess it doesn't show what is supposed to show... I will keep
> > on trying this. Anyway yesterday I just thought that the problem might
> > be due to some over usage of some OSDs. I was thinking that the
> > distribution of the primary OSD might be uneven, this could have
> > explained that some memory leaks are more important with some servers.
> > At the end, the repartition seems even but while looking at the pg
> > dump I found something interesting in the scrub column, timestamps
> > from the last scrubbing operation matched with times showed on the
> > graph.
> >
> > After this, I made some calculation, I compared the total number of
> > scrubbing operation with the time range where memory leaks occurred.
> > First of all check my setup:
> >
> > root@c2-ceph-01 ~ # ceph osd tree
> > dumped osdmap tree epoch 859
> > # id weight type name up/down reweight
> > -1 12 pool default
> > -3 12 rack lc2_rack33
> > -2 3 host c2-ceph-01
> > 0 1 osd.0 up 1
> > 1 1 osd.1 up 1
> > 2 1 osd.2 up 1
> > -4 3 host c2-ceph-04
> > 10 1 osd.10 up 1
> > 11 1 osd.11 up 1
> > 9 1 osd.9 up 1
> > -5 3 host c2-ceph-02
> > 3 1 osd.3 up 1
> > 4 1 osd.4 up 1
> > 5 1 osd.5 up 1
> > -6 3 host c2-ceph-03
> > 6 1 osd.6 up 1
> > 7 1 osd.7 up 1
> > 8 1 osd.8 up 1
> >
> >
> > And there are the results:
> >
> > * Ceph node 1 which has the most important memory leak performed 1608
> > in total and 1059 during the time range where memory leaks occured
> > * Ceph node 2, 1168 in total and 776 during the time range where
> > memory leaks occured
> > * Ceph node 3, 940 in total and 94 during the time range where memory
> > leaks occurred
> > * Ceph node 4, 899 in total and 191 during the time range where
> > memory leaks occurred
> >
> > I'm still not entirely sure that the scrub operation causes the leak
> > but the only relevant relation that I found...
> >
> > Could it be that the scrubbing process doesn't release memory? Btw I
> > was wondering, how ceph decides at what time it should run the
> > scrubbing operation? I know that it's once a day and control by the
> > following options
> >
> > OPTION(osd_scrub_min_interval, OPT_FLOAT, 300)
> > OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24)
> >
> > But how ceph determined the time where the operation started, during
> > cluster creation probably?
> >
> > I just checked the options that control OSD scrubbing and found that by 
> > default:
> >
> > OPTION(osd_max_scrubs, OPT_IN

Re: OSD memory leaks?

2013-01-07 Thread Samuel Just
Awesome!  What version are you running (ceph-osd -v, include the hash)?
-Sam

On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano  wrote:
> This failed the first time I sent it, so I'm resending in plain text.
>
> Dave Spano
> Optogenics
> Systems Administrator
>
>
>
> - Original Message -
>
> From: "Dave Spano" 
> To: "Sébastien Han" 
> Cc: "ceph-devel" , "Samuel Just" 
> 
> Sent: Monday, January 7, 2013 12:40:06 PM
> Subject: Re: OSD memory leaks?
>
>
> Sam,
>
> Attached are some heaps that I collected today. 001 and 003 are just after I 
> started the profiler; 011 is the most recent. If you need more, or anything 
> different let me know. Already the OSD in question is at 38% memory usage. As 
> mentioned by Sèbastien, restarting ceph-osd keeps things going.
>
> Not sure if this is helpful information, but out of the two OSDs that I have 
> running, the first one (osd.0) is the one that develops this problem the 
> quickest. osd.1 does have the same issue, it just takes much longer. Do the 
> monitors hit the first osd in the list first, when there's activity?
>
>
> Dave Spano
> Optogenics
> Systems Administrator
>
>
> - Original Message -
>
> From: "Sébastien Han" 
> To: "Samuel Just" 
> Cc: "ceph-devel" 
> Sent: Friday, January 4, 2013 10:20:58 AM
> Subject: Re: OSD memory leaks?
>
> Hi Sam,
>
> Thanks for your answer and sorry the late reply.
>
> Unfortunately I can't get something out from the profiler, actually I
> do but I guess it doesn't show what is supposed to show... I will keep
> on trying this. Anyway yesterday I just thought that the problem might
> be due to some over usage of some OSDs. I was thinking that the
> distribution of the primary OSD might be uneven, this could have
> explained that some memory leaks are more important with some servers.
> At the end, the repartition seems even but while looking at the pg
> dump I found something interesting in the scrub column, timestamps
> from the last scrubbing operation matched with times showed on the
> graph.
>
> After this, I made some calculation, I compared the total number of
> scrubbing operation with the time range where memory leaks occurred.
> First of all check my setup:
>
> root@c2-ceph-01 ~ # ceph osd tree
> dumped osdmap tree epoch 859
> # id weight type name up/down reweight
> -1 12 pool default
> -3 12 rack lc2_rack33
> -2 3 host c2-ceph-01
> 0 1 osd.0 up 1
> 1 1 osd.1 up 1
> 2 1 osd.2 up 1
> -4 3 host c2-ceph-04
> 10 1 osd.10 up 1
> 11 1 osd.11 up 1
> 9 1 osd.9 up 1
> -5 3 host c2-ceph-02
> 3 1 osd.3 up 1
> 4 1 osd.4 up 1
> 5 1 osd.5 up 1
> -6 3 host c2-ceph-03
> 6 1 osd.6 up 1
> 7 1 osd.7 up 1
> 8 1 osd.8 up 1
>
>
> And there are the results:
>
> * Ceph node 1 which has the most important memory leak performed 1608
> in total and 1059 during the time range where memory leaks occured
> * Ceph node 2, 1168 in total and 776 during the time range where
> memory leaks occured
> * Ceph node 3, 940 in total and 94 during the time range where memory
> leaks occurred
> * Ceph node 4, 899 in total and 191 during the time range where
> memory leaks occurred
>
> I'm still not entirely sure that the scrub operation causes the leak
> but the only relevant relation that I found...
>
> Could it be that the scrubbing process doesn't release memory? Btw I
> was wondering, how ceph decides at what time it should run the
> scrubbing operation? I know that it's once a day and control by the
> following options
>
> OPTION(osd_scrub_min_interval, OPT_FLOAT, 300)
> OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24)
>
> But how ceph determined the time where the operation started, during
> cluster creation probably?
>
> I just checked the options that control OSD scrubbing and found that by 
> default:
>
> OPTION(osd_max_scrubs, OPT_INT, 1)
>
> So that might explain why only one OSD uses a lot of memory.
>
> My dirty workaround at the moment is to performed a check of memory
> use by every OSD and restart it if it uses more than 25% of the total
> memory. Also note that on ceph 1, 3 and 4 it's always one OSD that
> uses a lot of memory, for ceph 2 only the mem usage is high but almost
> the same for all the OSD process.
>
> Thank you in advance.
>
> --
> Regards,
> Sébastien Han.
>
>
> On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just  wrote:
>>
>> Sorry, it's been very busy. The next step would to try to get a heap
>> dump. You can start a heap profile on osd N by:
>>
>> ceph osd tell N heap start_prof

Re: OSD memory leaks?

2013-01-04 Thread Sébastien Han
Hi Sam,

Thanks for your answer and sorry the late reply.

Unfortunately I can't get something out from the profiler, actually I
do but I guess it doesn't show what is supposed to show... I will keep
on trying this. Anyway yesterday I just thought that the problem might
be due to some over usage of some OSDs. I was thinking that the
distribution of the primary OSD might be uneven, this could have
explained that some memory leaks are more important with some servers.
At the end, the repartition seems even but while looking at the pg
dump I found something interesting in the scrub column, timestamps
from the last scrubbing operation matched with times showed on the
graph.

After this, I made some calculation, I compared the total number of
scrubbing operation with the time range where memory leaks occurred.
First of all check my setup:

root@c2-ceph-01 ~ # ceph osd tree
dumped osdmap tree epoch 859
# id weight type name up/down reweight
-1 12 pool default
-3 12 rack lc2_rack33
-2 3 host c2-ceph-01
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1
-4 3 host c2-ceph-04
10 1 osd.10 up 1
11 1 osd.11 up 1
9 1 osd.9 up 1
-5 3 host c2-ceph-02
3 1 osd.3 up 1
4 1 osd.4 up 1
5 1 osd.5 up 1
-6 3 host c2-ceph-03
6 1 osd.6 up 1
7 1 osd.7 up 1
8 1 osd.8 up 1


And there are the results:

* Ceph node 1 which has the most important memory leak performed 1608
in total and 1059 during the time range where memory leaks occured
* Ceph node 2, 1168 in total and 776 during the time range where
memory leaks occured
* Ceph node 3, 940 in total and 94 during  the time range where memory
leaks occurred
* Ceph node 4, 899 in total and 191 during  the time range where
memory leaks occurred

I'm still not entirely sure that the scrub operation causes the leak
but the only relevant relation that I found...

Could it be that the scrubbing process doesn't release memory? Btw I
was wondering, how ceph decides at what time it should run the
scrubbing operation? I know that it's once a day and control by the
following options

OPTION(osd_scrub_min_interval, OPT_FLOAT, 300)
OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24)

But how ceph determined the time where the operation started, during
cluster creation probably?

I just checked the options that control OSD scrubbing and found that by default:

OPTION(osd_max_scrubs, OPT_INT, 1)

So that might explain why only one OSD uses a lot of memory.

My dirty workaround at the moment is to performed a check of memory
use by every OSD and restart it if it uses more than 25% of the total
memory. Also note that on ceph 1, 3 and 4 it's always one OSD that
uses a lot of memory, for ceph 2 only the mem usage is high but almost
the same for all the OSD process.

Thank you in advance.

--
Regards,
Sébastien Han.


On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just  wrote:
>
> Sorry, it's been very busy.  The next step would to try to get a heap
> dump.  You can start a heap profile on osd N by:
>
> ceph osd tell N heap start_profiler
>
> and you can get it to dump the collected profile using
>
> ceph osd tell N heap dump.
>
> The dumps should show up in the osd log directory.
>
> Assuming the heap profiler is working correctly, you can look at the
> dump using pprof in google-perftools.
>
> On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han  
> wrote:
> > No more suggestions? :(
> > --
> > Regards,
> > Sébastien Han.
> >
> >
> > On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han  
> > wrote:
> >> Nothing terrific...
> >>
> >> Kernel logs from my clients are full of "libceph: osd4
> >> 172.20.11.32:6801 socket closed"
> >>
> >> I saw this somewhere on the tracker.
> >>
> >> Does this harm?
> >>
> >> Thanks.
> >>
> >> --
> >> Regards,
> >> Sébastien Han.
> >>
> >>
> >>
> >> On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just  wrote:
> >>>
> >>> What is the workload like?
> >>> -Sam
> >>>
> >>> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > No, I don't see nothing abnormal in the network stats. I don't see
> >>> > anything in the logs... :(
> >>> > The weird thing is that one node over 4 seems to take way more memory
> >>> > than the others...
> >>> >
> >>> > --
> >>> > Regards,
> >>> > Sébastien Han.
> >>> >
> >>> >
> >>> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han 
> >>> >  wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> No, I don't see nothing abnormal in the network stats. I don't see 
> >>> >> anything in the logs... :(
> >>> >> The weird thing is that one node over 4 seems to take way more memory 
> >>> >> than the others...
> >>> >>
> >>> >> --
> >>> >> Regards,
> >>> >> Sébastien Han.
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  
> >>> >> wrote:
> >>> >>>
> >>> >>> Are you having network hiccups?  There was a bug noticed recently that
> >>> >>> could cause a memory leak if nodes are being marked up and down.
> >>> >>> -Sam
> >>> >>>
> >>> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han 
> >>> >>>  wrote:
> >>> >>> > Hi guys,
> >>> >>> 

Re: OSD memory leaks?

2012-12-19 Thread Samuel Just
Sorry, it's been very busy.  The next step would to try to get a heap
dump.  You can start a heap profile on osd N by:

ceph osd tell N heap start_profiler

and you can get it to dump the collected profile using

ceph osd tell N heap dump.

The dumps should show up in the osd log directory.

Assuming the heap profiler is working correctly, you can look at the
dump using pprof in google-perftools.

On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han  wrote:
> No more suggestions? :(
> --
> Regards,
> Sébastien Han.
>
>
> On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han  
> wrote:
>> Nothing terrific...
>>
>> Kernel logs from my clients are full of "libceph: osd4
>> 172.20.11.32:6801 socket closed"
>>
>> I saw this somewhere on the tracker.
>>
>> Does this harm?
>>
>> Thanks.
>>
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>>
>> On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just  wrote:
>>>
>>> What is the workload like?
>>> -Sam
>>>
>>> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  
>>> wrote:
>>> > Hi,
>>> >
>>> > No, I don't see nothing abnormal in the network stats. I don't see
>>> > anything in the logs... :(
>>> > The weird thing is that one node over 4 seems to take way more memory
>>> > than the others...
>>> >
>>> > --
>>> > Regards,
>>> > Sébastien Han.
>>> >
>>> >
>>> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> No, I don't see nothing abnormal in the network stats. I don't see 
>>> >> anything in the logs... :(
>>> >> The weird thing is that one node over 4 seems to take way more memory 
>>> >> than the others...
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Sébastien Han.
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  
>>> >> wrote:
>>> >>>
>>> >>> Are you having network hiccups?  There was a bug noticed recently that
>>> >>> could cause a memory leak if nodes are being marked up and down.
>>> >>> -Sam
>>> >>>
>>> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han 
>>> >>>  wrote:
>>> >>> > Hi guys,
>>> >>> >
>>> >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
>>> >>> > lot of memory. It keeps growing and growing.
>>> >>> > See the graph attached to this mail.
>>> >>> > I run 0.48.2 on Ubuntu 12.04.
>>> >>> >
>>> >>> > The other nodes also grow, but slowly than the first one.
>>> >>> >
>>> >>> > I'm not quite sure about the information that I have to provide. So
>>> >>> > let me know. The only thing I can say is that the load haven't
>>> >>> > increase that much this week. It seems to be consuming and not giving
>>> >>> > back the memory.
>>> >>> >
>>> >>> > Thank you in advance.
>>> >>> >
>>> >>> > --
>>> >>> > Regards,
>>> >>> > Sébastien Han.
>>> >>
>>> >>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-19 Thread Sébastien Han
No more suggestions? :(
--
Regards,
Sébastien Han.


On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han  wrote:
> Nothing terrific...
>
> Kernel logs from my clients are full of "libceph: osd4
> 172.20.11.32:6801 socket closed"
>
> I saw this somewhere on the tracker.
>
> Does this harm?
>
> Thanks.
>
> --
> Regards,
> Sébastien Han.
>
>
>
> On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just  wrote:
>>
>> What is the workload like?
>> -Sam
>>
>> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  
>> wrote:
>> > Hi,
>> >
>> > No, I don't see nothing abnormal in the network stats. I don't see
>> > anything in the logs... :(
>> > The weird thing is that one node over 4 seems to take way more memory
>> > than the others...
>> >
>> > --
>> > Regards,
>> > Sébastien Han.
>> >
>> >
>> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> No, I don't see nothing abnormal in the network stats. I don't see 
>> >> anything in the logs... :(
>> >> The weird thing is that one node over 4 seems to take way more memory 
>> >> than the others...
>> >>
>> >> --
>> >> Regards,
>> >> Sébastien Han.
>> >>
>> >>
>> >>
>> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  wrote:
>> >>>
>> >>> Are you having network hiccups?  There was a bug noticed recently that
>> >>> could cause a memory leak if nodes are being marked up and down.
>> >>> -Sam
>> >>>
>> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han 
>> >>>  wrote:
>> >>> > Hi guys,
>> >>> >
>> >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
>> >>> > lot of memory. It keeps growing and growing.
>> >>> > See the graph attached to this mail.
>> >>> > I run 0.48.2 on Ubuntu 12.04.
>> >>> >
>> >>> > The other nodes also grow, but slowly than the first one.
>> >>> >
>> >>> > I'm not quite sure about the information that I have to provide. So
>> >>> > let me know. The only thing I can say is that the load haven't
>> >>> > increase that much this week. It seems to be consuming and not giving
>> >>> > back the memory.
>> >>> >
>> >>> > Thank you in advance.
>> >>> >
>> >>> > --
>> >>> > Regards,
>> >>> > Sébastien Han.
>> >>
>> >>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-18 Thread Sébastien Han
Nothing terrific...

Kernel logs from my clients are full of "libceph: osd4
172.20.11.32:6801 socket closed"

I saw this somewhere on the tracker.

Does this harm?

Thanks.

--
Regards,
Sébastien Han.



On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just  wrote:
>
> What is the workload like?
> -Sam
>
> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  
> wrote:
> > Hi,
> >
> > No, I don't see nothing abnormal in the network stats. I don't see
> > anything in the logs... :(
> > The weird thing is that one node over 4 seems to take way more memory
> > than the others...
> >
> > --
> > Regards,
> > Sébastien Han.
> >
> >
> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  
> > wrote:
> >>
> >> Hi,
> >>
> >> No, I don't see nothing abnormal in the network stats. I don't see 
> >> anything in the logs... :(
> >> The weird thing is that one node over 4 seems to take way more memory than 
> >> the others...
> >>
> >> --
> >> Regards,
> >> Sébastien Han.
> >>
> >>
> >>
> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  wrote:
> >>>
> >>> Are you having network hiccups?  There was a bug noticed recently that
> >>> could cause a memory leak if nodes are being marked up and down.
> >>> -Sam
> >>>
> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han  
> >>> wrote:
> >>> > Hi guys,
> >>> >
> >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
> >>> > lot of memory. It keeps growing and growing.
> >>> > See the graph attached to this mail.
> >>> > I run 0.48.2 on Ubuntu 12.04.
> >>> >
> >>> > The other nodes also grow, but slowly than the first one.
> >>> >
> >>> > I'm not quite sure about the information that I have to provide. So
> >>> > let me know. The only thing I can say is that the load haven't
> >>> > increase that much this week. It seems to be consuming and not giving
> >>> > back the memory.
> >>> >
> >>> > Thank you in advance.
> >>> >
> >>> > --
> >>> > Regards,
> >>> > Sébastien Han.
> >>
> >>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-17 Thread Samuel Just
What is the workload like?
-Sam

On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  wrote:
> Hi,
>
> No, I don't see nothing abnormal in the network stats. I don't see
> anything in the logs... :(
> The weird thing is that one node over 4 seems to take way more memory
> than the others...
>
> --
> Regards,
> Sébastien Han.
>
>
> On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  
> wrote:
>>
>> Hi,
>>
>> No, I don't see nothing abnormal in the network stats. I don't see anything 
>> in the logs... :(
>> The weird thing is that one node over 4 seems to take way more memory than 
>> the others...
>>
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>>
>> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  wrote:
>>>
>>> Are you having network hiccups?  There was a bug noticed recently that
>>> could cause a memory leak if nodes are being marked up and down.
>>> -Sam
>>>
>>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han  
>>> wrote:
>>> > Hi guys,
>>> >
>>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
>>> > lot of memory. It keeps growing and growing.
>>> > See the graph attached to this mail.
>>> > I run 0.48.2 on Ubuntu 12.04.
>>> >
>>> > The other nodes also grow, but slowly than the first one.
>>> >
>>> > I'm not quite sure about the information that I have to provide. So
>>> > let me know. The only thing I can say is that the load haven't
>>> > increase that much this week. It seems to be consuming and not giving
>>> > back the memory.
>>> >
>>> > Thank you in advance.
>>> >
>>> > --
>>> > Regards,
>>> > Sébastien Han.
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-17 Thread Sébastien Han
Hi,

No, I don't see nothing abnormal in the network stats. I don't see
anything in the logs... :(
The weird thing is that one node over 4 seems to take way more memory
than the others...

--
Regards,
Sébastien Han.


On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  wrote:
>
> Hi,
>
> No, I don't see nothing abnormal in the network stats. I don't see anything 
> in the logs... :(
> The weird thing is that one node over 4 seems to take way more memory than 
> the others...
>
> --
> Regards,
> Sébastien Han.
>
>
>
> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  wrote:
>>
>> Are you having network hiccups?  There was a bug noticed recently that
>> could cause a memory leak if nodes are being marked up and down.
>> -Sam
>>
>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han  
>> wrote:
>> > Hi guys,
>> >
>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
>> > lot of memory. It keeps growing and growing.
>> > See the graph attached to this mail.
>> > I run 0.48.2 on Ubuntu 12.04.
>> >
>> > The other nodes also grow, but slowly than the first one.
>> >
>> > I'm not quite sure about the information that I have to provide. So
>> > let me know. The only thing I can say is that the load haven't
>> > increase that much this week. It seems to be consuming and not giving
>> > back the memory.
>> >
>> > Thank you in advance.
>> >
>> > --
>> > Regards,
>> > Sébastien Han.
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-17 Thread Samuel Just
Are you having network hiccups?  There was a bug noticed recently that
could cause a memory leak if nodes are being marked up and down.
-Sam

On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han  wrote:
> Hi guys,
>
> Today looking at my graphs I noticed that one over 4 ceph nodes used a
> lot of memory. It keeps growing and growing.
> See the graph attached to this mail.
> I run 0.48.2 on Ubuntu 12.04.
>
> The other nodes also grow, but slowly than the first one.
>
> I'm not quite sure about the information that I have to provide. So
> let me know. The only thing I can say is that the load haven't
> increase that much this week. It seems to be consuming and not giving
> back the memory.
>
> Thank you in advance.
>
> --
> Regards,
> Sébastien Han.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html