Re: [Lustre-discuss] Out of Memory on MDS

2009-06-24 Thread Andreas Dilger
On Jun 24, 2009  10:20 -0400, Roger Spellman wrote:
> Thanks.  I've opened a bug,
> https://bugzilla.lustre.org/show_bug.cgi?id=19950
> 
> In that bug I show that the number of ldlm_locks exceeds the limit
> reported by lctl get_param ldlm.namespaces.*.pool.limit.
> 
> Do you agree that if we write to lru_size on every client, then that
> will set a limit on ldlm_locks on the servers?

Indirectly, yes.  The lru_size imposes a hard limit for the per-client
lock count.  The number of locks on the server will be limited to:

{number of OST/MDT on node} * {number of clients} * {client lru_size}

> Do you know of a way to limit ldiskfs_inode_cache?

The inode cache is managed by the kernel.  

> > -Original Message-
> > From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf
> Of
> > Andreas Dilger
> > Sent: Tuesday, June 23, 2009 6:07 PM
> > To: Roger Spellman
> > Cc: cliff.wh...@sun.com; lustre-discuss@lists.lustre.org
> > Subject: Re: [Lustre-discuss] Out of Memory on MDS
> > 
> > On Jun 23, 2009  16:50 -0400, Roger Spellman wrote:
> > > The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and
> 1.6.6.
> > >
> > > Is lru_size an upper limit on the number of entries?
> > >
> > > Also, lru_max_age does not seem to be working.  I set it to 10
> seconds,
> > > and it did not clean anything up.
> > 
> > This is worth filing a bug on, if it isn't working.  The lock LRU size
> > should be limited by the size of the RAM.  The upper limit on the
> number
> > of locks being granted can be gotten via:
> > 
> > lctl get_param ldlm.namespaces.*.pool.limit
> > 
> > The default limit should be 50 locks per 1MB of RAM.  In your case,
> > 4GB is 4096MB, so the LRU limit should be 50 * 4096 = 204800 locks.
> > 
> > > > Roger Spellman wrote:
> > > > > I have an MDS that is crashing with out-of-memory.
> > > > >
> > > > > Prior to the crash, I started collecting /proc/slabinfo.  I see
> that
> > > > > ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
> > > > > total of 2.2GB, which is more than half my RAM.
> > > > >
> > > > > Is there a way to limit this?
> > > >
> > > > You don't mention the version of Lustre - lru_size might have an
> > > > impact, I am not certain. I believe it is the only lock tuneable
> > > > of note. (and is auto-sized in recent Lustre)
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-24 Thread Roger Spellman
Andreas,

Thanks.  I've opened a bug,
https://bugzilla.lustre.org/show_bug.cgi?id=19950

In that bug I show that the number of ldlm_locks exceeds the limit
reported by lctl get_param ldlm.namespaces.*.pool.limit.

Do you agree that if we write to lru_size on every client, then that
will set a limit on ldlm_locks on the servers?

Do you know of a way to limit ldiskfs_inode_cache?

Thanks.

-Roger

> -Original Message-
> From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf
Of
> Andreas Dilger
> Sent: Tuesday, June 23, 2009 6:07 PM
> To: Roger Spellman
> Cc: cliff.wh...@sun.com; lustre-discuss@lists.lustre.org
> Subject: Re: [Lustre-discuss] Out of Memory on MDS
> 
> On Jun 23, 2009  16:50 -0400, Roger Spellman wrote:
> > The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and
1.6.6.
> >
> > Is lru_size an upper limit on the number of entries?
> >
> > Also, lru_max_age does not seem to be working.  I set it to 10
seconds,
> > and it did not clean anything up.
> 
> This is worth filing a bug on, if it isn't working.  The lock LRU size
> should be limited by the size of the RAM.  The upper limit on the
number
> of locks being granted can be gotten via:
> 
>   lctl get_param ldlm.namespaces.*.pool.limit
> 
> The default limit should be 50 locks per 1MB of RAM.  In your case,
> 4GB is 4096MB, so the LRU limit should be 50 * 4096 = 204800 locks.
> 
> > > Roger Spellman wrote:
> > > > I have an MDS that is crashing with out-of-memory.
> > > >
> > > > Prior to the crash, I started collecting /proc/slabinfo.  I see
that
> > > > ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
> > > > total of 2.2GB, which is more than half my RAM.
> > > >
> > > > Is there a way to limit this?
> > >
> > > You don't mention the version of Lustre - lru_size might have an
> > > impact, I am not certain. I believe it is the only lock tuneable
> > > of note. (and is auto-sized in recent Lustre)
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Andreas Dilger
On Jun 23, 2009  16:50 -0400, Roger Spellman wrote:
> The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and 1.6.6.
> 
> Is lru_size an upper limit on the number of entries?
> 
> Also, lru_max_age does not seem to be working.  I set it to 10 seconds,
> and it did not clean anything up.

This is worth filing a bug on, if it isn't working.  The lock LRU size
should be limited by the size of the RAM.  The upper limit on the number
of locks being granted can be gotten via:

lctl get_param ldlm.namespaces.*.pool.limit

The default limit should be 50 locks per 1MB of RAM.  In your case,
4GB is 4096MB, so the LRU limit should be 50 * 4096 = 204800 locks.

> > Roger Spellman wrote:
> > > I have an MDS that is crashing with out-of-memory.
> > >
> > > Prior to the crash, I started collecting /proc/slabinfo.  I see that
> > > ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
> > > total of 2.2GB, which is more than half my RAM.
> > >
> > > Is there a way to limit this?
> > 
> > You don't mention the version of Lustre - lru_size might have an
> > impact, I am not certain. I believe it is the only lock tuneable
> > of note. (and is auto-sized in recent Lustre)

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Roger Spellman
Thanks Cliff.

The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and 1.6.6.

Is lru_size an upper limit on the number of entries?

Also, lru_max_age does not seem to be working.  I set it to 10 seconds,
and it did not clean anything up.

-Roger

> 
> Roger Spellman wrote:
> > I have an MDS that is crashing with out-of-memory.
> >
> > Prior to the crash, I started collecting /proc/slabinfo.  I see that
> > ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
total of
> > 2.2GB, which is more than half my RAM.
> >
> > Is there a way to limit this?
> 
> You don't mention the version of Lustre - lru_size might have an
impact,
> i am not certain. I believe it is the only lock tuneable of note. (and
> is auto-sized in recent Lustre)
> 
> cliffw
> 
> >

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Cliff White
Roger Spellman wrote:
> I have an MDS that is crashing with out-of-memory.
> 
>  
> 
> Prior to the crash, I started collecting /proc/slabinfo.  I see that 
> ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a total of 
> 2.2GB, which is more than half my RAM.
> 
>  
> 
> Is there a way to limit this?

You don't mention the version of Lustre - lru_size might have an impact, 
i am not certain. I believe it is the only lock tuneable of note. (and 
is auto-sized in recent Lustre)

cliffw

> 
>  
> 
> Other heavy memory users are ldisk_inode_cache (421 MB) and 
> ldlm_resources (137 MB).  Is there a way to limit these too?
> 
>  
> 
> Thanks.
> 
>  
> 
> Roger Spellman
> 
> Staff Engineer
> 
> Terascala, Inc.
> 
> 508-588-1501
> 
> www.terascala.com http://www.terascala.com/
> 
>  
> 
> 
> 
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Seger, Mark
If you're looking to collect slab info, collectl exceeds at this.  Just 
download it from http://collectl.sourceforge.net/, install and 
"/etc/init.d/collectl start" and will collectl MDS stats every 10 seconds and 
slab stats every minute (easily changeable) along with a ton of other stats.  
You can then play back the recorded data showing slabs that changed during each 
interval or even report the 10-n slabs (default=10) sorted by a variety of 
fields (collectl -showtopopts for help on top options).
-mark

From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Roger Spellman
Sent: Tuesday, June 23, 2009 12:56 PM
To: lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] Out of Memory on MDS

I have an MDS that is crashing with out-of-memory.

Prior to the crash, I started collecting /proc/slabinfo.  I see that ldlm_locks 
is up to 4,500,000, and each one is 512 bytes, for a total of 2.2GB, which is 
more than half my RAM.

Is there a way to limit this?

Other heavy memory users are ldisk_inode_cache (421 MB) and ldlm_resources (137 
MB).  Is there a way to limit these too?

Thanks.

Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com http://www.terascala.com/

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Roger Spellman
I have an MDS that is crashing with out-of-memory.

 

Prior to the crash, I started collecting /proc/slabinfo.  I see that
ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a total of
2.2GB, which is more than half my RAM.

 

Is there a way to limit this?

 

Other heavy memory users are ldisk_inode_cache (421 MB) and
ldlm_resources (137 MB).  Is there a way to limit these too?

 

Thanks.

 

Roger Spellman

Staff Engineer

Terascala, Inc.

508-588-1501

www.terascala.com http://www.terascala.com/

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss