Re: [Lustre-discuss] Out of Memory on MDS

2009-06-24 Thread Andreas Dilger
On Jun 24, 2009  10:20 -0400, Roger Spellman wrote:
 Thanks.  I've opened a bug,
 https://bugzilla.lustre.org/show_bug.cgi?id=19950
 
 In that bug I show that the number of ldlm_locks exceeds the limit
 reported by lctl get_param ldlm.namespaces.*.pool.limit.
 
 Do you agree that if we write to lru_size on every client, then that
 will set a limit on ldlm_locks on the servers?

Indirectly, yes.  The lru_size imposes a hard limit for the per-client
lock count.  The number of locks on the server will be limited to:

{number of OST/MDT on node} * {number of clients} * {client lru_size}

 Do you know of a way to limit ldiskfs_inode_cache?

The inode cache is managed by the kernel.  

  -Original Message-
  From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf
 Of
  Andreas Dilger
  Sent: Tuesday, June 23, 2009 6:07 PM
  To: Roger Spellman
  Cc: cliff.wh...@sun.com; lustre-discuss@lists.lustre.org
  Subject: Re: [Lustre-discuss] Out of Memory on MDS
  
  On Jun 23, 2009  16:50 -0400, Roger Spellman wrote:
   The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and
 1.6.6.
  
   Is lru_size an upper limit on the number of entries?
  
   Also, lru_max_age does not seem to be working.  I set it to 10
 seconds,
   and it did not clean anything up.
  
  This is worth filing a bug on, if it isn't working.  The lock LRU size
  should be limited by the size of the RAM.  The upper limit on the
 number
  of locks being granted can be gotten via:
  
  lctl get_param ldlm.namespaces.*.pool.limit
  
  The default limit should be 50 locks per 1MB of RAM.  In your case,
  4GB is 4096MB, so the LRU limit should be 50 * 4096 = 204800 locks.
  
Roger Spellman wrote:
 I have an MDS that is crashing with out-of-memory.

 Prior to the crash, I started collecting /proc/slabinfo.  I see
 that
 ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
 total of 2.2GB, which is more than half my RAM.

 Is there a way to limit this?
   
You don't mention the version of Lustre - lru_size might have an
impact, I am not certain. I believe it is the only lock tuneable
of note. (and is auto-sized in recent Lustre)
  
  Cheers, Andreas
  --
  Andreas Dilger
  Sr. Staff Engineer, Lustre Group
  Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Roger Spellman
I have an MDS that is crashing with out-of-memory.

 

Prior to the crash, I started collecting /proc/slabinfo.  I see that
ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a total of
2.2GB, which is more than half my RAM.

 

Is there a way to limit this?

 

Other heavy memory users are ldisk_inode_cache (421 MB) and
ldlm_resources (137 MB).  Is there a way to limit these too?

 

Thanks.

 

Roger Spellman

Staff Engineer

Terascala, Inc.

508-588-1501

www.terascala.com http://www.terascala.com/

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Seger, Mark
If you're looking to collect slab info, collectl exceeds at this.  Just 
download it from http://collectl.sourceforge.net/, install and 
/etc/init.d/collectl start and will collectl MDS stats every 10 seconds and 
slab stats every minute (easily changeable) along with a ton of other stats.  
You can then play back the recorded data showing slabs that changed during each 
interval or even report the 10-n slabs (default=10) sorted by a variety of 
fields (collectl -showtopopts for help on top options).
-mark

From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Roger Spellman
Sent: Tuesday, June 23, 2009 12:56 PM
To: lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] Out of Memory on MDS

I have an MDS that is crashing with out-of-memory.

Prior to the crash, I started collecting /proc/slabinfo.  I see that ldlm_locks 
is up to 4,500,000, and each one is 512 bytes, for a total of 2.2GB, which is 
more than half my RAM.

Is there a way to limit this?

Other heavy memory users are ldisk_inode_cache (421 MB) and ldlm_resources (137 
MB).  Is there a way to limit these too?

Thanks.

Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com http://www.terascala.com/

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Cliff White
Roger Spellman wrote:
 I have an MDS that is crashing with out-of-memory.
 
  
 
 Prior to the crash, I started collecting /proc/slabinfo.  I see that 
 ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a total of 
 2.2GB, which is more than half my RAM.
 
  
 
 Is there a way to limit this?

You don't mention the version of Lustre - lru_size might have an impact, 
i am not certain. I believe it is the only lock tuneable of note. (and 
is auto-sized in recent Lustre)

cliffw

 
  
 
 Other heavy memory users are ldisk_inode_cache (421 MB) and 
 ldlm_resources (137 MB).  Is there a way to limit these too?
 
  
 
 Thanks.
 
  
 
 Roger Spellman
 
 Staff Engineer
 
 Terascala, Inc.
 
 508-588-1501
 
 www.terascala.com http://www.terascala.com/
 
  
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Roger Spellman
Thanks Cliff.

The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and 1.6.6.

Is lru_size an upper limit on the number of entries?

Also, lru_max_age does not seem to be working.  I set it to 10 seconds,
and it did not clean anything up.

-Roger

 
 Roger Spellman wrote:
  I have an MDS that is crashing with out-of-memory.
 
  Prior to the crash, I started collecting /proc/slabinfo.  I see that
  ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
total of
  2.2GB, which is more than half my RAM.
 
  Is there a way to limit this?
 
 You don't mention the version of Lustre - lru_size might have an
impact,
 i am not certain. I believe it is the only lock tuneable of note. (and
 is auto-sized in recent Lustre)
 
 cliffw
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Andreas Dilger
On Jun 23, 2009  16:50 -0400, Roger Spellman wrote:
 The servers are 1.6.7.1.  The clients are a mix of 1.6.7.1 and 1.6.6.
 
 Is lru_size an upper limit on the number of entries?
 
 Also, lru_max_age does not seem to be working.  I set it to 10 seconds,
 and it did not clean anything up.

This is worth filing a bug on, if it isn't working.  The lock LRU size
should be limited by the size of the RAM.  The upper limit on the number
of locks being granted can be gotten via:

lctl get_param ldlm.namespaces.*.pool.limit

The default limit should be 50 locks per 1MB of RAM.  In your case,
4GB is 4096MB, so the LRU limit should be 50 * 4096 = 204800 locks.

  Roger Spellman wrote:
   I have an MDS that is crashing with out-of-memory.
  
   Prior to the crash, I started collecting /proc/slabinfo.  I see that
   ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a
   total of 2.2GB, which is more than half my RAM.
  
   Is there a way to limit this?
  
  You don't mention the version of Lustre - lru_size might have an
  impact, I am not certain. I believe it is the only lock tuneable
  of note. (and is auto-sized in recent Lustre)

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss