Re: [lustre-discuss] Ongoing issues with quota

2023-10-09 Thread Daniel Szkola via lustre-discuss
Thanks, I will look into the ZFS quota since we are using ZFS for all storage, 
MDT and OSTs.

In our case, there is a single MDS/MDT. I have used Robinhood and lfs find (by 
group) commands to verify what the numbers should apparently be.

—
Dan Szkola
FNAL

> On Oct 9, 2023, at 10:13 AM, Andreas Dilger  wrote:
> 
> The quota accounting is controlled by the backing filesystem of the OSTs and 
> MDTs.
> 
> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block 
> usage. 
> 
> For ZFS you would have to ask on the ZFS list to see if there is some way to 
> re-count the quota usage. 
> 
> The "inode" quota is accounted from the MDTs, while the "block" quota is 
> accounted from the OSTs. You might be able to see with "lfs quota -v -g 
> group" to see if there is one particular MDT that is returning too many 
> inodes. 
> 
> Possibly if you have directories that are striped across many MDTs it would 
> inflate the used inode count. For example, if every one of the 426k 
> directories reported by RBH was striped across 4 MDTs then you would see the 
> inode count add up to 3.6M. 
> 
> If that was the case, then I would really, really advise against striping 
> every directory in the filesystem.  That will cause problems far worse than 
> just inflating the inode quota accounting. 
> 
> Cheers, Andreas
> 
>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss 
>>  wrote:
>> 
>> Is there really no way to force a recount of files used by the quota? All 
>> indications are we have accounts where files were removed and this is not 
>> reflected in the used file count in the quota. The space used seems correct 
>> but the inodes used numbers are way high. There must be a way to clear these 
>> numbers and have a fresh count done.
>> 
>> —
>> Dan Szkola
>> FNAL
>> 
>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss 
>>>  wrote:
>>> 
>>> Also, quotas on the OSTS don’t add up to near 3 million files either:
>>> 
>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  1394853459   0 1913344192   -  132863   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  1411579601   0 1963246413   -  120643   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  1416507527   0 1789950778   -  190687   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  1636465724   0 1926578117   -  195034   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  2202272244   0 3020159313   -  185097   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  1324770165   0 1371244768   -  145347   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  2892027349   0 3221225472   -  169386   0   0  
>>>  -
>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 
>>> /lustre1
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>>  2076201636   0 2474853207   -  171552   0   0  
>>>  -
>>> 
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> 
> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
>  wrote:
 
 No combination of ossnodek runs has helped with this.
 
 Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' 
 found 1796104 files as well.
 
 So why is the quota command showing over 3 million inodes used?
 
 There must be a way to force it to recount or clear all stale quota data 
 and have it regenerate it?
 
 Anyone?
 
 —
 Dan Szkola
 FNAL
 

Re: [lustre-discuss] Ongoing issues with quota

2023-10-09 Thread Andreas Dilger via lustre-discuss
The quota accounting is controlled by the backing filesystem of the OSTs and 
MDTs.

For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block 
usage. 

For ZFS you would have to ask on the ZFS list to see if there is some way to 
re-count the quota usage. 

The "inode" quota is accounted from the MDTs, while the "block" quota is 
accounted from the OSTs. You might be able to see with "lfs quota -v -g group" 
to see if there is one particular MDT that is returning too many inodes. 

Possibly if you have directories that are striped across many MDTs it would 
inflate the used inode count. For example, if every one of the 426k directories 
reported by RBH was striped across 4 MDTs then you would see the inode count 
add up to 3.6M. 

If that was the case, then I would really, really advise against striping every 
directory in the filesystem.  That will cause problems far worse than just 
inflating the inode quota accounting. 

Cheers, Andreas

> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss 
>  wrote:
> 
> Is there really no way to force a recount of files used by the quota? All 
> indications are we have accounts where files were removed and this is not 
> reflected in the used file count in the quota. The space used seems correct 
> but the inodes used numbers are way high. There must be a way to clear these 
> numbers and have a fresh count done.
> 
> —
> Dan Szkola
> FNAL
> 
>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss 
>>  wrote:
>> 
>> Also, quotas on the OSTS don’t add up to near 3 million files either:
>> 
>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   1394853459   0 1913344192   -  132863   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   1411579601   0 1963246413   -  120643   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   1416507527   0 1789950778   -  190687   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   1636465724   0 1926578117   -  195034   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   2202272244   0 3020159313   -  185097   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   1324770165   0 1371244768   -  145347   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   2892027349   0 3221225472   -  169386   0   0  
>>  -
>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 
>> /lustre1
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>>   2076201636   0 2474853207   -  171552   0   0  
>>  -
>> 
>> 
>> —
>> Dan Szkola
>> FNAL
>> 
 On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
  wrote:
>>> 
>>> No combination of ossnodek runs has helped with this.
>>> 
>>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' 
>>> found 1796104 files as well.
>>> 
>>> So why is the quota command showing over 3 million inodes used?
>>> 
>>> There must be a way to force it to recount or clear all stale quota data 
>>> and have it regenerate it?
>>> 
>>> Anyone?
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> 
>>> 
 On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
  wrote:
 
 We have a lustre filesystem that we just upgraded to 2.15.3, however this 
 problem has been going on for some time.
 
 The quota command shows this:
 
 Disk quotas for grp somegroup (gid 9544):
  Filesystemused   quota   limit   grace   files   quota   limit   grace
/lustre1  13.38T 40T 45T   - 

Re: [lustre-discuss] Ongoing issues with quota

2023-10-09 Thread Daniel Szkola via lustre-discuss
Is there really no way to force a recount of files used by the quota? All 
indications are we have accounts where files were removed and this is not 
reflected in the used file count in the quota. The space used seems correct but 
the inodes used numbers are way high. There must be a way to clear these 
numbers and have a fresh count done.

—
Dan Szkola
FNAL

> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss 
>  wrote:
> 
> Also, quotas on the OSTS don’t add up to near 3 million files either:
> 
> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>1394853459   0 1913344192   -  132863   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>1411579601   0 1963246413   -  120643   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>1416507527   0 1789950778   -  190687   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>1636465724   0 1926578117   -  195034   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>2202272244   0 3020159313   -  185097   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>1324770165   0 1371244768   -  145347   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>2892027349   0 3221225472   -  169386   0   0  
>  -
> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 /lustre1
> Disk quotas for grp somegroup (gid 9544):
> Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
>2076201636   0 2474853207   -  171552   0   0  
>  -
> 
> 
> —
> Dan Szkola
> FNAL
> 
>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
>>  wrote:
>> 
>> No combination of ossnodek runs has helped with this.
>> 
>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' 
>> found 1796104 files as well.
>> 
>> So why is the quota command showing over 3 million inodes used?
>> 
>> There must be a way to force it to recount or clear all stale quota data and 
>> have it regenerate it?
>> 
>> Anyone?
>> 
>> —
>> Dan Szkola
>> FNAL
>> 
>> 
>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
>>>  wrote:
>>> 
>>> We have a lustre filesystem that we just upgraded to 2.15.3, however this 
>>> problem has been going on for some time.
>>> 
>>> The quota command shows this:
>>> 
>>> Disk quotas for grp somegroup (gid 9544):
>>>   Filesystemused   quota   limit   grace   files   quota   limit   grace
>>> /lustre1  13.38T 40T 45T   - 3136761* 2621440 3670016 
>>> expired
>>> 
>>> The group is not using nearly that many files. We have robinhood installed 
>>> and it show this:
>>> 
>>> Using config file '/etc/robinhood.d/lustre1.conf'.
>>>   group, type,  count, volume,   spc_used,   avg_size
>>> somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
>>> somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
>>> somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB
>>> 
>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space 
>>> used: 14704924899840 bytes (13.37 TB)
>>> 
>>> Any ideas what is wrong here?
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR&s=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI&e=
>>>  
>> 
>> _