Re: [lustre-discuss] FOLLOW UP: MDT filling up with 4 MB files

2016-10-18 Thread Colin Faber
Hi,

There was a bug (sorry don't recall which) that would leave the llog files
present and full of logs which should have been cleared, remount was the
solution. I don't recall the details here but I'm sure you'd be able to
find the LU ticket after some searching.

-cf


On Sat, Oct 15, 2016 at 3:49 PM, Pawel Dziekonski  wrote:

> Hi,
>
> we had the same problem on 2.5.3. Robinhood was supposed to
> consume changelog but it wasn't. Don't know why.  Simply
> disabling changelog was not enough - we had to remount MDT.
> We did it by simply doing failover to other MDS node (HA
> pair).
>
> The other issue we had with MDT was the size of inodes -
> they are (were at that time) created with 512 bytes by
> default and when you use the stripe count then it will not
> accommodate the lfsck and xattr data on that single inode
> and it starts utilizing the disk space. So you have to
> create the inodes with proper size, then whole data will be
> saved in that same inode and will not occupy additional disk
> space.  AFAIK, this is a known issue since 2.x.
> Unfortunately the only solution was to reformat MDT offline.
>
> P
>
>
>
>
> (via failover for
> On pią, 14 paź 2016 at 06:46:59 -0400, Jessica Otey wrote:
> > All,
> > My colleagues in Chile now believe that both of their 2.5.3 file
> > systems are experiencing this same problem with the MDTs filling up
> > with files. We have also come across a report from another user from
> > early 2015 denoting the same issue, also with a 2.5.3 system.
> >
> > See: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.
> mail-2Darchive.com_search-3Fl-3Dlustre-2Ddiscuss-40lists.
> lustre.org-26q-3Dsubject-3A-2522Re-255C-253A-2B-255C-
> 255Blustre-255C-2Ddiscuss-255C-255D-2BMDT-2Bpartition-
> 2Bgetting-2Bfull-2522-26o-3Dnewest&d=DQIGaQ&c=IGDlg0lD0b-nebmJJ0Kp8A&r=
> x9pM59OqndbWw-lPPdr8w1Vud29EZigcxcNkz0uw5oQ&m=hrnhInO0YKCrI7g-
> bxkr6YelhXQXywcc8cF4G8x3rR4&s=ulbEYTQ5HJrTLe_tNQ3LdH_Ylc_
> qwxZOWD3MlN_a5Bc&e=
> >
> > We are confident that these files are not related to the changelog
> feature.
> >
> > Does anyone have any other suggestions as to what the cause of this
> > problem could be?
> >
> > I'm intrigued that the Lustre version involved in all 3 reports is
> > 2.5.3. Could this be a bug?
> >
> > Thanks,
> > Jessica
> >
> >
> > >On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey  > >> wrote:
> > >
> > >Hello all,
> > >I write on behalf of my colleagues in Chile, who are experiencing
> > >a bizarre problem with their MDT, namely, it is filling up with 4
> > >MB files. There is no issue with the number of inodes, of which
> > >there are hundreds of millions unused. Â
> > >
> > >[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
> > >device /dev/sdb2 mounted by lustre
> > >Filesystem features: Â  Â  Â has_journal ext_attr resize_inode
> > >dir_index filetype needs_recovery flex_bg dirdata sparse_super
> > >large_file huge_file uninit_bg dir_nlink quota
> > >Inode count: Â  Â  Â  Â  Â  Â  Â 239730688
> > >Free inodes: Â  Â  Â  Â  Â  Â  Â 223553405
> > >Inodes per group: Â  Â  Â  Â  32768
> > >Inode blocks per group: Â  4096
> > >First inode: Â  Â  Â  Â  Â  Â  Â 11
> > >Inode size:Â  Â  Â  Â  Â 512
> > >Journal inode: Â  Â  Â  Â  Â  Â 8
> > >Journal backup: Â  Â  Â  Â  Â  inode blocks
> > >User quota inode: Â  Â  Â  Â  3
> > >Group quota inode: Â  Â  Â  Â 4
> > >
> > >Has anyone ever encountered such a problem? The only thing unusual
> > >about this cluster is that it is using 2.5.3 MDS/OSSes while still
> > >using 1.8.9 clients—something I didn't actually believe was
> > >possible, as I thought the last version to work effectively with
> > >1.8.9 clients was 2.4.3. However, for all I know, the version gap
> > >may have nothing to do with this phenomena.
> > >
> > >Any and all advice is appreciated. Any general information on the
> > >structure of the MDT also welcome, as such info is in short supply
> > >on the internet.
> > >
> > >Thanks,
> > >Jessica
> > >
>
> --
> Pawel Dziekonski 
> Wroclaw Centre for Networking & Supercomputing, HPC Department
> phone: +48 71 320 37 39, fax: +48 71 322 57 97,
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> wcss.pl&d=DQIGaQ&c=IGDlg0lD0b-nebmJJ0Kp8A&r=x9pM59OqndbWw-
> lPPdr8w1Vud29EZigcxcNkz0uw5oQ&m=hrnhInO0YKCrI7g-
> bxkr6YelhXQXywcc8cF4G8x3rR4&s=Ob17Q3c2GcDNu-2FIvqyDwKxVQ-
> u7xeBzVTMo9we3XI&e=
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.
> lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&
> d=DQIGaQ&c=IGDlg0lD0b-nebmJJ0Kp8A&r=x9pM59OqndbWw-
> lPPdr8w1Vud29EZigcxcNkz0uw5oQ&m=hrnhInO0YKCrI7g-
> bxkr6YelhXQXywcc8cF4G8x3rR4&s=eHOc6uHZuJVxo_TiHizVTzK4FmtxLwtp0jQjbKWFGK8&
> e=
>
___
lustre-discuss maili

Re: [lustre-discuss] FOLLOW UP: MDT filling up with 4 MB files

2016-10-15 Thread Pawel Dziekonski
Hi,

we had the same problem on 2.5.3. Robinhood was supposed to
consume changelog but it wasn't. Don't know why.  Simply
disabling changelog was not enough - we had to remount MDT.
We did it by simply doing failover to other MDS node (HA
pair).

The other issue we had with MDT was the size of inodes -
they are (were at that time) created with 512 bytes by
default and when you use the stripe count then it will not
accommodate the lfsck and xattr data on that single inode
and it starts utilizing the disk space. So you have to
create the inodes with proper size, then whole data will be
saved in that same inode and will not occupy additional disk
space.  AFAIK, this is a known issue since 2.x.
Unfortunately the only solution was to reformat MDT offline.

P




(via failover for 
On pią, 14 paź 2016 at 06:46:59 -0400, Jessica Otey wrote:
> All,
> My colleagues in Chile now believe that both of their 2.5.3 file
> systems are experiencing this same problem with the MDTs filling up
> with files. We have also come across a report from another user from
> early 2015 denoting the same issue, also with a 2.5.3 system.
> 
> See: 
> https://www.mail-archive.com/search?l=lustre-discuss@lists.lustre.org&q=subject:%22Re%5C%3A+%5C%5Blustre%5C-discuss%5C%5D+MDT+partition+getting+full%22&o=newest
> 
> We are confident that these files are not related to the changelog feature.
> 
> Does anyone have any other suggestions as to what the cause of this
> problem could be?
> 
> I'm intrigued that the Lustre version involved in all 3 reports is
> 2.5.3. Could this be a bug?
> 
> Thanks,
> Jessica
> 
> 
> >On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey  >> wrote:
> >
> >Hello all,
> >I write on behalf of my colleagues in Chile, who are experiencing
> >a bizarre problem with their MDT, namely, it is filling up with 4
> >MB files. There is no issue with the number of inodes, of which
> >there are hundreds of millions unused. Â
> >
> >[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
> >device /dev/sdb2 mounted by lustre
> >Filesystem features: Â  Â  Â has_journal ext_attr resize_inode
> >dir_index filetype needs_recovery flex_bg dirdata sparse_super
> >large_file huge_file uninit_bg dir_nlink quota
> >Inode count: Â  Â  Â  Â  Â  Â  Â 239730688
> >Free inodes: Â  Â  Â  Â  Â  Â  Â 223553405
> >Inodes per group: Â  Â  Â  Â  32768
> >Inode blocks per group: Â  4096
> >First inode: Â  Â  Â  Â  Â  Â  Â 11
> >Inode size:Â  Â  Â  Â  Â 512
> >Journal inode: Â  Â  Â  Â  Â  Â 8
> >Journal backup: Â  Â  Â  Â  Â  inode blocks
> >User quota inode: Â  Â  Â  Â  3
> >Group quota inode: Â  Â  Â  Â 4
> >
> >Has anyone ever encountered such a problem? The only thing unusual
> >about this cluster is that it is using 2.5.3 MDS/OSSes while still
> >using 1.8.9 clients—something I didn't actually believe was
> >possible, as I thought the last version to work effectively with
> >1.8.9 clients was 2.4.3. However, for all I know, the version gap
> >may have nothing to do with this phenomena.
> >
> >Any and all advice is appreciated. Any general information on the
> >structure of the MDT also welcome, as such info is in short supply
> >on the internet.
> >
> >Thanks,
> >Jessica
> >

-- 
Pawel Dziekonski 
Wroclaw Centre for Networking & Supercomputing, HPC Department
phone: +48 71 320 37 39, fax: +48 71 322 57 97, http://www.wcss.pl
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] FOLLOW UP: MDT filling up with 4 MB files

2016-10-14 Thread Jessica Otey

All,
My colleagues in Chile now believe that both of their 2.5.3 file systems 
are experiencing this same problem with the MDTs filling up with files. 
We have also come across a report from another user from early 2015 
denoting the same issue, also with a 2.5.3 system.


See: 
https://www.mail-archive.com/search?l=lustre-discuss@lists.lustre.org&q=subject:%22Re%5C%3A+%5C%5Blustre%5C-discuss%5C%5D+MDT+partition+getting+full%22&o=newest


We are confident that these files are not related to the changelog feature.

Does anyone have any other suggestions as to what the cause of this 
problem could be?


I'm intrigued that the Lustre version involved in all 3 reports is 
2.5.3. Could this be a bug?


Thanks,
Jessica


On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey > wrote:


Hello all,
I write on behalf of my colleagues in Chile, who are experiencing
a bizarre problem with their MDT, namely, it is filling up with 4
MB files. There is no issue with the number of inodes, of which
there are hundreds of millions unused. Â

[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
device /dev/sdb2 mounted by lustre
Filesystem features: Â  Â  Â has_journal ext_attr resize_inode
dir_index filetype needs_recovery flex_bg dirdata sparse_super
large_file huge_file uninit_bg dir_nlink quota
Inode count: Â  Â  Â  Â  Â  Â  Â 239730688
Free inodes: Â  Â  Â  Â  Â  Â  Â 223553405
Inodes per group: Â  Â  Â  Â  32768
Inode blocks per group: Â  4096
First inode: Â  Â  Â  Â  Â  Â  Â 11
Inode size:Â  Â  Â  Â  Â 512
Journal inode: Â  Â  Â  Â  Â  Â 8
Journal backup: Â  Â  Â  Â  Â  inode blocks
User quota inode: Â  Â  Â  Â  3
Group quota inode: Â  Â  Â  Â 4

Has anyone ever encountered such a problem? The only thing unusual
about this cluster is that it is using 2.5.3 MDS/OSSes while still
using 1.8.9 clients—something I didn't actually believe was
possible, as I thought the last version to work effectively with
1.8.9 clients was 2.4.3. However, for all I know, the version gap
may have nothing to do with this phenomena.

Any and all advice is appreciated. Any general information on the
structure of the MDT also welcome, as such info is in short supply
on the internet.

Thanks,
Jessica

Below is a look inside the O folder at the root of the MDT, where
there are about 48,000 4MB files:

[root@jaopost-mds O]# pwd
/lustrebackup/O
[root@jaopost-mds O]# tree -L 1
.
├── 1
├── 10
└── 20003

3 directories, 0 files

[root@jaopost-mds O]# ls -l 1
total 2240
drwx-- 2 root root 69632 sep 16 16:25 d0
drwx-- 2 root root 69632 sep 16 16:25 d1
drwx-- 2 root root 61440 sep 16 17:46 d10
drwx-- 2 root root 69632 sep 16 17:46 d11
drwx-- 2 root root 69632 sep 16 18:04 d12
drwx-- 2 root root 65536 sep 16 18:04 d13
drwx-- 2 root root 65536 sep 16 18:04 d14
drwx-- 2 root root 69632 sep 16 18:04 d15
drwx-- 2 root root 61440 sep 16 18:04 d16
drwx-- 2 root root 61440 sep 16 18:04 d17
drwx-- 2 root root 69632 sep 16 18:04 d18
drwx-- 2 root root 69632 sep 16 18:04 d19
drwx-- 2 root root 65536 sep 16 16:25 d2
drwx-- 2 root root 69632 sep 16 18:04 d20
drwx-- 2 root root 69632 sep 16 18:04 d21
drwx-- 2 root root 61440 sep 16 18:04 d22
drwx-- 2 root root 69632 sep 16 18:04 d23
drwx-- 2 root root 61440 sep 16 16:11 d24
drwx-- 2 root root 69632 sep 16 16:11 d25
drwx-- 2 root root 69632 sep 16 16:11 d26
drwx-- 2 root root 69632 sep 16 16:11 d27
drwx-- 2 root root 69632 sep 16 16:25 d28
drwx-- 2 root root 69632 sep 16 16:25 d29
drwx-- 2 root root 69632 sep 16 16:25 d3
drwx-- 2 root root 65536 sep 16 16:25 d30
drwx-- 2 root root 65536 sep 16 16:25 d31
drwx-- 2 root root 69632 sep 16 16:25 d4
drwx-- 2 root root 61440 sep 16 16:25 d5
drwx-- 2 root root 69632 sep 16 16:25 d6
drwx-- 2 root root 73728 sep 16 16:25 d7
drwx-- 2 root root 65536 sep 16 17:46 d8
drwx-- 2 root root 69632 sep 16 17:46 d9
-rw-r--r-- 1 root root 8 ene  4  2016 LAST_ID

[root@jaopost-mds d0]# ls -ltr | more
total 5865240
-rw-r--r-- 1 root root  252544 ene  4  2016 32
-rw-r--r-- 1 root root 2396224 ene  9  2016 2720
-rw-r--r-- 1 root root 4153280 ene  9  2016 2752
-rw-r--r-- 1 root root 4153280 ene 10  2016 2784
-rw-r--r-- 1 root root 4153280 ene 10  2016 2816
-rw-r--r-- 1 root root 4153280 ene 10  2016 2848
-rw-r--r-- 1 root root 4153280 ene 10  2016 2880
-rw-r--r-- 1 root root 4153280 ene 10  2016 2944
-rw-r--r-- 1 root root 4153280 ene 10  2016 2976
-rw-r--r-- 1 root root 4153280 ene 10  2016 3008
-rw-r--r-- 1 root root 4153280 ene 10  2016 3040
-rw-r--r-- 1 root root 41