Re: [lustre-discuss] lfsck repair quota

2019-04-17 Thread Martin Hecht
Dear Fernando,

I'm not sure if those files contribute to the quota, but I would assume
that the ones on the OSTs consume disk quota and the ones on the MDT
consume inode quota.
As long as they are in the lost+found directory they are not visible to
the users, but they may contain data which belonged to user files. If
they contain useful data and if files can be reconstructed completely
depends on the exact damage that the e2fsck has tried to repair. A 
complete output of all the fsck runs could tell more, but even with that
one would probably need further information e.g. about the striping
stored in the extended attributes of the files.

best regards,
Martin



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Martin Hecht
Are there a lot of inodes moved to lost+found by the fsck, which contribute to 
the occupied quota now?

- Ursprüngliche Mail -
Von: Fernando Pérez 
An: lustre-discuss@lists.lustre.org
Gesendet: Tue, 16 Apr 2019 16:24:13 +0200 (CEST)
Betreff: Re: [lustre-discuss] lfsck repair quota

Thank you Rick.

I followed these steps for the ldiskfs OSTs and MDT, but the quotes for all 
users is more corrupted than before.

I tried to run e2fsck in ldiskfs OSTs MDT, but the problem was the MDT e2fsck 
ran very slow ( 10 inodes per second for more than 100 million inodes).

According to the lustre wiki I though that the lfsck could repair corrupted 
quotes:

http://wiki.lustre.org/Lustre_Quota_Troubleshooting

Regards.


Fernando Pérez
Institut de Ciències del Mar (CSIC)
Departament Oceanografía Física i Tecnològica
Passeig Marítim de la Barceloneta,37-49
08003 Barcelona
Phone:  (+34) 93 230 96 35


> El 16 abr 2019, a las 15:34, Mohr Jr, Richard Frank (Rick Mohr) 
>  escribió:
> 
> 
>> On Apr 15, 2019, at 10:54 AM, Fernando Perez  wrote:
>> 
>> Could anyone confirm me that the correct way to repair wrong quotes in a 
>> ldiskfs mdt is lctl lfsck_start -t layout -A?
> 
> As far as I know, lfsck doesn’t repair quota info. It only fixes internal 
> consistency within Lustre.
> 
> Whenever I have had to repair quotas, I just follow the procedure you did 
> (unmount everything, run “tune2fs -O ^quota ”, run “tune2fs -O quota 
> ”, and then remount).  But all my systems used ldiskfs, so I don’t know 
> if the ZFS OSTs introduce any sort of complication.  (Actually, I am not even 
> sure if/how you can regenerate quota info for ZFS.)
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-21 Thread Martin Hecht
Hello Roland,

there is a nice collection of lustre monitoring tools on the lustre wiki:

http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide

which also contains a couple of references. One of them is lltop, which
has already been mentioned a couple of times and that's what came to my
mind as well when I read your question.

best regards,
Martin



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ko2iblnd optimizations for EDR

2018-11-08 Thread Martin Hecht
On 11/7/18 9:44 PM, Riccardo Veraldi wrote:
> Anyway I Was wondering if something different is needed for mlx5 and
> what are the suggested values in that case ?
>
> Anyone has experience with mlx5 LNET performance tunings ? 

Hi Riccardo,

We have recently integrated mlx5 nodes into our fabric, and we had to
reduce the values to

peer_credits = 16
concurrent_sends = 16

because mlx5 doesn't support larger values for some reason. The peer_credits 
must have the same value in all connected lnets, even across routers (at least 
it used to be like this. I believe we are currently running some Lustre 2.5.x 
derivates on the server side, and newer versions on the various clients).

kind regards,
Martin


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] building lustre 2.11.50 on CentOS 7.4

2018-04-10 Thread Martin Hecht

problem solved:

another git pull today, followed by autogen.sh and configure has made
the error go away.

I assume it was LU-10752 which was fixed by a patch by James Simons
(commit 6189ae07c5161d14c9e9f863a400045f923f2301) that was landed on the
hpdd git 16 hours ago.

Martin

On 04/09/2018 04:55 PM, Martin Hecht wrote:
> Hi,
>
> I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4.
>
> patching the kernel for ldiskfs worked fine, I have installed and booted
> the patched kernel as well as the devel-rpm,  but when I run `make rpms`
> it exits with the following errors:
>
> Processing files: lustre-2.11.50-1.el7.centos.x86_64
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf
>
>
> RPM build errors:
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf
> make: *** [rpms] Error 1
>
> just `make` works fine, so the problem is something with packaging the
> rpms. Any hints?
>
> kind regards,
> Martin
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


-- 
Dr. Martin Hecht
High Performance Computing Center Stuttgart (HLRS)
Office 0.051, HPCN Production, IT-Security
University of Stuttgart
Nobelstraße 19, 70569 Stuttgart, Germany
Tel: +49(0)711/685-65799  Fax: -55799
Mail: he...@hlrs.de
Web: http://www.hlrs.de/people/hecht/
PGP Key available at: https://www.hlrs.de/fileadmin/user_upload/Martin_Hecht.pgp
PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] building lustre 2.11.50 on CentOS 7.4

2018-04-09 Thread Martin Hecht
Hi,

I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4.

patching the kernel for ldiskfs worked fine, I have installed and booted
the patched kernel as well as the devel-rpm,  but when I run `make rpms`
it exits with the following errors:

Processing files: lustre-2.11.50-1.el7.centos.x86_64
error: File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
error: File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
error: File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf


RPM build errors:
    File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
    File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
    File not found:
/tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf
make: *** [rpms] Error 1

just `make` works fine, so the problem is something with packaging the
rpms. Any hints?

kind regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Mixed size OST's

2018-03-16 Thread Martin Hecht
On 03/15/2018 04:48 PM, Steve Thompson wrote:
> If I go with one OST per system (one zpool comprising 8 x 6 RAIDZ2
> vdevs), I will have a lustre f/s comprised of two 60 TB OST's and two
> 192 TB OST's (minus RAIDZ2 overhead). This is obviously a big mismatch
> between OST sizes.
Depending on how full your file system is going to be, it may be better
to create more OSTs on the new OSSes to have all OSTs roughly of the
same size and avoid trouble balancing the fill level of the OSTs.

We had a lustre system (back in lustre 1.8 times) with different disk
sizes. We did put them into pools such that each pool contains only OSTs
of the same size. We balanced the users between the pools such that the
larger OSTs were filled more quickly than the smaller ones, or put in
other words such that the percentage how much an OST was filled remained
homogeneous across the whole file system. It worked, but this manual
interaction was needed to prevent the small OSTs from reaching a
critical filling level more quickly than the large ones.

Maybe the internal algorithm has been improved in the meantime, but as
far as I know it is just round robin until a critical difference of
levels is reached and the weighted stripe allocation impacts
performance. A rough description can be found on
wiki.lustre.org/Managing_Free_Space or in the Lustre Manual in the
corresponding chapter. However, I'm not sure if these sections are all
up to date.




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Fwd: FW: mdt mounting error

2017-11-09 Thread Martin Hecht
Hi Parag,

can you lctl ping 10.2.1.204@o2ib from the mgs node and from the mds
now? I have seen on the list that you were able to load the modules, but
well, if lnet is not working on the ib this might be a the reason for
the errors you are seeing.

Regards,
Martin


On 11/08/2017 09:15 AM, Parag Khuraswar wrote:
> Hi,
>
> Any resolution on this ?
>
> Regards,
> Parag.
>
>
>
>  Original Message 
> Subject: [lustre-discuss] FW:  mdt mounting error
> Date: 2017-11-07 15:29
> From: "Parag Khuraswar" 
> To: "'Lustre discussion'" 
>
> Hi,
>
> Lustre module is loaded.  I am getting bellow errors in
> /var/log/messaged while mounting mdt.
>
> Nov  7 14:10:03 mds1 kernel: LDISKFS-fs (dm-2): mounted filesystem with
> ordered data mode. Opts:
> user_xattr,errors=remount-ro,no_mbcache,nodelalloc
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(ldlm_lib.c:483:client_obd_setup()) can't add initial connection
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(obd_config.c:608:class_setup()) setup MGC10.2.1.204@o2ib failed
> (-2)
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(obd_mount.c:202:lustre_start_simple()) MGC10.2.1.204@o2ib setup
> error -2
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(obd_mount_server.c:1573:server_put_super()) no obd home-MDT
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(obd_mount_server.c:132:server_deregister_mount()) home-MDT
> not registered
>
> Nov  7 14:10:03 mds1 kernel: Lustre: server umount home-MDT complete
>
> Nov  7 14:10:03 mds1 kernel: LustreError:
> 4852:0:(obd_mount.c:1504:lustre_fill_super()) Unable to mount  (-2)
>
> Regards,
>
> Parag
>
> FROM: Ben Evans [mailto:bev...@cray.com]
> SENT: Wednesday, November , 2017 6:19 PM
> TO: Raj; Parag Khuraswar
> CC: Lustre discussion
> SUBJECT: Re: [lustre-discuss] mdt mounting error
>
> On the node in question Try: lsmod | grep lustre
>
> followed by: modprobe lustre
>
> I'm betting the modules aren't loaded for some reason, generally that
> reason is found in dmesg.
>
> FROM: lustre-discuss  on behalf
> of Raj 
> DATE: Wednesday, November 1, 2017 at 7:48 AM
> TO: Parag Khuraswar 
> CC: Lustre discussion 
> SUBJECT: Re: [lustre-discuss] mdt mounting error
>
> Parag, I have not tested two FS using a common MGT and I don't know
> whether it is supported.
>
> On Wed, Nov 1, 2017 at 6:37 AM Parag Khuraswar 
> wrote:
>
>> Hi Raj,
>>
>> But I have two file systems,
>> And I think I can use one mgt for two filesystems. Please correct me
>> if
>> I am wrong.
>>
>> Regards,
>> Parag
>>
>> On 2017-11-01 16:56, Raj wrote:
>>> The following can contribute to this issue:
>>> - Missing FS name in mgt creation (it must be <=9 character long):
>>> --fsname=
>>> mkfs.lustre --servicenode=10.2.1.204@o2ib
>>> --servicenode=10.2.1.205@o2ib --FSNAME=HOME --mgs /dev/mapper/mpathc
>>>
>>> - verify if /mdt directory exists
>>>
>>> On Wed, Nov 1, 2017 at 6:16 AM Raj  wrote:
>>>
 What options in mkfs.lustre did you use to format with lustre?

 On Wed, Nov 1, 2017 at 6:14 AM Parag Khuraswar
  wrote:

 Hi Raj,

 Yes, /dev/mapper/mpatha available.

 I could format and mount using ext4.
>>

 Regards,

 Parag

 FROM: Raj [mailto:rajgau...@gmail.com]
 SENT: Wednesday, November , 2017 4:39 PM
 TO: Parag Khuraswar; Lustre discussion
 SUBJECT: Re: [lustre-discuss] mdt mounting error

 Parag,
 Is the device /dev/mapper/mpatha available?
 If not, the multipathd may not have started or the multipath
 configuration may not be correct.

 On Wed, Nov 1, 2017 at 5:18 AM Parag Khuraswar
  wrote:

 Hi,

 I am getting below error while mounting mdt. Mgt is mounted.

 Please suggest

 [root@mds2 ~]# mount -t lustre /dev/mapper/mpatha /mdt

 mount.lustre: mount /dev/mapper/mpatha at /mdt failed: No such file
 or directory

 Is the MGS specification correct?

 Is the filesystem name correct?

 If upgrading, is the copied client log valid? (see upgrade docs)

 Regards,

 Parag

 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




smime.p7s
Description: S/MIME 

Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi Parag,

please reply to the list or keep it in cc at least

On 10/30/2017 01:21 PM, Parag Khuraswar wrote:
> Hi Martin,
>
> The problem got resolved.
> But I am not able to see ib in 'lctl list_nids' output
> My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is
> not executable.
>
> Can you help ?
>
> Regards,
> Parag
your lnet is probably not configured correctly. Things to check:
- is the ib0 device there (i.e. make sure the infiniband layer works
correctly)?
- does the ib0 haven an ip address? (lustre normally doesn't use ip over
ib but it uses the ip-addresses for identifying the hosts)
- verify that you can ping the ip (with normal network ping to ensure
that the connection is working)
- Is the lnet module loaded?
- if not can you load it manually with modprobe lnet?
- what is written to dmesg / syslog when it fails?
- when the module is loaded, try lctl network up

Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi,

On 10/30/2017 09:56 AM, Parag Khuraswar wrote:
> Hi,
>
> I am installing lustre cloned from github.  
Hmm... there are a few lustre related repositories on github.
I would prefer the upstream Lustre git repository managed by Intel
git://git.hpdd.intel.com unless you are interested in specific features
that are not (yet) available from there.

> After build of rpms I am trying
> to install lustre rpms. 
>
> I am getting below error 
>
> Requires: ldiskfsprogs >= 1.42.7.wc1
>
> But while compilation this package was not built.
ldiskfsprogs used to be called e2fsprogs. However, in my experience it
is a bit more a challenge to build these ones from source than for the
main lustre packages. Anyhow, in Intel's git Lustre repository
git://git.hpdd.intel.com there is also a branch tools/e2fsprogs.git - or
you can use pre-built rpms for your OS from

https://downloads.hpdd.intel.com/public/e2fsprogs/latest/

best regards,
Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre [2.8.0] flock Functionality

2017-03-29 Thread Martin Hecht
Hello,

we use the flock mount option on all our lustre systems (currently some
2.5 versions) and are not aware of any issues due to that.

If your applications run on a single node (or require  locks only
locally) you could also try localflock.
localflock has less performance impact than the global flock. How much
impact you have depends on how heavily the applications make use of
locks. We have measured a few per cent on lustre 1.8 in simple tests,
but I think that the performance impact nowadays is even less, but as I
said, it depends on the IO pattern.

localflock is more risky than flock, because it makes your application
think that locks are there, but in fact they are not globally visible,
which may lead to strange effects with parallel applications spanning
several nodes. We were running localflock on one of our systems for some
time and occasionally heard about such problems from a few users.

best regards, Martin

On 03/28/2017 07:49 PM, DeWitt, Chad wrote:
> Good afternoon, All.
>
> We've encountered several programs that require flock, so we are now
> investigating enabling flock functionality.  However, the Lustre manual
> includes a passage in regards to flocks which gives us pause:
>
> "Warning
> This mode affects the performance of the file being flocked and may affect
> stability, depending on the Lustre version used.  Consider using a newer
> Lustre version which is more stable. If the consistent mode is enabled and
> no applications are using flock, then it has no effect."
>
> We are running Lustre 2.8.0 (servers and clients).  I've looked through
> Jira, but didn't see anything that looked like a showstopper.
>
> Just curious if anyone has enabled flocks and encountered issues?  Anything
> in particular to look out for?
>
> Thank you in advance,
> Chad
>
> 
>
> Chad DeWitt, CISSP | HPC Storage Administrator
>
> UNC Charlotte *| *ITS – University Research Computing
>
> 
>




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] many 'ksym' packages required

2016-12-20 Thread Martin Hecht
I have seen this, too, on SL6, build went smoothly, but installation
failed. A few months before 2.9 was tagged on master the build and
install went smoothly. I'm not using zfs by the way. Unfortunately, I
didn't find the time yet, to investigate this more deeply.

Cheers, Martin

On 12/20/2016 05:34 AM, Andrus, Brian Contractor wrote:
> All,
> I am running into an issue lately on rpms I build and the ones I download 
> from intel, where I try to install the server zfs module 
> (kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64.rpm) and it gives me a TON of errors 
> about Requires: ksym(xxx)
> Example:
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_pool) = 0xa8cb0bd0
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(zap_cursor_serialize) = 0x3f455060
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_prefetch) = 0x7947c677
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dsl_prop_register) = 0xa6f021e0
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_space) = 0x0a5a5f8f
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(zfs_prop_to_name) = 0xa483a8c3
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(txg_wait_callbacks) = 0x90f50ab1
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(nvlist_pack) = 0x424ac2e1
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_buf_rele) = 0x53e356d2
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_buf_hold_array_by_bonus) = 0x330ef227
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>Requires: ksym(dmu_objset_disown) = 0x27d01e19
> Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 
> (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64)
>
> Has anyone seen this before and know what the issue could be?
>
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Mounting Lustre over IB-to-Ethernet gateway

2016-08-02 Thread Martin Hecht
Hi Kevin,

I think your proposed lnet config line is correct and it would add tcp0.
If you add a new lnet on the servers you have to reload the lnet module,
which implies that you have to restart lustre (you don't have to reboot
if unloading the modules works smoothly, i.e. unmounting all targets,
followed by lustre_rmmod, and then mounting targets again, you don't
have to restart the ib clients though).

If you have clients with an interface on both networks (which could act
as lnet routers) you can do without restarting the servers. You don't
have to add the lnet on the servers in that case, but you just have to
add the routes to the new lnet on all servers which works in production
with lctl --net tcp0 add_route client-ip@o2ib0. On the routers you need
forwarding="enabled" and they need both lnets, each of them assigned to
the appropriate interface (in order to configure this you have to reload
the lnet module on the clients which will act as routers). On the tcp
clients you would need the route across the routers in the opposite
direction. However, in that scenario you wouldn't use the ib2eth gateway.

Greetings,
Martin

On 08/01/2016 01:05 PM, Kevin M. Hildebrand wrote:
> Our Lustre filesystem is currently set up to use the o2ib interface only-
> all of the servers have
> options lnet networks=o2ib0(ib0)
>
> We've just added a Mellanox IB-to-Ethernet gateway and would like to be
> able to have clients on the Ethernet side also mount Lustre.  The gateway
> extends the same layer-2 IP range that's being used for IPoIB out to the
> Ethernet clients
>
> How should I go about doing this?  Since the clients don't have IB, it
> doesn't appear that I can use o2ib0 to mount.  Do I need to add another
> lnet network on the servers?  Something like
> options lnet networks=o2ib0(ib0),tcp0(ib0)?  Can I have both protocols on
> the same interface?
> And if I do have to add another lnet network, is there any way to do so
> without restarting the servers?
>
> Thanks,
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland, College Park
> Division of IT
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Analog of ll_recover_lost_found_objs for MDS

2016-07-27 Thread Martin Hecht
Hi James,

I'm not aware of a ready-to use tool, but if you have captured the
output of e2fsck you can use that as a basis for a script that puts the
files back to their original location.
e2fsck usually prints out the full path and the inode numbers of the
files/directories which it moves to lost+found and there, they are named
"#$inode" (which makes scripting a bit ugly, but if you properly escape
the '#'-sign and do some magic with awk, perl or alike, you can
transform the log to a shell script that moves your files back to the
orignal path). I have done this once after a file system corruption
after an upgrade from 1.8 to 2.4 (which contained an ugly bug when
enabling the "FID in dirent" feature).

The backup... well, it gives you the chance to go back to the state
where you started *before* the e2fsck. That would be a chance to capture
the output again, in case you did not store it (actually, you could do
this offline, on a copy of the backup). Restoring the MDT out of the
backup however is only useful as long as you did not go in production
after the e2fsck. And as I said, you still have to "repair" the restored
MDT (probably by doing the same steps as you already did on the live
system), but a second chance is better than no chance to go back... The
backup is also good to investigate what happened during the e2fsck (in
case it did something weird) or to go in with e2fsdebug for manual
investigations... (e.g. manually look up inode<->path relations).

Martin


On 07/26/2016 04:08 PM, jbellinger wrote:
> Is there, or could there be, something analogous to the OST recovery
> tool that works on the lost+found on the MDT?  e2fsck went berserk.
>
> We're running 2.5.3.
>
>
> Thanks,
> James Bellinger
>
> Yes, we have an older (therefore somewhat inconsistent) backup of the
> mdt, so we should be able to recover most things, _in theory_.  In
> practice -- we'd love to hear other people's experience about recovery
> using an inconsistent backup.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ​luster client mount issues

2016-07-21 Thread Martin Hecht
Hi,

I think your client doesn't have the o2ib lnet (it should appear in the
output of the lctl ping, even if you ping on the tcp lnet).
In your /etc/modprobe.d/lustre.conf o2ib is associated with the ib0
interface, but your /var/log/messages talks about ib1.
If it is a dual port card where just one port is used, the easiest would
be to plug the cable to the other interface. (If there are two ib
connections, things might become a bit more complicated. There are
examples for multi rail configurations using several lnets in the lustre
manual, but maybe this goes too far.)

With the attempt to mount via tcp (or tcp0, which is the same) I think
the problem is that the file system config on the mgs doesn't contain
the tcp-NIDs and/or the routes are not configured correctly. It seems
the attempt to mount via tcp causes the client to use o2ib for the
connections to the MDS and OSSes. So, I would recommend to get that
working first and then look at tcp0 at a later stage (if you need it at
all - native o2ib is more performant).

Last but not least I have noticed a typo in your client mount command:
mount -t lustre 192.168.200.52@ob2:/mylustre /lustre
this should be "o2ib" here, too.

best regards,
Martin

On 07/20/2016 08:09 PM, sohamm wrote:
> Hi
>
> Any guidance/help on this is greatly appreciated.
>
> Thanks
>
> On Mon, Jul 18, 2016 at 7:25 PM, sohamm  wrote:
>
>> Hi Ben
>> Both the networks have netmasks of value 255.255.255.0
>>
>> Thanks
>>
>> On Mon, Jul 18, 2016 at 10:08 AM, Ben Evans  wrote:
>>
>>> What do your netmasks look like on each network?
>>>
>>> From: lustre-discuss  on behalf
>>> of sohamm 
>>> Date: Monday, July 18, 2016 at 1:56 AM
>>> To: "lustre-discuss@lists.lustre.org" 
>>> Subject: Re: [lustre-discuss] lustre-discuss Digest, Vol 124, Issue 17
>>>
>>> Hi Thomas
>>> Below are the results of the commands you suggested.
>>>
>>> *From Client*
>>> [root@dev1 ~]# lctl ping 192.168.200.52@o2ib
>>> failed to ping 192.168.200.52@o2ib: Input/output error
>>> [root@dev1 ~]# lctl ping 192.168.111.52@tcp
>>> 12345-0@lo
>>> 12345-192.168.200.52@o2ib
>>> 12345-192.168.111.52@tcp
>>> [root@dev1 ~]# mount -t lustre 192.168.111.52@tcp:/mylustre /lustre
>>> mount.lustre: mount 192.168.111.52@tcp:/mylustre at /lustre failed:
>>> Input/output error
>>> Is the MGS running?
>>> mount: mounting 192.168.111.52@tcp:/mylustre on /lustre failed: Invalid
>>> argument
>>>
>>> cat /var/log/messages | tail
>>> Jul 18 01:37:04 dev1 user.warn kernel: [2250504.401397] ib1: multicast
>>> join failed for ff12:401b::::::, status -22
>>> Jul 18 01:37:26 dev1 user.warn kernel: [2250526.257309] LNet: No route to
>>> 12345-192.168.200.52@o2ib via  (all routers down)
>>> Jul 18 01:37:36 dev1 user.warn kernel: [2250536.481862] ib1: multicast
>>> join failed for ff12:401b::::::, status -22
>>> Jul 18 01:41:53 dev1 user.warn kernel: [2250792.947299] LNet: No route to
>>> 12345-192.168.200.52@o2ib via  (all routers down)
>>>
>>>
>>> *From MGS*
>>> [root@lustre_mgs01_vm03 ~]# lctl ping 192.168.111.102@tcp
>>> 12345-0@lo
>>> 12345-192.168.111.102@tcp
>>>
>>> Please let me know what else i can try. Looks like i am missing something
>>> with the ib config? Do i need router setup as part of lnet ?
>>> if i am able to ping mgs from client on the tcp network, it should still
>>> work ?
>>>
>>> Thanks
>>>
>>>
>>> On Sun, Jul 17, 2016 at 1:07 PM, 
 To: "lustre-discuss@lists.lustre.org"
 
 Subject: [lustre-discuss] llapi_file_get_stripe() and
 /proc/fs/lustre/osc/entries
 Message-ID: <03ceaaa0-b004-ae43-eaa1-437da2a5b...@iodoctors.com>
 Content-Type: text/plain; charset="utf-8"; Format="flowed"

 I am using 

Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm

2016-07-05 Thread Martin Hecht
Hi Andreas,

I can't reproduce this with the latest master on a freshly installed
CentOS 6.8. I have successfully built the server packages and also the
client for the unpatched kernel, both without having heartbeat
installed. Maybe the spec file has been fixed already.

LU-5760 "LU-4707 patch breaks Lustre build" might be related.

Here a discussion about the issue with Lustre 2.5 on el6.5 which I have
found:
http://comments.gmane.org/gmane.comp.file-systems.lustre.user/13961

Cheers,
Martin


On 06/29/2016 07:55 PM, Dilger, Andreas wrote:
> This is a bug in the RPM .spec file. While heartbeat is one option for HA on 
> servers, it definitely should not be required. Could you please file a Jira 
> ticket with details. 
>
> Cheers, Andreas
>
>> On Jun 29, 2016, at 11:36, Martin Hecht <he...@hlrs.de> wrote:
>>
>> Hello,
>>
>> I have just seen that you managed to mount with a different kernel, but
>> let me come back to this error when building your own rpms for a
>> specific kernel.
>>
>> Independent if you use it or not, I believe on lustre servers you need
>> to have heartbeat installed nowadays. This is not installed by default
>> on a standard centos server, and it's a new requirement to build the
>> rpms since some 2.x release (it was optional before, and actually using
>> it is still optional). This requirement for building and installing the
>> server rpms is not mentioned in all tutorials and unfortunately the
>> absence of heartbeat is not properly detected by the configure system.
>> It would be better to fail earlier, during configure, with a clear error
>> message, rather than the error during make which you have seen here (has
>> anybody filed a lustre bug about this yet?)
>>
>> If you  aim to build lustre client rpms only, you can use the rpmbuild
>> option --without servers to work around this problem, but If I didn't
>> miss anything in the discussion before you are trying to build the
>> server rpms with zfs, so --without servers is not suitable for you, but
>> mentioning it here might be helpful for others who run into the same
>> trouble.
>>
>> Martin
>>
>>> On 06/28/2016 04:55 PM, Yu Chen wrote:
>>> Hello,
>>>
>>> Trying to follow Christopher's advice to rebuild the lustre from src.rpm.
>>> However, got into this error:
>>>
>>> ...
>>>
>>> make[3]: Nothing to be done for `install-data-am'.
>>>
>>> make[3]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>>>
>>> make[2]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>>>
>>> make[1]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>>>
>>> + :
>>>
>>> + ln -s Lustre.ha_v2
>>> /home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre
>>>
>>> ln: failed to create symbolic link
>>> '/home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre':
>>> No such file or directory
>>>
>>> error: Bad exit status from /var/tmp/rpm-tmp.Rhg32s (%install)
>>> ..
>>>
>>>
>>> There seems someone posted to the list before about this error too, and no
>>> answers, wondering if anybody has some solutions now?
>>>
>>> Thanks in advance!
>>>
>>> Regards,
>>>
>>> Chen
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm

2016-06-29 Thread Martin Hecht
Hello,

I have just seen that you managed to mount with a different kernel, but
let me come back to this error when building your own rpms for a
specific kernel.

Independent if you use it or not, I believe on lustre servers you need
to have heartbeat installed nowadays. This is not installed by default
on a standard centos server, and it's a new requirement to build the
rpms since some 2.x release (it was optional before, and actually using
it is still optional). This requirement for building and installing the
server rpms is not mentioned in all tutorials and unfortunately the
absence of heartbeat is not properly detected by the configure system.
It would be better to fail earlier, during configure, with a clear error
message, rather than the error during make which you have seen here (has
anybody filed a lustre bug about this yet?)

If you  aim to build lustre client rpms only, you can use the rpmbuild
option --without servers to work around this problem, but If I didn't
miss anything in the discussion before you are trying to build the
server rpms with zfs, so --without servers is not suitable for you, but
mentioning it here might be helpful for others who run into the same
trouble.

Martin

On 06/28/2016 04:55 PM, Yu Chen wrote:
> Hello,
>
> Trying to follow Christopher's advice to rebuild the lustre from src.rpm.
> However, got into this error:
>
> ...
>
> make[3]: Nothing to be done for `install-data-am'.
>
> make[3]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>
> make[2]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>
> make[1]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre'
>
> + :
>
> + ln -s Lustre.ha_v2
> /home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre
>
> ln: failed to create symbolic link
> '/home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre':
> No such file or directory
>
> error: Bad exit status from /var/tmp/rpm-tmp.Rhg32s (%install)
> ..
>
>
> There seems someone posted to the list before about this error too, and no
> answers, wondering if anybody has some solutions now?
>
> Thanks in advance!
>
> Regards,
>
> Chen
>




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Apache via NFS via Lustre

2016-03-09 Thread Martin Hecht
I think, if the apache uid and gid needs to be known on the mds, this
depends on the question if you have configured mdt.group_upcall or not.
If not, the group memberships are checked on the lustre client against
its /etc/group (or ldap if that's configured).

On 03/09/2016 06:59 AM, Philippe Weill wrote:
> We use this on our lustre 2.5.3 but now your apache user uid and gid
> have to be known on mds server
>
> Le 09/03/2016 03:05, John Dubinski a écrit :
>> Hi,
>>
>> I'm wondering if there are any developments on this front.
>>
>> We also NFS export some lustre filesystems from a client to an apache
>> server so that users can link to their large datasets on their
>> personal websites.  This has been working for years for us using
>> lustre 1.8.
>>
>> We recently built some new systems using lustre 2.5.3 and now this
>> functionality is broken in the same way that Eric describes -
>> symlinks to directories and files on the lustre filesystem are denied
>> by the apache server.  This doesn't seem to be due to our apache
>> configuration since symlinks to files and directories in ordinary
>> (non-lustre) nfs-mounted filesystems work.  Also the nfs-exported
>> filesystems behave normally - you can copy files in, as well as
>> create and delete files as you wish.
>> The only problems arise in relation to apache access.
>>
>> We've also noticed that whenever the forbidden access messages comes
>> up in the browser /var/log/messages on the lustre client spits out
>> this error consistently:
>>
>> Mar  8 19:53:14 nuexport02 kernel: LustreError:
>> 2626:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -13
>>
>> This appears to be related to file locking looking at the code...
>>
>> We have also built a test apache server with lustre client modules
>> that directly mount our lustre filesystems.  symlinks to the
>> directories in the lustre fs within /var/www/html similarly return
>> the forbidden access message with the above mdc_locks error.
>>
>> We're running CentOS 6 with lustre 2.5.3 on the client and server
>> side.  To repeat, direct client mounts of the lustre filesystems
>> behave normally as well as nfs-exported mounts.  Only apache access
>> to symlinks of files on a lustre filesystem give trouble.
>>
>> Are there any special nfs export flags that can be set to help in
>> /etc/exports?
>>
>> Thanks for any help or insight!
>>
>> Regards,
>>
>> John
>>
>> --
>> John Dubinski
>> Canadian Institute for Theoretical Astrophysics
>> University of Torontophone:  416-946-7290
>> 60 St. George St.fax:416-946-7287
>> Toronto, Ontario e-mail: dubin...@cita.utoronto.ca
>> CANADA M5S 3H8   url:www.cita.utoronto.ca
>>
>>
>>
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-05 Thread Martin Hecht
Hi,

comments inline...

On 11/04/2015 01:34 PM, Patrick Farrell wrote:
> Our observation at the time was that lfsck did not add the fid to the .. 
> dentry unless there was already space in the appropriate location.  
Ok, I might have been wrong in this point and some manual mv by the
users was involved.


On 11/04/2015 04:24 PM, Chris Hunter wrote:
> Yes I believe you want to (manually) recover the directories from
> lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't
> think lfsck on the MDT will impact orphan objects on the OSTs.
With lfsck phase 2 introduced in lustre 2.6 the MDT-OST consistency is
checked and repaired. Chris, you wrote that you have upgraded to "lustre
2.x", so I don't know if you have lfsck II already.  And I'm not sure if
MDT entries in lost+found are ignored by lfsck. I just wanted to point
out that you might have to be careful here, but looking at the lustre
manual it turns out that you are right. The consistency checks are run
when lfsck type is set to "layout", which is a different thing than the
"namespace" check used to update the FIDs.


On 11/05/2015 01:29 AM, Dilger, Andreas wrote:
> Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't
> recall offhand) will be able to return such entries from lost+found back
> into the proper parent directory in the namespace, assuming they were
> created under 2.x.  Lustre stores an extra "link" xattr on each inode with
> the filename and parent directory FID for each link to the file (up to the
> available xattr space for each inode), so in case of directory corruption
> it would be possible to rebuild the directory structure just from the
> "link" xattrs on each file.
that's good to know. However, the files in this case were created with
1.8, so even if the current version after the upgrade has this "link"
xattr, it doesn't help to recover from LU-5626. But your script is
useful (it's pretty much the same as I did back then, but I didn't find
my quick hack it anymore...)
 
> In the meantime, I attached a script to LU-5626 that could be used to
> re-link files from lost+found into the right directory and filename based
> on the output from e2fsck.  It is a bit rough (needs manual editing of
> pathnames), but may be useful if someone has hit this problem.

best regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Martin Hecht
On 11/04/2015 03:23 AM, Patrick Farrell wrote:
> PAF: Remember, the specific conditions are pretty tight.  Created under 1.8, 
> not empty (if it's empty, the .. dentry is not misplaced when moved) but also 
> non-htree, then moved with dirdata enabled, and then grown to this larger 
> size.  How many existing (small) directories do you move and then add a bunch 
> of files to?  It's a pretty rare operation.  We only hit it at Martin's site 
> because of an automated tool they have to re-arrange user/job directories.
Well, not only because of the tool. Especially, because when the
directories have been moved by the tool, no files are added anymore.
However, our mechanism gives a reason to the users to move their data
from time to time (that's not the intention of the mechanism, but that's
how some users react).

But I'm not quite sure anymore if moving the directories is really a
precondition to run into LU-5626.
We have run the background lfsck which adds the FID to the existing
dentries. This might be an important detail, because in our case a
second '..' entry containing the FID was presumably created by lfsck (in
the wrong place), and not by moving the directory. To my current
understanding the user then only has to add some files to trigger the LBUG.
A subsequent e2fsck will not only find this particular directory but all
other small directories with a '..' entry in the wrong place. When
e2fsck tries to fix these directories, some entries are overwritten by
the FID and these files are then moved to lost+found.
If one of these first entries happens to be a small subdirectory, I
believe there is a chance to run into the same issue again, when you
move everything back to the original location after the e2fsck and
someone starts adding files in these subdirectories.

However, the preconditions are still quite narrow: small directories,
not empty, created without fid, then converted by lfsck (or
alternatively moved to a different place which would also create the
second '..' entry). To trigger the LBUG files need to be added to one of
these directories and for a second occurrence of the LBUG the same
conditions must hold for another subdirectory which must have been at
the very beginning of the directory.

Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-02 Thread Martin Hecht
Hi Chris and Patrick,

I was sick last week so I have found this conversation not before today,
sorry

On 10/27/2015 05:06 PM, Patrick Farrell wrote:
> If you read LU-5626 carefully, there's an explanation of the exact nature of 
> the damage, and having that should let you make partial recoveries by hand.  
> I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it 
> would prove helpful in this instance.
there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs
this would be the right choice.

> Note that there's two forms to this corruption.  One is if you move a 
> directory which was created before dirdata was enabled, then the '..' entry 
> ends up in the wrong place.  This does not trouble Lustre, but fsck reports 
> it as an error and will 'correct' it, which has the effect of (usually) 
> overwriting one dentry in the directory when it creates a new '..' dentry in 
> the correct location.
>
> I don't *think* that one causes the MDT to go read only, but I could be 
> wrong.  I *think* what causes the MDT to go read only is the other problem:
>
> When you have a non-htree directory (not too many items in it, all directory 
> entries in a single inode) that is in the bad state described above (with the 
> '..' dentry in the wrong place after being moved) and that directory has 
> enough files added to it that it becomes an htree directory, the resulting 
> directory is corrupted more severely.  We never sorted out the precise 
> details of this - I believe we chose to simply delete any directories in this 
> state.  (I think lfsck did it for us, but can't recall for sure.)
If I recall correctly, moving (or renaming) the corrupted directory to
another place caused the MDT to go readonly, probably adding more files
as Patrick wrote before is another trigger.

In our case we captured the full ouptut of e2fsck which contained the
original names and the inodes. fsck moved some of the files and
subdiretories of the corrupted directories to lost+found. With the
information contained in the e2fsck output we could move them back from
lost+found to their original place on the ldiskfs level (I have parsed
the e2fsck output for a pattern matching the inode numbers and created a
script out of it). We had to repeat this a couple of times, because
either some of the subdirectories moved to lost+found were in a bad
shape themselves or were further damaged later when the owners added
files to them later on or moved them around.

So, if you have captured all your e2fsck output and you haven't yet
cleaned up lost+found, you still can recover the data. lfsck would
probably throw away the objects on the OSTs because it thinks they are
orphane objects left over after deleting the files. 

best regards,
Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.5.3 - OST unable to connect to MGS

2015-10-09 Thread Martin Hecht
Hi,

you can use ll_recover_lost_found_objs to recover the files in lost+found to 
their original location.
I think this should be the first step. 

Also these messages look a bit scary to me:

Oct  7 13:02:04 OSS50 kernel: LustreError: 0-0: Trying to start OBD 
Lustre-OST003b_UUID using the wrong disk <85>. Were the /dev/ assignments 
rearranged?
...
Oct  7 13:02:04 OSS50 kernel: LustreError: 15b-f: MGC172.16.0.251@tcp: The 
configuration from log 'Lustre-OST003b'failed from the MGS (-22).  Make sure 
this client and the MGS are running compatible versions of Lustre.
Oct  7 13:02:05 OSS50 kernel: LustreError: 15c-8: MGC172.16.0.251@tcp: The 
configuration from log 'Lustre-OST003b' failed (-22). This may be the result of 
communication errors between this node and the MGS, a bad
configuration, or other errors. See the syslog for more information.

before actually instructing tunefs.lustre to do the writeconf I would check the 
configuration, parameters etc. with --dryrun. Maybe you also have to put 
--erase-params and re-configure the OST. 
Or other CONFIG files (e.g. mountdata) got screwed up on this OST (or was moved 
to lost+found by the e2fsck?). If you have lost some important ones, some data 
exists in a copy on the MGT (basically, the writeconf is the mechanism, which 
transfers it to the MGS).

It's a bit difficult to give a good advice by looking at the syslog messages 
only. Anyhow, recovering the files from lost+found should be the first step, 
maybe followed by a closer look at the OST on the ldiskfs level.

regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Remove failnode parameter

2015-09-28 Thread Martin Hecht
Hi,

--erase-params should remove everything, but you have to set the other
ones (e.g. --mgsnode) again. If you don't want to have any --failnode,
just leave that one out. At least this is how it should work. If not,
you could post the exact command line and the output of the tunefs
command here.

Martin

On 09/27/2015 05:16 PM, Exec Unerd wrote:
> Thanks for the reply.
>
> I can't seem to get tunefs to *remove *the failnode parameter. I can
> *change *the failnode NIDs, but I can't figure out how to wholesale remove
> the param as if I'd never put it in.
>
> On Thu, Sep 24, 2015 at 1:43 AM, Martin Hecht <he...@hlrs.de> wrote:
>
>> On 09/23/2015 02:38 AM, Exec Unerd wrote:
>>> I made a typo when setting failnode/servicenode parameters, but I can't
>>> figure out how to remove the failnode parameter entirely
>>>
>>> I can change the failnode NIDs, but I can't figure out how to completely
>>> remove "failnode" from the system.
>>>
>>> Does anyone have an example of a syntax (maybe lctl?) that will eliminate
>>> the failnode parameter from the config so there's no chance it gets in
>> the
>>> way of the servicenode parameter?
>>>
>> you have set the failnode with tunefs.lustre, right? You can erase *all*
>> parameters with tunefs.lustre --erase-params and set the correct ones
>> again. You can combine several ones in one call, and I recommend to use
>> also the dry-run option before actually changing anything
>>
>> tunefs.lustre --erase-params --mgsnode=10.11.12.13@o2ib --param
>> sys.timeout=300  --failnode=10.11.12.101@o2ib --dryrun
>> /dev/mapper/some-device
>>
>> the output will be the previous values and the premanent disk data which
>> the command intents to write. If this is ok, ommit the --dryrun option.
>> BTW the file system must be unmounted to perform the tunefs command.
>>
>>
>>


-- 
Dr. Martin Hecht
High Performance Computing Center Stuttgart (HLRS)
Office 0.051, HPCN Production, IT-Security
University of Stuttgart
Nobelstraße 19, 70569 Stuttgart, Germany
Tel: +49(0)711/685-65799  Fax: -55799
Mail: he...@hlrs.de
Web: http://www.hlrs.de/people/hecht/
PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-28 Thread Martin Hecht
On 09/27/2015 08:59 PM, Exec Unerd wrote:
>> I'm not sure if I have understood your setup correctly.
> In this case, the clients are a combination of all three: some are o2ib
> only, some tcp only, and some o2ib+tcp with tcp as failover.
>
> It sounds like I need a combination of configurations, one for the OSSes
> and one for each client type.
>
> So if I used this parameter in the OST,
> --mgsnode="172.16.10.1@o2ib0,192.168.10.1@tcp0"
>
> Then configured the modprobe.d/lustre.conf appropriately on the clients
> tcp: options lnet networks="tcp0(ixgbe1)"
> o2ib: options lnet networks="o2ib0(ib1)"
> both: options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"
>
> And use these mount parameters:
> tcp: mount -v -t lustre 192.168.10.1@tcp0:/testfs /mnt/testfs
> o2ib: mount -v -t lustre 172.16.10.1@o2ib0:/testfs /mnt/testfs
> both: mount -v -t lustre 172.16.10.1@o2ib0,192.168.10.1@tcp0:/testfs
I think here it should be a colon between the two MGS nids:

mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs


> /mnt/testfs
>
> Everything should be happy?
>
> On Thu, Sep 24, 2015 at 9:12 AM, Martin Hecht <he...@hlrs.de> wrote:
>
>> On 09/24/2015 05:33 PM, Chris Hunter wrote:
>>> [...]
>>>>2. What's the best way to trace the TCP client interactions to see
>>>> where
>>>>it's breaking down?
>>> If lnet is running on the client, you can try "lctl ping"
>>> eg) lctl ping 172.16.10.1@o2ib
>>>
>>> I believe a lustre mount uses ipoib for initial handshake with a mds
>>> o2ib interfaces. You should make sure regular ping over ipoib is
>>> working before mounting lustre.
>> if the client and the server is on the same network, yes, it's a good
>> starting point. But it's not a prerequisite. In general you can have an
>> lnet router in-between or have different ip subnets for ipoib, so you
>> can't ping on the ipoib layer, but you can still lctl ping the whole
>> path (although you could verify that you can ip ping to the next hop at
>> least).
>>
>> We also have a case in which we tried to block ipoib completely with
>> iptables, but we still could lctl ping, even after rebooting the host
>> and ensuring that the firewall was up before loading the lnet module.
>> So, I doubt that ipoib is needed at all for establishing the o2ib
>> connection.
>>
>>




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-24 Thread Martin Hecht
On 09/23/2015 02:39 AM, Exec Unerd wrote:
> My environment has both TCP and IB clients, so my Lustre config has to
> accommodate both, but I'm having a hard time figuring out the proper syntax
> for it. Theoretically, I should be able to use comma-separated interfaces
> in the mgsnode parameter like this:
>
> --mgsnode=192.168.10.1@tcp0,172.16.10.1@o2ib
> --mgsnode=192.168.10.2@tcp0,172.16.10.2@o2ib
I think this should work:

--mgsnode=192.168.10.1@tcp0 --mgsnode=172.16.10.1@o2ib
--mgsnode=192.168.10.2@tcp0 --mgsnode=172.16.10.2@o2ib

at least that's how it works with a multirail ib network (where you would 
replace tcp0 by o2ib1).
The mount command would contain all 4 nids, but if the client can't connect via 
tcp it takes until it reaches a timeout and tries the next one. If in addition 
the MGS is failed over to the second server I guess it takes three timeouts 
until the client succeeds to connect.

> The problem is, this doesn't work for all clients all the time ...
> randomly. It would work, then it wouldn't. Googling, I found some known
> defects saying that the comma delimiter didn't work as per the manual and
> recommending alternate syntaxes like using the colon instead of a comma. I
> know what the manuals *say*about the syntax, I'm just having trouble
> getting it to work.
I'm not sure if I have understood your setup correctly. You have ib
clients and you have other hosts which are connected via tcp, right? Or
do the clients have both, and the tcp network a failback solution in
case the ib doesn't work properly (network flooded, SM crashed or alike)?

When you say it doesn't work on a particular client, can you lctl ping
one of the nids in this situation? Or can you ping the other direction
from the server to the client? And if at least one of the pings
succeeds, can you suddenly mount afterwards?

> This seems to affect only the TCP clients; at least I haven't seen it
> affect any of the IB clients. It may be a comma parsing problem or
> something else.
>
> I have two questions for the group:
>
>1. Is there a known-working method for using both TCP and IB interface
>NIDs for the MGS in this manner?
>2. What's the best way to trace the TCP client interactions to see where
>it's breaking down?
>
> Versions in use:
> kernel: 2.6.32-504.23.4.el6.x86_64
> lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64
> zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
>
> My lustre.conf contents:
> options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"
ip2nets could be an alternative here, especially if not all clients have
both interfaces.



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Remove failnode parameter

2015-09-24 Thread Martin Hecht
On 09/23/2015 02:38 AM, Exec Unerd wrote:
> I made a typo when setting failnode/servicenode parameters, but I can't
> figure out how to remove the failnode parameter entirely
>
> I can change the failnode NIDs, but I can't figure out how to completely
> remove "failnode" from the system.
>
> Does anyone have an example of a syntax (maybe lctl?) that will eliminate
> the failnode parameter from the config so there's no chance it gets in the
> way of the servicenode parameter?
>
you have set the failnode with tunefs.lustre, right? You can erase *all*
parameters with tunefs.lustre --erase-params and set the correct ones
again. You can combine several ones in one call, and I recommend to use
also the dry-run option before actually changing anything

tunefs.lustre --erase-params --mgsnode=10.11.12.13@o2ib --param
sys.timeout=300  --failnode=10.11.12.101@o2ib --dryrun
/dev/mapper/some-device

the output will be the previous values and the premanent disk data which
the command intents to write. If this is ok, ommit the --dryrun option.
BTW the file system must be unmounted to perform the tunefs command.




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-24 Thread Martin Hecht
On 09/24/2015 05:33 PM, Chris Hunter wrote:
> [...]
>>2. What's the best way to trace the TCP client interactions to see
>> where
>>it's breaking down?
> If lnet is running on the client, you can try "lctl ping"
> eg) lctl ping 172.16.10.1@o2ib
>
> I believe a lustre mount uses ipoib for initial handshake with a mds
> o2ib interfaces. You should make sure regular ping over ipoib is
> working before mounting lustre.
if the client and the server is on the same network, yes, it's a good
starting point. But it's not a prerequisite. In general you can have an
lnet router in-between or have different ip subnets for ipoib, so you
can't ping on the ipoib layer, but you can still lctl ping the whole
path (although you could verify that you can ip ping to the next hop at
least).

We also have a case in which we tried to block ipoib completely with
iptables, but we still could lctl ping, even after rebooting the host
and ensuring that the firewall was up before loading the lnet module.
So, I doubt that ipoib is needed at all for establishing the o2ib
connection.




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-14 Thread Martin Hecht
Hi,

A backup is always a good idea if feasible. It gives you at least the
chance to go back and start over again. However, a backup of the MDT
alone wouldn't help much, because as soon as you put the file system
online and users start to work on their files, also the content of the
OSTs will change. Restoring the MDT backup would cause the MDT being out
of sync with the OSTs.
You notice some bugs immediately during the upgrade (e.g. the one with
the CATALOGS file which prevents you from starting the MDT again), but
some others (e.g. quota bugs or the one about the FID) pop up a few
hours or days after you have started production again, and then you have
to make a decision. Even if you have a full backup, it's always a
trade-off if you decide to give it a try to fix the problems based on
the state where you are or if you go back and restore the backup.
But even then, you have to put some measures in place which ensure that
you won't run into the same problem again. In the worst case it's
reinstalling the servers with the lustre version you have used before. A
full backup at least gives you this fallback for the worst case
scenario. It can also be useful for offline analysis in case you have to
investigate what's going wrong.
In the particular case with the FID in the directory a file level backup
of the MDT wouldn't have been of that much help, because you also have
to backup the extended attributes. There is a section in the lustre
manual how to do this. However, these structures must be converted (at
least if you want to make use of the fid_in_dirent feature). If I'm not
mistaken, the structures were ok right after the upgrade and the
subsequent lfsck run, but the ldiskfs backend contained a bug which
caused things to be overwritten, when users started to move files
somewhere else. Lustre 2.4.3 is marked as affected in LU-5626.

regards,
Martin

On 09/11/2015 04:14 PM, Patrick Farrell wrote:
> Having an MDT backup might perhaps have allowed recovery and trying an 
> improved upgrade process and/or upgrading to a version with the fixes in it.  
> It's not a bad idea if practical.  (And yes, the changes are MDT specific.)
>
> By the way, the fid-in-dirent bug that Martin described is fixed in the most 
> recent 2.5 from Intel, but I don't think it's fixed in 2.4?  Unsure.
> But I'd recommend targeting 2.5 as the destination version for an upgrade.
> 
> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
> Chris Hunter [chris.hun...@yale.edu]
> Sent: Friday, September 11, 2015 8:02 AM
> To: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
>
> Hi
> I believe FID & dirdata feature changes would only affect the MDT during
> a lustre upgrade. In hindsight/retrospective do you think a file-level
> backup/restore of the MDT would have avoided some of these issues ?
>
> thanks
> chris hunter
>
>> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>>> Lewis,
>>>
>>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the 
>>> most part things went pretty good.  I?ll chime in on a couple of Martin?s 
>>> points and mention a few other things.
>>>
>>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <he...@hlrs.de> wrote:
>>>>
>>>> In any case the file systems should be clean before starting the
>>>> upgrade, so I would recommend to run e2fsck on all targets and repair
>>>> them before starting the upgrade. We did so, but unfortunately our
>>>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>>>> probably some errors on the file systems were still present, but
>>>> unnoticed when we did the upgrade.
>>> This is a very important point.  While I didn?t run e2fsck before the 
>>> upgrade (but maybe I should have), I made sure to install the latest 
>>> e2fsprogs.
>>>
>>>> Lustre 2 introduces the FID (which is something like an inode number,
>>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>>>> with the possibility to have several MDTs in one file system a
>>>> replacement was needed). The FID is stored in the inode, but it can also
>>>> be activated that the FIDs are stored in the directory node, which makes
>>>> lookups faster, especially when there are many files in a directory.
>>>> However, there were bugs in the code that takes care about adding the
>>>> FID to the directory entry when the file system is converted from 1.8 to
>>>> 2.x. So, I would recommend to use a version in 

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-11 Thread Martin Hecht
a few more comments in-line

On 09/10/2015 09:11 PM, Lewis Hyatt wrote:
> Thanks a lot for the info, a little more optimistic :-).
>
> -Lewis
>
> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> Lewis,
>>
>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for
>> the most part things went pretty good.  I’ll chime in on a couple of
>> Martin’s points and mention a few other things.
>>
>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <he...@hlrs.de> wrote:
>>>
>>> In any case the file systems should be clean before starting the
>>> upgrade, so I would recommend to run e2fsck on all targets and repair
>>> them before starting the upgrade. We did so, but unfortunately our
>>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>>> probably some errors on the file systems were still present, but
>>> unnoticed when we did the upgrade.
>>
>> This is a very important point.  While I didn’t run e2fsck before the
>> upgrade (but maybe I should have), I made sure to install the latest
>> e2fsprogs.
well, a version of the e2fsprogs with some important fixes was released
shortly after we did the upgrade. Maybe this was just because we ran
into these bugs, and the vendor escalated our tickets to whamcloud/intel

>>
>>> Lustre 2 introduces the FID (which is something like an inode number,
>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>>> with the possibility to have several MDTs in one file system a
>>> replacement was needed). The FID is stored in the inode, but it can
>>> also
>>> be activated that the FIDs are stored in the directory node, which
>>> makes
>>> lookups faster, especially when there are many files in a directory.
>>> However, there were bugs in the code that takes care about adding the
>>> FID to the directory entry when the file system is converted from
>>> 1.8 to
>>> 2.x. So, I would recommend to use a version in which these bug are
>>> solved. We went to 2.4.1 that time. By default this fid_in_dirent
>>> feature is not automatically enabled, however, this is the only point
>>> where a performance boost may be expected... so we took the risk to
>>> enable this... and ran into some bugs.
>>
>> Enabling fid_in_dirent prevents you from backing out of the upgrade. 
>> In theory, if you upgraded to Lustre 2.x without enabling
>> fid_in_dirent, you could always revert back to Lustre 1.8.  We tried
>> this on a test system, and the downgrade seemed to work.  However,
>> this was a small scale test and I have never tried it on a production
>> file system.  But if you want to minimize possible complications, you
>> could always leave this disabled for a while after the updgrade, and
>> then if things are going well, enable it later on.
actually, the FID is added to new contents, and you have to run the
oi_scrub once to convert the file system. That might be important to
know when you decide to use this feature. On the other hand, if you
don't enable fid_in_dirent, you can go back theoretically, but I think
the FID is still added to regular files (not to the directory entry),
and you can't read these files created with lustre 2 after the
downgrade. However, running lustre 2 without fid_in_dirent is possiblem
at least in the earlier 2.x versions - about 2.5 onwards you would have
to double check. This is sometimes called "Compatibility Mode IGIF"

Anyhow, to avoid running into the problem with the directory entries, I
would also recommend not to enable fid_in_dirent or make sure to choose
a version which has all the fixes for this problem. There are different
types of directories, large and small ones which have a different
structure, and the issue was already fixed for some cases, but we have
hit another case which was not correctly handled until we hit that bug
with our upgrade.

>>
>> My only other advice is to test as much as possible prior to the
>> upgrade.  If you have a little test hardware, install the same Lustre
>> 1.8 version you are currently running in production and then try
>> upgrading that to the new Lustre version.  I think preparation is the
>> key.  I think I spent about 2 months reading about upgrade
>> procedures, talking with others who have upgraded, reading JIRA bug
>> reports, and running tests on hardware.
well, our vendor was preparing the upgrade for about a year and did
intensive testing on several file systems and they changed the targeted
lustre version several times. The problem is that some bugs are on

Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Martin Hecht
On 09/11/2015 05:23 AM, Dilger, Andreas wrote:
> On 2015/09/10, 6:54 PM, "Chris Hunter"  wrote:
>
>> We experienced file corruption on several OSTs. We proceeded through
>> recovery using e2fsck & ll_recover_lost_found_obj tools.
>> Following these steps, e2fsck came out clean.
>>
>> The file corruption did not impact the MDT. The files were still
>> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
>> would report error "Cannot allocate memory"
>>
>> Following OST recovery steps, we started removing the corrupt files via
>> "unlink" command on lustre client (rm command would not remove file).
>>
>> Now dry-run e2fsck of the OST is reporting errors:
>> "deleted/unused inodes" in Pass 2 (checking directory structure),
>> "Unattached inodes" in Pass 4 (checking reference counts)
>> "free block count wrong" in Pass 5 (checking group summary information).
>>
>> Is e2fsck errors expected when unlinking files ?
> No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
> by calling "stat()" on the file before trying to unlink it.  This
> shouldn't cause any errors on the OSTs, unless there is ongoing corruption
> from the back-end storage.
Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
a lustre file system while it was mounted and clients working on the
file system? Then, it is expected that e2fsck reports some error,
because the file system contents changes while the e2fsck is running and
the in-memory directory structure does not fit to the on-disk data
anymore. However, as Andreas points out, it might as well be a sign of
ongoing corruption on the storage, but only an offline e2fsck (i.e.
while the OST is unmounted, and the journal is played back) can clarify
this. 

regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-10 Thread Martin Hecht
Hi Lewis,

it's difficult to tell how much data loss was actually related to the
lustre upgrade itself. We have upgraded 6 file systems and we had to do
it more or less in one shot, because at that time they were using a
common MGS server. All servers of one file system must be on the same
level (at least for the major upgrade 1.8 to 2.x, there is rolling
upgrade for minor versions in the lustre 2 branch now, but I have no
experience with that).

In any case the file systems should be clean before starting the
upgrade, so I would recommend to run e2fsck on all targets and repair
them before starting the upgrade. We did so, but unfortunately our
e2fsprogs were not really up to date and after our lustre upgrade a lot
of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
probably some errors on the file systems were still present, but
unnoticed when we did the upgrade.

Lustre 2 introduces the FID (which is something like an inode number,
where lustre 1.8 used the inode number of the underlying ldiskfs, but
with the possibility to have several MDTs in one file system a
replacement was needed). The FID is stored in the inode, but it can also
be activated that the FIDs are stored in the directory node, which makes
lookups faster, especially when there are many files in a directory.
However, there were bugs in the code that takes care about adding the
FID to the directory entry when the file system is converted from 1.8 to
2.x. So, I would recommend to use a version in which these bug are
solved. We went to 2.4.1 that time. By default this fid_in_dirent
feature is not automatically enabled, however, this is the only point
where a performance boost may be expected... so we took the risk to
enable this... and ran into some bugs.

We had other file systems, still on 1.8, so with the server upgrade we
didn't upgrade the clients, because lustre 2 clients wouldn't have been
able to mount the 1.8 file systems. And we use quotas, and for this you
need the 1.8.9 client with a patch that corrects a defect of the 1.8.9
client when it talks to 2.x servers (LU-3067). However, older 1.8
clients don't support the Lustre 2 quota (which came in 2.2 or 2.4, I'm
not 100% sure). BTW, it still runs out of sync from time to time, but
the limit seems to be fine now, it's just the numbers the users see. lfs
quota prints out too low numbers and users run out of quota earlier than
they expect... It's better in the latest 2.5 versions now.

Here an unsorted(!) list of bugs we have hit during the lustre upgrade.
For most of them we weren't the first ones, but I guess you could wait
forever for the version in which all bugs are resolved :-)

LU-3067 - already mentioned above, a patch for 1.8.9 clients
interoperating with 2.x servers, however, 1.8.9 is needed for having
quota working. Without this patch clients become unresponsive, 100% cpu
load, then just hang and devices become unavailable, reboot doesn't
work, so power cycle needed, but after a while the problem reappeared

LU-4504 - e2fsck noticed quota issues similar to this bug on osts - use
latest e2fsprogs, check again and then the ldiskfs backend doesn't run
into this anymore

e2fsck noticed quota issues on MDT "Problem in HTREE directory inode
21685465: block #16 not referenced"  however, could be fixed by e2fsck

LU-5626 mdt becomes readonly: one file system where the MDT was
corrupted at earlier stage and obviously not fully repaired lbuged upon
MDT mount, could only be mounted with noscrub option

the mdt group_upcall (which can be configured with tunefs) used to be
/usr/sbin/l_getgroups in lustre 1.8 and it was set by default - the
program is called l_getidentity now, is not configured by default
anymore. You should either change it with tunefs, or put an appropriate
link in place as a fallback. Anyhow, lustre 2 file systems don't use it
by default anymore. They just trust the client. It also means that
users/groups are not needed anymore on lustre the servers. (we had lokal
passwd/group files there so that secondary groups work properly,
alternatively you could configure ldap, but without group_upcall, all
this is handled by the lustre client.

LU-5626 and LU-2627: .. directory entries were damaged by adding the
FID, once all old directories were converted and all files somehow
recovered (in several consecutive attempts), the problem is gone. The
number of emergency maintenances is basically limited by the depth of
your directory structure. It could be repaired by running e2fsck,
followed by manually moving back everything (save the log of the e2fsck
which tells you the relation of the objects in lost+found and their
original path!)

LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
- I believe that's something which must be done anyhow quite often,
because there is no quotacheck anymore. It's run in the background when
enabling quotas, but file systems have to be unmounted for this.

Related to quota, there is a change in the lfs 

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-09 Thread Martin Hecht
Hi Lewis,

Yes, for lustre 2.x you have to "upgrade" the OS, which basically means
a reinstall of a CentOS 6.x (because there is no upgade path across
major releases), then install the lustre packages and the lustre-patched
kernel, and then the pain begins.
We had a lot of trouble when we upgraded our lustre file systems from
1.8 to 2.4. I would recommend to consider a fresh install of lustre 2 on
a separate hardware, then migrate the data (1.8 clients are able to
mount lustre 2 file systems, but not the other way round, and for
working quota support you need 1.8.9) to the new file system, and
finally reformat the old file system with lustre 2 and use it for
testing or backups or whatever.
However, if buying new hardware is not an option, the upgrade is
possible, and depending on the history of the file system it might work
quite smoothly. Upgrading a freshly formatted lustre 1.8 with some
artificial test data worked without any problems in our tests before
doing the upgrade of the production file systems.

Regards,
Martin


On 09/08/2015 08:18 PM, Lewis Hyatt wrote:
> Thanks a lot for the response. Seems like we need to explore upgrading
> the servers. Do you happen to know how smooth that process is likely
> to be? We have lustre 1.8.8 on CentOS 5.4 there, I presume we need to
> upgrade the OS and then follow the upgrade procedure in the lustre
> manual, maybe it isn't such a big deal. Thanks again...
>
> -Lewis
>
> On 9/8/15 11:16 AM, Patrick Farrell wrote:
>> Lewis,
>>
>> My own understanding is you are out of luck - the 1.8 client cannot
>> realistically be brought forward to newer kernels.  Far too many
>> changes over too long a period.
>>
>> As far as version compatibility, I believe no newer clients will talk
>> to servers running 1.8.  If any will, they would be very early 2.x
>> versions, which won't support your desired kernel versions anyway.
>>
>> Regards,
>> Patrick
>>
>> 
>> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on
>> behalf of Lewis Hyatt [lhy...@gmail.com]
>> Sent: Tuesday, September 08, 2015 9:06 AM
>> To: lustre-discuss@lists.lustre.org
>> Subject: [lustre-discuss] 1.8 client on 3.13.0 kernel
>>
>> Hello-
>>
>> We have a working 1.8 lustre cluster with which we are very happy.
>> The object
>> and metadata servers are running one of the recommended CentOS
>> distributions
>> (5.4), but the clients are all Ubuntu 10.04 LTS, with kernel 2.6.32.
>> It is not
>> feasible for us to change on the client side to a different distro
>> other than
>> Ubuntu, but we are about to go to Ubuntu 14, with kernel 3.13.0, for
>> reasons
>> unrelated to lustre. Unfortunately it seems that lustre 1.8 cannot be
>> built on
>> this kernel, we can't even get through the configure process without
>> a large
>> number of errors. The first one we hit is this:
>>
>> checking for
>> /lib/modules/3.13.0-63-generic/build/include/linux/autoconf.h... no
>>
>> But various attempts to hack around the errors as they come up have
>> not led to
>> much success. Is this something we can hope to achieve? I thought I
>> saw some
>> threads about a series of patches to support this kernel in lustre
>> 1.8 but I
>> haven't been able to find anything conclusive. We are really hoping
>> it is
>> possible to upgrade our clients without touching the lustre servers,
>> as we
>> don't want to disturb that production system which has been very
>> reliable for
>> us, and we don't have much in-house expertise with lustre or CentOS.
>> We were
>> able to build a newer lustre client on the 3.13 kernel, but it seems
>> it is not
>> willing to interact with the 1.8 servers.
>>
>> Thanks for any advice, much appreciated.
>>
>> -Lewis
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] refresh file layout error

2015-09-04 Thread Martin Hecht
On 09/03/2015 07:22 AM, E.S. Rosenberg wrote:
> On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edward  wrote:
>
>> That would be my guess here.  Any chance this is across NFS?  Seen that a
>> great deal with this error, it used to cause crashes.
>>
> Strictly speaking it is not, but it may be because a part of the path the
> server 'sees'/'knows' is a symlink to the lustre filesystem which lives on
> nfs...
>
Ah, I can remember a problem we had some years ago, when users with
their $HOME on NFS were accessing many files in directories on lustre
via symlink. Somehow the NAS box serving the nfs file system didn't
immediately notice that the files weren't on its own file system and
repeatedly had to look up in its cache, just to notice that the files
are somewhere else behind a symlink. If I recall correctly, the problem
could be avoided by:
- Either access the file via absolute path, or cd into the directory
(both via mount point, not (!) via symlink)
- Or make the symlink an absolute one (I'm not 100% sure, but I believe
the problem was only with relative links pointing out of the NFS upwards
across the mountpoint and down again into the lustre file system).
It could be something similar here. Do you have any chance to access the
files via absolute path in your setup and web server configuration?

best regards, Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Convert a disk from lustre to ext4

2015-09-04 Thread Martin Hecht
Maybe, it's anyhow too late, but I have found this thread in my unread mail:

On 09/01/2015 06:38 PM, Colin Faber wrote:
> If you're just looking to reformat the drive, then just reformat the drive:
>
> http://linux.die.net/man/8/mkfs.ext4
It's still unclear what he actually did. Maybe he formatted as ldiskfs
and used the disk as if it were ext4?
Then, it might even be mountable as ext4, at least if the e2fstools are
installed.
Anyhow, I would recommend to backup the data (if there is anything
useful on the device already) on a different device, then  reformat the
drive and restore the files on it.

and from an earlier mail on this topic:

On 08/24/2015 08:07 PM, E.S. Rosenberg wrote:
> What's wrong with plain ext4?
> Or XFS, btrfs etc.
>
> If you want good support on Windows you're stuck with windows filesystems
> ((ex)FAT, NTFS), though there are tools to mount extX filesystems on
> windows they aren't all that stable as far as I know (though I haven't
> looked at that for years so things may have changed).
maybe using a NAS box which exports a file system via NFS and SMB is
what you are looking for. It's connected via ethernet, however and
Windwos can mount it as so-called Network Drive. Linux and MAC support
both, NFS and SMB more or less, depending on the version of OS.
If you go out to buy such a box, I would recommend to look for a
solution based on a RAID system. The very low-price end is based on
single disks which, might not be a good idea for long term storage. A
RAID system (excpet for RAID0) has built-in redundancy, so that disks
may fail and you still can recover the data.

regards,
Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-03 Thread Martin Hecht
Hi Chris,

On 09/02/2015 07:18 AM, Chris Hunter wrote:
> Hi Andreas
>
> On 09/01/2015 07:22 PM, Dilger, Andreas wrote:
>> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter"
>> > chris.hun...@yale.edu> wrote:
>>
>>> Hi Andreas,
>>> Thanks for your help.
>>>
>>> If you have a striped lustre file with "holes" (ie. one chunk is gone
>>> due hardware failure, etc.) are the remaining file chunks considered
>>> orphan objects ?
> So when a lustre striped file has a hole (eg. missing chunk due to
> hardware failure), the remaining file chunks stay indefinitely on the
> OSTs.
> Is there a way to reclaim the space occupied by these pieces (after
> recovery of any usuable data, etc.)?
these remaining chunks still belong to the file (i.e. you have the
metadata entry on the MDT and you see the file when lustre is mounted).
By removing the file you free up the space.

In general there are two types of inconsistencies which may occur:
Orphan objects are objects which are NOT assigned to an entry on the
MDT, i.e. chunks which do not belong to any file. These can be either
pre-allocated chunks or chunks left over after a corruption of the
metadata on the MDT.

The other type of corruption is that you have a file, where chunks are
missing in-between. This can happen, when an OST gets corrupted. As long
as the MDT is Ok, you should be able to remove such a file. If in
addition the MDT is also corrupted, you should first fix the MDT, and
you might then only be able to unlink the file (which again might leave
some orphan objects on the OSTs). lfsck should be able to remove them,
depending on the lustre version you are running...

Another point: When the OST got corrupted, after having them repaired
with e2fsck, you can mount them as ldiskfs and see if there are chunks
in lost+found and use the tool ll_recover_lost_found_objs to restore
them in the original place. I believe these objects which e2fsck puts in
lost+found are another kind of thing, usually not called "orphan
objects". As I said, they usually can be easily recovered.

Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] quota only in- but not decreasing after upgrading to Lustre 2.5.3

2015-07-28 Thread Martin Hecht
Hi,

it might help to disable quota using tune2fs and re-enable it again on
the ext2 level on all devices, see LU-3861.
(BTW you don't need the e2fsprogs mentioned in the bug, there was an
official release last year in September).

You have to stop lustre for the tune2fs run and it takes some time,
because this triggers a quota check in the background (which does not
produce any output on the screen).

best regards,
Martin

On 07/28/2015 09:44 AM, Torsten Harenberg wrote:
 a further observation:

 a user deleted a ~100MB file:

 before.

 [root@wnfg001 lustre]# lfs quota -u sandhoff /lustre
 Disk quotas for user sandhoff (uid 11206):
  Filesystem  kbytes   quota   limit   grace   files   quota   limit
   grace
 /lustre 811480188  1811480200 2811480200   -   61077   0
   0   -

 after:

 [root@wnfg001 lustre]# lfs quota -u sandhoff /lustre
 Disk quotas for user sandhoff (uid 11206):
  Filesystem  kbytes   quota   limit   grace   files   quota   limit
   grace
 /lustre 811480188  1811480200 2811480200   -   61076   0
   0   -
 [root@wnfg001 lustre]#

 so #files decreased by 1, but not the #kbytes.

 Furthermore, the lfs quota command is pretty slow:

 [root@wnfg001 lustre]# time lfs quota -u sandhoff /lustre
 Disk quotas for user sandhoff (uid 11206):
  Filesystem  kbytes   quota   limit   grace   files   quota   limit
   grace
 /lustre 811480188  1811480200 2811480200   -   61076   0
   0   -

 real0m2.441s
 user0m0.001s
 sys 0m0.004s
 [root@wnfg001 lustre]#

 although the system is not overloaded.

 Couldn't find anything useful in dmesg:

 [root@lustre2 ~]# dmesg | grep quota
 VFS: Disk quotas dquot_6.5.2
 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on.
 Opts:
 [root@lustre2 ~]#

 [root@lustre3 ~]# dmesg | grep quota
 VFS: Disk quotas dquot_6.5.2
 LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-12): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-12): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. quota=on.
 Opts:
 [root@lustre3 ~]#

 [root@lustre4 ~]# dmesg | grep quota
 VFS: Disk quotas dquot_6.5.2
 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-8): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-8): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on.
 Opts:
 LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on.
 Opts:
 [root@lustre4 ~]#

 Thanks for any hint!

 Best regards

   Torsten





smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] trouble mounting after a tunefs

2015-06-12 Thread Martin Hecht
Hi John,

on the Parameters line the different nodes should not be separated by
:. Each node should be specified by a separate mgsnode=... or
failover.node=... statement. I'm not sure if separating the two
interfaces of each node by , is correct here, or if this should be
splitted again in two separate statements.

best regards,
Martin

On 06/12/2015 05:07 PM, John White wrote:
 Good Morning Folks,
   We recently had to add TCP NIDs to an existing o2ib FS.  We added the 
 nid to the modprobe.d stuff and tossed the definition of the NID in the 
 failnode and mgsnode params on all OSTs and the MGS + MDT.  When either an 
 o2ib or tcp client try to mount, the mount command hangs and dmesg repeats:
 LustreError: 11-0: brc-MDT-mdc-881036879c00: Communicating with 
 10.4.250.10@o2ib, operation mds_connect failed with -11.

 I fear we may have over-done the parameters, could anyone take a look here 
 and let me know if we need to fix things up (remove params, etc)?

 MGS:
 Read previous values:
 Target: MGS
 Index:  unassigned
 Lustre FS:  
 Mount type: ldiskfs
 Flags:  0x4
   (MGS )
 Persistent mount opts: user_xattr,errors=remount-ro
 Parameters:

 MDT:
  Read previous values:
 Target: brc-MDT
 Index:  0
 Lustre FS:  brc
 Mount type: ldiskfs
 Flags:  0x1001
   (MDT no_primnode )
 Persistent mount opts: user_xattr,errors=remount-ro
 Parameters:  
 mgsnode=10.4.250.11@o2ib,10.0.250.11@tcp:10.4.250.10@o2ib,10.0.250.10@tcp  
 failover.node=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp
  mdt.quota_type=ug

 OST(sample):
 Read previous values:
 Target: brc-OST0002
 Index:  2
 Lustre FS:  brc
 Mount type: ldiskfs
 Flags:  0x1002
   (OST no_primnode )
 Persistent mount opts: errors=remount-ro
 Parameters:  
 mgsnode=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp  
 failover.node=10.4.250.12@o2ib,10.0.250.12@tcp:10.4.250.13@o2ib,10.0.250.13@tcp
  ost.quota_type=ug
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Exporting a lustre mounted directory via nfs

2015-05-22 Thread Martin Hecht
Hi,

I'm re-adding lustre-discuss (I mistakenly replied directly to Kurt).

It's interesting that you can't re-export the 2.5.3 system on the client
which is able to export the 1.8.9.

The support for the lustre 2 quotas has been added to the 1.8 client
somewhere between 1.8.7 and 1.8.9.
There are a few more commits to the 1.8 branch in Whamcloud's git, but
unfortunately there is not much activity anymore. Important  fixes which
haven't been landed on 1.8 are in LU-3596 and LU-1126.

best regards,
Martin

On 05/21/2015 07:03 PM, Kurt Strosahl wrote:
 Hi,

The purpose of it was to allow access to the lustre file system to any 
 system that can mount over nfs (also because our lustre system runs over IB, 
 and this allows peoples desktops to get to the system).

The thing is that the export of lustre 1.8.9 over the 1.8.7 client has 
 been working for years (it was set up back in 2009 I believe, before I was on 
 the project).  It is only the lustre 2.5.3 system that mounts but is not 
 reachable via NFS.

Today I compiled the 2.5.3 client for a new system that has IB but does 
 not need access to the lustre file system, mounted the new lustre system, and 
 was able to export successfuly.  So the problem clearly lies with some 
 combination of the old OS (RHEL5), old client (1.8.7) and new lustre (2.5.3).

 This isn't the first oddity I've encountered.  Early in the testing 
 process I discovered that the quotas in 2.5.3 are not visible to the 1.8.7 
 clients (but are visible to the 1.8.9 clients).

 At this point I'm probably going to try building a new gateway (with new 
 hardware and the new OS), mount the new lustre with the client I know works, 
 and export the new area that way.

 I was just hoping that someone would say oh, just mount it with the derp 
 option.

 thanks,
 Kurt

 - Original Message -
 From: Martin Hecht 
 To: Kurt Strosahl 
 Sent: Thursday, May 21, 2015 12:51:41 PM
 Subject: Re: [lustre-discuss] Exporting a lustre mounted directory via nfs

 Hi Kurt,

 some time ago we had a client re-exporting a lustre 1.8.x in rhel5. It
 worked quite well, but I believe you shouldn't run too many nfs clients
 with this construct.

 What's the reason for the reexport? If your lustre is on an infiniband
 and you want to make it available on clients which have no IB card, lnet
 routing over a tcp-network might be a better option than the nfs re-export.

 If the reason is that you can't build the lustre client... well... the
 nfs reexport might be worth trying, but I don't have any experience with
 re-exporting a lustre 2 file system (although I think the client version
 is more important in this scenario).

 best regards,
 Martin

 On 05/20/2015 09:14 PM, Kurt Strosahl wrote:
 Sorry, left off some important info...

 the system is rhel5, with client 1.8.7... the lustre file system is 2.5.3 
 (the system already exports a 1.8.9 lustre file system).

 w/r,
 Kurt

 - Original Message -
 From: Kurt Strosahl 
 To: lustre-discuss@lists.lustre.org
 Sent: Wednesday, May 20, 2015 2:55:39 PM
 Subject: Exporting a lustre mounted directory via nfs

 Good Afternoon,

I'm attempting to use nfs to export a lustre mount point on a client box 
 (essentially acting as a gateway for systems that don't have the lustre 
 client).  I've mounted lustre, and added it to the nfs exports file (it 
 shows up as exported) but when I try to mount the nfs point the system 
 hangs.  On the server side (the lustre gateway) I do see the test system 
 authenticating.

 w/r,
 Kurt
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Size difference between du and quota

2015-05-21 Thread Martin Hecht
Hi,

a few more things which may play a role:

- as you are suspecting, the difference of used blocks vs. used bytes
might be the reason, especially if there are many very small files, but
there are more possible causes:

- some tools use 2^10 bytes and some others use 1000 bytes as kb which
might explain small discrpancies. du and ls are examples for different
output. However, this can not explain the whole difference.

- your find looks for regular files only (type f), but directories and
symbolic links consume a few kb as well. If there are many symbolic
links, this would be the explanation, I think.

- there are also cases in which quota can get out of sync (I don't
remember the cause, but I have already seen warnings about this in the
syslog of lustre servers). e2fsck on the ldiskfs level is supposed to
fix this issue, but I also had cases in which I had to turn off quota by
means of tune2fs on the ldisk level and turn it on again in order to
trigger something like a background quotacheck in lustre 2. In lustre
1.8 there used to be a tool quotacheck

- preallocated stripes on many ost's might be an issue as well, although
I don't see the discrepancy described on our file systems.

- there might also be orphane objects on the disk, i.e. stripes which
are not referenced anymore on the lustre level, but which still consume
disk space (not sure if these may affect quota). An online lfsck is
supposed to clean them up in lustre 2. In lustre 1.8 one had to run
several e2fsck on the ldisk level and build databases to run an lfsck,
but that's not supported anymore in lustre 2.

best regards,
Martin

On 05/20/2015 10:50 AM, Phill Harvey-Smith wrote:
 Hi all,

 One of my users is reporting a massive size difference between the
 figures reported by du and quota.

 doing a du -hs on his directory reports :
  du -hs .
 529G.

 doing a lfs quota -u username /storage reports
 Filesystem  kbytes   quota   limit   grace   files   quota   limit  
 grace
 /storage 621775192  64000 64001   -  601284  100
 110   -

 Though this user does have a lot of files :

 find . -type f | wc -l
 581519

 So I suspect that it is the typical thing that quota is reporting used
 blocks whilst du is reporting used bytes, which can of course be
 wildly different due to filesystem overhead and wasted unused space at
 the end of files where a block is allocated but only partially used.

 Is this likely to be the case ?

 I'm also not entirely sure what versions of lustre the client machines
 and MDS / OSS servers are running, as I didn't initially set the
 system up.

 Cheers.

 Phill.

 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

2014-05-28 Thread Martin Hecht
Hi bob,

just to make sure: You already followed:
http://wiki.lustre.org/index.php/Handling_File_System_Errors, especially
the steps for e2fsck linked there?

If you did *not yet* do any write operation to the damaged OST, you
might want to back up the whole OST first, using dd for instance (if the
underlying hardware still permits it).

If the situation described (empty O directory, lost LAST_ID entry)
occurred *after* the e2fsck, and you find lots of files in lost+found
when you mount the OST as ldiskfs, you can use
ll_recover_lost_found_objs to put them back in the correct place
(http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html)
- it is part of the lustre distribution. Once I had to run this several
times in order to restore the structure below.

best regards,
Martin

On 05/19/2014 08:24 PM, Bob Ball wrote:
 Oh, better still, as I kept looking, and the low-level panic
 retreated, I found this on the mdt:

 [root@lmd02 ~]# lctl get_param osc.*.prealloc_next_id
 ...
 osc.umt3-OST0025-osc.prealloc_next_id=6778336

 So, unless someone tells me that I am way off base, I'm going to
 proceed with the assumption that this is a valid starting point, and
 proceed to get my file system back online.

 bob

 On 5/19/2014 2:05 PM, Bob Ball wrote:
 Google first, ask later.  I found this in the manuals:


   26.3.4 Fixing a Bad LAST_ID on an OST

 The procedures there spell out pretty well what I must do, so this
 should be relatively straight forward.  But, does this comment refer
 to just this OST, or to all OST?
 *Note - *The file system must be stopped on all servers before
 performing this procedure.

 So, is this the best approach to follow, allowing for the fact that
 there is nothing at all left on the OST, or is there a better short
 cut to choosing an appropriate LAST_ID?

 Thanks again,
 bob


 On 5/19/2014 1:50 PM, Bob Ball wrote:
 I need to completely remake a failed OST.  I have done this in the
 past, but this time, the disk failed in such a way that I cannot
 fully get recovery information from the OST before I destroy and
 recreate.  In particular, I am unable to recover the LAST_ID file,
 but successfully retrieved the last_rcvd and CONFIGS/* files.

 mount -t ldiskfs /dev/sde /mnt/ost
 pushd /mnt/ost
 cd O
 cd 0
 cp -p LAST_ID /root/reformat/sde

 The O directory exists, but it is empty.  What can I do concerning
 this missing LAST_ID file?  I mean, I probably have something,
 somewhere, from some previous recovery, but that is way, way out of
 date.

 My intent is to recreate this OST with the same index, and then put
 it back into production.  All files were moved off the OST before
 reaching this state, so nothing else needs to be recovered here.

 Thanks,
 bob

 ___
 HPDD-discuss mailing list
 hpdd-disc...@lists.01.org
 https://lists.01.org/mailman/listinfo/hpdd-discuss




 ___
 HPDD-discuss mailing list
 hpdd-disc...@lists.01.org
 https://lists.01.org/mailman/listinfo/hpdd-discuss




 ___
 HPDD-discuss mailing list
 hpdd-disc...@lists.01.org
 https://lists.01.org/mailman/listinfo/hpdd-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss