Re: [lustre-discuss] Data migration software

2023-03-22 Thread Stephane Thiell via lustre-discuss
Hi Anna, We’re about to deploy Lustre/HSM with Phobos for a new large research data archival system at Stanford (200PB). https://github.com/phobos-storage Phobos is open source and a Lustre copytool is available. Archiving policies can be set up via Robinhood like with other HSMs. Robinhood

Re: [lustre-discuss] Filesystem could not mount after e2fsck

2023-03-06 Thread Stephane Thiell via lustre-discuss
, 2023 at 2:07 AM Stephane Thiell mailto:sthi...@stanford.edu>> wrote: Hi Robin, Sorry to hear about your problem. A few questions… Why did you run e2fsck? Did e2fsck fix something? What version of e2fsprogs are you using? errno 28 is ENOSPC, what does dumpe2fs say about available spac

Re: [lustre-discuss] Filesystem could not mount after e2fsck

2023-03-04 Thread Stephane Thiell via lustre-discuss
Hi Robin, Sorry to hear about your problem. A few questions… Why did you run e2fsck? Did e2fsck fix something? What version of e2fsprogs are you using? errno 28 is ENOSPC, what does dumpe2fs say about available space? You can check the values of "Free blocks” and "Free inodes” using this

Re: [lustre-discuss] Mistake while removing an OST

2023-02-02 Thread Stephane Thiell via lustre-discuss
lctl del_ost was added in the upcoming Lustre 2.16 to remove a specific OST while the filesystem is online. What it does is actually launching the llog_cancel commands for the specified OST, hence with reduced risk of user errors (like what happened here). We have used it several times in

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-04-19 Thread Stephane Thiell via lustre-discuss
Hi Andrew, kernel: LustreError: 13921:0:(genops.c:478:class_register_device()) astrofs-OST-osc-MDT0001: already exists, won't add is symptomatic of a llog index issue/mismatch on the MDT vs. MGT. I would check if the llog backup of MDT0001 (over ldiskfs in CONFIGS) matches the one on the

Re: [lustre-discuss] NRS TBF by UID and congestion

2021-10-15 Thread Stephane Thiell via lustre-discuss
Salut Diego! Yes, we have been using NRS TBF by UID on our Oak storage system for months now with Lustre 2.12. It’s a capacity-oriented, global filesystem, not designed for heavy workloads (unlike our scratch filesystem) but with many users and as such, a great candidate for NRS TBF UID. Since

Re: [lustre-discuss] how to enforce traffic to OSS on o2ib1 only ?

2021-09-28 Thread Stephane Thiell via lustre-discuss
Hi Riccardo, I would check if the OSTs on this OSS have been registered with the correct NIDs (o2ib1) on the MGS: $ lctl --device MGS llog_print -client and look for the NIDs in setup/add_conn for the OSTs in question. Best, Stephane > On Sep 28, 2021, at 9:52 AM, Riccardo Veraldi >

Re: [lustre-discuss] changelogs stop working

2020-12-09 Thread Stephane Thiell
Hi Thomas, Nodemap’s audit_mode is not defaulting to 1 on upgrade to Lustre 2.12. If you have recently experienced the issue when upgraded a filesystem from 2.10 to Lustre 2.12, so please check that flag maybe.

Re: [lustre-discuss] Robinhood scan time

2020-12-07 Thread Stephane Thiell
Hi Amit, Your number is very low indeed. At our site, we're seeing ~100 million files/day during a Robinhood scan with nb_threads_scan =4 and on hardware using Intel based CPUs: 2020/11/16 07:29:46 [126653/2] STATS | avg. speed (effective): 1207.06 entries/sec (3.31 ms/entry/thread)

Re: [lustre-discuss] routed LNET connection between server and virtualized client is failing

2020-08-13 Thread Stephane Thiell
Hi Uwe, We had a similar problem in the past and our conclusion was that the in-kernel OFED drivers (provided by distribution) likely doesn’t have all the patches required for SR-IOV to work correctly in VMs (when using VFs on IB HBAs and KVM). Mellanox OFED is required at least on the VMs

Re: [lustre-discuss] 2.12.2 mds problems

2019-11-26 Thread Stephane Thiell
Hi Alastair, The first thing to do is to upgrade your servers to 2.12.3, as many bugs have been fixed. http://wiki.lustre.org/Lustre_2.12.3_Changelog Stephane > On Nov 20, 2019, at 7:29 AM, BASDEN, ALASTAIR G. > wrote: > > Hi, > > We have a new 2.12.2 system, and are seeing fairly

Re: [lustre-discuss] ksym errors on kmod-lustre RPM after 2.12.0 build against MOFED 4.5-1

2019-10-15 Thread Stephane Thiell
Hi Americo, In my experience, you need a proper kmod-mlnx-ofa_kernel RPM installed for the Lustre build process to find the correct symbols. To generate the kmod-mlnx-ofa_kernel RPM for the current kernel (in my case, Lustre patched, server-side), you can try: $ rpmbuild --rebuild --define

Re: [lustre-discuss] Lustre-2.10.6 frequently hangs during OST data migration

2019-03-30 Thread Stephane Thiell
Hi Tung-Han, Your stack trace looks similar to the one we’ve just seen yesterday on our 2.10.6 system. I’ve open https://jira.whamcloud.com/browse/LU-12136 to track the issue. Best, Stephane > On Mar 29, 2019, at 8:47 PM, Tung-Han Hsieh > wrote: > > Dear All, > > Our system was recently

Re: [lustre-discuss] Suspended jobs and rebooting lustre servers

2019-02-27 Thread Stephane Thiell
On one of our filesystem, we add a few new OSTs almost every month with no downtime, this is very convenient. The only thing that I would recommend is to avoid doing that during a peak of I/Os on your filesystem (we usually do it as early as possible in the morning), as the added OSTs will

Re: [lustre-discuss] Filesystem started crashing recently

2019-01-24 Thread Stephane Thiell
Hi Steve, This could be LU-5152 (https://jira.whamcloud.com/browse/LU-5152), which tentatively tried to fix unprivileged chgrp -R. The patch introduced some kind of dependency between servers in the quota handling. It has been reverted in 2.10.6, however it’s not clear to me what the plan for

Re: [lustre-discuss] Lustre 2.10.3 on ZFS - slow read performance

2018-03-30 Thread Stephane Thiell
Hi Alex, I’m no ZFS expert, but for a new project I recently faced some read performance issues too when doing some zfs 0.7 testing, but not at all as bad as you… so I feel sorry for you as you seem to have done quite a good work so far… so perhaps some ideas: - check that arc is fully

[lustre-discuss] Lustre User Group 17 CFP extended to March 10th

2017-03-01 Thread Stephane Thiell
Share your experiences with colleagues. Submit an abstract to present at LUG17. Abstracts are now due March 10, 2017. The committee is particularly looking for presentations regarding: • experiences running the newer community releases - 2.8 and 2.9 - in production • experiences

Re: [lustre-discuss] [Iudev] GID only mapping in 2.8.60?

2016-11-28 Thread Stephane Thiell
ly > > > > On Mon, Nov 7, 2016 at 5:59 PM, Stephane Thiell <sthi...@stanford.edu> wrote: > > > On Nov 4, 2016, at 11:05 PM, Dilger, Andreas <andreas.dil...@intel.com> > > wrote: > > > > Actually, the nodemap feature will work with any client,

Re: [lustre-discuss] GID only mapping in 2.8.60?

2016-11-07 Thread Stephane Thiell
> On Nov 4, 2016, at 11:05 PM, Dilger, Andreas wrote: > > Actually, the nodemap feature will work with any client, since it is only > affecting lookups on the MDS and quota on the OSS. Great! :-) > > It probably would take less time for you to implement the flag

[lustre-discuss] GID only mapping in 2.8.60?

2016-11-04 Thread Stephane Thiell
Hi all, I am currently evaluating Lustre 2.8.60 with UID/GID mapping for a Lustre storage system that will be connected to multiple clusters. These clusters have matching UIDs but same groups have different GIDs. Actually GID mapping is working fine but I can’t find an easy way to avoid

Re: [lustre-discuss] Permanently remove osts: tunefs writeconf on zfs osts

2016-08-13 Thread Stephane Thiell
Hi Fernando, To remove your old and empty OSTs during a maintenance: stop your filesystem, do the writeconf on all targets and remount your MGS/MDS and then all OSTs minus the old ones. With shine, stop your filesystem, comment out the old OST lines in the fs model file and type “shine

Re: [lustre-discuss] Filesystem hanging....

2016-08-13 Thread Stephane Thiell
Hi Phil, I understand that you’re running master on your clients (tag v2_8_56 was created 4 days ago) and 2.1 on the servers? Running master in production is already a challenge. Also Lustre has never be good for cross-version compatibility. For example, it is possible to make 2.1 servers work