Re: [lustre-discuss] What's your favorite distributed filesystem benchmark?

2021-06-28 Thread Andreas Dilger via lustre-discuss
NVidia seems to think so: https://wiki.lustre.org/images/3/30/LUG2021-Accelerating_AI_at_Scale-Bernauer-Kashinkunti.pdf Cheers, Andreas On Jun 28, 2021, at 17:43, Vinayak.Kamath wrote:  Thanks for the prompt response, Andreas. Is Lustre a good choice for a non-volatile “cache” of ML training

Re: [lustre-discuss] What's your favorite distributed filesystem benchmark?

2021-06-28 Thread Andreas Dilger via lustre-discuss
On Jun 28, 2021, at 16:58, Vinayak.Kamath via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Greetings, Our team in the process of setting up an HPC system. We’re evaluating several distributed file systems(DFS), including Lustre. This is new territory for us and we’ve been

Re: [lustre-discuss] Unable to mount new OST

2021-07-05 Thread Andreas Dilger via lustre-discuss
On Jul 5, 2021, at 09:05, David Cohen mailto:cda...@physics.technion.ac.il>> wrote: Hi, I'm using Lustre 2.10.5 and lately tried to add a new OST. The OST was formatted with the command below, which other than the index is the exact same one used for all the other OSTs in the system.

Re: [lustre-discuss] MDT filling up

2021-06-28 Thread Andreas Dilger via lustre-discuss
On Jun 26, 2021, at 12:58, Thomas Roth mailto:t.r...@gsi.de>> wrote: we have one of three MDTs filling up its disk space. It is running 2.12.5 on ldiskfs, but no data-on-metadata. The inode usage is just 54%, corresponding to 477 M inodes. There is a large directory tree with 353 M files. For

Re: [lustre-discuss] good ways to identify clients causing problems?

2021-05-04 Thread Andreas Dilger via lustre-discuss
On May 4, 2021, at 12:41, Bill Anderson via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi All, Can you recommend good ways to identify Lustre client hosts that might be causing stability or performance problems for the entire filesystem? For example, if a user is

Re: [lustre-discuss] kernel 5.10 on RHEL8 and lustre-client kind of working

2021-05-05 Thread Andreas Dilger via lustre-discuss
Hi Andrej, could you please check if your change is included in one of the patches in https://jira.whamcloud.com/browse/LU-14195 "Support for linux kernel version 5.10"? If not, that would be the natural ticket to reference when submit this patch. Note, however, that the same code also needs

Re: [lustre-discuss] Experience with DDN AI400X

2021-04-02 Thread Andreas Dilger via lustre-discuss
On Mar 30, 2021, at 11:54, Spitz, Cory James via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello, Megan. I was curious why you made this comment: > A general example is a box with lustre-client 2.10.4 is not going to be > completely happy with a new 2.12.x on the lustre

Re: [lustre-discuss] Viewing project quotas

2021-02-25 Thread Andreas Dilger via lustre-discuss
On Feb 24, 2021, at 10:45, Peeples, Heath via lustre-discuss wrote: > > I am looking for a way to see all project quotas and their associated > directories for a file system.Is there an easy way to do that? Thanks > for the help. As yet, the "lfs project" command does not have support

Re: [lustre-discuss] OST -> MDT migration and MDT -> OST migration

2021-04-14 Thread Andreas Dilger via lustre-discuss
On Apr 14, 2021, at 18:42, Bill Anderson via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi All, I'm trying to figure out how to migrate files stored on an OST to an MDT (that's using DoM) and to migrate files stored on an MDT to an OST (e.g., if the MDT is getting

Re: [lustre-discuss] OST -> MDT migration and MDT -> OST migration

2021-04-15 Thread Andreas Dilger via lustre-discuss
On Apr 15, 2021, at 00:52, Åke Sandgren wrote: > >  > >> On 4/15/21 5:12 AM, Andreas Dilger via lustre-discuss wrote: >> On Apr 14, 2021, at 18:42, Bill Anderson via lustre-discuss >> > <mailto:lustre-discuss@lists.lustre.org>> wrote: >>> >

Re: [lustre-discuss] Full OST

2021-09-03 Thread Andreas Dilger via lustre-discuss
You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname. On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi It could be a bug, but

Re: [lustre-discuss] Full OST

2021-09-03 Thread Andreas Dilger via lustre-discuss
$ man lfs-fid2path.1 lfs-fid2path(1) user utilities lfs-fid2path(1) NAME lfs fid2path - print the pathname(s) for a file identifier SYNOPSIS lfs fid2path [OPTION]... ... DESCRIPTION lfs fid2path

Re: [lustre-discuss] lustre file system installation issues

2021-09-16 Thread Andreas Dilger via lustre-discuss
You are trying to build the 2.14.54 (development) branch against a very old kernel. You are much more likely to have success with the b2_12/2.12.7 release for this kernel. On Sep 13, 2021, at 11:06, Nagmat Nazarov mailto:nag...@nevada.unr.edu>> wrote: Dear Lustre file system community I am

Re: [lustre-discuss] Full OST

2021-09-07 Thread Andreas Dilger via lustre-discuss
On Sep 3, 2021, at 01:49, Alastair Basden mailto:a.g.bas...@durham.ac.uk>> wrote: Hi, We have a file system where each OST is a single SSD. One of those is reporting as 100% full (lfs df -h /snap8): snap8-OST004d_UUID 5.8T2.0T3.5T 37% /snap8[OST:77] snap8-OST004e_UUID

Re: [lustre-discuss] lru_size question

2021-09-14 Thread Andreas Dilger via lustre-discuss
On Sep 9, 2021, at 02:49, Thomas Roth mailto:t.r...@gsi.de>> wrote: Hi all, I have checked the lru_size on an (2.12.5) system that has just been restarted. The defaults have never been touched on that system, and so I see lru_size=0 for all OSTs, on the MDS as on a client, as it should be. The

Re: [lustre-discuss] Disabling multi-rail dynamic discovery

2021-09-14 Thread Andreas Dilger via lustre-discuss
On Sep 14, 2021, at 11:17, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Ah yes, I see what the lnet unit file is doing. OK, I think this is all straighten out and working great now. We have a fairly extensive init

Re: [lustre-discuss] Disabling multi-rail dynamic discovery

2021-09-14 Thread Andreas Dilger via lustre-discuss
On Sep 14, 2021, at 11:17, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss wrote: > > Ah yes, I see what the lnet unit file is doing. OK, I think this is all > straighten out and working great now. We have a fairly extensive init script > (the lustre3 script in

Re: [lustre-discuss] Full OST

2021-09-09 Thread Andreas Dilger via lustre-discuss
On Sep 8, 2021, at 04:42, Alastair Basden mailto:a.g.bas...@durham.ac.uk>> wrote: Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found and/or a regular "find /mnt/ost -type f -size +1M" or similar to find where the files are. Thanks. e2fsck returns clean

Re: [lustre-discuss] Full OST

2021-09-04 Thread Andreas Dilger via lustre-discuss
You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path. Alternately, see "lctl-lfsck-start.8" man page for options to

Re: [lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Andreas Dilger via lustre-discuss
On Aug 5, 2021, at 13:29, Nathan Dauchy - NOAA Affiliate mailto:nathan.dau...@noaa.gov>> wrote: Andreas, thanks as always for your insight. Comments inline... On Thu, Aug 5, 2021 at 10:48 AM Andreas Dilger mailto:adil...@whamcloud.com>> wrote: On Aug 5, 2021, at 09:28, Nathan Dauchy via

Re: [lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Andreas Dilger via lustre-discuss
On Aug 5, 2021, at 09:28, Nathan Dauchy - NOAA Affiliate via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Greetings ext4 and flash storage experts! Motivation: We have ldiskfs OSTs that are primarily HDDs and use a Flash device for an external journal device. Recent IOR

Re: [lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Andreas Dilger via lustre-discuss
On Aug 5, 2021, at 17:44, Nathan Dauchy - NOAA Affiliate via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: On Thu, Aug 5, 2021 at 3:23 PM Andreas Dilger mailto:adil...@whamcloud.com>> wrote: On Aug 5, 2021, at 13:29, Nathan Dauchy wrote: Andreas, thanks as always for your

Re: [lustre-discuss] lustre client kernel compatibility

2021-07-28 Thread Andreas Dilger via lustre-discuss
On Jul 28, 2021, at 17:07, Scott Wood via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi all, Section 8.1.1 of the current lustre documentation, "Software Requirements", states that "ver refers to the Linux distribution (e.g., 3.6.18-348.1.1.el5)." The client binaries

Re: [lustre-discuss] Fwd: RPCs in Flight are more than the max_rpcs_in_flight value

2021-10-07 Thread Andreas Dilger via lustre-discuss
On Oct 7, 2021, at 13:19, Md Hasanur Rashid via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello Everyone, I am running the Filebench benchmark in my Lustre cluster. I set the max_rpcs_in_flight value to be 1. Before executing and after executing, I verified that the value

Re: [lustre-discuss] [EXTERNAL] Re: Fwd: RPCs in Flight are more than the max_rpcs_in_flight value

2021-10-17 Thread Andreas Dilger via lustre-discuss
It seems likely if you are using spinning disks (HDD) that using max_rpcs_in_flight=1 will reduce the seeking on the OSTs, especially with large RPC sizes (16MB). Cheers, Andreas On Oct 16, 2021, at 16:11, Md Hasanur Rashid wrote:  Hi Everyone, Thanks for getting back to me, Andreas and

Re: [lustre-discuss] dkms-2.8.6 breaks installation of lustre-zfs-dkms-2.12.7-1.el7.noarch

2021-10-16 Thread Andreas Dilger via lustre-discuss
Riccardo, It would be great if you could submit your patch to Gerrit. Cheers, Andreas > On Oct 13, 2021, at 17:06, Riccardo Veraldi > wrote: > > yes, same problem for me, I Addressed this a few weeks go and I think I > Reported to the mailing list. > > This is my patch to make things

Re: [lustre-discuss] No read throughput shown for the sequential read write Filebench workload

2021-10-16 Thread Andreas Dilger via lustre-discuss
I would guess that all of the reads are handled from the client page cache? Cheers, Andreas On Oct 13, 2021, at 06:37, Md Hasanur Rashid via lustre-discuss wrote:  Hello Everyone, I am running a Filebench workload which is provided below: define fileset

Re: [lustre-discuss] Question about max service threads

2021-09-21 Thread Andreas Dilger via lustre-discuss
I’ve always used ps, grep, and wc -l to answer that question :) From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Andreas Dilger via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> Sent: Tuesday, Septembe

Re: [lustre-discuss] Question about max service threads

2021-09-21 Thread Andreas Dilger via lustre-discuss
Hello Houkun, There was patch https://review.whamcloud.com/34400 "LU-947 ptlrpc: allow stopping threads above threads_max" landed for the 2.13 release. You could apply this patch to your 2.12 release, or test with 2.14.0. Note that this patch only

Re: [lustre-discuss] Question about max service threads

2021-09-22 Thread Andreas Dilger via lustre-discuss
t think Lustre exposes a stat which gives *current* count of worker threads. I’ve always used ps, grep, and wc -l to answer that question :) From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Andreas Dilger via lustre-discuss mailto:lustre-discuss@list

Re: [lustre-discuss] Lustre and server upgrade

2021-11-19 Thread Andreas Dilger via lustre-discuss
Dean, it should be emphasized that "llmount.sh" and "llmountcleanup.sh" are for quickly formatting and mounting *TEST* filesystems. They only create a few small (400MB) loopback files in /tmp and format them as OSTs and MDTs. This should *NOT* be used on a production system, or you will be

Re: [lustre-discuss] OST "D" status - only 1 OSS mounting

2021-10-31 Thread Andreas Dilger via lustre-discuss
The "D" status means the OST is marked in "Degraded" mode, see the lfs-df(1) man page. The "lfs check osts" is only checking the client connection to the OSTs, but whether the MDS creates objects on those OSTs really depends on how the MDS is feeling about them. On Oct 31, 2021, at 19:28, Sid

Re: [lustre-discuss] SLUB: Unable to allocate memory on node -1

2021-10-29 Thread Andreas Dilger via lustre-discuss
On Oct 29, 2021, at 07:39, Julien Rey via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello, This may not be related directly to Lustre, but here's what I get when I try to mount our Lustre filesystem on one of our compute node running CentOS 7: Oct 29 14:30:20 gpu-node8

Re: [lustre-discuss] Jobstats Support with Singularity Container

2021-12-14 Thread Andreas Dilger via lustre-discuss
The JobID is provided by the clients, the servers don't really care how it was generated. On Dec 14, 2021, at 03:24, Iannetti, Gabriele mailto:g.ianne...@gsi.de>> wrote: Hi again, is it possible to use the introduced per-session JobID feature in Lustre 2.13 when running the server with 2.12

Re: [lustre-discuss] Patched vs patchless server (again)

2021-12-10 Thread Andreas Dilger via lustre-discuss
On Dec 10, 2021, at 01:37, Steve Brasier mailto:ste...@stackhpc.com>> wrote: Hi all, I've seen the useful thread ending here: https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg16212.html Assuming you are using ldiskfs without project quotas, is there any reason to use the

Re: [lustre-discuss] Jobstats Support with Singularity Container

2021-12-11 Thread Andreas Dilger via lustre-discuss
See the Lustre Operations Manual for options setting the JobID. You can set it using fields like "%u" for UID, or you can set it per process group, or for the whole node. For containers, you could set it for the process group when it starts and it should be inherited by all processes in the

Re: [lustre-discuss] Hardware advice for homelab

2021-07-19 Thread Andreas Dilger via lustre-discuss
On Jul 19, 2021, at 04:51, Andrew Elwell via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi folks, Given my homelab testing for Lustre tends to be contained within VirtualBox on laptop ($work has a physical hardware test bed once mucking around gets serious), I'm considering

Re: [lustre-discuss] Stripe Size for small file and OST query

2022-03-11 Thread Andreas Dilger via lustre-discuss
Yes, if you set a default stripe_count=2 then all file access will need to check 2 objects. However, the PFL file layout avoids this problem, and allows setting different file layouts for different ranges of the file, which is very useful for small/medium/large files, hybrid flash/disk in the

Re: [lustre-discuss] [EXTERNAL] Re: Write Performance is Abnormal for max_dirty_mb Value of 2047

2022-03-29 Thread Andreas Dilger via lustre-discuss
The filesystem looks like a test system (only 200GB OSTs, the whole filesystem is smaller than a single HDD), so it doesn't make sense to be testing Lustre 2.9 at this point. At a very minimum it should be 2.12.8, but for a new test filesystem it makes sense to try out master (2.15.0-RC2)

Re: [lustre-discuss] Problem with standard fortran IO and lustre fs, sporadically slow io rate

2022-02-03 Thread Andreas Dilger via lustre-discuss
On Feb 3, 2022, at 05:13, Bertini, Denis Dr. mailto:d.bert...@gsi.de>> wrote: I am facing a problem when trying to write on a Lustre filesystem ( v 2.12) using standard fortran IO code ( compiled with GNU gfortran 8.3). The io rate sporadically goes extremely slow ( sometime down to 20/10

Re: [lustre-discuss] question regarding du vs df on lustre

2022-04-11 Thread Andreas Dilger via lustre-discuss
Lustre is returning the file unlink from the MDS immediately, but deleting the objects from the OSTs asynchronously in the background. How many files are being deleted in this case? If you are running tests like IO500, where there are many millions of small files plus some huge files, then it

Re: [lustre-discuss] Target index choice

2022-04-08 Thread Andreas Dilger via lustre-discuss
On Apr 8, 2022, at 01:50, Hans Henrik Happe via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi, Is or will there be a downside to choosing discontinuous index numbers? I.e. encode OSS number YY and target number XX like 0xYYXX. I guess it could hurt if layouts are packed to

Re: [lustre-discuss] Essential tools for Lustre

2022-04-15 Thread Andreas Dilger via lustre-discuss
Note that in newer Lustre releases, if you have project IDs enabled (you don't need to *enforce* project quotas, just have quota accounting enabled), that "df" (statfs()) will return the quota for the project ID on that directory tree. It isn't _quite_ "fast du" for arbitrary directory trees,

Re: [lustre-discuss] Cleanup of RPC-related Statistics at "import" Parameter on OSC Device

2022-04-13 Thread Andreas Dilger via lustre-discuss
On Apr 12, 2022, at 02:14, Hasan Rashid via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello Everyone, Is it possible to reset the rpc-related statistics (i.e. avg_waittime, read_data_averages, write_data_averages etc.) in osc.*.import without remounting the client? If

Re: [lustre-discuss] File size discrepancy on lustre

2023-09-15 Thread Andreas Dilger via lustre-discuss
Are you using any file mirroring (FLR, "lfs mirror extend") on the files, perhaps before the "lfs getstripe" was run? On Sep 15, 2023, at 08:12, Kurt Strosahl via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Good Morning, We have encountered a very odd issue. Where

Re: [lustre-discuss] Getting started with Lustre on RHEL 8.8

2023-09-12 Thread Andreas Dilger via lustre-discuss
Hello, The preferred path to set up Lustre depends on what you are planning to do with it? If for regular usage it is easiest to start with RPMs built for the distro from

Re: [lustre-discuss] Getting started with Lustre on RHEL 8.8

2023-09-12 Thread Andreas Dilger via lustre-discuss
On Sep 12, 2023, at 22:31, Cyberxstudio cxs mailto:cyberxstudio.cl...@gmail.com>> wrote: Hi I get this error while installing lustre and other packages [root@localhost ~]# yum --nogpgcheck --enablerepo=lustre-server install \ > kmod-lustre-osd-ldiskfs \ > lustre-dkms \ >

Re: [lustre-discuss] questions about group locks / LDLM_FL_NO_TIMEOUT flag

2023-08-30 Thread Andreas Dilger via lustre-discuss
You can't directly dump the holders of a particular lock, but it is possible to dump the list of FIDs that each client has open. mds# lctl get_param mdt.*.exports.*.open_files | egrep "=|FID" | grep -B1 FID That should list all client NIDs that have FID open. It shouldn't be possible for

Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Andreas Dilger via lustre-discuss
On Oct 13, 2023, at 20:58, Fedele Stabile mailto:fedele.stab...@fis.unical.it>> wrote: Hello everyone, We are in progress to integrate Lustre on our little HPC Cluster and we would like to know if it is possible to use the same node in a cluster to act as an OSS with disks and to also use it

Re: [lustre-discuss] Lustre-Manual on lfsck - non-existing entries?

2023-10-31 Thread Andreas Dilger via lustre-discuss
On Oct 31, 2023, at 13:12, Thomas Roth via lustre-discuss wrote: > > Hi all, > > after starting an `lctl lfsck_start -A -C -o` and the oi_scrub having > completed, I would check the layout scan as described in the Lustre manual, > "36.4.3.3. LFSCK status of layout via procfs", by > > >

[lustre-discuss] Possible change to "lfs find -size" default units?

2023-11-05 Thread Andreas Dilger via lustre-discuss
I've recently realized that "lfs find -size N" defaults to looking for files of N *bytes* by default, unlike regular find(1) that is assuming 512-byte blocks by default if no units are given. I'm wondering if it would be disruptive to users if the default unit for -size was changed to 512-byte

Re: [lustre-discuss] OST is not mounting

2023-11-07 Thread Andreas Dilger via lustre-discuss
The OST went read-only because that is what happens when the block device disappears underneath it. That is a behavior of ext4 and other local filesystems as well. If you look in the console logs you would see SCSI errors and the filesystem being remounted read-only. To have reliability in

Re: [lustre-discuss] very slow mounts with OSS node down and peer discovery enabled

2023-10-26 Thread Andreas Dilger via lustre-discuss
I can't comment on the LNet peer discovery part, but I would definitely not tecommend to leave the lnet_transaction_timeout that low for normal usage. This can cause messages to be dropped while the server is processing them and introduce failures needlessly. Cheers, Andreas > On Oct 26,

Re: [lustre-discuss] Data recovery with lost MDT data

2023-09-21 Thread Andreas Dilger via lustre-discuss
In the absence of backups, you could try LFSCK to link all of the orphan OST objects into .lustre/lost+found (see lctl-lfsck_start.8 man page for details). The data is still in the objects, and they should have UID/GID/PRJID assigned (if used) but they have no filenames. It would be up to you

Re: [lustre-discuss] [EXTERNAL] Re: Data recovery with lost MDT data

2023-09-22 Thread Andreas Dilger via lustre-discuss
On Sep 21, 2023, at 16:06, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] mailto:darby.vicke...@nasa.gov>> wrote: I knew an lfsck would identify the orphaned objects. That’s great that it will move those objects to an area we can triage. With ownership still intact (and I assume time

Re: [lustre-discuss] [EXTERNAL EMAIL] Re: Lustre 2.15.3: patching the kernel fails

2023-09-22 Thread Andreas Dilger via lustre-discuss
On Sep 22, 2023, at 01:45, Jan Andersen mailto:j...@comind.io>> wrote: Hi Andreas, Thank you for your insightful reply. I didn't know Rocky; I see there's a version 9 as well - is ver 8 better, since it is more mature? There is an el9.2 ldiskfs series that would likely also apply to the

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-09-28 Thread Andreas Dilger via lustre-discuss
On Sep 26, 2023, at 13:44, Audet, Martin via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello all, I would appreciate if the community would give more attention to this issue because upgrading from 2.12.x to 2.15.x, two LTS versions, is something that we can expect many

Re: [lustre-discuss] Adding lustre clients into the Debian

2023-10-01 Thread Andreas Dilger via lustre-discuss
On Oct 1, 2023, at 05:54, Arman Khalatyan via lustre-discuss wrote: > > Hello everyone, > > We are in the process of integrating the Lustre client into Debian. Are there > any legal concerns or significant obstacles to this? We're curious why it > hasn't been included in the official Debian

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-10-01 Thread Andreas Dilger via lustre-discuss
On Oct 1, 2023, at 00:36, Tung-Han Hsieh via lustre-discuss wrote: > I should apologize for replying late. Here I would like to clarify why in my > opinion the Lustre ldiskfs code is not self-contained. > > In the past, to compile lustre with ldiskfs, we needed to patch Linux kernel > using

Re: [lustre-discuss] Adding lustre clients into the Debian

2023-10-01 Thread Andreas Dilger via lustre-discuss
On Oct 1, 2023, at 05:54, Arman Khalatyan via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hello everyone, We are in the process of integrating the Lustre client into Debian. Are there any legal concerns or significant obstacles to this? We're curious why it hasn't been

Re: [lustre-discuss] No port 988?

2023-09-26 Thread Andreas Dilger via lustre-discuss
On Sep 26, 2023, at 06:12, Jan Andersen mailto:j...@comind.io>> wrote: Hi, I've built and installed lustre on two VirtualBoxes running Rocky 8.8 and formatted one as the MGS/MDS and the other as OSS, following a presentation from Oak Ridge National Laboratory: "Creating a Lustre Test System

Re: [lustre-discuss] [BULK] Re: [EXTERNAL] Re: Data recovery with lost MDT data

2023-09-25 Thread Andreas Dilger via lustre-discuss
Probably using "stat" on each file is slow, since this is getting the file size from each OST object. You could try the "xstat" utility in the lustre-tests RPM (or build it directly) as it will only query the MDS for the requested attributes (owner at minimum). Then you could split into

Re: [lustre-discuss] backup restore docs not quite accurate?

2023-10-18 Thread Andreas Dilger via lustre-discuss
Removing the OI files is for ldiskfs backup/restore (eg. after tar/untar) when the inode numbers are changed. That is not needed for ZFS send/recv because the inode numbers stay the same after such an operation. If that isn't clear in the manual it should be fixed. Cheers, Andreas > On Oct

Re: [lustre-discuss] mount not possible: "no server support"

2023-10-19 Thread Andreas Dilger via lustre-discuss
On Oct 19, 2023, at 19:58, Benedikt Alexander Braunger via lustre-discuss mailto:lustre-discuss@lists.Lustre.org>> wrote: Hi Lustrers, I'm currently struggling with a unmountable Lustre filesystem. The client only says "no server support", no further logs on client or server. I first thought

Re: [lustre-discuss] Ongoing issues with quota

2023-10-10 Thread Andreas Dilger via lustre-discuss
There is a $ROOT/.lustre/lost+found that you could check. What does "lfs df -i" report for the used inode count? Maybe it is RBH that is reporting the wrong count? The other alternative would be to mount the MDT filesystem directly as type ZFS and see what df -i and find report? Cheers,

Re: [lustre-discuss] Ongoing issues with quota

2023-10-09 Thread Andreas Dilger via lustre-discuss
The quota accounting is controlled by the backing filesystem of the OSTs and MDTs. For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block usage. For ZFS you would have to ask on the ZFS list to see if there is some way to re-count the quota usage. The "inode" quota is

Re: [lustre-discuss] setting quotas from within a container

2023-10-21 Thread Andreas Dilger via lustre-discuss
Hi Lisa, The first question to ask is which Lustre version you are using? Second, are you using subdirectory mounts or other UID/GID mapping for the container? That could happen at both the Lustre level or by the kernel itself. If you aren't sure, you could try creating a new file as root

Re: [lustre-discuss] OST went back in time: no(?) hardware issue

2023-10-04 Thread Andreas Dilger via lustre-discuss
On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss wrote: > > Hi all, > > in our Lustre 2.12.5 system, we have "OST went back in time" after OST > hardware replacement: > - hardware had reached EOL > - we set `max_create_count=0` for these OSTs, searched for and migrated off > the

Re: [lustre-discuss] Failing build of lustre client on Debian 12

2023-10-04 Thread Andreas Dilger via lustre-discuss
On Oct 4, 2023, at 16:26, Jan Andersen mailto:j...@comind.io>> wrote: Hi, I've just successfully built the lustre 2.15.3 client on Debian 11 and need to do the same on Debian 12; however, configure fails with: checking if Linux kernel was built with CONFIG_FHANDLE in or as module... no

Re: [lustre-discuss] re-registration of MDTs and OSTs

2023-10-24 Thread Andreas Dilger via lustre-discuss
On Oct 18, 2023, at 13:04, Peter Grandi via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: So I have been upgrading my one and only MDT to a larger ZFS pool, by the classic route of creating a new pool, new MDT, and then 'zfs send'/zfs receive' for the copy over (BTW for those

Re: [lustre-discuss] question about rename operation ?

2023-08-16 Thread Andreas Dilger via lustre-discuss
Any directory renames where it is not just a simple name change (ie. parent directory is not the same for both source and target) the MDS thread doing the rename will take the LDLM "big filesystem lock" (BFL), which is a specific FID for global rename serialization. This ensures that there is

Re: [lustre-discuss] Essential tools for Lustre

2022-04-22 Thread Andreas Dilger via lustre-discuss
:40, Raj wrote:  Andreas, Is there any IO penalties in enabling project quota? Will I see the same throughput from the FS? Thanks -Raj On Fri, Apr 15, 2022 at 1:32 PM Andreas Dilger via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Note that in newer Lustre releases, if yo

Re: [lustre-discuss] Poor(?) Lustre performance

2022-04-20 Thread Andreas Dilger via lustre-discuss
On Apr 16, 2022, at 22:51, Finn Rawles Malliagh via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi all, I have just set up a three-node Lustre configuration, and initial testing shows what I think are slow results. The current configuration is 2 OSS, 1 MDS-MGS; each OSS/MGS

Re: [lustre-discuss] Poor(?) Lustre performance

2022-04-20 Thread Andreas Dilger via lustre-discuss
Finn, I can't really say for sure where the performance limitation in your system is coming from. You'd have to re-run the tests against the local ldiskfs filesystem to see how the performance compares to with that of Lustre. The important part of benchmark testing is to systematically build

Re: [lustre-discuss] FLR Mirroring for read performance

2022-05-19 Thread Andreas Dilger via lustre-discuss
On May 11, 2022, at 08:25, Nathan Dauchy wrote: > > Greetings! Hello Nathan, > During the helpful LUG tutorial from Rick Mohr on advanced lustre file > layouts, it was mentioned that “lfs mirror” could be used to improve read > performance. And the manual supports this, stating “files that

Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-20 Thread Andreas Dilger via lustre-discuss
wildcard "*" if all of the OSTs/MDTs are flash based. If you have a hybrid NVMe/HDD system, you can explicitly select a subset of OST/MDT devices to disable the caches. Cheers, Andreas On May 20, 2022, at 02:49, Åke Sandgren mailto:ake.sandg...@hpc2n.umu.se>> wrote: On 5/20/22

Re: [lustre-discuss] lustre_rsync with growing statuslog

2022-05-20 Thread Andreas Dilger via lustre-discuss
On May 20, 2022, at 06:33, Robert Redl mailto:robert.r...@lmu.de>> wrote: Dear Lustre Experts, since a few weeks we are keeping two Lustre system synchronous using lustre_rsync. That works fine, but the statuslog file is growing. It is currently about 500MB in size. Updating it is apparently

Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-24 Thread Andreas Dilger via lustre-discuss
The mailing list is not a great place to file bug reports. Jira would be better, and including a patch would be best. :-) Cheers, Andreas > On Jun 24, 2022, at 08:24, Thomas Roth via lustre-discuss > wrote: > > Since it seems I have now managed to create the modules, I'd like to record >

Re: [lustre-discuss] Max Single OSS throughput is not crossing 7 GB/s Reads

2022-06-28 Thread Andreas Dilger via lustre-discuss
On Jun 28, 2022, at 21:51, Karan Singh via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi team Below are the details : Using 40 lustre docker clients running on 4 x Dell R750 with each lustre docker client running the below mentioned command ${xfersize} = ${blksize}

Re: [lustre-discuss] How to speed up Lustre

2022-07-06 Thread Andreas Dilger via lustre-discuss
Thomas, where the file data is stored depends entirely on the PFL layout used for the filesystem or parent directory. For DoM files, you need to specify a DoM component, like: lfs setstripe -E 64K -L mdt -E 1G -c 1 -E 16G -c 4 -E eof -c 32 so the first 64KB will be put onto the MDT where

Re: [lustre-discuss] How to speed up Lustre

2022-07-06 Thread Andreas Dilger via lustre-discuss
On Jul 6, 2022, at 14:50, Thomas Roth mailto:t.r...@gsi.de>> wrote: Yes, I got it. But Marion states that they switched > to a PFL arrangement, where the first 64k lives on flash OST's (mounted on > our metadata servers), and the remainder of larger files lives on HDD OST's. So, how do you

Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-22 Thread Andreas Dilger via lustre-discuss
On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to an alleged rhel8.5 software fails... The system is running kernel

Re: [lustre-discuss] Help with recovery of data

2022-06-22 Thread Andreas Dilger via lustre-discuss
First thing, if you haven't already done so, would be to make a separate "dd" backup of the ldiskfs MDT(s) to some external storage before you do anything else. That will give you a fallback in case whatever changes you make don't work out well. I would also suggest to contact the ZFS mailing

Re: [lustre-discuss] need info regarding TCP ports for lustre

2022-06-13 Thread Andreas Dilger via lustre-discuss
On Jun 13, 2022, at 08:27, Sharma, Amit via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi Team, can you please help me which ephemeral TCP ports need to be opened for luster inter server communication. Regards, Amit There are several answers here: - the LNet socklnd

Re: [lustre-discuss] llapi documentation

2022-06-16 Thread Andreas Dilger via lustre-discuss
On Jun 15, 2022, at 06:56, John Bauer mailto:bau...@iodoctors.com>> wrote: Andreas, Thanks for the info. It got me a lot farther down the road. A few comments: 1) It appears that the values returned in the poollist argument to llapi_get_poollist() are temporary. I used the values in

Re: [lustre-discuss] need info regarding TCP ports for lustre

2022-06-14 Thread Andreas Dilger via lustre-discuss
ame fabric/subnet. The Lustre-level MDS_CONNECT and OST_CONNECT are at a higher level and use "portal" numbers, which are totally different. Cheers, Andreas > On Jun 13, 2022, at 23:27, Åke Sandgren wrote: > >  > >> On 6/14/22 00:51, Andreas Dilger via lustre-di

Re: [lustre-discuss] llapi documentation

2022-06-15 Thread Andreas Dilger via lustre-discuss
On Jun 14, 2022, at 05:32, John Bauer mailto:bau...@iodoctors.com>> wrote: I have had little success in my search for documentation on pool functions in llapi. I've looked in: https://wiki.lustre.org/PFL2_High_Level_Design https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace

Re: [lustre-discuss] Misplaced position for two glibc checks

2022-06-08 Thread Andreas Dilger via lustre-discuss
On Jun 2, 2022, at 07:14, Åke Sandgren wrote: > > Hi! > > The tests for LC_GLIBC_SUPPORT_FHANDLES and LC_GLIBC_SUPPORT_COPY_FILE_RANGE > must be in the "core" set of configure tests, i.e. in the > === > AC_DEFUN([LC_CONFIGURE], [ > AC_MSG_NOTICE([Lustre core checks > === > section. The reason

Re: [lustre-discuss] Removing stale files

2022-06-08 Thread Andreas Dilger via lustre-discuss
On May 31, 2022, at 13:01, William D. Colburn mailto:wcolb...@nrao.edu>> wrote: We had a filesystem corruption back in February, and we've been trying to salvage things since then. I've spent the past month slowly draining the corrupt OST, and over the weekend it finally finished. An lfs find

Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-20 Thread Andreas Dilger via lustre-discuss
To elaborate a bit on Patrick's answer, there is no mechanism to do this on the *client*, because the performance difference between client RAM and server storage is still fairly significant, especially if the application is doing sub-page read or write operations. However, on the *server* the

Re: [lustre-discuss] llapi_layout_file_comp_del

2022-07-28 Thread Andreas Dilger via lustre-discuss
John, you are probably right that allowing a passed fd would also be useful, but nobody has done it this way before because of the need to use O_LOV_DELAY_CREATE within the application code... and to be honest very few applications tune their IO to this extent, especially with PFL layouts

Re: [lustre-discuss] llapi_layout_file_comp_del

2022-07-27 Thread Andreas Dilger via lustre-discuss
The HLD document was written before the feature was implemented, and is outdated. The lustreapi.h and llapi_layout_file_comp_del.3 man page are correct. Feel free to update the wiki to use the correct argument list. I believe that it is possible to delete multiple components that match the

Re: [lustre-discuss] A project quota question

2022-07-27 Thread Andreas Dilger via lustre-discuss
On Jul 13, 2022, at 11:10, Grigory Shamov mailto:grigory.sha...@umanitoba.ca>> wrote: Hi All, I am trying to arrange quotas for a University's cluster , on our new ES400 running DDNs Lustre 2.12. Somehow, there is a desire to have nested quota : like, Faculty quota then Professors research

Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-06 Thread Andreas Dilger via lustre-discuss
On May 5, 2022, at 07:16, Andrew Elwell via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: I've got a case open with the vendor to see if there are any firmware updates - but I'm not hopeful. These are 6 core single socket broadwells. with 128G of RAM, Storage disks are mounted

Re: [lustre-discuss] Anyone know why lustre-zfs-dkms-2.12.8_6_g5457c37-1.el7.noarch.rpm won't install?

2022-05-06 Thread Andreas Dilger via lustre-discuss
Riccardo, please file a Jira ticket and submit your change as a patch to Gerrit (review.whamcloud.com), so that this fix is not lost, On May 5, 2022, at 08:20, Riccardo Veraldi mailto:riccardo.vera...@cnaf.infn.it>> wrote: If you look in the mentioned thread I

Re: [lustre-discuss] lproc stats changed snapshot_time from unix-epoch to uptime/monotonic in 2.15

2022-08-24 Thread Andreas Dilger via lustre-discuss
Ellis, thanks for reporting this. This looks like it was a mistake. The timestamps should definitely be in wallclock time, but this looks to have been changed unintentionally to reduce overhead, and use a u64 instead of dealing with timespec64 math, while losing the original intent (there are

Re: [lustre-discuss] fiemap, final chapter.

2022-08-19 Thread Andreas Dilger via lustre-discuss
On Aug 19, 2022, at 13:58, John Bauer mailto:bau...@iodoctors.com>> wrote: Andreas, As I mentioned in an earlier email, this had been working for a long time. I think that using an old header file is at the root of the issue. On my development platform, which doesn't have Lustre installed,

Re: [lustre-discuss] fiemap

2022-08-18 Thread Andreas Dilger via lustre-discuss
What version of Lustre are you using? Does "filefrag -v" from a newer Lustre e2fsprogs (1.45.6.wc3+) work properly? There was a small change to the Lustre FIEMAP handling in order to handle overstriped files and PFL/FLR files with many stripes and multiple components, since the FIEMAP

Re: [lustre-discuss] fiemap

2022-08-18 Thread Andreas Dilger via lustre-discuss
On Aug 18, 2022, at 14:28, John Bauer mailto:bau...@iodoctors.com>> wrote: Andreas, Thanks for the reply. I don't think I'm accessing the Lustre filefrag ( see below ). Where would I normally find that installed? I downloaded the lustre-release git repository and can't find filefrag stuff

Re: [lustre-discuss] fio and lustre performance

2022-08-25 Thread Andreas Dilger via lustre-discuss
No comment on the actual performance issue, but we normally test fio using the libaio interface (which is handled in the kernel) instead of posixaio (which is handled by threads in userspace, AFAIK), and also use DirectIO to avoid memory copies (OK if there are enough IO requests in flight).

  1   2   >