Re: [lustre-discuss] Removing stale files

2022-06-08 Thread Andreas Dilger via lustre-discuss
On May 31, 2022, at 13:01, William D. Colburn 
mailto:wcolb...@nrao.edu>> wrote:

We had a filesystem corruption back in February, and we've been trying
to salvage things since then.  I've spent the past month slowly draining
the corrupt OST, and over the weekend it finally finished.  An lfs find
on the filesystem says that thhow ere are no files stored on that OST.  The
OST is 100% full, and if I mount it as an ldiskfs I can see a little
over five millions files in O/*/*.

How did you drain the OST?  Was the OST totally deactivated, or 
"max_create_count=0"?
If it was deactivated, then this will prevent OST objects from being destroyed 
when the
MDT inode is deleted.

Most of them have numbers as names, and some of them are named LAST_ID.

This is normal.  The number is the object ID.  You can check the OST objects 
with
ll_decode_filter_fid (when mounted as type ldiskfs) to report the parent MDT FID
that the object belongs/belonged to.  Then "lfs fid2path" can be used to check 
if the
file still exists and/or if the OST object is still part of the layout (which 
it should not be).

All of the numbered files seem to be user data, with owners, and real data in 
them
(based on ls and the find command)

I would like to clean out this OST and readd it to lustre, but I'm
unsure of how to best approach this.  I see several options:

OPTION ONE: run lfsck against the entire filesystem with the full and
previously corrupt OST mounted.

This is unnecessary, since these objects are not used.   With the "--orphan"
option it will link the OST objects into $mount/.lustre/lost+found where they
could be deleted, essentially the same as #4 below, but it is easier to just
delete the objects directly if you know they are not needed.

OPTION TWO: run lfsck against only the corrupt OST in the hopes that
cleans up all of the orphans on that OST.

This won't help, since the OST LFSCK would not attach the object into the
visible namespace, so it won't really change the situation.

OPTION THREE: mounted as ldiskfs remove O/*/[1234567890]*[1234567890]
and then remount the file system.

This would be one option.  Note that with DNE the OST object names will be
in hex, so the above regexp would not catch all objects.

OPTION FOUR: newfs the bad OST and readd it losing the old index.

This would also work, if you use the "--replace" option when formatting.

We tried option one once before, and it killed cluster jobs because it
made files unreadable while they were in use.  Option two might avoid
that since it would not be affecting existing files.  Option three
sounds like it will work based on my limited knowledge of how lustre
works, and would probably be the most expedient method.  Option four is
annoying because it leaves a hole in the lustre that is upsetting to our
OCD tendencies.

Any and all advice is appreciated here.  Thank you.

--Schlake
 Sysadmin IV, NRAO
 Work: 575-835-7281 (BACK IN THE OFFICE!)
 Cell: 575-517-5668 (out of work hours)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Misplaced position for two glibc checks

2022-06-08 Thread Andreas Dilger via lustre-discuss
On Jun 2, 2022, at 07:14, Åke Sandgren  wrote:
> 
> Hi!
> 
> The tests for LC_GLIBC_SUPPORT_FHANDLES and LC_GLIBC_SUPPORT_COPY_FILE_RANGE 
> must be in the "core" set of configure tests, i.e. in the
> ===
> AC_DEFUN([LC_CONFIGURE], [
> AC_MSG_NOTICE([Lustre core checks
> ===
> section. The reason for that is that they are required for the client/server 
> utils code and not only for the kernel part.
> 
> This pops up if configuring with --disable-modules --enable-client and making 
> the client utilities only, think of a make dkms-debs that does NOT produce 
> the kernel modules but only the DKMS package and the utilities.
> 
> This probably won't pop up easily without another change I have regarding the 
> setting of CPPFLAGS for uapi which are also needed for client utils only 
> builds.
> 
> 
> PS
> Can you point me to the URL for how to correctly produce PR's again, I lost 
> that info some time ago  I seem to remember there being some more steps 
> to do than I'm used to.

Hi Åke,
you can create a Jira ticket for this issue at 
https://jira.whamcloud.com/secure/CreateIssue!default.jspa

(or just click the "Create" button at the top of https://jira.whamcloud.com/).

Details of how to push patches to Gerrit for review and testing are at:
https://wiki.lustre.org/Using_Gerrit 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] intermittently can't start ll_sa thread and can't start ll_sa thread, and sysctl kernel.pid_max

2022-06-08 Thread Faaland, Olaf P. via lustre-discuss
Hi All,

This is not a Lustre problem proper, but others might run into it with a 64-bit 
Lustre client on RHEL 7, and I hope to save others the time it took us to nail 
it down.  We saw it on a node running the "Starfish" policy engine, which reads 
through the entire file system tree repeatedly and consumes changelogs.  
Starfish itself creates and destroys processes frequently, and the workload 
causes Lustre to create and destroy threads as well, by triggering statahead 
thread creation and changelog thread creation.

For the impatient, the fix was to increase pid_max.  We used:
kernel.pid_max=524288

The symptoms are:

1) console log messages like
LustreError: 10525:0:(statahead.c:970:ll_start_agl()) can't start ll_agl 
thread, rc: -12
LustreError: 15881:0:(statahead.c:1614:start_statahead_thread()) can't start 
ll_sa thread, rc: -12
LustreError: 15881:0:(statahead.c:1614:start_statahead_thread()) Skipped 45 
previous similar messages
LustreError: 15878:0:(statahead.c:1614:start_statahead_thread()) can't start 
ll_sa thread, rc: -12
LustreError: 15878:0:(statahead.c:1614:start_statahead_thread()) Skipped 17 
previous similar messages 

Note the return codes are -12, which is -ENOMEM.

Attempts to create new user space processes are also intermittently failing.

sf_lustre.liblustreCmds 10983 'MainThread' : ("can't start new thread",) 
[liblustreCmds.py:216]

and

[faaland1@solfish2 lustre]$git fetch llnlstash
Enter passphrase for key '/g/g0/faaland1/.ssh/swdev': 
Enter passphrase for key '/g/g0/faaland1/.ssh/swdev': 
remote: Enumerating objects: 1377, done.
remote: Counting objects: 100% (1236/1236), done.
remote: Compressing objects: 100% (271/271), done.
error: cannot fork() for index-pack: Cannot allocate memory
fatal: fetch-pack: unable to fork off index-pack

We wasted a lot of time chasing the idea that this was in fact due to 
insufficient free memory on the node, but the actual problem was that sysctl 
kernel.pid_max was too low.

When a new process must be created via fork() or kthread_create(), or similar, 
the kernel has to allocate a PID.  It has a data structure for keeping track of 
which PIDs are available, and there is some delay after a process is destroyed 
before its PID may be reused.

We found that on this node, that the kernel would occasionally find no PIDs 
available when it was creating the process.  Specifically, copy_process() would 
call alloc_pidmap(), which would return -1.  This tended to be when the system 
was processing a large number of changes on the file system, so both Lustre and 
Starifish were suddenly doing a lot of work and both would have been creating 
new threads in response to the load.   This node has about 700-800 processes 
running normally according to top(1).  At the time these errors occurred, I 
don't know many processes were running or how quickly they were being created 
and destroyed.

Ftrace showed this:

|copy_namespaces();
|copy_thread();
|alloc_pid() {
|  kmem_cache_alloc() {
|__might_sleep();
|_cond_resched();
|  }
|  kmem_cache_free();
|}
|exit_task_namespaces() {
|  switch_task_namespaces() {

On this particular node, with 32 cores, running RHEL 7, arch x86_64, pid_max 
was 36K.We added
kernel.pid_max=524288
to our sysctl.conf which resolved the issue.

I don't expect this to be an issue under RHEL 8 (or clone of your choice), 
because in RHEL 8.2 systemd puts a config file in place that sets pid_max to 
2^22.

-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs fid2path crashes the MDS

2022-06-08 Thread William D. Colburn
I have a FID that crashes the MDS when I run lfs fid2path on it.  I have
only tried fid2path once because I'm scared to try again, so it might
have been a one-off.  My lustre version is 2.10.8.

Does anyone have any advice about resolving this problem?

[2397666.284621] LustreError: 
401793:0:(osd_compat.c:609:osd_obj_update_entry()) aoclst03-OST000e: the FID 
[0x1:0x5d3 32f8:0x0] is used by two objects: 1777938/3255960136 
1750082/3255960136


--Schlake
  Sysadmin IV, NRAO
  Work: 575-835-7281 (BACK IN THE OFFICE!)
  Cell: 575-517-5668 (out of work hours)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org