Re: [lustre-discuss] storing Lustre jobid in file xattrs: seeking feedback

2023-05-12 Thread Jeff Johnson
Just a thought, instead of embedding the jobname itself, perhaps just a
least significant 7 character sha-1 hash of the jobname. Small chance of
collision, easy to decode/cross reference to jobid when needed. Just a
thought.

--Jeff


On Fri, May 12, 2023 at 3:08 PM Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hi Thomas,
> thanks for working on this functionality and raising this question.
>
> As you know, I'm inclined toward the user.job xattr, but I think it is
> never a good idea to unilaterally make policy decisions in the kernel that
> cannot be changed.
>
> As such, it probably makes sense to have a tunable parameter like "
> mdt.*.job_xattr=user.job" and then this could be changed in the future if
> there is some conflict (e.g. some site already uses the "user.job" xattr
> for some other purpose).
>
> I don't think the job_xattr should allow totally arbitrary values (e.g.
> overwriting trusted.lov or trusted.lma or security.* would be bad). One
> option is to only allow a limited selection of valid xattr namespaces, and
> possibly names:
>
>- NONE to turn this feature off
>- user, or trusted or system (if admin wants to restrict the ability
>of regular users to change this value?), with ".job" added
>automatically
>- user.* (or trusted.* or system.*) to also allow specifying the xattr
>name
>
> If we allow the xattr name portion to be specified (which I'm not sure
> about, but putting it out for completeness), it should have some reasonable
> limits:
>
>- <= 7 characters long to avoid wasting valuable xattr space in the
>inode
>- should not conflict with other known xattrs, which is tricky if we
>allow the name to be arbitrary. Possibly if in trusted (and system?)
>it should only allow trusted.job to avoid future conflicts?
>- maybe restrict it to contain "job" (or maybe "pbs", "slurm", ...) to
>reduce the chance of namespace clashes in user or system? However, I'm
>reluctant to restrict names in user since this *shouldn't* have any
>fatal side effects (e.g. data corruption like in trusted or system),
>and the admin is supposed to know what they are doing...
>
>
> On May 4, 2023, at 15:53, Bertschinger, Thomas Andrew Hjorth via
> lustre-discuss  wrote:
>
> Hello Lustre Users,
>
> There has been interest in a proposed feature
> https://jira.whamcloud.com/browse/LU-13031 to store the jobid with each
> Lustre file at create time, in an extended attribute. An open question is
> which xattr namespace is to use between "user", the Lustre-specific
> namespace "lustre", "trusted", or even perhaps "system".
>
> The correct namespace likely depends on how this xattr will be used. For
> example, will interoperability with other filesystems be important?
> Different namespaces have their own limitations so the correct choice
> depends on the use cases.
>
> I'm looking for feedback on applications for this feature. If you have
> thoughts on how you could use this, please feel free to share them so that
> we design it in a way that meets your needs.
>
> Thanks!
>
> Tom Bertschinger
> LANL
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] storing Lustre jobid in file xattrs: seeking feedback

2023-05-12 Thread Andreas Dilger via lustre-discuss
Hi Thomas,
thanks for working on this functionality and raising this question.

As you know, I'm inclined toward the user.job xattr, but I think it is never a 
good idea to unilaterally make policy decisions in the kernel that cannot be 
changed.

As such, it probably makes sense to have a tunable parameter like 
"mdt.*.job_xattr=user.job" and then this could be changed in the future if 
there is some conflict (e.g. some site already uses the "user.job" xattr for 
some other purpose).

I don't think the job_xattr should allow totally arbitrary values (e.g. 
overwriting trusted.lov or trusted.lma or security.* would be bad). One option 
is to only allow a limited selection of valid xattr namespaces, and possibly 
names:

  *   NONE to turn this feature off
  *   user, or trusted or system (if admin wants to restrict the ability of 
regular users to change this value?), with ".job" added automatically
  *   user.* (or trusted.* or system.*) to also allow specifying the xattr name

If we allow the xattr name portion to be specified (which I'm not sure about, 
but putting it out for completeness), it should have some reasonable limits:

  *   <= 7 characters long to avoid wasting valuable xattr space in the inode
  *   should not conflict with other known xattrs, which is tricky if we allow 
the name to be arbitrary. Possibly if in trusted (and system?) it should only 
allow trusted.job to avoid future conflicts?
  *   maybe restrict it to contain "job" (or maybe "pbs", "slurm", ...) to 
reduce the chance of namespace clashes in user or system? However, I'm 
reluctant to restrict names in user since this shouldn't have any fatal side 
effects (e.g. data corruption like in trusted or system), and the admin is 
supposed to know what they are doing...

On May 4, 2023, at 15:53, Bertschinger, Thomas Andrew Hjorth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

Hello Lustre Users,

There has been interest in a proposed feature 
https://jira.whamcloud.com/browse/LU-13031 to store the jobid with each Lustre 
file at create time, in an extended attribute. An open question is which xattr 
namespace is to use between "user", the Lustre-specific namespace "lustre", 
"trusted", or even perhaps "system".

The correct namespace likely depends on how this xattr will be used. For 
example, will interoperability with other filesystems be important? Different 
namespaces have their own limitations so the correct choice depends on the use 
cases.

I'm looking for feedback on applications for this feature. If you have thoughts 
on how you could use this, please feel free to share them so that we design it 
in a way that meets your needs.

Thanks!

Tom Bertschinger
LANL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Disk failures triggered during OST creation and mounting on OSS Servers

2023-05-12 Thread Jane Liu via lustre-discuss

Hi Jeff,

Thanks for your response. We discovered later that the network issues 
originating from the iDRAC IP were causing the SAS driver to hang or 
experience timeouts when trying to access the drives. This resulted in 
the drives being kicked out.


Once we resolved this issue, both the mkfs and mount operations started 
working fine.


Thanks,
Jane



On 2023-05-10 12:43, Jeff Johnson wrote:

Jane,

You're having hardware errors, the codes in those mpt3sas errors
define as "PL_LOGINFO_SUB_CODE_OPEN_FAILURE_ORR_TIMEOUT", or in other
words your SAS HBA cannot open a command dialogue with your disk. I'd
suspect backplane or cabling issues as an internal disk failure will
be reported by the target disk with its own error code. In this case
your HBA can't even talk to it properly.

Is sdah the partner mpath device to sdef? Or is sdah a second failing
disk interface?

Looking at this, I don't think your hardware is deploy-ready.

--Jeff

On Wed, May 10, 2023 at 9:29 AM Jane Liu via lustre-discuss
 wrote:


Hi,

We recently attempted to add several new OSS servers ( RHEL 8.7 and
Lustre 2.15.2). While creating new OSTs, I noticed that mdstat
reported
some disk failures after the mkfs, even though the disks were
functional
before the mkfs command. Our hardware admins managed to resolve the
mdstat issue and restore the disks to normal operation. However,
when I
ran the mount OST command (when network had a problem and mount
command
timed out), similar problems occurred, and several disks were kicked

out. The relevant /var/log/messages are provided below.

This problem was consistent across all our OSS servers. Any insights

into the possible cause would be appreciated.

Jane

-

May  9 13:33:15 sphnxoss47 kernel: LDISKFS-fs (md0): mounted
filesystem
with ordered data mode. Opts: errors=remount-ro
May  9 13:33:15 sphnxoss47 systemd[1]: tmp-mntmirJ5z.mount:
Succeeded.
May  9 13:33:16 sphnxoss47 kernel: LNet: HW NUMA nodes: 2, HW CPU
cores:
72, npartitions: 2
May  9 13:33:16 sphnxoss47 kernel: alg: No test for adler32
(adler32-zlib)
May  9 13:33:16 sphnxoss47 kernel: Key type ._llcrypt registered
May  9 13:33:16 sphnxoss47 kernel: Key type .llcrypt registered
May  9 13:33:16 sphnxoss47 kernel: Lustre: Lustre: Build Version:
2.15.2
May  9 13:33:16 sphnxoss47 kernel: LNet: Added LNI 169.254.1.2@tcp
[8/256/0/180]
May  9 13:33:16 sphnxoss47 kernel: LNet: Accept secure, port 988
May  9 13:33:17 sphnxoss47 kernel: LDISKFS-fs (md0): mounted
filesystem
with ordered data mode. Opts:
errors=remount-ro,no_mbcache,nodelalloc
May  9 13:33:17 sphnxoss47 kernel: Lustre: sphnx01-OST0244-osd:
enabled
'large_dir' feature on device /dev/md0
May  9 13:33:25 sphnxoss47 systemd-logind[8609]: New session 7 of
user
root.
May  9 13:33:25 sphnxoss47 systemd[1]: Started Session 7 of user
root.
May  9 13:34:36 sphnxoss47 kernel: LustreError: 15f-b:
sphnx01-OST0244:
cannot register this server with the MGS: rc = -110. Is the MGS
running?
May  9 13:34:36 sphnxoss47 kernel: LustreError:
45314:0:(obd_mount_server.c:2027:server_fill_super()) Unable to
start
targets: -110
May  9 13:34:36 sphnxoss47 kernel: LustreError:
45314:0:(obd_mount_server.c:1644:server_put_super()) no obd
sphnx01-OST0244
May  9 13:34:36 sphnxoss47 kernel: LustreError:
45314:0:(obd_mount_server.c:131:server_deregister_mount())
sphnx01-OST0244 not registered
May  9 13:34:39 sphnxoss47 kernel: Lustre: server umount
sphnx01-OST0244
complete
May  9 13:34:39 sphnxoss47 kernel: LustreError:
45314:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount
: rc = -110
May  9 13:34:40 sphnxoss47 kernel: LDISKFS-fs (md1): mounted
filesystem
with ordered data mode. Opts: errors=remount-ro
May  9 13:34:40 sphnxoss47 systemd[1]: tmp-mntXT85fz.mount:
Succeeded.
May  9 13:34:41 sphnxoss47 kernel: LDISKFS-fs (md1): mounted
filesystem
with ordered data mode. Opts:
errors=remount-ro,no_mbcache,nodelalloc
May  9 13:34:41 sphnxoss47 kernel: Lustre: sphnx01-OST0245-osd:
enabled
'large_dir' feature on device /dev/md1
May  9 13:36:00 sphnxoss47 kernel: LustreError: 15f-b:
sphnx01-OST0245:
cannot register this server with the MGS: rc = -110. Is the MGS
running?
May  9 13:36:00 sphnxoss47 kernel: LustreError:
46127:0:(obd_mount_server.c:2027:server_fill_super()) Unable to
start
targets: -110
May  9 13:36:00 sphnxoss47 kernel: LustreError:
46127:0:(obd_mount_server.c:1644:server_put_super()) no obd
sphnx01-OST0245
May  9 13:36:00 sphnxoss47 kernel: LustreError:
46127:0:(obd_mount_server.c:131:server_deregister_mount())
sphnx01-OST0245 not registered
May  9 13:36:08 sphnxoss47 kernel: Lustre: server umount
sphnx01-OST0245
complete
May  9 13:36:08 sphnxoss47 kernel: LustreError:
46127:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount
: rc = -110
May  9 13:36:08 sphnxoss47 kernel: LDISKFS-fs (md2): mounted
filesystem
with ordered data mode. Opts: errors=remount-ro
May  9 13:36:08 sphnxoss47 systemd[1]: tmp-mnt17IOaq.mount:
Succeeded.
May  9 13:36:09 sphnxoss47