Hello,
We have a few files created by a particular application where reads to those
files consistently hang. The debug log on a client attempting a read() has
messages like:
> ldlm_completion_ast(): waiting indefinitely because of NO_TIMEOUT ...
This is printed when the flag
s is killed.
Cheers, Andreas
> On Aug 30, 2023, at 07:42, Bertschinger, Thomas Andrew Hjorth via
> lustre-discuss wrote:
>
> Hello,
>
> We have a few files created by a particular application where reads to those
> files consistently hang. The debug log on a client
Hello,
Recently we had an OSS node down for an extended period with hardware problems.
While the node was down, mounting lustre on a client took an extremely long
time to complete (20-30 minutes). Once the fs is mounted, all operations are
normal and there isn't any noticeable impact from the
Hello,
We've been experimenting with DNEv3 recently and have run into this issue:
https://jira.whamcloud.com/browse/LU-7607 where the directory inode number
changes after auto-split.
In addition to the problem noted with backups that track the inode number, we
have found that file access
Thanks, this is helpful. We certainly don't need the auto-split feature and
were just experimenting with it, so this should be fine for us. And we have
been satisfied with the round robin directory creation so far. Just out of
curiosity, is the auto-split feature still being actively worked on
Hello Jon,
I've ran into this issue in the past when trying to install Lustre with ZFS
RPMs I built myself. I was able to resolve the problem by including the flag
"--with-spec=redhat" when configuring ZFS:
./configure --with-spec=redhat
see:
;slurm", ...) to
reduce the chance of namespace clashes in user or system? However, I'm
reluctant to restrict names in user since this shouldn't have any fatal side
effects (e.g. data corruption like in trusted or system), and the admin is
supposed to know what they are doing...
On May 4,
Hello Lustre Users,
There has been interest in a proposed feature
https://jira.whamcloud.com/browse/LU-13031 to store the jobid with each Lustre
file at create time, in an extended attribute. An open question is which xattr
namespace is to use between "user", the Lustre-specific namespace
Hello,
We have a curious issue with supplemental group permissions. There is a set of
files where a user has group read permission to the files via a supplemental
group. If the user tries to open() one of these files, they get EACCES. Then,
if the user stat()s the file (or seemingly does any
Hello Jan,
You can use the Pacemaker / Corosync high-availability software stack for this:
specifically, ordering constraints [1] can be used.
Unfortunately, Pacemaker is probably over-the-top if you don't need HA -- its
configuration is complex and difficult to get right, and it significantly
Hello Ger,
Can you share the full stack trace from the log output for the hung thread?
That will be helpful for diagnosing the issue. Some other clues: do you get any
stack traces or error output on clients where you observe the hang? Does every
client hang, or only some? Does it hang on any
Hello Jan,
More often than not, when I see stat() syscalls hanging, it's due to a
communication issue with an OSS rather than an MDS. I think the message about
"Lustre: comind-MDT: haven't heard from client ..." may be a downstream
effect of the client hanging (maybe due to an OSS issue),
12 matches
Mail list logo