At first glance, this sounds like your Infiniband subnet manager may
be down or malfunctioning. In this case, nodes which were already up
when the subnet manager was working will continue to be able to
communicate over IB, but nodes which reboot after the SM goes down
will not.
You can test this t
link_layer: InfiniBand
>
> [root@pg-gpu01 ~]# sminfo
> sminfo: sm lid 1 sm guid 0xf452140300f62320, activity count 80878098
> priority 0 state 3 SMINFO_MASTER
>
> Looks like the rebooted node is able to connect/contact IB/IB subnetmanager
>
>
>
>
>
W
> version, for the same device available on this fabric is 2.36.5150
> -W- pg-node014/U1 - Node has FW version 2.32.5100 while the latest FW
> version, for the same device available on this fabric is 2.36.5150
> -W- pg-node015/U1 - Node has FW version 2.32.5100 while the latest FW
> version,
I'm not sure this is likely to help either, but if you run the command
'ibhosts' on one of the non-working Lustre client nodes, do you see
all of your Lustre servers in the printed list?
-Rusty
On Mon, Apr 24, 2017 at 10:39 AM, Russell Dekema wrote:
> I can't rule it out
Greetings,
We have been having various kinds of trouble with our Lustre
filesystem lately; right now the main problem we are having is
intermittent severe slowness (such as 30 seconds for an 'ls' of a
directory containing 100 files to return) when 'cd' and 'ls'ing around
our Lustre filesystem.
As
On Tue, May 30, 2017 at 12:20 PM, Oleg Drokin wrote:
>
> This means exactly what it says.
> This ost is slow creating new objects (for the object preallocates).
>
> If all of your OST creates are slow - then when you create a lot of files,
> eventually you run out of OST objects (or when striping
Greetings,
Is there a way, either on the Lustre clients or (preferably) OSSes, to
determine how many I/O operations each Lustre client is performing
against the filesystem?
I know several ways of finding the number of *bytes* read or written
by a client (or even on a per-job basis with job_stats)
Good evening,
In my experience, you definitely need to sync your user/group information
with the MDS(es). I don't think you need to sync it to the OSSes though.
-Rusty
On Sun, Aug 6, 2017 at 9:32 PM, Yasir Israr
wrote:
> I've sync lustre user with all mounted client. Do I've to sync user with
We've got a Lustre system running lustre-2.5.42.28.ddn8 and are having
a problem with it that none of us here have ever seen before. We are
wondering if anyone here has seen this or has any idea what might be
causing it.
(I have redacted the example affected username and its corresponding
UID in t
little different from what I see when the MDS node's
> passwd file is incomplete, but did you verify the affected_user has a
> proper /etc/passwd entry on the MDS node(s)?
>
> On 1/10/19 12:14 PM, Russell Dekema wrote:
> > We've got a Lustre system running lustre-2.5.42
Greetings,
What kind of hardware are you running on your metadata array?
Cheers,
Rusty Dekema
On Fri, Dec 4, 2020 at 5:12 PM Kumar, Amit wrote:
>
> HI All,
>
>
>
> During LAD’20 Andreas mentioned if I could share the Robinhood scan time for
> the 369millions files we have. So here it is. It to
11 matches
Mail list logo