It was an NSD server … we’d already shutdown all the clients in the remote
clusters!
And Tomer has already agreed to do a talk on memory (but I’m still looking
for a user talk if anyone is interested!)
Simon
From: on behalf of
"oeh...@gmail.com"
Reply-To:
Hello,
I am a new member here. I work for IBM in the Washington System Center
supporting Spectrum Scale and ESS across North America. I live in
Leesburg, Virginia, USA northwest of Washington, DC.
Connie Rice
Storage Specialist
Washington Systems Center
Mobile: 202-821-6747
E-mail:
and i already talk about NUMA stuff at the CIUK usergroup meeting, i won't
volunteer for a 2nd advanced topic :-D
On Tue, Nov 27, 2018 at 12:43 PM Sven Oehme wrote:
> was the node you rebooted a client or a server that was running kswapd at
> 100% ?
>
> sven
>
>
> On Tue, Nov 27, 2018 at
was the node you rebooted a client or a server that was running kswapd at
100% ?
sven
On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson
wrote:
> The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1
> I think).
>
>
>
> So is this memory pressure on the NSD nodes then? I
The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1 I
think).
So is this memory pressure on the NSD nodes then? I thought it was documented
somewhere that GFPS won’t use more than 50% of the host memory.
And actually if you look at the values for maxStatCache and
Yes, but we’d upgraded all out HPC client nodes to 5.0.2-1 last week as well
when this first happened …
Unless it’s necessary to upgrade the NSD servers as well for this?
Simon
From: on behalf of "t...@il.ibm.com"
Reply-To: "gpfsug-discuss@spectrumscale.org"
Date: Tuesday, 27 November 2018
"paging to disk" sometimes means mmap as well - there were several issues
around that recently as well.
Regards,
Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: t...@il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:+1 720 3422758
Israel Tel: +972 3 9188625
Hi Simon,
Was there a reason behind swap being disabled?
Best,
Dwayne
—
Dwayne Hart | Systems Administrator IV
CHIA, Faculty of Medicine
Memorial University of Newfoundland
300 Prince Philip Drive
St. John’s, Newfoundland | A1B 3V6
Craig L Dobbin Building | 4M409
T 709 864 6631
On Nov 27,
Despite its name, kswapd isn't directly involved in paging to disk; it's
the kernel process that's involved in finding committed memory that can be
reclaimed for use (either immediately, or possibly by flushing dirty pages
to disk). If kswapd is using a lot of CPU, it's a sign that the kernel is
Hi,
now i need to swap back in a lot of information about GPFS i tried to swap
out :-)
i bet kswapd is not doing anything you think the name suggest here, which
is handling swap space. i claim the kswapd thread is trying to throw
dentries out of the cache and what it tries to actually get rid
This means that the files having the below inode numbers 38422 and 281057
are orphan files (i.e. files not referenced by any directory/folder) and
they will be moved to the lost+found folder of the fileset owning these
files by mmfsck repair.
Regards, The Spectrum Scale (GPFS) team
Thanks Sven …
We found a node with kswapd running 100% (and swap was off)…
Killing that node made access to the FS spring into life.
Simon
From: on behalf of
"oeh...@gmail.com"
Reply-To: "gpfsug-discuss@spectrumscale.org"
Date: Tuesday, 27 November 2018 at 16:14
To:
if this happens you should check a couple of things :
1. are you under memory pressure or even worse started swapping .
2. is there any core running at ~ 0% idle - run top , press 1 and check the
idle column.
3. is there any single thread running at ~100% - run top , press shift - h
and check
I have seen something like this in the past, and I have resorted to a cluster
restart as well. :-( IBM and I could never really track it down, because I
could not get a dump at the time of occurrence. However, you might take a look
at your NSD servers, one at a time. As I recall, we thought it
I have a file-system which keeps hanging over the past few weeks. Right now,
its offline and taken a bunch of services out with it.
(I have a ticket with IBM open about this as well)
We see for example:
Waiting 305.0391 sec since 15:17:02, monitored, thread 24885
15 matches
Mail list logo