Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Simon Thompson
It was an NSD server … we’d already shutdown all the clients in the remote clusters! And Tomer has already agreed to do a talk on memory  (but I’m still looking for a user talk if anyone is interested!) Simon From: on behalf of "oeh...@gmail.com" Reply-To:

[gpfsug-discuss] Introduction

2018-11-27 Thread Constance M Rice
Hello, I am a new member here. I work for IBM in the Washington System Center supporting Spectrum Scale and ESS across North America. I live in Leesburg, Virginia, USA northwest of Washington, DC. Connie Rice Storage Specialist Washington Systems Center Mobile: 202-821-6747 E-mail:

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
and i already talk about NUMA stuff at the CIUK usergroup meeting, i won't volunteer for a 2nd advanced topic :-D On Tue, Nov 27, 2018 at 12:43 PM Sven Oehme wrote: > was the node you rebooted a client or a server that was running kswapd at > 100% ? > > sven > > > On Tue, Nov 27, 2018 at

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
was the node you rebooted a client or a server that was running kswapd at 100% ? sven On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson wrote: > The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1 > I think). > > > > So is this memory pressure on the NSD nodes then? I

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Simon Thompson
The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1 I think). So is this memory pressure on the NSD nodes then? I thought it was documented somewhere that GFPS won’t use more than 50% of the host memory. And actually if you look at the values for maxStatCache and

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Simon Thompson
Yes, but we’d upgraded all out HPC client nodes to 5.0.2-1 last week as well when this first happened … Unless it’s necessary to upgrade the NSD servers as well for this? Simon From: on behalf of "t...@il.ibm.com" Reply-To: "gpfsug-discuss@spectrumscale.org" Date: Tuesday, 27 November 2018

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Tomer Perry
"paging to disk" sometimes means mmap as well - there were several issues around that recently as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: t...@il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel:+1 720 3422758 Israel Tel: +972 3 9188625

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Dwayne.Hart
Hi Simon, Was there a reason behind swap being disabled? Best, Dwayne — Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John’s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 On Nov 27,

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Skylar Thompson
Despite its name, kswapd isn't directly involved in paging to disk; it's the kernel process that's involved in finding committed memory that can be reclaimed for use (either immediately, or possibly by flushing dirty pages to disk). If kswapd is using a lot of CPU, it's a sign that the kernel is

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
Hi, now i need to swap back in a lot of information about GPFS i tried to swap out :-) i bet kswapd is not doing anything you think the name suggest here, which is handling swap space. i claim the kswapd thread is trying to throw dentries out of the cache and what it tries to actually get rid

Re: [gpfsug-discuss] mmfsck output

2018-11-27 Thread IBM Spectrum Scale
This means that the files having the below inode numbers 38422 and 281057 are orphan files (i.e. files not referenced by any directory/folder) and they will be moved to the lost+found folder of the fileset owning these files by mmfsck repair. Regards, The Spectrum Scale (GPFS) team

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Simon Thompson
Thanks Sven … We found a node with kswapd running 100% (and swap was off)… Killing that node made access to the FS spring into life. Simon From: on behalf of "oeh...@gmail.com" Reply-To: "gpfsug-discuss@spectrumscale.org" Date: Tuesday, 27 November 2018 at 16:14 To:

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
if this happens you should check a couple of things : 1. are you under memory pressure or even worse started swapping . 2. is there any core running at ~ 0% idle - run top , press 1 and check the idle column. 3. is there any single thread running at ~100% - run top , press shift - h and check

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Oesterlin, Robert
I have seen something like this in the past, and I have resorted to a cluster restart as well. :-( IBM and I could never really track it down, because I could not get a dump at the time of occurrence. However, you might take a look at your NSD servers, one at a time. As I recall, we thought it

[gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Simon Thompson
I have a file-system which keeps hanging over the past few weeks. Right now, its offline and taken a bunch of services out with it. (I have a ticket with IBM open about this as well) We see for example: Waiting 305.0391 sec since 15:17:02, monitored, thread 24885