Re: [gpfsug-discuss] 5.1.2.2 changes

2022-01-17 Thread Peter Childs
https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_apars_512x.html

Is normally the best place to look for changes in PTF releases.

Peter Childs
ITS Research Storage
Queen Mary University Of London


From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Hannappel, Juergen 

Sent: Thursday, January 13, 2022 5:26 PM
To: gpfsug main discussion list
Subject: [EXTERNAL] [gpfsug-discuss] 5.1.2.2 changes

CAUTION: This email originated from outside of QMUL. Do not click links or open 
attachments unless you recognise the sender and know the content is safe.


Hi,
just got notified that 5.1.2.2 is out.
What are the changes to 5.1.2.1?
https://www.ibm.com/docs/en/spectrum-scale/5.1.2?topic=summary-changes
does not specify that

--
Dr. Jürgen Hannappel  DESY/ITTel.  : +49 40 8998-4616
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies?

2021-10-11 Thread Peter Childs
We've had this same issue with characters that are fine in Scale but Protect 
can't handle. Normally its because some script has embedded a newline in the 
middle of a file name, and normally we end up renaming that file by inode number

find . -inum 9975226749 -exec mv {} badfilename \;

mostly because we can't even type the filename at the command prompt.

However its not always just new line characters currently we've got a few files 
with unprintable characters in it. but its normally less than 50 files every 
few months, so is easy to handle manually.

I normally end up looking at /data/mmbackup.unsupported which is the standard 
output from mmapplypolicy and extracting the file names from it and emailing 
the users concerned to assist them in working out what went wrong.

I guess you could automate the parsing of this file at the end of the backup 
process and do something interesting with it.


Peter Childs




From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Simon Thompson 

Sent: Monday, October 11, 2021 9:35 AM
To: gpfsug main discussion list
Subject: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies?

CAUTION: This email originated from outside of QMUL. Do not click links or open 
attachments unless you recognise the sender and know the content is safe.


We have both:

  WILDCARDSARELITERAL   yes
  QUOTESARELITERAL  yes

Set. And use --noquote for mmbackup, the backup runs, but creates a file:

/filesystem/mmbackup.unsupported.CLIENTNAME

Which contains a list of files that are not backed up due to \n in the filename.

So it doesn't break backup, but they don't get backed up either. I believe this 
is because the TSM client can't back the file up rather than mmbackup no longer 
allowing them. I had an RFE at some point to get dsmc changed ... but it got 
closed WONTFIX.

Simon

On 09/10/2021, 10:09, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
Jonathan Buzzard"  wrote:

On 08/10/2021 19:14, Wahl, Edward wrote:

> This goes back as far as I can recall to <=GPFS 3.5 days. And no, I
> cannot recall what version of TSM-EE that was.   But newline has been
> the only stopping point, for what seems like forever. Having filed
> many an mmbackup bug, I don't recall ever crashing on filenames.
> (tons of OTHER reasons, but not character set)   We even generate an
> error report from this and email users to fix it. We accept basically
> almost everything else, and I have to say, we see some really crazy
> things sometimes.   I think my current favorite is the full windows
> paths as a filename. (eg:
> "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" )
>

I will have to do a test but I am sure newlines have worked just fine in
the past. At the very least they have not stopped an entire backup from
working when using dsmc incr.

Now mmbackup that's a different kettle of fish. If you have not seen
mmbackup fail entirely because of a random "special" character you
simply have not been using it long enough :-)

For the longest of times I would simply not go anywhere near it because
it was not fit for purpose.

>
> Current IBM documentation doesn't go backwards past 4.2 but it says:
>
> "For IBM Spectrum Scale™ file systems with special characters
> frequently used in the names of files or directories, backup failures
> might occur. Known special characters that require special handling
> include: *, ?, ", ’, carriage return, and the new line character.
>
> In such cases, enable the Tivoli Storage Manager client options
> WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used
> in backup activities and make sure that the mmbackup option --noquote
> is used when invoking mmbackup."
>
> So maybe we could handle newlines somehow.   But my lazy searches
> didn't show what TSM doesn't accept.
>

We strongly advise our users (our GPFS file system is for an HPC system)
in training not to use "special" characters. That is followed with a
warning that if they do then we don't make any promises to backup their
files :-)

 From time to time I run a dsmc incr in a screen and capture the output
to a log file and then look at the list of failed files and prompt users
to "fix" them. Though sometimes I just "fix" them myself if the
correction is going to be obvious and then email them to tell them what
has happened.


JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mai

Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image.

2021-02-01 Thread Peter Childs
We used to run

mmsdrestore -p manager -R /usr/bin/scp

in a xcat postscript to re-add our nodes to our Spectrum Scale cluster. however 
we disliked needing to put the private key for the whole cluster on every host,

We now use

mmsdrestore -N nodename

post-install from a management node to re-add the node to the cluster, so we 
could stop xcat from distributing the private key for security reasons.

Ideally we would have like the postscript call a manual call back to do this 
but have not as yet worked out how best to do this in xcat, so currently its a 
manual task which is fine when our nodes are stateless, but is not possible 
when your nodes are stateless.

My understanding is that xcat should have a hook to do this like the 
pre-scripts to run one at the end but I'm yet to find it.

Peter Childs


From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Ruffner, Scott (jpr9c) 

Sent: Friday, January 29, 2021 8:04 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image.

Thanks David! Slick solution.

--
Scott Ruffner
Senior HPC Engineer
UVa Research Computing
(434)924-6778(o)
(434)295-0250(h)
sruff...@virginia.edu


From:  on behalf of 
"david_john...@brown.edu" 
Reply-To: gpfsug main discussion list 
Date: Friday, January 29, 2021 at 2:52 PM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image.

We use mmsdrrestore after the node boots. In our case these are diskless nodes 
provisioned by xCAT.  The post install script takes care of ensuring infiniband 
is lit up, and does the mmsdrrestore followed by mmstartup.
  -- ddj
Dave Johnson


On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c)  wrote:
Hi everyone,

We want all of our compute nodes (bare metal) to directly participate in the 
cluster as client nodes; of course, they are sharing a common root image.

Adding nodes via the regular mmaddnode (with the dsh operation to replicate 
files to the clients) isn’t really viable, but if I short-circuit that, and 
simply generate the /var/mmfs/gen files and then manually copy those and the 
keyfiles to the shared root images, is that safe?

Am I going about this the entirely wrong way?

--
Scott Ruffner
Senior HPC Engineer
UVa Research Computing
(434)924-6778(o)
(434)295-0250(h)
sruff...@virginia.edu

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Odd behavior using sudo for mmchconfig

2019-06-12 Thread Peter Childs
Yesterday, I updated updated some gpfs config using 

sudo /usr/lpp/mmfs/bin/mmchconfig -N frontend
maxFilesToCache=20,maxStatCache=80

which looked to have worked fine, however later other machines started
reported issues with permissions while running mmlsquota as a user, 

cannot open file `/var/mmfs/gen/mmfs.cfg.ls' for reading (Permission
denied)
cannot open file `/var/mmfs/gen/mmfs.cfg' for reading (Permission
denied)


this was corrected by run-running the command from the same machine
within a root session.

sudo -s
/usr/lpp/mmfs/bin/mmchconfig -N frontend
maxFilesToCache=2,maxStatCache=8
/usr/lpp/mmfs/bin/mmchconfig -N frontend
maxFilesToCache=20,maxStatCache=80
exit

I suspecting an environment issue from within sudo caused the gpfs
config to have its permissions to change, but I've done simular before
with no bad effects, so I'm a little confused.

We're looking at tightening up our security to reduce the need for need
for root based password less access from none admin nodes, but I've
never understood the expect requirements this is using setting
correctly, and I periodically see issues with our root known_hosts
files when we update our admin hosts and hence I often endup going
around with 'mmdsh -N all echo ""' to clear the old entries, but I
always find this less than ideal, and hence would prefer a better
solution.

Thanks for any ideas to get this right and avoid future issues.

I'm more than happy to open a IBM ticket on this issue, but I feel
community feed back might get me further to start with.

Thanks

-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Metadata space usage NFS4 vs POSIX ACL

2019-05-07 Thread Peter Childs


On Sat, 2019-04-06 at 23:50 +0200, Michal Zacek wrote:

Hello,


we decided to convert NFS4 acl to POSIX (we need share same data

between  SMB, NFS and GPFS clients), so I created script to convert

NFS4 to posix ACL. It is very simple, first I do "chmod -R 770 DIR" and

then "setfacl -R . DIR".  I was surprised that conversion to posix

acl has taken more then 2TB of metadata space.There is about one hundred
million files at GPFS filesystem. Is this expected behavior?


Thanks,

Michal


Example of NFS4 acl:


#NFSv4 ACL

#owner:root

#group:root

special:owner@:rwx-:allow

  (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE

(X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED

  (-)DELETE(X)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH

(-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED


special:group@::allow

  (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE

(X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED

  (-)DELETE(-)DELETE_CHILD (-)CHOWN(-)EXEC/SEARCH

(-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED


special:everyone@::allow

  (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE

(X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED

  (-)DELETE(-)DELETE_CHILD (-)CHOWN(-)EXEC/SEARCH

(-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED


group:ag_cud_96_lab:rwx-:allow:FileInherit:DirInherit

  (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE

(X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED

  (-)DELETE(X)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH

(-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED


group:ag_cud_96_lab_ro:r-x-:allow:FileInherit:DirInherit

  (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE

(X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED

  (-)DELETE(-)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH

(-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED



converted to posix acl:


# owner: root

# group: root

user::rwx

group::rwx

mask::rwx

other::---

default:user::rwx

default:group::rwx

default:mask::rwx

default:other::---

group:ag_cud_96_lab:rwx

default:group:ag_cud_96_lab:rwx

group:ag_cud_96_lab_ro:r-x

default:group:ag_cud_96_lab_ro:r-x




___

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7Cp.childs%40qmul.ac.uk%7Ce1059833f7ed448b027608d6bad9ffec%7C569df091b01340e386eebd9cb9e25814%7C0%7C1%7C636901842833614488sdata=ROQ3LKmLZ06pI%2FTfdKZ9oPJx5a2xCUINqBnlIfEKF2Q%3Dreserved=0



I've been trying to get my head round acls, with the plan to implement Cluster 
Export Services SMB rather than roll your own SMB.

I'm not sure that plan is going to work Michal, although it might if your not 
using the Cluster Export Services version of SMB.

Put simply if your running Cluster export services SMB you need to set ACLs in 
Spectrum Scale to "nfs4" we currently have it set to "all" and it won't let you 
export the shares until you change it, currently I'm still testing, and have 
had to write a change to go the other way.

If you using linux kernel nfs4 that uses posix, however CES nfs uses ganasha 
which uses nfs4 acl correctly.

It gets slightly more annoying as nfs4-setfacl does not work with Spectrum 
Scale and you have to use mmputacl which has no recursive flag, I even found a 
ibm article from a few years ago saying the best way to set acls is to use 
find, and a temporary file. The other workaround they suggest is to update 
acls from windows or nfs to get the right.

One thing I think may happen if you do as you've suggested is that you will 
break any acls under Samba badly. I think the other reason that command is 
taking up more space than expected is that your giving files acls that never 
had them to start with.

I would love someone to say that I'm wrong, as changing our acl setting is 
going to be a pain. as while we don't make a lot of use of them we make enough 
that having to use nfs4 acls all the time is going to be a pain.


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmlsquota output

2019-03-27 Thread Peter Childs
On Mon, 2019-03-25 at 09:52 +, Robert Horton wrote:
> I don't know the answer to your actual question, but have you thought
> about using the REST-API rather than parsing the command outputs? I
> can
> send over the Python stuff we're using if you mail me off list.

Thanks, We don't currently run the REST-API, partly I've never got
around to getting the monitoring overhead working, and working out
which extra packages we need to go round our 300 nodes and install. Out
cluster has been gradually upgraded over the years from 3.5 and we
don't routinely install all the new packages the GUI needs on every
node. It might be nice to see a list of which Spectrum Scale packages
are needed for the different added value features in Scale. 

I'm currently working on re-writing the cli quota reporting program
which was originally written in a combination of bash and awk. Its a
strict Linux Cli util for reporting quota's and hence I'd prefer to
avoid the overhead of using a Rest-API. 

With reference to the issue people reported not being able to run
"mmlsfileset" as a user a few weeks ago, I've found a handy work-around 
using "mmlsattr" instead, and yes it does use the -Y flag all the time.

I'd like to share the code, once its gone though some internal code
review..

With reference to the other post, I will I think raise a PMR for this
as it does not look like mmlsquota is working as documented.

Thanks

Peter Childs

> 
> Rob
> 
> On Mon, 2019-03-25 at 09:38 +, Peter Childs wrote:
> > Can someone tell me I'm not reading this wrong.
> > 
> > This is using Spectrum Scale 5.0.2-1
> > 
> > It looks like the output from mmlsquota is not what it says 
> > 
> > In the man page it says,
> > 
> > mmlsquota [-u User | -g Group] [-v | -q] [-e] [-C ClusterName]
> >   [-Y] [--block-size {BlockSize | auto}] [Device[:Fileset]
> > ...]
> > 
> > however
> > 
> > mmlsquota -u username fs:fileset
> > 
> > Return the output for every fileset, not just the "fileset" I've
> > asked
> > for, this is same output as 
> > 
> > mmlsquota -u username fs
> > 
> > Where I've not said the fileset.
> > 
> > I can work around this, but I'm just checking this is not actually
> > a
> > bug, that ought to be fixed.
> > 
> > Long story is that I'm working on rewriting our quota report util
> > that
> > used be a long bash/awk script into a more easy to understand
> > python
> > script, and I want to get the user quota info for just one
> > fileset. 
> > 
> > Thanks in advance.
> > 
> > 
-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] mmlsquota output

2019-03-25 Thread Peter Childs
Can someone tell me I'm not reading this wrong.

This is using Spectrum Scale 5.0.2-1

It looks like the output from mmlsquota is not what it says 

In the man page it says,

mmlsquota [-u User | -g Group] [-v | -q] [-e] [-C ClusterName]
  [-Y] [--block-size {BlockSize | auto}] [Device[:Fileset] ...]

however

mmlsquota -u username fs:fileset

Return the output for every fileset, not just the "fileset" I've asked
for, this is same output as 

mmlsquota -u username fs

Where I've not said the fileset.

I can work around this, but I'm just checking this is not actually a
bug, that ought to be fixed.

Long story is that I'm working on rewriting our quota report util that
used be a long bash/awk script into a more easy to understand python
script, and I want to get the user quota info for just one fileset. 

Thanks in advance.


-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-11 Thread Peter Childs
We have a similar issue, I'm wondering if getting mmlsfileset to work as a user 
is a reasonable "request for enhancement" I suspect it would need better 
wording.

We too have a rather complex script to report on quota's that I suspect does a 
similar job. It works by having all the filesets mounted in known locations and 
names matching mount point names. It then works out which ones are needed by 
looking at the group ownership, Its very slow and a little cumbersome. Not 
least because it was written ages ago in a mix of bash, sed, awk and find.



On Tue, 2019-01-08 at 22:12 +, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Can't take snapshots while re-striping

2018-10-18 Thread Peter Childs
Thanks Sven, that's one of the best answers I've seen and probably closer to 
why we sometimes can't take snapshots under normal circumstances as well.

We're currently running the restripe with "-N " so it only runs on a few nodes 
and does not disturb the work of the cluster, which is why we hadn't noticed it 
slow down the storage too much.

I've also tried to put some qos settings on it too, I always find the qos a 
little bit "trial and error" but 30,000Iops looks to be making the rebalance 
run at about 2/3 iops it was using with no qos limit.. Just out of interest 
which version do I need to be running for "mmchqos -N" to work? I tried it to 
limit a set of nodes and it says not supported by my filesystem version. Manual 
does not look to say.

Even with a very, very small value for qos on maintenance tasks, I still can't 
take snapshots so as Sven says the buffers are getting dirty too quickly.

I have thought before that making snapshot taking more reliable would be nice, 
I'd not really thought it would be possible, I guess its time to write another 
RFE.

Peter Childs
Research Storage
ITS Research Infrastructure
Queen Mary, University of London


From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Sven Oehme 

Sent: Thursday, October 18, 2018 7:09:56 PM
To: gpfsug main discussion list; gpfsug-disc...@gpfsug.org
Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping

Peter,

If the 2 operations wouldn't be compatible you should have gotten a different 
message.
To understand what the message means one needs to understand how the snapshot 
code works.
When GPFS wants to do a snapshot it goes through multiple phases. It tries to 
first flush all dirty data a first time, then flushes new data a 2nd time and 
then tries to quiesce the filesystem, how to do this is quite complex, so let 
me try to explain.

How much parallelism is used for the 2 sync periods  is controlled by sync 
workers

. sync1WorkerThreads 64
 . sync2WorkerThreads 64
 . syncBackgroundThreads 64
. syncWorkerThreads 64

and if my memory serves me correct the sync1 number is for the first flush, the 
sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. 
crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if 
I state something wrong I mixed them up before ) :

when data is flushed by background sync is triggered by the OS :

root@dgx-1-01:~# sysctl -a |grep -i vm.dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500.  <--- this is 5 seconds

as well as GPFS settings :

  syncInterval 5
  syncIntervalStrict 0

here both are set to 5 seconds, so every 5 seconds there is a periodic 
background flush happening .

why explain all this, because its very easy for a thread that does buffered i/o 
to make stuff dirty, a single thread can do 100's of thousands of i/os into 
memory so making stuff dirty is very easy. The number of threads described 
above need to clean all this stuff, means stabilizing it onto media and here is 
where it gets complicated. You already run rebalance, which puts a lot of work 
on the disk, on top I assume you don't have a idle filesystem  , people make 
stuff dirty and the threads above compete flushing things , so it’s a battle 
they can't really win unless you have very fast storage or at least very fast 
and large caches in the storage, so the 64 threads in the example above can 
clean stuff faster than new data gets made dirty.

So your choices are  :
1. reduce workerthreads, so stuff gets less dirty.
2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can 
use -I while running) this will slow all write operations down on your system 
as all writes are now done synchronous, but because of that they can't make 
anything dirty, so the flushers actually don't have to do any work.

While back at IBM I proposed to change the code to switch into O_SYNC mode 
dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes 
would be done synchronous to not have the possibility to make things dirty so 
the quiesce actually doesn't get delayed and as soon as the quiesce happened 
remove the temporary enforced stable flag, but that proposal never got anywhere 
as no customer pushed for it. Maybe that would be worth a RFE __


Btw. I described some of the parameters in more detail here --> 
http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf
Some of that is outdated by now, but probably still the best summary 
presentation out there.

Sven

On 10/18/18, 8:32 AM, "Peter Childs"  wrote:

We've just added 9 raid volumes to our main storage, (5 Raid6 arrays
for data and 4 Raid1 arrays for metadata)

We are now attempting to rebalance and our data around all the vol

[gpfsug-discuss] Can't take snapshots while re-striping

2018-10-18 Thread Peter Childs
We've just added 9 raid volumes to our main storage, (5 Raid6 arrays
for data and 4 Raid1 arrays for metadata)

We are now attempting to rebalance and our data around all the volumes.

We started with the meta-data doing a "mmrestripe -r" as we'd changed
the failure groups to on our meta-data disks and wanted to ensure we
had all our metadata on known good ssd. No issues, here we could take
snapshots and I even tested it. (New SSD on new failure group and move
all old SSD to the same failure group)

We're now doing a "mmrestripe -b" to rebalance the data accross all 21
Volumes however when we attempt to take a snapshot, as we do every
night at 11pm it fails with  

sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test
Flushing dirty data for snapshot :test...
Quiescing all file system operations.
Unable to quiesce all nodes; some processes are busy or holding
required resources.
mmcrsnapshot: Command failed. Examine previous error messages to
determine cause.

Are you meant to be able to take snapshots while re-striping or not? 

I know a rebalance of the data is probably unnecessary, but we'd like
to get the best possible speed out of the system, and we also kind of
like balance.

Thanks


-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] control which hosts become token manager

2018-07-24 Thread Peter Childs

What does mmlsmgr show?

Your config looks fine.

I suspect you need to do a

mmchmgr perf node-1.psi.ch<http://node-1.psi.ch>
mmchmgr tiered node-2.psi.ch<http://node-2.psi.ch>

It looks like the node was set up as a manager and was demoted to just quorum 
but since its still currently the manager it needs to be told to stop.

>From experience it's also worth having different file system managers on 
>different nodes, if at all possible.

But that's just a guess without seeing the output of mmlsmgr.


Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


 Billich Heinrich Rainer (PSI) wrote 

Hello,

I want to control which nodes can become token manager. In detail I run a 
virtual machine as quorum node. I don’t want this machine to become a token 
manager - it has no access to Infiniband and only very limited memory.

What I see is that ‘mmdiag –tokenmgr’ lists the machine as active token 
manager. The machine has role ‘quorum-client’. This doesn’t seem sufficient to 
exclude it.

Is there any way to tell spectrum scale to exclude this single machine with 
role quorum-client?

I run 5.0.1-1.

Sorry if this is a faq, I did search  quite a bit before I wrote to the list.

Thank you,

Heiner Billich


[root@node-2 ~]# mmlscluster

GPFS cluster information

  GPFS cluster name: node.psi.ch
  GPFS cluster id:   5389874024582403895
  GPFS UID domain:   node.psi.ch
  Remote shell command:  /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:   CCR

Node  Daemon node name   IP address Admin node nameDesignation

   1   node-1.psi.ch   a.b.95.31  node-1.psi.ch   quorum-manager
   2   node-2.psi.ch   a.b.95.32  node-2.psi.ch   quorum-manager
   3   node-quorum.psi.ch  a.b.95.30  node-quorum.psi.ch  quorum
   <<<< VIRTUAL MACHINE >>>>>>>>>

[root@node-2 ~]# mmdiag --tokenmgr

=== mmdiag: tokenmgr ===
  Token Domain perf
There are 3 active token servers in this domain.
Server list:
  a.b.95.120
  a.b.95.121
  a.b.95.122<<<< VIRTUAL MACHINE >>>>>>>>>
  Token Domain tiered
There are 3 active token servers in this domain.
Server list:
  a.b.95.120
  a.b.95.121
  a.b.95.122   <<<< VIRTUAL MACHINE >>>>>>>>>

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Same file opened by many nodes / processes

2018-07-23 Thread Peter Childs
On Mon, 2018-07-23 at 22:13 +1200, José Filipe Higino wrote:
I think the network problems need to be cleared first. Then I would investigate 
further.

Buf if that is not a trivial path...
Are you able to understand from the mmfslog what happens when the tipping point 
occurs?

mmfslog thats not a term I've come accross before, if you mean 
/var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In 
other words no expulsions or errors just a very slow filesystem, We've not seen 
any significantly long waiters either (mmdiag --waiters) so as far as I can see 
its just behaving like a very very busy filesystem.

We've already had IBM looking at the snaps due to the rather slow mmbackup 
process, all I've had back is to try increase -a ie the number of sort threads 
which has speed it up to a certain extent, But once again I think we're looking 
at the results of the issue not the cause.


In my view, when troubleshooting is not easy, the usual methods work/help to 
find the next step:
- Narrow the window of troubleshooting (by discarding "for now" events that did 
not happen within the same timeframe)
- Use "as precise" as possible, timebased events to read the reaction of the 
cluster (via log or others)  and make assumptions about other observed 
situations.
- If possible and when the problem is happening, run some traces, gpfs.snap and 
ask for support via PMR.

Also,

What is version of GPFS?

4.2.3-8

How many quorum nodes?

4 Quorum nodes with tie breaker disks, however these are not the file system 
manager nodes as to fix a previous problem (with our nsd servers not being 
powerful enough) our fsmanager nodes are on hardware, We have two file system 
manager nodes (Which do token management, quota management etc) they also run 
the mmbackup.

How many filesystems?

1, although we do have a second that is accessed via multi-cluster from our 
older GPFS setup, (thats running 4.2.3-6 currently)

Is the management network the same as the daemon network?

Yes. the management network and the daemon network are the same network.

Thanks in advance

Peter Childs



On Mon, 23 Jul 2018 at 20:37, Peter Childs 
mailto:p.chi...@qmul.ac.uk>> wrote:
On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote:

Hi there,

Have you been able to create a test case (replicate the problem)? Can you tell 
us a bit more about the setup?

Not really, It feels like a perfect storm, any one of the tasks running on its 
own would be fine, Its the shear load, our mmpmon data says the storage has 
been flat lining when it occurs.

Its a reasonably standard (small) HPC cluster, with a very mixed work load, 
hence while we can usually find "bad" jobs from the point of view of io on this 
occasion we can see a few large array jobs all accessing the same file, the 
cluster runs fine until we get to a certain point and one more will tip the 
balance. We've been attempting to limit the problem by adding limits to the 
number of jobs in an array that can run at once. But that feels like fire 
fighting.


Are you using GPFS API over any administrative commands? Any problems with the 
network (being that Ethernet or IB)?

We're not as using the GPFS API, never got it working, which is a shame, I've 
never managed to figure out the setup, although it is on my to do list.

Network wise, We've just removed a great deal of noise from arp requests by 
increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network 
currently, we're currently looking at removing all the 1GBit nodes within the 
next few months and adding some new faster kit. The Storage is attached at 
40GBit but it does not look to want to run much above 5Gbit I suspect due to 
Ethernet back off due to the mixed speeds.

While we do have some IB we don't currently run our storage over it.

Thanks in advance

Peter Childs





Sorry if I am un-announced here for the first time. But I would like to help if 
I can.

Jose Higino,
from NIWA
New Zealand

Cheers

On Sun, 22 Jul 2018 at 23:26, Peter Childs 
mailto:p.chi...@qmul.ac.uk>> wrote:
Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many hours 
(i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs 
accessing the same file from many nodes also look to have come to an end for 
the time being.

I was trying to figure out howto control the bad io using mmchqos, to 
prioritise certain nodes over others but had not worked out if that was 
possible yet.

We've only previously seen this problem when we had some bad disks in our 
storage, which we replaced, I've checked and I can't see that issue currently.

Thanks for the help.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

 Yaron Daniel wrote 

Hi

Do u run mmbackup on snapshot , which is read only ?


Regards


Re: [gpfsug-discuss] Same file opened by many nodes / processes

2018-07-23 Thread Peter Childs
On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote:

Hi there,

Have you been able to create a test case (replicate the problem)? Can you tell 
us a bit more about the setup?

Not really, It feels like a perfect storm, any one of the tasks running on its 
own would be fine, Its the shear load, our mmpmon data says the storage has 
been flat lining when it occurs.

Its a reasonably standard (small) HPC cluster, with a very mixed work load, 
hence while we can usually find "bad" jobs from the point of view of io on this 
occasion we can see a few large array jobs all accessing the same file, the 
cluster runs fine until we get to a certain point and one more will tip the 
balance. We've been attempting to limit the problem by adding limits to the 
number of jobs in an array that can run at once. But that feels like fire 
fighting.


Are you using GPFS API over any administrative commands? Any problems with the 
network (being that Ethernet or IB)?

We're not as using the GPFS API, never got it working, which is a shame, I've 
never managed to figure out the setup, although it is on my to do list.

Network wise, We've just removed a great deal of noise from arp requests by 
increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network 
currently, we're currently looking at removing all the 1GBit nodes within the 
next few months and adding some new faster kit. The Storage is attached at 
40GBit but it does not look to want to run much above 5Gbit I suspect due to 
Ethernet back off due to the mixed speeds.

While we do have some IB we don't currently run our storage over it.

Thanks in advance

Peter Childs





Sorry if I am un-announced here for the first time. But I would like to help if 
I can.

Jose Higino,
from NIWA
New Zealand

Cheers

On Sun, 22 Jul 2018 at 23:26, Peter Childs 
mailto:p.chi...@qmul.ac.uk>> wrote:
Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many hours 
(i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs 
accessing the same file from many nodes also look to have come to an end for 
the time being.

I was trying to figure out howto control the bad io using mmchqos, to 
prioritise certain nodes over others but had not worked out if that was 
possible yet.

We've only previously seen this problem when we had some bad disks in our 
storage, which we replaced, I've checked and I can't see that issue currently.

Thanks for the help.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

 Yaron Daniel wrote 

Hi

Do u run mmbackup on snapshot , which is read only ?


Regards





Yaron Daniel 94 Em Ha'Moshavot Rd
[cid:_1_0C9372140C936C60006FF189C22582D1]

Storage Architect – IL Lab Services (Storage)Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales Israel

Phone:  +972-3-916-5672
Fax:+972-3-916-5672
Mobile: +972-52-8395593
e-mail: y...@il.ibm.com<mailto:y...@il.ibm.com>
IBM Israel<http://www.ibm.com/il/he/>



[IBM Storage Strategy and Solutions v1][IBM Storage Management and Data 
Protection 
v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1]
 
[https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png]
   [Related image]



From:Peter Childs mailto:p.chi...@qmul.ac.uk>>
To:
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:07/10/2018 05:51 PM
Subject:[gpfsug-discuss] Same file opened by many nodes / processes
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs



Re: [gpfsug-discuss] Same file opened by many nodes / processes

2018-07-22 Thread Peter Childs
Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many hours 
(i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs 
accessing the same file from many nodes also look to have come to an end for 
the time being.

I was trying to figure out howto control the bad io using mmchqos, to 
prioritise certain nodes over others but had not worked out if that was 
possible yet.

We've only previously seen this problem when we had some bad disks in our 
storage, which we replaced, I've checked and I can't see that issue currently.

Thanks for the help.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

 Yaron Daniel wrote 

Hi

Do u run mmbackup on snapshot , which is read only ?


Regards





Yaron Daniel 94 Em Ha'Moshavot Rd
[cid:_1_0C9372140C936C60006FF189C22582D1]

Storage Architect – IL Lab Services (Storage)Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales Israel

Phone:  +972-3-916-5672
Fax:+972-3-916-5672
Mobile: +972-52-8395593
e-mail: y...@il.ibm.com
IBM Israel<http://www.ibm.com/il/he/>



[IBM Storage Strategy and Solutions v1][IBM Storage Management and Data 
Protection 
v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1]
 
[https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png]
   [Related image]



From:    Peter Childs 
To:"gpfsug-discuss@spectrumscale.org" 
Date:07/10/2018 05:51 PM
Subject:[gpfsug-discuss] Same file opened by many nodes / processes
Sent by:gpfsug-discuss-boun...@spectrumscale.org




We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Same file opened by many nodes / processes

2018-07-10 Thread Peter Childs
The reason I think the metanode is moving around is I'd done a limited amount 
of trying to track it down using "mmfsadm saferdump file" and it moved before 
I'd tracked down the correct metanode. But I might have been chasing ghosts, so 
it may be operating normally and nothing to worry about.

The user reading the file only has read access to it from the file permissions,

Mmbackup has only slowed down while this job has been running. As I say the 
scan for what to backup usally takes 40-60 minutes, but is currently 
taking 3-4 hours with these jobs running. I've seen it take 3 days when our 
storage went bad (slow and failing disks) but that is usally a sign of a bad 
disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I 
cant see anything like that currently however.

It might be that its network congestion were suffering from and nothing to do 
with token management but as the mmpmon bytes read data is running very high 
with this job and the load is spread over 50+ nodes it's difficult to see one 
culprit. It's a mixed speed ethernet network mainly 10GB connected although the 
nodes in question are legacy with only 1GB connections (and 40GB to the back of 
the storage.

We're currently running 4.2.3-8

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

 IBM Spectrum Scale wrote 

What is in the dump that indicates the metanode is moving around?  Could you 
please provide an example of what you are seeing?

You noted that the access is all read only, is the file opened for read only or 
for read and write?

What makes you state that this particular file is interfering with the scan 
done by mmbackup?  Reading a file, no matter how large should significantly 
impact a policy scan.

What version of Spectrum Scale are you running and how large is your cluster?

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact  1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.



From:Peter Childs 
To:"gpfsug-discuss@spectrumscale.org" 
Date:07/10/2018 10:51 AM
Subject:[gpfsug-discuss] Same file opened by many nodes / processes
Sent by:gpfsug-discuss-boun...@spectrumscale.org




We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Lroc on NVME

2018-06-12 Thread Peter Childs
We have a new computer, which has an nvme drive that is appearing as
/dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a
partition on the drive.

After doing the standard mmcrnsd to set it up Spectrum Scale fails to
see it.

I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them,
and it does work now. What "type" should I set the nvme drives too?
I've currently set it to "generic"

I want to do some tidying of my script, but has anyone else tried to
get lroc running on nvme and how well does it work.

We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently.

Thanks in advance.



 
-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

2018-06-04 Thread Peter Childs

We have 2 power 9 nodes,

The rest of our cluster is running centos 7.4 and spectrum scale 
4.2.3-8 (x86 based)

The power 9 nodes are running spectrum scale 5.0.0-0 currently as 
we couldn't get the gplbin for 4.2.3 to compile, where as spectrum 
scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an 
old kernel I guess.

I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 
out reach team but heard nothing back.

If we can get 4.2.3 running on the power 9 nodes it would put us in 
a more consistent setup.

Of course our current plan b is to upgrade everything to 5.0.1, but 
we can't do that as our storage appliance doesn't (officially) support spectrum 
scale 5 yet.

These are my experiences of what works and nothing whatsoever to do with what's 
supported, except I want to keep us as close to a supported setup as possible 
given what we've found to actually work. (now that's an interesting spin on a 
disclaimer)


Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

 Simon Thompson (IT Research Support) wrote 

Thanks Felipe,

Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 
when the x86 7.5 release is also made?

Simon
* Insert standard IBM disclaimer about the meaning of intent etc etc


From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of k...@us.ibm.com 
[k...@us.ibm.com]
Sent: 04 June 2018 16:47
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

Simon,

The support statement for Power9 / RHEL 7.4 has not yet been included in the 
FAQ, but I understand that a FAQ update is under way:

4.2.3.8 for the 4.2.3 release

5.0.0.0 for the 5.0.0 release

Kernel level tested with: 4.11.0-44.6.1.el7a

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314



[Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 
07:21:56 AM---So … I have another question on s]"Simon Thompson (IT Research 
Support)" ---06/04/2018 07:21:56 AM---So … I have another question on support. 
We’ve just ordered some Power 9 nodes, now my understanding

From: "Simon Thompson (IT Research Support)" 
To: gpfsug main discussion list 
Date: 06/04/2018 07:21 AM
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862
Sent by: gpfsug-discuss-boun...@spectrumscale.org




So … I have another question on support.

We’ve just ordered some Power 9 nodes, now my understanding is that with 7.4, 
they require the -ALT kernel 
(https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm)
 which is 4.x based. I don’t see any reference in the Spectrum Scale FAQ to the 
ALT kernels.

So what Scale code is supported for us to run on the Power9s?

Thanks

Simon

From:  on behalf of "k...@us.ibm.com" 

Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Friday, 25 May 2018 at 14:24
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

All,

Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships 
with RHEL 7.5) as a result of applying kernel security patches may open a PMR 
to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be 
provided once the internal tests on RHEL 7.5 have been completed, likely a few 
days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid 
June).

Regards,

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] How to clear explicitly set quotas

2018-05-22 Thread Peter Childs

Its a little difficult that the different quota commands for Spectrum Scale are 
all different in there syntax and can only be used by the "right" people.

As far as I can see mmedquota is the only quota command that uses this "full 
colon" syntax and it would be better if its syntax matched that for mmsetquota 
and mmlsquota. or that the reset to default quota was added to mmsetquota and 
mmedquota was left for editing quotas visually in an editor.

Regards

Peter Childs




On Tue, 2018-05-22 at 16:01 +0800, IBM Spectrum Scale wrote:

Hi Kuei-Yu,

Should we update the document as the requested below ?

Thanks.

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

[Inactive hide details for Bryan Banister ---05/22/2018 04:52:15 AM---Quick 
update.  Thanks to a colleague of mine, John Valdes,]Bryan Banister 
---05/22/2018 04:52:15 AM---Quick update. Thanks to a colleague of mine, John 
Valdes, there is a way to specify the file system

From: Bryan Banister <bbanis...@jumptrading.com>
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date: 05/22/2018 04:52 AM
Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas
Sent by: gpfsug-discuss-boun...@spectrumscale.org



Quick update. Thanks to a colleague of mine, John Valdes, there is a way to 
specify the file system + fileset + user with this form:

mmedquota -d -u ::

It’s just not documented in the man page or shown in the examples. Docs need to 
be updated!
-Bryan

From: gpfsug-discuss-boun...@spectrumscale.org 
[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan Banister
Sent: Tuesday, May 15, 2018 11:00 AM
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas

Note: External Email

Unfortunately it doesn’t look like there is a way to target a specific quota. 
So for cluster with many file systems and/or many filesets in each file system, 
clearing the quota entries affect all quotas in all file systems and all 
filesets. This means that you have to clear them all and then reapply the 
explicit quotas that you need to keep.

# mmedquota -h
Usage: mmedquota -d {-u User ... | -g Group ... | -j Device:Fileset ... }

Maybe RFE time, or am I missing some other existing solution?
-Bryan

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan Banister
Sent: Tuesday, May 15, 2018 10:36 AM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas

Note: External Email

That was it! Thanks!

# mmrepquota -v fpi_test02:root --block-size G
*** Report for USR GRP quotas on fpi_test02
Block Limits | File Limits
Name fileset type GB quota limit in_doubt grace | files quota limit in_doubt 
grace entryType
root root USR 243 0 0 0 none | 248 0 0 0 none default on
bbanister root USR 84 0 0 0 none | 21 0 0 0 none e
root root GRP 243 0 0 0 none | 248 0 0 0 none default on
# mmedquota -d -u bbanister
#
# mmrepquota -v fpi_test02:root --block-size G
*** Report for USR GRP quotas on fpi_test02
Block Limits | File Limits
Name fileset type GB quota limit in_doubt grace | files quota limit in_doubt 
grace entryType
root root USR 243 0 0 0 none | 248 0 0 0 none default on
bbanister root USR 84 0 0 0 none | 21 0 0 0 none d_fset
root root GRP 243 0 0 0 none | 248 0 0 0 none default on

Note that " Try disabling and re-enabling default quotas with the -d option for 
that fileset " didn't fix this issue.

Cheers,
-Bryan

-Original Message-
From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter Serocka
Sent: Monday, May 14, 2018 4:52 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas

Note: External Email
-

Re: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.

2017-07-24 Thread Peter Childs

top

but ps gives the same value.

[root@dn29<mailto:root@dn29> ~]# ps auww -q 
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root    2.7 22.3 10537600 5472580 ?S wrote:


I've had a look at mmfsadm dump malloc and it looks to agree with the output 
from mmdiag --memory. and does not seam to account for the excessive memory 
usage.

The new machines do have idleSocketTimout set to 0 from what your saying it 
could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.




[root@dn29<mailto:root@dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
   128 bytes in use
   17500049370 hard limit on memory usage
   1048576 bytes committed to regions
 1 number of regions
   555 allocations
   555 frees
 0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
  42179592 bytes in use
   17500049370 hard limit on memory usage
  56623104 bytes committed to regions
 9 number of regions
100027 allocations
 79624 frees
 0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
   2099520 bytes in use
   17500049370 hard limit on memory usage
  16778240 bytes committed to regions
 1 number of regions
 4 allocations
 0 frees
 0 allocation failures


On Mon, 2017-07-24 at 13:11 +, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared 
memory segments.   To see the memory utilization of the shared memory segments 
run the command   mmfsadm dump malloc .The statistics for memory pool id 2 
is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use 
memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak 
that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.chi...@qmul.ac.uk> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.

2017-07-24 Thread Peter Childs
I've had a look at mmfsadm dump malloc and it looks to agree with the output 
from mmdiag --memory. and does not seam to account for the excessive memory 
usage.

The new machines do have idleSocketTimout set to 0 from what your saying it 
could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.




[root@dn29<mailto:root@dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
   128 bytes in use
   17500049370 hard limit on memory usage
   1048576 bytes committed to regions
 1 number of regions
   555 allocations
   555 frees
 0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
  42179592 bytes in use
   17500049370 hard limit on memory usage
  56623104 bytes committed to regions
 9 number of regions
100027 allocations
 79624 frees
 0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
   2099520 bytes in use
   17500049370 hard limit on memory usage
  16778240 bytes committed to regions
 1 number of regions
 4 allocations
 0 frees
 0 allocation failures


On Mon, 2017-07-24 at 13:11 +, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared 
memory segments.   To see the memory utilization of the shared memory segments 
run the command   mmfsadm dump malloc .The statistics for memory pool id 2 
is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use 
memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak 
that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.chi...@qmul.ac.uk> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors

2017-05-18 Thread Peter Childs

As I understand it,

mmbackup calls mmapplypolicy so this stands for mmapplypolicy too.

mmapplypolicy scans the metadata inodes (file) as requested depending on 
the query supplied.


You can ask mmapplypolicy to scan a fileset, inode space or filesystem.

If scanning a fileset it scans the inode space that fileset is dependant 
on, for all files in that fileset. Smaller inode spaces hence less to 
scan, hence its faster to use an independent filesets, you get a list of 
what to process quicker.


Another advantage is that once an inode is allocated you can't 
deallocate it, however you can delete independent filesets and hence 
deallocate the inodes, so if you have a task which has losts and lots of 
small files which are only needed for a short period of time, you can 
create a new independent fileset for them work on them and then blow 
them away afterwards.


I like independent filesets I'm guessing the only reason dependant 
filesets are used by default is history.



Peter


On 18/05/17 14:58, Jaime Pinto wrote:

Thanks for the explanation Mark and Luis,

It begs the question: why filesets are created as dependent by 
default, if the adverse repercussions can be so great afterward? Even 
in my case, where I manage GPFS and TSM deployments (and I have been 
around for a while), didn't realize at all that not adding and extra 
option at fileset creation time would cause me huge trouble with 
scaling later on as I try to use mmbackup.


When you have different groups to manage file systems and backups that 
don't read each-other's manuals ahead of time then we have a really 
bad recipe.


I'm looking forward to your explanation as to why mmbackup cares one 
way or another.


I'm also hoping for a hint as to how to configure backup exclusion 
rules on the TSM side to exclude fileset traversing on the GPFS side. 
Is mmbackup smart enough (actually smarter than TSM client itself) to 
read the exclusion rules on the TSM configuration and apply them 
before traversing?


Thanks
Jaime

Quoting "Marc A Kaplan" :

When I see "independent fileset" (in Spectrum/GPFS/Scale)  I always 
think

and try to read that as "inode space".

An "independent fileset" has all the attributes of an (older-fashioned)
dependent fileset PLUS all of its files are represented by inodes 
that are
in a separable range of inode numbers - this allows GPFS to 
efficiently do

snapshots of just that inode-space (uh... independent fileset)...

And... of course the files of dependent filesets must also be 
represented

by inodes -- those inode numbers are within the inode-space of whatever
the containing independent fileset is... as was chosen when you created
the fileset   If you didn't say otherwise, inodes come from the
default "root" fileset

Clear as your bath-water, no?

So why does mmbackup care one way or another ???   Stay tuned

BTW - if you look at the bits of the inode numbers carefully --- you may
not immediately discern what I mean by a "separable range of inode
numbers" -- (very technical hint) you may need to permute the bit order
before you discern a simple pattern...



From:   "Luis Bolinches" 
To: gpfsug-discuss@spectrumscale.org
Cc: gpfsug-discuss@spectrumscale.org
Date:   05/18/2017 02:10 AM
Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope 
errors

Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi

There is no direct way to convert the one fileset that is dependent to
independent or viceversa.

I would suggest to take a look to chapter 5 of the 2014 redbook, lots of
definitions about GPFS ILM including filesets
http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only
place that is explained but I honestly believe is a good single start
point. It also needs an update as does nto have anything on CES nor ESS,
so anyone in this list feel free to give feedback on that page people 
with

funding decisions listen there.

So you are limited to either migrate the data from that fileset to a new
independent fileset (multiple ways to do that) or use the TSM client
config.

- Original message -
From: "Jaime Pinto" 
Sent by: gpfsug-discuss-boun...@spectrumscale.org
To: "gpfsug main discussion list" ,
"Jaime Pinto" 
Cc:
Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors
Date: Thu, May 18, 2017 4:43 AM

There is hope. See reference link below:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm 




The issue has to do with dependent vs. independent filesets, something
I didn't even realize existed until now. Our filesets are dependent
(for no particular reason), so I have to find a way to turn them into
independent.

The proper option syntax is "--scope inodespace", and the error
message actually flagged that out, however I didn't 

Re: [gpfsug-discuss] AFM Prefetch Missing Files

2017-05-18 Thread Peter Childs
Further investigation and checking says 4.2.1 afmctl prefetch is missing 
empty directories (not files as said previously) and noted by the update 
in 4.2.2.3. However I've found it is also missing symlinks both dangling 
(pointing to files that don't exist) and not.


I can't see any actual data loss which is good.

I'm looking to work around this with

find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) 
-printf "%p -> %l\n"


My initial testing says this should work. (/data2/$fileset is the 
destination "cache" fileset)


It looks like this should catch everything, But I'm wondering if anyone 
else has noticed any other things afmctl prefetch misses.


Thanks in advance

Peter Childs

On 16/05/17 10:40, Peter Childs wrote:
I know it was said at the User group meeting last week that older 
versions of afm prefetch miss empty files and that this is now fixed 
in 4.2.2.3.


We are in the middle of trying to migrate our files to a new 
filesystem, and since that was said I'm double checking for any 
mistakes etc.


Anyway it looks like AFM prefetch also misses symlinks pointing to 
files that that don't exist. ie "dangling symlinks" or ones that point 
to files that either have not been created yet or have subsequently 
been deleted. or when files have been decompressed and a symlink 
extracted that points somewhere that is never going to exist.


I'm still checking this, and as yet it does not look like its a data 
loss issue, but it could still cause things to not quiet work once the 
file migration is complete.


Does anyone else know of any other types of files that might be missed 
and I need to be aware of?


We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" 
using a gpfs policy to collect the list, we are using GPFS 
Multi-cluster to connect the two filesystems not NFS


Thanks in advanced


Peter Childs


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] AFM Prefetch Missing Files

2017-05-16 Thread Peter Childs
I know it was said at the User group meeting last week that older 
versions of afm prefetch miss empty files and that this is now fixed in 
4.2.2.3.


We are in the middle of trying to migrate our files to a new filesystem, 
and since that was said I'm double checking for any mistakes etc.


Anyway it looks like AFM prefetch also misses symlinks pointing to files 
that that don't exist. ie "dangling symlinks" or ones that point to 
files that either have not been created yet or have subsequently been 
deleted. or when files have been decompressed and a symlink extracted 
that points somewhere that is never going to exist.


I'm still checking this, and as yet it does not look like its a data 
loss issue, but it could still cause things to not quiet work once the 
file migration is complete.


Does anyone else know of any other types of files that might be missed 
and I need to be aware of?


We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" 
using a gpfs policy to collect the list, we are using GPFS Multi-cluster 
to connect the two filesystems not NFS


Thanks in advanced


Peter Childs


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

2017-04-20 Thread Peter Childs
Simon,

We've managed to resolve this issue by switching off quota's and switching them 
back on again and rebuilding the quota file.

Can I check if you run quota's on your cluster.

See you 2 weeks in Manchester

Thanks in advance.

Peter Childs
Research Storage Expert
ITS Research Infrastructure
Queen Mary, University of London
Phone: 020 7882 8393


From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Simon Thompson (IT 
Research Support) <s.j.thomp...@bham.ac.uk>
Sent: Tuesday, April 11, 2017 4:55:35 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of
bbanis...@jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>___
>gpfsug-discuss m

Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

2017-04-13 Thread Peter Childs

After a load more debugging, and switching off the quota's the issue looks to 
be quota related. in that the issue has gone away since I switched quota's off.

I will need to switch them back on, but at least we know the issue is not the 
network and is likely to be fixed by upgrading.


Peter Childs
ITS Research Infrastructure
Queen Mary, University of London



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Peter Childs 
<p.chi...@qmul.ac.uk>
Sent: Tuesday, April 11, 2017 8:35:40 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

Can you remember what version you were running? Don't worry if you can't 
remember.

It looks like ibm may have withdrawn 4.2.1 from fix central and wish 
to forget its existences. Never a good sign, 4.2.0, 
4.2.2, 4.2.3 and even 3.5, so maybe upgrading is worth a 
try.

I've looked at all the standard trouble shouting guides and got nowhere hence 
why I asked. But another set of slides always helps.

Thank-you for the help, still head scratching  Which only makes the issue 
more random.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


 Simon Thompson (IT Research Support) wrote 

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of
bbanis...@jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>

Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

2017-04-11 Thread Peter Childs

Can you remember what version you were running? Don't worry if you can't 
remember.

It looks like ibm may have withdrawn 4.2.1 from fix central and wish 
to forget its existences. Never a good sign, 4.2.0, 
4.2.2, 4.2.3 and even 3.5, so maybe upgrading is worth a 
try.

I've looked at all the standard trouble shouting guides and got nowhere hence 
why I asked. But another set of slides always helps.

Thank-you for the help, still head scratching  Which only makes the issue 
more random.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


 Simon Thompson (IT Research Support) wrote 

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of
bbanis...@jumptrading.com> wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discu

Re: [gpfsug-discuss] mmbackup logging issue

2017-03-03 Thread Peter Childs
That's basically what we did, They are only environment variables, so if you 
not using bash to call mmbackup you will need to change the lines accordingly.

What they do is in the manual the issue is the default changed between 
versions.

Peter Childs
ITS Research Infrastructure
Queen Mary, University of London


From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sobey, Richard A 
<r.so...@imperial.ac.uk>
Sent: Friday, March 3, 2017 9:20:24 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] mmbackup logging issue

HI all

We have the same problem (less of a problem, more lack of visibilitiy).

Can I just add those lines to the top of our mmbackup.sh script?

-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Ashish Thandavan
Sent: 02 March 2017 16:50
To: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] mmbackup logging issue

Dear Peter,


On 02/03/17 16:34, Peter Childs wrote:
> We had that issue.
>
> we had to
>
> export MMBACKUP_PROGRESS_CONTENT=5
> export MMBACKUP_PROGRESS_INTERVAL=300
>
> before we run it to get it back.
>
> Lets just say IBM changed the behaviour, We ended up opening a PRM to
> get that answer ;) We also set -L 1
>
> you can change how often the messages are displayed by changing
> MMBACKUP_PROGRESS_INTERVAL flexable but the default is different;)
>

I'll set those variables before kicking off the next mmbackup and hope that 
fixes it.

Thank you!!

Regards,
Ash

--
-
Ashish Thandavan

UNIX Support Computing Officer
Department of Computer Science
University of Oxford
Wolfson Building
Parks Road
Oxford OX1 3QD

Phone: 01865 610733
Email: ashish.thanda...@cs.ox.ac.uk

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup logging issue

2017-03-02 Thread Peter Childs
We had that issue.

we had to

export MMBACKUP_PROGRESS_CONTENT=5
export MMBACKUP_PROGRESS_INTERVAL=300

before we run it to get it back.

Lets just say IBM changed the behaviour, We ended up opening a PRM to get that 
answer ;) We also set -L 1

you can change how often the messages are displayed by changing 
MMBACKUP_PROGRESS_INTERVAL flexable but the default is different;)

Peter Childs
ITS Research Infrastructure
Queen Mary, University of London



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Ashish Thandavan 
<ashish.thanda...@cs.ox.ac.uk>
Sent: Tuesday, February 28, 2017 4:10:44 PM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] mmbackup logging issue

Dear all,

We have a small GPFS cluster and a separate server running TSM and one
of the three NSD servers backs up our GPFS filesystem to the TSM server
using mmbackup. After a recent upgrade from v3.5 to 4.1.1, we've noticed
that mmbackup no longer logs stuff like it used to :

...
Thu Jan 19 05:45:41 2017 mmbackup:Backing up files: 0 backed up, 870532
expired, 2 failed.
Thu Jan 19 06:15:41 2017 mmbackup:Backing up files: 0 backed up, 870532
expired, 3 failed.
Thu Jan 19 06:45:41 2017 mmbackup:Backing up files: 0 backed up, 870532
expired, 3 failed.
...


instead of

...
Sat Dec  3 12:01:00 2016 mmbackup:Backing up files: 105030 backed up,
635456 expired, 30 failed.
Sat Dec  3 12:31:00 2016 mmbackup:Backing up files: 205934 backed up,
635456 expired, 57 failed.
Sat Dec  3 13:01:00 2016 mmbackup:Backing up files: 321702 backed up,
635456 expired, 169 failed.
...

like it used to pre-upgrade.

I am therefore unable to see how far long it has got, and indeed if it
completed successfully, as this is what it logs at the end of a job :

...
Tue Jan 17 18:07:31 2017 mmbackup:Completed policy backup run with 0
policy errors, 10012 files failed, 0 severe errors, returning rc=9.
Tue Jan 17 18:07:31 2017 mmbackup:Policy for backup returned 9 Highest
TSM error 12
mmbackup: TSM Summary Information:
 Total number of objects inspected: 20617273
 Total number of objects backed up: 0
 Total number of objects updated: 0
 Total number of objects rebound: 0
 Total number of objects deleted: 0
 Total number of objects expired: 1
 Total number of objects failed: 10012
 Total number of objects encrypted: 0
 Total number of bytes inspected: 3821624716861
 Total number of bytes transferred: 3712040943672
Tue Jan 17 18:07:31 2017 mmbackup:Audit files /cs/mmbackup.audit.gpfs*
contain 0 failed paths but there were 10012 failures.
Cannot reconcile shadow database.
Unable to compensate for all TSM errors in new shadow database.
   Preserving previous shadow database.
   Run next mmbackup with -q to synchronize shadow database.  exit 12

If it helps, the mmbackup job is kicked off with the following options :
  /usr/lpp/mmfs/bin/mmbackup gpfs -n 8 -t full -B 2 -L 1
--tsm-servers gpfs_weekly_stanza -N glossop1a | /usr/bin/tee
/var/log/mmbackup/gpfs_weekly/backup_log.`date +%Y%m%d_%H_%M`

(The excerpts above are from the backup_log. file.)

Our NSD servers are running GPFS 4.1.1-11, TSM is at 7.1.1.100 and the
File system version is 12.06 (3.4.0.3). Has anyone else seen this
behaviour with mmbackup and if so, found a fix?

Thanks,

Regards,
Ash

--
-
Ashish Thandavan

UNIX Support Computing Officer
Department of Computer Science
University of Oxford
Wolfson Building
Parks Road
Oxford OX1 3QD

Phone: 01865 610733
Email: ashish.thanda...@cs.ox.ac.uk

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] AFM OpenFiles

2017-02-09 Thread Peter Childs
4.2.1.1 or CentOs 7. So that might account for it.

Thanks

Peter Childs


From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Venkateswara R Puvvada 
<vpuvv...@in.ibm.com>
Sent: Thursday, February 9, 2017 3:10:58 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM OpenFiles

What is the version of GPFS ? There was an issue fixed in Spectrum Scale 4.2.2 
for file count(file_nr) leak. This issue mostly happens on Linux kernel version 
>= 3.6.

~Venkat (vpuvv...@in.ibm.com)



From:    Peter Childs <p.chi...@qmul.ac.uk>
To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date:02/09/2017 08:00 PM
Subject:[gpfsug-discuss] AFM OpenFiles
Sent by:gpfsug-discuss-boun...@spectrumscale.org




We are trying to preform a file migration from our old GPFS cluster to our New 
GPFS Cluster using AFM.

Currently we have 142 AFM Filesets setup one for each fileset on the old 
cluster, and are attempting to prefetch the files. in batched of 100,000 files 
with "mmafmctl home prefetch -j $fileset --list-file=$curfile 
--home-fs-path=/data/$fileset 2>&1"

I'm doing this on a separate gateway node from our main gpfs servers and its 
work quiet well,

However there seams to be a leak in AFM with file handles and after a couple of 
days of prefetch the gateway will run out of file handles and need rebooting 
before we can continue.

We thought to begin with this was improved by not doing --metadata-only on the 
prefetch but in fact (As we where attempting to get the metadata before getting 
the main data) but in truth the machine was just lasting a little longer.

Does anyone know of any setting that may help this or what is wrong?

Thanks

Peter Childs
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] AFM OpenFiles

2017-02-09 Thread Peter Childs
We are trying to preform a file migration from our old GPFS cluster to our New 
GPFS Cluster using AFM.

Currently we have 142 AFM Filesets setup one for each fileset on the old 
cluster, and are attempting to prefetch the files. in batched of 100,000 files 
with "mmafmctl home prefetch -j $fileset --list-file=$curfile 
--home-fs-path=/data/$fileset 2>&1"

I'm doing this on a separate gateway node from our main gpfs servers and its 
work quiet well, 

However there seams to be a leak in AFM with file handles and after a couple of 
days of prefetch the gateway will run out of file handles and need rebooting 
before we can continue.

We thought to begin with this was improved by not doing --metadata-only on the 
prefetch but in fact (As we where attempting to get the metadata before getting 
the main data) but in truth the machine was just lasting a little longer.

Does anyone know of any setting that may help this or what is wrong?

Thanks

Peter Childs
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] AFM Migration Issue

2017-01-09 Thread Peter Childs
Interesting, I'm currently doing similar but currently am only using read-only 
to premigrate the filesets,  The directory file stamps don't agree with the 
original but neither are they all marked when they were migrated. So there is 
something very weird going on. (We're planning to switch them to Local 
Update when we move the users over to them)

We're using a mmapplypolicy on our old gpfs cluster to get the files  to 
migrate, and have noticed that you need a

RULE EXTERNAL LIST ESCAPE '%/'

line otherwise files with % in the filenames don't get migrated and through 
errors.

I'm trying to work out if empty directories or those containing only empty 
directories get migrated correctly as you can't list them in the mmafmctl 
prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors)

I am very interested in the solution to this issue.

Peter Childs
Queen Mary, University of London



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of 
paul.tomlin...@awe.co.uk <paul.tomlin...@awe.co.uk>
Sent: Monday, January 9, 2017 3:09:43 PM
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] AFM Migration Issue

Hi All,

We have just completed the first data move from our old cluster to the new one 
using AFM Local Update as per the guide, however we have noticed that all date 
stamps on the directories have the date they were created on(e.g. 9th Jan 2017) 
, not the date from the old system (e.g. 14th April 2007), whereas all the 
files have the correct dates.

Has anyone else seen this issue as we now have to convert all the directory 
dates to their original dates !




The information in this email and in any attachment(s) is
commercial in confidence. If you are not the named addressee(s)
or
if you receive this email in error then any distribution, copying or
use of this communication or the information in it is strictly
prohibited. Please notify us immediately by email at
admin.internet(at)awe.co.uk, and then delete this message from
your computer. While attachments are virus checked, AWE plc
does not accept any liability in respect of any virus which is not
detected.

AWE Plc
Registered in England and Wales
Registration No 02763902
AWE, Aldermaston, Reading, RG7 4PR

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] LROC

2016-12-21 Thread Peter Childs
So your saying maxStatCache should be raised on LROC enabled nodes only as its 
the only place under Linux its used and should be set low on non-LROC enabled 
nodes.

Fine just good to know, nice and easy now with nodeclasses

Peter Childs



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme 
<oeh...@gmail.com>
Sent: Wednesday, December 21, 2016 11:37:46 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] LROC

StatCache is not useful on Linux, that hasn't changed if you don't use LROC on 
the same node. LROC uses the compact object (StatCache) to store its pointer to 
the full file Object which is stored on the LROC device. so on a call for 
attributes that are not in the StatCache the object gets recalled from LROC and 
converted back into a full File Object, which is why you still need to have a 
reasonable maxFiles setting even you use LROC as you otherwise constantly move 
file infos in and out of LROC and put the device under heavy load.

sven



On Wed, Dec 21, 2016 at 12:29 PM Peter Childs 
<p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>> wrote:
My understanding was the maxStatCache was only used on AIX and should be set 
low on Linux, as raising it did't help and wasted resources. Are we saying that 
LROC now uses it and setting it low if you raise maxFilesToCache under linux is 
no longer the advice.


Peter Childs



From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Sven Oehme <oeh...@gmail.com<mailto:oeh...@gmail.com>>
Sent: Wednesday, December 21, 2016 9:23:16 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] LROC

Lroc only needs a StatCache object as it 'compacts' a full open File object 
(maxFilesToCache) to a StatCache Object when it moves the content to the LROC 
device.
therefore the only thing you really need to increase is maxStatCache on the 
LROC node, but you still need maxFiles Objects, so leave that untouched and 
just increas maxStat

Olaf's comment is important you need to make sure your manager nodes have 
enough memory to hold tokens for all the objects you want to cache, but if the 
memory is there and you have enough its well worth spend a lot of memory on it 
and bump maxStatCache to a high number. i have tested maxStatCache up to 16 
million at some point per node, but if nodes with this large amount of inodes 
crash or you try to shut them down you have some delays , therefore i suggest 
you stay within a 1 or 2  million per node and see how well it does and also if 
you get a significant gain.
i did help Bob to setup some monitoring for it so he can actually get 
comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have 
real stats too , so you can see what benefits you get.

Sven

On Tue, Dec 20, 2016 at 8:13 PM Matt Weil 
<mw...@wustl.edu<mailto:mw...@wustl.edu><mailto:mw...@wustl.edu<mailto:mw...@wustl.edu>>>
 wrote:

as many as possible and both

have maxFilesToCache 128000

and maxStatCache 4

do these effect what sits on the LROC as well?  Are those to small? 1million 
seemed excessive.

On 12/20/16 11:03 AM, Sven Oehme wrote:
how much files do you want to cache ?
and do you only want to cache metadata or also data associated to the files ?

sven



On Tue, Dec 20, 2016 at 5:35 PM Matt Weil 
<mw...@wustl.edu<mailto:mw...@wustl.edu><mailto:mw...@wustl.edu<mailto:mw...@wustl.edu>>>
 wrote:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage<https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage>

Hello all,

Are there any tuning recommendations to get these to cache more metadata?

Thanks

Matt

___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/lis

Re: [gpfsug-discuss] LROC

2016-12-21 Thread Peter Childs
My understanding was the maxStatCache was only used on AIX and should be set 
low on Linux, as raising it did't help and wasted resources. Are we saying that 
LROC now uses it and setting it low if you raise maxFilesToCache under linux is 
no longer the advice.


Peter Childs



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme 
<oeh...@gmail.com>
Sent: Wednesday, December 21, 2016 9:23:16 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] LROC

Lroc only needs a StatCache object as it 'compacts' a full open File object 
(maxFilesToCache) to a StatCache Object when it moves the content to the LROC 
device.
therefore the only thing you really need to increase is maxStatCache on the 
LROC node, but you still need maxFiles Objects, so leave that untouched and 
just increas maxStat

Olaf's comment is important you need to make sure your manager nodes have 
enough memory to hold tokens for all the objects you want to cache, but if the 
memory is there and you have enough its well worth spend a lot of memory on it 
and bump maxStatCache to a high number. i have tested maxStatCache up to 16 
million at some point per node, but if nodes with this large amount of inodes 
crash or you try to shut them down you have some delays , therefore i suggest 
you stay within a 1 or 2  million per node and see how well it does and also if 
you get a significant gain.
i did help Bob to setup some monitoring for it so he can actually get 
comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have 
real stats too , so you can see what benefits you get.

Sven

On Tue, Dec 20, 2016 at 8:13 PM Matt Weil 
<mw...@wustl.edu<mailto:mw...@wustl.edu>> wrote:

as many as possible and both

have maxFilesToCache 128000

and maxStatCache 4

do these effect what sits on the LROC as well?  Are those to small? 1million 
seemed excessive.

On 12/20/16 11:03 AM, Sven Oehme wrote:
how much files do you want to cache ?
and do you only want to cache metadata or also data associated to the files ?

sven



On Tue, Dec 20, 2016 at 5:35 PM Matt Weil 
<mw...@wustl.edu<mailto:mw...@wustl.edu>> wrote:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage<https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage>

Hello all,

Are there any tuning recommendations to get these to cache more metadata?

Thanks

Matt

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Using AFM to migrate files. (Peter Childs) (Peter Childs)

2016-10-21 Thread Peter Childs







 Bill Pappas wrote 

> >>the largest of the filesets has 52TB and 63 million files
>
>
> Are you using NFS as the transport path between the home and cache?

No plans to was planning to use gpfs multi-cluster, as transport.

> If you are using NFS, how are you producing the list of files to migrate?  
> mmafmctl with the prefetch option? If so, I would measure the time it takes 
> for that command (with that option) to produce the list of files it intends 
> to prefetch. From my experience, this is very important as a) it can take a 
> long time if you have >10 million of files and b) I've seen this operation 
> crash when the list grew large.  Does anyone else on this thread have any 
> experiences?  I would love to hear positive experiences as well.  I tried so 
> hard and for so long to make AFM work with one customer, but we gave up as it 
> was not reliable and stable for large scale (many files) migrations.
> If you are using GPFS as the conduit between the home and cache (i.e. no 
> NFS), I would still ask the same question, more with respect to stability for 
> large file lists during the initial prefetch stages.

I was planning to use a gpfs policy to create the list, but I guess a find 
should work, I'm guessing your saying don't migrate the files in bulk by using 
a find onto cache.

It would be nice to see some examples recipes to prefetch files into afm.

>
>
> As far as I could tell, from GPFS 3.5 to 4.2, the phases of prefetch where 
> the home and cache are compared (i.e. let's make a list of what is ot be 
> migrated over) before the data transfer begins only runs on the GW node 
> managing that cache.  It does not leverage multiple gw nodes and multiple 
> home nodes to speed up this 'list and find' stage of prefetch.  I hope some 
> AFM developers can clarify or correct my findings.  This was a huge 
> impediment for large file migrations where it is difficult (organizationally, 
> not technically) to split a folder structure into multiple file sets.  The 
> lack of stability under these large scans was the real failing for us.

Interesting.

>
>
> Bill Pappas
>
> 901-619-0585
>
> bpap...@dstonline.com<mailto:bpap...@dstonline.com>
>
>

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


>
>
>
>
>
> http://www.prweb.com/releases/2016/06/prweb13504050.htm
>
>
>
> From: 
> gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
>  
> <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
>  on behalf of 
> gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org>
>  
> <gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org>>
> Sent: Thursday, October 20, 2016 2:07 PM
> To: gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>
> Subject: gpfsug-discuss Digest, Vol 57, Issue 53
>
>  Send gpfsug-discuss mailing list submissions to
> 
> gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 
> gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org>
>
> You can reach the person managing the list at
> 
> gpfsug-discuss-ow...@spectrumscale.org<mailto:gpfsug-discuss-ow...@spectrumscale.org>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: Using AFM to migrate files. (Peter Childs) (Peter Childs)
>
>
> ------
>
> Message: 1
> Date: Thu, 20 Oct 2016 19:07:44 +<tel:+>
> From: Peter Childs <p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>>
> To: gpfsug main discussion list 
> <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
> Subject: Re: [gpfsug-discuss] Using AFM to migrate files. (Peter
> Childs)
> Message-ID: 
> <5qv6d7inj2j1pa94kqamk2uf.1476989646...@email.android.com<mailto:5qv6d7inj2j1pa94kqamk2uf.1476989646...@email.android.com>>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Yes, most of the filesets are based on research groups, projects or 
> departments, with the exception of scratch and home, hence the idea to use a 
> different method for these filesets.
>
> There are ap

Re: [gpfsug-discuss] Using AFM to migrate files.

2016-10-20 Thread Peter Childs
Yes but not a great deal,

Peter Childs
Research Storage Expert
ITS Research Infrastructure
Queen Mary, University of London



From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Yaron Daniel 
<y...@il.ibm.com>
Sent: Thursday, October 20, 2016 7:15:54 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Using AFM to migrate files.

Hi

Does you use NFSv4 acls in your old cluster ?


Regards









Yaron Daniel 94 Em Ha'Moshavot Rd
[cid:_1_09E5055809E4FFC4002269E8C2258052]

Server, Storage and Data 
Services<https://w3-03.ibm.com/services/isd/secure/client.wss/Somt?eventType=getHomePage=115>-
 Team Leader   Petach Tiqva, 49527
Global Technology Services   Israel
Phone:  +972-3-916-5672
Fax:+972-3-916-5672
Mobile: +972-52-8395593
e-mail: y...@il.ibm.com
IBM Israel<http://www.ibm.com/il/he/>








From:Peter Childs <p.chi...@qmul.ac.uk>
To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date:10/19/2016 05:34 PM
Subject:[gpfsug-discuss] Using AFM to migrate files.
Sent by:gpfsug-discuss-boun...@spectrumscale.org






We are planning to use AFM to migrate our old GPFS file store to a new GPFS 
file store. This will give us the advantages of Spectrum Scale (GPFS) 4.2, such 
as larger block and inode size. I would like to attempt to gain some insight on 
my plans before I start.

The old file store was running GPFS 3.5 with 512 byte inodes and 1MB block 
size. We have now upgraded it to 4.1 and are working towards 4.2 with 300TB of 
files. (385TB max space) this is so we can use both the old and new storage via 
multi-cluster.

We are moving to a new GPFS cluster so we can use the new protocol nodes 
eventually and also put the new storage machines as cluster managers, as this 
should be faster and future proof

The new hardware has 1PB of space running GPFS 4.2

We have multiple filesets, and would like to maintain our namespace as far as 
possible.

My plan was to.

1. Create a read-only (RO) AFM cache on the new storage (ro)
2a. Move old fileset and replace with SymLink to new.
2b. Convert RO AFM to Local Update (LU) AFM pointing to new parking area of old 
files.
2c. move user access to new location in cache.
3. Flush everything into cache and disconnect.

I've read the docs including the ones on migration but it's not clear if it's 
safe to move the home of a cache and update the target. It looks like it should 
be possible and my tests say it works.

An alternative plan is to use a Independent Writer (IW) AFM Cache to move the 
home directories which are pointed to by LDAP. Hence we can move users one at a 
time and only have to drain the HPC cluster at the end to disconnect the cache. 
I assume that migrating users over an Independent Writer is safe so long as the 
users don't use both sides of the cache at once (ie home and target)

I'm also interested in any recipe people have on GPFS policies to preseed and 
flush the cache.

We plan to do all the migration using AFM over GPFS we're not currently using 
NFS and have no plans to start. I believe using GPFS is the faster method to 
preform the migration.

Any suggestions and experience of doing similar migration jobs would be helpful.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Hardware refresh

2016-10-11 Thread Peter Childs
My reading is,

If you are running a small cluster with tie-breaker disks and your wanting to 
change the manager servers, or you want to switch to using the new config 
management method in v4 then new cluster, and use multicluster to upgrade.

Otherwise just use a new Filesystem within the old cluster.

But I'm interested to hear otherwise, as I'm about to embark on this myself.

I note you can switch an old cluster but need to shutdown to do so.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


 Marc A Kaplan wrote 

New FS? Yes there are some good reasons.
New cluster?  I did not see a compelling argument either way.



From:"mark.b...@siriuscom.com" <mark.b...@siriuscom.com>
To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date:10/11/2016 03:34 PM
Subject:Re: [gpfsug-discuss] Hardware refresh
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Ok.  I think I am hearing that a new cluster with a new FS and copying data 
from old to new cluster is the best way forward.  Thanks everyone for your 
input.

From: <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Yuri L Volobuev 
<volob...@us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date: Tuesday, October 11, 2016 at 12:22 PM
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] Hardware refresh


This depends on the committed cluster version level (minReleaseLevel) and file 
system format. Since NFSv2 is an on-disk format change, older code wouldn't be 
able to understand what it is, and thus if there's a possibility of a downlevel 
node looking at the NSD, the NFSv1 format is going to be used. The code does 
NSDv1<->NSDv2 conversions under the covers as needed when adding an empty NSD 
to a file system.

I'd strongly recommend getting a fresh start by formatting a new file system. 
Many things have changed over the course of the last few years. In particular, 
having a 4K-aligned file system can be a pretty big deal, depending on what 
hardware one is going to deploy in the future, and this is something that can't 
be bolted onto an existing file system. Having 4K inodes is very handy for many 
reasons. New directory format and NSD format changes are attractive, too. And 
disks generally tend to get larger with time, and at some point you may want to 
add a disk to an existing storage pool that's larger than the existing 
allocation map format allows. Obviously, it's more hassle to migrate data to a 
new file system, as opposed to extending an existing one. In a perfect world, 
GPFS would offer a conversion tool that seamlessly and robustly converts old 
file systems, making them as good as new, but in the real world such a tool 
doesn't exist. Getting a clean slate by formatting a new file system every few 
years is a good long-term investment of time, although it comes front-loaded 
with extra work.

yuri

[nactive hide details for Aaron Knister ---10/10/2016 04:45:31 PM---Can 
on]Aaron Knister ---10/10/2016 04:45:31 PM---Can one format NSDv2 NSDs and put 
them in a filesystem with NSDv1 NSD's? -Aaron

From: Aaron Knister <aaron.s.knis...@nasa.gov>
To: <gpfsug-discuss@spectrumscale.org>,
Date: 10/10/2016 04:45 PM
Subject: Re: [gpfsug-discuss] Hardware refresh
Sent by: gpfsug-discuss-boun...@spectrumscale.org






Can one format NSDv2 NSDs and put them in a filesystem with NSDv1 NSD's?

-Aaron

On 10/10/16 7:40 PM, Luis Bolinches wrote:
> Hi
>
> Creating a new FS sounds like a best way to go. NSDv2 being a very good
> reason to do so.
>
> AFM for migrations is quite good, latest versions allows to use NSD
> protocol for mounts as well. Olaf did a great job explaining this
> scenario on the redbook chapter 6
>
> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open
>
> --
> Cheers
>
> On 10 Oct 2016, at 23.05, Buterbaugh, Kevin L
> <kevin.buterba...@vanderbilt.edu
> <mailto:kevin.buterba...@vanderbilt.edu>> wrote:
>
>> Hi Mark,
>>
>> The last time we did something like this was 2010 (we’re doing rolling
>> refreshes now), so there are probably lots of better ways to do this
>> than what we did, but we:
>>
>> 1) set up the new hardware
>> 2) created new filesystems (so that we could make adjustments we
>> wanted to make that can only be made at FS creation time)
>> 3) used rsync to make a 1st pass copy of everything
>> 4) coordinated a time with users / groups to do a 2nd rsync when they
>> weren’t active
>> 5) used symbolic links during the transition (i.e. rm -rvf
>> /gpfs0/home/joeuser; ln -s /gpfs2/home/joeuser /gpfs0/home/joeuser)
>> 6) once everybody was migrated, u

Re: [gpfsug-discuss] GPFS Upgrade 3.5 -> 4.1

2016-10-10 Thread Peter Childs
So in short we're saying, "mmchfs -V LATEST" increments a version number and 
allows new features to become possible, it does not start using them straight 
away. Hence

Directories will shrink in 4.1 but you need to run "mmchattr --compact" on all 
the old ones before anything actually changes. (new ones are fine) increasing 
the version number makes this possible but it does not actually do it, as doing 
it would mean walking every directory and updating stuff.





Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


 Yuri L Volobuev wrote 


Correct. mmchfs -V only done quick operations (that can be easily undone if 
something goes wrong). Essentially the big task here is to increase on-disk 
file system descriptor version number, to allow using those features that 
require a higher version. Bigger "conversion"-style tasks belong in mmmigratefs.

The only way to increase the inode size and the data block size is to format a 
new file system. This cannot be done on an existing file system.

yuri

[Inactive hide details for Jan-Frode Myklebust ---10/10/2016 07:35:48 AM---I've 
also always been worried about that one, but nev]Jan-Frode Myklebust 
---10/10/2016 07:35:48 AM---I've also always been worried about that one, but 
never experienced it taking any time, I/O or inter

From: Jan-Frode Myklebust <janfr...@tanso.net>
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>,
Date: 10/10/2016 07:35 AM
Subject: Re: [gpfsug-discuss] GPFS Upgrade 3.5 -> 4.1
Sent by: gpfsug-discuss-boun...@spectrumscale.org





I've also always been worried about that one, but never experienced it taking 
any time, I/O or interruption. I've the interpreted it to just start using new 
features, but not really changing anything with the existing metadata. Things 
needing on disk changes are probably put in mmmigratefs I have not heard about 
anything needing mmmigratefs since GPFS v3.3 (fs version 11.03) added fast 
extended attributes.

Would be great to hear otherwize, or confirmations.


-jf
man. 10. okt. 2016 kl. 14.32 skrev Peter Childs 
<p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>>:

We are finishing upgrading our GPFS cluster of around 250 (client) nodes from 
GPFS 3.5.0.31 to Spectrum Scale 4.1.1.8, and have just about upgraded all the 
computers.

We are looking at running the "mmchfs -V LATEST" step and where wondering how 
much io this takes and if it was likely to interrupt service?

We are looking at upgrading to 4.2 but plan to do that via Multi-cluster and 
AFM as we are integrating new hardware and wish to increase the block and inode 
size at the same time.

Peter Childs
Research Storage Expert
ITS Research Infrastructure
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] OOM Killer killing off GPFS 3.5

2016-05-24 Thread Peter Childs
 Hi All,

We have an issue where the Linux kills off GPFS first when a computer runs out 
of memory, this happens when user processors have exhausted memory and swap and 
the out of memory killer in Linux kills the GPFS daemon as the largest user of 
memory, due to its large pinned memory foot print.

We have an issue where the Linux kills off GPFS first when a computer runs out 
of memory. We are running GPFS 3.5

We believe this happens when user processes have exhausted memory and swap and 
the out of memory killer in Linux chooses to  kill the GPFS daemon as the 
largest user of memory, due to its large pinned memory footprint.

This means that GPFS is killed and the whole cluster blocks for a minute before 
it resumes operation, this is not ideal, and kills and causes issues with most 
of the cluster.

What we see is users unable to login elsewhere on the cluster until we have 
powered off the node. We believe this is because while the node is still 
pingable, GPFS doesn't expel it from the cluster.

This issue mainly occurs on our frontend nodes of our HPC cluster but can 
effect the rest of the cluster when it occurs.

This issue mainly occurs on the login nodes of our HPC cluster but can affect 
the rest of the cluster when it occurs.

I've seen others on list with this issue.

We've come up with a solution where by the gpfs is adjusted so that is unlikely 
to be the first thing to be killed, and hopefully the user process is killed 
and not GPFS.

We've come up with a solution to adjust the OOM score of GPFS, so that it is 
unlikely to be the first thing to be killed, and hopefully the OOM killer picks 
a user process instead.

Out testing says this solution works, but I'm asking here firstly to share our 
knowledge and secondly to ask if there is anything we've missed with this 
solution and issues with this.

We've tested this and it seems to work. I'm asking here firstly to share our 
knowledge and secondly to ask if there is anything we've missed with this 
solution.

Its short which is part of its beauty.

/usr/local/sbin/gpfs-oom_score_adj


#!/bin/bash

 for proc in $(pgrep mmfs); do
  echo -500 >/proc/$proc/oom_score_adj
 done


This can then be called automatically on GPFS startup with the following:


mmaddcallback startupoomkiller --command /usr/local/sbin/gpfs-oom_score_adj 
--event startup


and either restart gpfs or just run the script on all nodes.

Peter Childs
ITS Research Infrastructure
Queen Mary, University of London
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss