Re: [gpfsug-discuss] 5.1.2.2 changes
https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_apars_512x.html Is normally the best place to look for changes in PTF releases. Peter Childs ITS Research Storage Queen Mary University Of London From: gpfsug-discuss-boun...@spectrumscale.org on behalf of Hannappel, Juergen Sent: Thursday, January 13, 2022 5:26 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] 5.1.2.2 changes CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. Hi, just got notified that 5.1.2.2 is out. What are the changes to 5.1.2.1? https://www.ibm.com/docs/en/spectrum-scale/5.1.2?topic=summary-changes does not specify that -- Dr. Jürgen Hannappel DESY/ITTel. : +49 40 8998-4616 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies?
We've had this same issue with characters that are fine in Scale but Protect can't handle. Normally its because some script has embedded a newline in the middle of a file name, and normally we end up renaming that file by inode number find . -inum 9975226749 -exec mv {} badfilename \; mostly because we can't even type the filename at the command prompt. However its not always just new line characters currently we've got a few files with unprintable characters in it. but its normally less than 50 files every few months, so is easy to handle manually. I normally end up looking at /data/mmbackup.unsupported which is the standard output from mmapplypolicy and extracting the file names from it and emailing the users concerned to assist them in working out what went wrong. I guess you could automate the parsing of this file at the end of the backup process and do something interesting with it. Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org on behalf of Simon Thompson Sent: Monday, October 11, 2021 9:35 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon On 09/10/2021, 10:09, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale™ file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ’, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mai
Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image.
We used to run mmsdrestore -p manager -R /usr/bin/scp in a xcat postscript to re-add our nodes to our Spectrum Scale cluster. however we disliked needing to put the private key for the whole cluster on every host, We now use mmsdrestore -N nodename post-install from a management node to re-add the node to the cluster, so we could stop xcat from distributing the private key for security reasons. Ideally we would have like the postscript call a manual call back to do this but have not as yet worked out how best to do this in xcat, so currently its a manual task which is fine when our nodes are stateless, but is not possible when your nodes are stateless. My understanding is that xcat should have a hook to do this like the pre-scripts to run one at the end but I'm yet to find it. Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org on behalf of Ruffner, Scott (jpr9c) Sent: Friday, January 29, 2021 8:04 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image. Thanks David! Slick solution. -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruff...@virginia.edu From: on behalf of "david_john...@brown.edu" Reply-To: gpfsug main discussion list Date: Friday, January 29, 2021 at 2:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Adding client nodes using a shared NFS root image. We use mmsdrrestore after the node boots. In our case these are diskless nodes provisioned by xCAT. The post install script takes care of ensuring infiniband is lit up, and does the mmsdrrestore followed by mmstartup. -- ddj Dave Johnson On Jan 29, 2021, at 2:47 PM, Ruffner, Scott (jpr9c) wrote: Hi everyone, We want all of our compute nodes (bare metal) to directly participate in the cluster as client nodes; of course, they are sharing a common root image. Adding nodes via the regular mmaddnode (with the dsh operation to replicate files to the clients) isn’t really viable, but if I short-circuit that, and simply generate the /var/mmfs/gen files and then manually copy those and the keyfiles to the shared root images, is that safe? Am I going about this the entirely wrong way? -- Scott Ruffner Senior HPC Engineer UVa Research Computing (434)924-6778(o) (434)295-0250(h) sruff...@virginia.edu ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] Odd behavior using sudo for mmchconfig
Yesterday, I updated updated some gpfs config using sudo /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=20,maxStatCache=80 which looked to have worked fine, however later other machines started reported issues with permissions while running mmlsquota as a user, cannot open file `/var/mmfs/gen/mmfs.cfg.ls' for reading (Permission denied) cannot open file `/var/mmfs/gen/mmfs.cfg' for reading (Permission denied) this was corrected by run-running the command from the same machine within a root session. sudo -s /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=2,maxStatCache=8 /usr/lpp/mmfs/bin/mmchconfig -N frontend maxFilesToCache=20,maxStatCache=80 exit I suspecting an environment issue from within sudo caused the gpfs config to have its permissions to change, but I've done simular before with no bad effects, so I'm a little confused. We're looking at tightening up our security to reduce the need for need for root based password less access from none admin nodes, but I've never understood the expect requirements this is using setting correctly, and I periodically see issues with our root known_hosts files when we update our admin hosts and hence I often endup going around with 'mmdsh -N all echo ""' to clear the old entries, but I always find this less than ideal, and hence would prefer a better solution. Thanks for any ideas to get this right and avoid future issues. I'm more than happy to open a IBM ticket on this issue, but I feel community feed back might get me further to start with. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Metadata space usage NFS4 vs POSIX ACL
On Sat, 2019-04-06 at 23:50 +0200, Michal Zacek wrote: Hello, we decided to convert NFS4 acl to POSIX (we need share same data between SMB, NFS and GPFS clients), so I created script to convert NFS4 to posix ACL. It is very simple, first I do "chmod -R 770 DIR" and then "setfacl -R . DIR". I was surprised that conversion to posix acl has taken more then 2TB of metadata space.There is about one hundred million files at GPFS filesystem. Is this expected behavior? Thanks, Michal Example of NFS4 acl: #NFSv4 ACL #owner:root #group:root special:owner@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE(X)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@::allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE(-)DELETE_CHILD (-)CHOWN(-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:everyone@::allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE(-)DELETE_CHILD (-)CHOWN(-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED group:ag_cud_96_lab:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE(X)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:ag_cud_96_lab_ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE(-)DELETE_CHILD (-)CHOWN(X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED converted to posix acl: # owner: root # group: root user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::--- group:ag_cud_96_lab:rwx default:group:ag_cud_96_lab:rwx group:ag_cud_96_lab_ro:r-x default:group:ag_cud_96_lab_ro:r-x ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7Cp.childs%40qmul.ac.uk%7Ce1059833f7ed448b027608d6bad9ffec%7C569df091b01340e386eebd9cb9e25814%7C0%7C1%7C636901842833614488sdata=ROQ3LKmLZ06pI%2FTfdKZ9oPJx5a2xCUINqBnlIfEKF2Q%3Dreserved=0 I've been trying to get my head round acls, with the plan to implement Cluster Export Services SMB rather than roll your own SMB. I'm not sure that plan is going to work Michal, although it might if your not using the Cluster Export Services version of SMB. Put simply if your running Cluster export services SMB you need to set ACLs in Spectrum Scale to "nfs4" we currently have it set to "all" and it won't let you export the shares until you change it, currently I'm still testing, and have had to write a change to go the other way. If you using linux kernel nfs4 that uses posix, however CES nfs uses ganasha which uses nfs4 acl correctly. It gets slightly more annoying as nfs4-setfacl does not work with Spectrum Scale and you have to use mmputacl which has no recursive flag, I even found a ibm article from a few years ago saying the best way to set acls is to use find, and a temporary file. The other workaround they suggest is to update acls from windows or nfs to get the right. One thing I think may happen if you do as you've suggested is that you will break any acls under Samba badly. I think the other reason that command is taking up more space than expected is that your giving files acls that never had them to start with. I would love someone to say that I'm wrong, as changing our acl setting is going to be a pain. as while we don't make a lot of use of them we make enough that having to use nfs4 acls all the time is going to be a pain. -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmlsquota output
On Mon, 2019-03-25 at 09:52 +, Robert Horton wrote: > I don't know the answer to your actual question, but have you thought > about using the REST-API rather than parsing the command outputs? I > can > send over the Python stuff we're using if you mail me off list. Thanks, We don't currently run the REST-API, partly I've never got around to getting the monitoring overhead working, and working out which extra packages we need to go round our 300 nodes and install. Out cluster has been gradually upgraded over the years from 3.5 and we don't routinely install all the new packages the GUI needs on every node. It might be nice to see a list of which Spectrum Scale packages are needed for the different added value features in Scale. I'm currently working on re-writing the cli quota reporting program which was originally written in a combination of bash and awk. Its a strict Linux Cli util for reporting quota's and hence I'd prefer to avoid the overhead of using a Rest-API. With reference to the issue people reported not being able to run "mmlsfileset" as a user a few weeks ago, I've found a handy work-around using "mmlsattr" instead, and yes it does use the -Y flag all the time. I'd like to share the code, once its gone though some internal code review.. With reference to the other post, I will I think raise a PMR for this as it does not look like mmlsquota is working as documented. Thanks Peter Childs > > Rob > > On Mon, 2019-03-25 at 09:38 +, Peter Childs wrote: > > Can someone tell me I'm not reading this wrong. > > > > This is using Spectrum Scale 5.0.2-1 > > > > It looks like the output from mmlsquota is not what it says > > > > In the man page it says, > > > > mmlsquota [-u User | -g Group] [-v | -q] [-e] [-C ClusterName] > > [-Y] [--block-size {BlockSize | auto}] [Device[:Fileset] > > ...] > > > > however > > > > mmlsquota -u username fs:fileset > > > > Return the output for every fileset, not just the "fileset" I've > > asked > > for, this is same output as > > > > mmlsquota -u username fs > > > > Where I've not said the fileset. > > > > I can work around this, but I'm just checking this is not actually > > a > > bug, that ought to be fixed. > > > > Long story is that I'm working on rewriting our quota report util > > that > > used be a long bash/awk script into a more easy to understand > > python > > script, and I want to get the user quota info for just one > > fileset. > > > > Thanks in advance. > > > > -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] mmlsquota output
Can someone tell me I'm not reading this wrong. This is using Spectrum Scale 5.0.2-1 It looks like the output from mmlsquota is not what it says In the man page it says, mmlsquota [-u User | -g Group] [-v | -q] [-e] [-C ClusterName] [-Y] [--block-size {BlockSize | auto}] [Device[:Fileset] ...] however mmlsquota -u username fs:fileset Return the output for every fileset, not just the "fileset" I've asked for, this is same output as mmlsquota -u username fs Where I've not said the fileset. I can work around this, but I'm just checking this is not actually a bug, that ought to be fixed. Long story is that I'm working on rewriting our quota report util that used be a long bash/awk script into a more easy to understand python script, and I want to get the user quota info for just one fileset. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?
We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I’ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I’m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done. The reason I’m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we’re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That’s sub-optimal for any day on which a fileset gets created or deleted, so I’m looking for a better way … one which doesn’t require root privileges and preferably doesn’t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. — Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Can't take snapshots while re-striping
Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit.. Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-disc...@gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root@dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it’s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the vol
[gpfsug-discuss] Can't take snapshots while re-striping
We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] control which hosts become token manager
What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch<http://node-1.psi.ch> mmchmgr tiered node-2.psi.ch<http://node-2.psi.ch> It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on >different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Billich Heinrich Rainer (PSI) wrote Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don’t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ‘mmdiag –tokenmgr’ lists the machine as active token manager. The machine has role ‘quorum-client’. This doesn’t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root@node-2 ~]# mmlscluster GPFS cluster information GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node nameDesignation 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root@node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122<<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Same file opened by many nodes / processes
On Mon, 2018-07-23 at 22:13 +1200, José Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs mailto:p.chi...@qmul.ac.uk>> wrote: On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs mailto:p.chi...@qmul.ac.uk>> wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Yaron Daniel wrote Hi Do u run mmbackup on snapshot , which is read only ? Regards
Re: [gpfsug-discuss] Same file opened by many nodes / processes
On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs mailto:p.chi...@qmul.ac.uk>> wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Yaron Daniel wrote Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect – IL Lab Services (Storage)Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax:+972-3-916-5672 Mobile: +972-52-8395593 e-mail: y...@il.ibm.com<mailto:y...@il.ibm.com> IBM Israel<http://www.ibm.com/il/he/> [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From:Peter Childs mailto:p.chi...@qmul.ac.uk>> To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" mailto:gpfsug-discuss@spectrumscale.org>> Date:07/10/2018 05:51 PM Subject:[gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs
Re: [gpfsug-discuss] Same file opened by many nodes / processes
Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Yaron Daniel wrote Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect – IL Lab Services (Storage)Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax:+972-3-916-5672 Mobile: +972-52-8395593 e-mail: y...@il.ibm.com IBM Israel<http://www.ibm.com/il/he/> [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs To:"gpfsug-discuss@spectrumscale.org" Date:07/10/2018 05:51 PM Subject:[gpfsug-discuss] Same file opened by many nodes / processes Sent by:gpfsug-discuss-boun...@spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Same file opened by many nodes / processes
The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London IBM Spectrum Scale wrote What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team -- If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From:Peter Childs To:"gpfsug-discuss@spectrumscale.org" Date:07/10/2018 10:51 AM Subject:[gpfsug-discuss] Same file opened by many nodes / processes Sent by:gpfsug-discuss-boun...@spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] Lroc on NVME
We have a new computer, which has an nvme drive that is appearing as /dev/nvme0 and we'd like to put lroc on /dev/nvme0p1p1. which is a partition on the drive. After doing the standard mmcrnsd to set it up Spectrum Scale fails to see it. I've added a script /var/mmfs/etc/nsddevices so that gpfs scans them, and it does work now. What "type" should I set the nvme drives too? I've currently set it to "generic" I want to do some tidying of my script, but has anyone else tried to get lroc running on nvme and how well does it work. We're running CentOs 7.4 and Spectrum Scale 4.2.3-8 currently. Thanks in advance. -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862
We have 2 power 9 nodes, The rest of our cluster is running centos 7.4 and spectrum scale 4.2.3-8 (x86 based) The power 9 nodes are running spectrum scale 5.0.0-0 currently as we couldn't get the gplbin for 4.2.3 to compile, where as spectrum scale 5 worked on power 9 our of the box. They are running rhel7.5 but on an old kernel I guess. I'm not sure that 4.2.3 works on power 9 we've asked the IBM power 9 out reach team but heard nothing back. If we can get 4.2.3 running on the power 9 nodes it would put us in a more consistent setup. Of course our current plan b is to upgrade everything to 5.0.1, but we can't do that as our storage appliance doesn't (officially) support spectrum scale 5 yet. These are my experiences of what works and nothing whatsoever to do with what's supported, except I want to keep us as close to a supported setup as possible given what we've found to actually work. (now that's an interesting spin on a disclaimer) Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Simon Thompson (IT Research Support) wrote Thanks Felipe, Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 when the x86 7.5 release is also made? Simon * Insert standard IBM disclaimer about the meaning of intent etc etc From: gpfsug-discuss-boun...@spectrumscale.org [gpfsug-discuss-boun...@spectrumscale.org] on behalf of k...@us.ibm.com [k...@us.ibm.com] Sent: 04 June 2018 16:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Simon, The support statement for Power9 / RHEL 7.4 has not yet been included in the FAQ, but I understand that a FAQ update is under way: 4.2.3.8 for the 4.2.3 release 5.0.0.0 for the 5.0.0 release Kernel level tested with: 4.11.0-44.6.1.el7a Felipe Felipe Knop k...@us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So … I have another question on s]"Simon Thompson (IT Research Support)" ---06/04/2018 07:21:56 AM---So … I have another question on support. We’ve just ordered some Power 9 nodes, now my understanding From: "Simon Thompson (IT Research Support)" To: gpfsug main discussion list Date: 06/04/2018 07:21 AM Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 Sent by: gpfsug-discuss-boun...@spectrumscale.org So … I have another question on support. We’ve just ordered some Power 9 nodes, now my understanding is that with 7.4, they require the -ALT kernel (https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm) which is 4.x based. I don’t see any reference in the Spectrum Scale FAQ to the ALT kernels. So what Scale code is supported for us to run on the Power9s? Thanks Simon From: on behalf of "k...@us.ibm.com" Reply-To: "gpfsug-discuss@spectrumscale.org" Date: Friday, 25 May 2018 at 14:24 To: "gpfsug-discuss@spectrumscale.org" Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862 All, Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships with RHEL 7.5) as a result of applying kernel security patches may open a PMR to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be provided once the internal tests on RHEL 7.5 have been completed, likely a few days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid June). Regards, Felipe Felipe Knop k...@us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] How to clear explicitly set quotas
Its a little difficult that the different quota commands for Spectrum Scale are all different in there syntax and can only be used by the "right" people. As far as I can see mmedquota is the only quota command that uses this "full colon" syntax and it would be better if its syntax matched that for mmsetquota and mmlsquota. or that the reset to default quota was added to mmsetquota and mmedquota was left for editing quotas visually in an editor. Regards Peter Childs On Tue, 2018-05-22 at 16:01 +0800, IBM Spectrum Scale wrote: Hi Kuei-Yu, Should we update the document as the requested below ? Thanks. Regards, The Spectrum Scale (GPFS) team -- If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for Bryan Banister ---05/22/2018 04:52:15 AM---Quick update. Thanks to a colleague of mine, John Valdes,]Bryan Banister ---05/22/2018 04:52:15 AM---Quick update. Thanks to a colleague of mine, John Valdes, there is a way to specify the file system From: Bryan Banister <bbanis...@jumptrading.com> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 05/22/2018 04:52 AM Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas Sent by: gpfsug-discuss-boun...@spectrumscale.org Quick update. Thanks to a colleague of mine, John Valdes, there is a way to specify the file system + fileset + user with this form: mmedquota -d -u :: It’s just not documented in the man page or shown in the examples. Docs need to be updated! -Bryan From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan Banister Sent: Tuesday, May 15, 2018 11:00 AM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas Note: External Email Unfortunately it doesn’t look like there is a way to target a specific quota. So for cluster with many file systems and/or many filesets in each file system, clearing the quota entries affect all quotas in all file systems and all filesets. This means that you have to clear them all and then reapply the explicit quotas that you need to keep. # mmedquota -h Usage: mmedquota -d {-u User ... | -g Group ... | -j Device:Fileset ... } Maybe RFE time, or am I missing some other existing solution? -Bryan From: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan Banister Sent: Tuesday, May 15, 2018 10:36 AM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas Note: External Email That was it! Thanks! # mmrepquota -v fpi_test02:root --block-size G *** Report for USR GRP quotas on fpi_test02 Block Limits | File Limits Name fileset type GB quota limit in_doubt grace | files quota limit in_doubt grace entryType root root USR 243 0 0 0 none | 248 0 0 0 none default on bbanister root USR 84 0 0 0 none | 21 0 0 0 none e root root GRP 243 0 0 0 none | 248 0 0 0 none default on # mmedquota -d -u bbanister # # mmrepquota -v fpi_test02:root --block-size G *** Report for USR GRP quotas on fpi_test02 Block Limits | File Limits Name fileset type GB quota limit in_doubt grace | files quota limit in_doubt grace entryType root root USR 243 0 0 0 none | 248 0 0 0 none default on bbanister root USR 84 0 0 0 none | 21 0 0 0 none d_fset root root GRP 243 0 0 0 none | 248 0 0 0 none default on Note that " Try disabling and re-enabling default quotas with the -d option for that fileset " didn't fix this issue. Cheers, -Bryan -Original Message- From: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter Serocka Sent: Monday, May 14, 2018 4:52 PM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Subject: Re: [gpfsug-discuss] How to clear explicitly set quotas Note: External Email -
Re: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.
top but ps gives the same value. [root@dn29<mailto:root@dn29> ~]# ps auww -q USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 2.7 22.3 10537600 5472580 ?S wrote: I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root@dn29<mailto:root@dn29> ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc .The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs <p.chi...@qmul.ac.uk> wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.
I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage. The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working. Thanks in advance Peter. [root@dn29<mailto:root@dn29> ~]# mmdiag --memory === mmdiag: memory === mmfsd heap size: 2039808 bytes Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 17500049370 hard limit on memory usage 1048576 bytes committed to regions 1 number of regions 555 allocations 555 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 42179592 bytes in use 17500049370 hard limit on memory usage 56623104 bytes committed to regions 9 number of regions 100027 allocations 79624 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 2099520 bytes in use 17500049370 hard limit on memory usage 16778240 bytes committed to regions 1 number of regions 4 allocations 0 frees 0 allocation failures On Mon, 2017-07-24 at 13:11 +, Jim Doherty wrote: There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 shared memory segments. To see the memory utilization of the shared memory segments run the command mmfsadm dump malloc .The statistics for memory pool id 2 is where maxFilesToCache/maxStatCache objects are and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. You might want to upgrade to later PTF as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops. On Monday, July 24, 2017 5:29 AM, Peter Childs <p.chi...@qmul.ac.uk> wrote: We have two GPFS clusters. One is fairly old and running 4.2.1-2 and non CCR and the nodes run fine using up about 1.5G of memory and is consistent (GPFS pagepool is set to 1G, so that looks about right.) The other one is "newer" running 4.2.1-3 with CCR and the nodes keep increasing in there memory usage, starting at about 1.1G and are find for a few days however after a while they grow to 4.2G which when the node need to run real work, means the work can't be done. I'm losing track of what maybe different other than CCR, and I'm trying to find some more ideas of where to look. I'm checked all the standard things like pagepool and maxFilesToCache (set to the default of 4000), workerThreads is set to 128 on the new gpfs cluster (against default 48 on the old) I'm not sure what else to look at on this one hence why I'm asking the community. Thanks in advance Peter Childs ITS Research Storage Queen Mary University of London. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
As I understand it, mmbackup calls mmapplypolicy so this stands for mmapplypolicy too. mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. You can ask mmapplypolicy to scan a fileset, inode space or filesystem. If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. I like independent filesets I'm guessing the only reason dependant filesets are used by default is history. Peter On 18/05/17 14:58, Jaime Pinto wrote: Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan": When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss@spectrumscale.org Cc: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 02:10 AM Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. - Original message - From: "Jaime Pinto" Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't
Re: [gpfsug-discuss] AFM Prefetch Missing Files
Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: I know it was said at the User group meeting last week that older versions of afm prefetch miss empty files and that this is now fixed in 4.2.2.3. We are in the middle of trying to migrate our files to a new filesystem, and since that was said I'm double checking for any mistakes etc. Anyway it looks like AFM prefetch also misses symlinks pointing to files that that don't exist. ie "dangling symlinks" or ones that point to files that either have not been created yet or have subsequently been deleted. or when files have been decompressed and a symlink extracted that points somewhere that is never going to exist. I'm still checking this, and as yet it does not look like its a data loss issue, but it could still cause things to not quiet work once the file migration is complete. Does anyone else know of any other types of files that might be missed and I need to be aware of? We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" using a gpfs policy to collect the list, we are using GPFS Multi-cluster to connect the two filesystems not NFS Thanks in advanced Peter Childs ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] AFM Prefetch Missing Files
I know it was said at the User group meeting last week that older versions of afm prefetch miss empty files and that this is now fixed in 4.2.2.3. We are in the middle of trying to migrate our files to a new filesystem, and since that was said I'm double checking for any mistakes etc. Anyway it looks like AFM prefetch also misses symlinks pointing to files that that don't exist. ie "dangling symlinks" or ones that point to files that either have not been created yet or have subsequently been deleted. or when files have been decompressed and a symlink extracted that points somewhere that is never going to exist. I'm still checking this, and as yet it does not look like its a data loss issue, but it could still cause things to not quiet work once the file migration is complete. Does anyone else know of any other types of files that might be missed and I need to be aware of? We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" using a gpfs policy to collect the list, we are using GPFS Multi-cluster to connect the two filesystems not NFS Thanks in advanced Peter Childs ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Spectrum Scale Slow to create directories
Simon, We've managed to resolve this issue by switching off quota's and switching them back on again and rebuilding the quota file. Can I check if you run quota's on your cluster. See you 2 weeks in Manchester Thanks in advance. Peter Childs Research Storage Expert ITS Research Infrastructure Queen Mary, University of London Phone: 020 7882 8393 From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Simon Thompson (IT Research Support) <s.j.thomp...@bham.ac.uk> Sent: Tuesday, April 11, 2017 4:55:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories We actually saw this for a while on one of our clusters which was new. But by the time I'd got round to looking deeper, it had gone, maybe we were using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2, so might be worth trying to bump the version and see if it goes away. We saw it on the NSD servers directly as well, so not some client trying to talk to it, so maybe there was some buggy code? Simon On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of bbanis...@jumptrading.com> wrote: >There are so many things to look at and many tools for doing so (iostat, >htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc). I would >recommend a review of the presentation that Yuri gave at the most recent >GPFS User Group: >https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs > >Cheers, >-Bryan > >-Original Message- >From: gpfsug-discuss-boun...@spectrumscale.org >[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter >Childs >Sent: Tuesday, April 11, 2017 3:58 AM >To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories > >This is a curious issue which I'm trying to get to the bottom of. > >We currently have two Spectrum Scale file systems, both are running GPFS >4.2.1-1 some of the servers have been upgraded to 4.2.1-2. > >The older one which was upgraded from GPFS 3.5 works find create a >directory is always fast and no issue. > >The new one, which has nice new SSD for metadata and hence should be >faster. can take up to 30 seconds to create a directory but usually takes >less than a second, The longer directory creates usually happen on busy >nodes that have not used the new storage in a while. (Its new so we've >not moved much of the data over yet) But it can also happen randomly >anywhere, including from the NSD servers them selves. (times of 3-4 >seconds from the NSD servers have been seen, on a single directory create) > >We've been pointed at the network and suggested we check all network >settings, and its been suggested to build an admin network, but I'm not >sure I entirely understand why and how this would help. Its a mixed >1G/10G network with the NSD servers connected at 40G with an MTU of 9000. > >However as I say, the older filesystem is fine, and it does not matter if >the nodes are connected to the old GPFS cluster or the new one, (although >the delay is worst on the old gpfs cluster), So I'm really playing spot >the difference. and the network is not really an obvious difference. > >Its been suggested to look at a trace when it occurs but as its difficult >to recreate collecting one is difficult. > >Any ideas would be most helpful. > >Thanks > > > >Peter Childs >ITS Research Infrastructure >Queen Mary, University of London >___ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >and to please notify the sender immediately and destroy this email and >any attachments. Email transmission cannot be guaranteed to be secure or >error-free. The Company, therefore, does not make any guarantees as to >the completeness or accuracy of this email or any attachments. This email >is for informational purposes only and does not constitute a >recommendation, offer, request or solicitation of any kind to buy, sell, >subscribe, redeem or perform any type of transaction of a financial >product. >___ >gpfsug-discuss m
Re: [gpfsug-discuss] Spectrum Scale Slow to create directories
After a load more debugging, and switching off the quota's the issue looks to be quota related. in that the issue has gone away since I switched quota's off. I will need to switch them back on, but at least we know the issue is not the network and is likely to be fixed by upgrading. Peter Childs ITS Research Infrastructure Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Peter Childs <p.chi...@qmul.ac.uk> Sent: Tuesday, April 11, 2017 8:35:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories Can you remember what version you were running? Don't worry if you can't remember. It looks like ibm may have withdrawn 4.2.1 from fix central and wish to forget its existences. Never a good sign, 4.2.0, 4.2.2, 4.2.3 and even 3.5, so maybe upgrading is worth a try. I've looked at all the standard trouble shouting guides and got nowhere hence why I asked. But another set of slides always helps. Thank-you for the help, still head scratching Which only makes the issue more random. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Simon Thompson (IT Research Support) wrote We actually saw this for a while on one of our clusters which was new. But by the time I'd got round to looking deeper, it had gone, maybe we were using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2, so might be worth trying to bump the version and see if it goes away. We saw it on the NSD servers directly as well, so not some client trying to talk to it, so maybe there was some buggy code? Simon On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of bbanis...@jumptrading.com> wrote: >There are so many things to look at and many tools for doing so (iostat, >htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc). I would >recommend a review of the presentation that Yuri gave at the most recent >GPFS User Group: >https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs > >Cheers, >-Bryan > >-Original Message- >From: gpfsug-discuss-boun...@spectrumscale.org >[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter >Childs >Sent: Tuesday, April 11, 2017 3:58 AM >To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories > >This is a curious issue which I'm trying to get to the bottom of. > >We currently have two Spectrum Scale file systems, both are running GPFS >4.2.1-1 some of the servers have been upgraded to 4.2.1-2. > >The older one which was upgraded from GPFS 3.5 works find create a >directory is always fast and no issue. > >The new one, which has nice new SSD for metadata and hence should be >faster. can take up to 30 seconds to create a directory but usually takes >less than a second, The longer directory creates usually happen on busy >nodes that have not used the new storage in a while. (Its new so we've >not moved much of the data over yet) But it can also happen randomly >anywhere, including from the NSD servers them selves. (times of 3-4 >seconds from the NSD servers have been seen, on a single directory create) > >We've been pointed at the network and suggested we check all network >settings, and its been suggested to build an admin network, but I'm not >sure I entirely understand why and how this would help. Its a mixed >1G/10G network with the NSD servers connected at 40G with an MTU of 9000. > >However as I say, the older filesystem is fine, and it does not matter if >the nodes are connected to the old GPFS cluster or the new one, (although >the delay is worst on the old gpfs cluster), So I'm really playing spot >the difference. and the network is not really an obvious difference. > >Its been suggested to look at a trace when it occurs but as its difficult >to recreate collecting one is difficult. > >Any ideas would be most helpful. > >Thanks > > > >Peter Childs >ITS Research Infrastructure >Queen Mary, University of London >___ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >
Re: [gpfsug-discuss] Spectrum Scale Slow to create directories
Can you remember what version you were running? Don't worry if you can't remember. It looks like ibm may have withdrawn 4.2.1 from fix central and wish to forget its existences. Never a good sign, 4.2.0, 4.2.2, 4.2.3 and even 3.5, so maybe upgrading is worth a try. I've looked at all the standard trouble shouting guides and got nowhere hence why I asked. But another set of slides always helps. Thank-you for the help, still head scratching Which only makes the issue more random. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Simon Thompson (IT Research Support) wrote We actually saw this for a while on one of our clusters which was new. But by the time I'd got round to looking deeper, it had gone, maybe we were using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2, so might be worth trying to bump the version and see if it goes away. We saw it on the NSD servers directly as well, so not some client trying to talk to it, so maybe there was some buggy code? Simon On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Bryan Banister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of bbanis...@jumptrading.com> wrote: >There are so many things to look at and many tools for doing so (iostat, >htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc). I would >recommend a review of the presentation that Yuri gave at the most recent >GPFS User Group: >https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs > >Cheers, >-Bryan > >-Original Message- >From: gpfsug-discuss-boun...@spectrumscale.org >[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter >Childs >Sent: Tuesday, April 11, 2017 3:58 AM >To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories > >This is a curious issue which I'm trying to get to the bottom of. > >We currently have two Spectrum Scale file systems, both are running GPFS >4.2.1-1 some of the servers have been upgraded to 4.2.1-2. > >The older one which was upgraded from GPFS 3.5 works find create a >directory is always fast and no issue. > >The new one, which has nice new SSD for metadata and hence should be >faster. can take up to 30 seconds to create a directory but usually takes >less than a second, The longer directory creates usually happen on busy >nodes that have not used the new storage in a while. (Its new so we've >not moved much of the data over yet) But it can also happen randomly >anywhere, including from the NSD servers them selves. (times of 3-4 >seconds from the NSD servers have been seen, on a single directory create) > >We've been pointed at the network and suggested we check all network >settings, and its been suggested to build an admin network, but I'm not >sure I entirely understand why and how this would help. Its a mixed >1G/10G network with the NSD servers connected at 40G with an MTU of 9000. > >However as I say, the older filesystem is fine, and it does not matter if >the nodes are connected to the old GPFS cluster or the new one, (although >the delay is worst on the old gpfs cluster), So I'm really playing spot >the difference. and the network is not really an obvious difference. > >Its been suggested to look at a trace when it occurs but as its difficult >to recreate collecting one is difficult. > >Any ideas would be most helpful. > >Thanks > > > >Peter Childs >ITS Research Infrastructure >Queen Mary, University of London >___ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >and to please notify the sender immediately and destroy this email and >any attachments. Email transmission cannot be guaranteed to be secure or >error-free. The Company, therefore, does not make any guarantees as to >the completeness or accuracy of this email or any attachments. This email >is for informational purposes only and does not constitute a >recommendation, offer, request or solicitation of any kind to buy, sell, >subscribe, redeem or perform any type of transaction of a financial >product. >___ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discu
Re: [gpfsug-discuss] mmbackup logging issue
That's basically what we did, They are only environment variables, so if you not using bash to call mmbackup you will need to change the lines accordingly. What they do is in the manual the issue is the default changed between versions. Peter Childs ITS Research Infrastructure Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sobey, Richard A <r.so...@imperial.ac.uk> Sent: Friday, March 3, 2017 9:20:24 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmbackup logging issue HI all We have the same problem (less of a problem, more lack of visibilitiy). Can I just add those lines to the top of our mmbackup.sh script? -Original Message- From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Ashish Thandavan Sent: 02 March 2017 16:50 To: gpfsug-discuss@spectrumscale.org Subject: Re: [gpfsug-discuss] mmbackup logging issue Dear Peter, On 02/03/17 16:34, Peter Childs wrote: > We had that issue. > > we had to > > export MMBACKUP_PROGRESS_CONTENT=5 > export MMBACKUP_PROGRESS_INTERVAL=300 > > before we run it to get it back. > > Lets just say IBM changed the behaviour, We ended up opening a PRM to > get that answer ;) We also set -L 1 > > you can change how often the messages are displayed by changing > MMBACKUP_PROGRESS_INTERVAL flexable but the default is different;) > I'll set those variables before kicking off the next mmbackup and hope that fixes it. Thank you!! Regards, Ash -- - Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thanda...@cs.ox.ac.uk ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup logging issue
We had that issue. we had to export MMBACKUP_PROGRESS_CONTENT=5 export MMBACKUP_PROGRESS_INTERVAL=300 before we run it to get it back. Lets just say IBM changed the behaviour, We ended up opening a PRM to get that answer ;) We also set -L 1 you can change how often the messages are displayed by changing MMBACKUP_PROGRESS_INTERVAL flexable but the default is different;) Peter Childs ITS Research Infrastructure Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Ashish Thandavan <ashish.thanda...@cs.ox.ac.uk> Sent: Tuesday, February 28, 2017 4:10:44 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] mmbackup logging issue Dear all, We have a small GPFS cluster and a separate server running TSM and one of the three NSD servers backs up our GPFS filesystem to the TSM server using mmbackup. After a recent upgrade from v3.5 to 4.1.1, we've noticed that mmbackup no longer logs stuff like it used to : ... Thu Jan 19 05:45:41 2017 mmbackup:Backing up files: 0 backed up, 870532 expired, 2 failed. Thu Jan 19 06:15:41 2017 mmbackup:Backing up files: 0 backed up, 870532 expired, 3 failed. Thu Jan 19 06:45:41 2017 mmbackup:Backing up files: 0 backed up, 870532 expired, 3 failed. ... instead of ... Sat Dec 3 12:01:00 2016 mmbackup:Backing up files: 105030 backed up, 635456 expired, 30 failed. Sat Dec 3 12:31:00 2016 mmbackup:Backing up files: 205934 backed up, 635456 expired, 57 failed. Sat Dec 3 13:01:00 2016 mmbackup:Backing up files: 321702 backed up, 635456 expired, 169 failed. ... like it used to pre-upgrade. I am therefore unable to see how far long it has got, and indeed if it completed successfully, as this is what it logs at the end of a job : ... Tue Jan 17 18:07:31 2017 mmbackup:Completed policy backup run with 0 policy errors, 10012 files failed, 0 severe errors, returning rc=9. Tue Jan 17 18:07:31 2017 mmbackup:Policy for backup returned 9 Highest TSM error 12 mmbackup: TSM Summary Information: Total number of objects inspected: 20617273 Total number of objects backed up: 0 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 1 Total number of objects failed: 10012 Total number of objects encrypted: 0 Total number of bytes inspected: 3821624716861 Total number of bytes transferred: 3712040943672 Tue Jan 17 18:07:31 2017 mmbackup:Audit files /cs/mmbackup.audit.gpfs* contain 0 failed paths but there were 10012 failures. Cannot reconcile shadow database. Unable to compensate for all TSM errors in new shadow database. Preserving previous shadow database. Run next mmbackup with -q to synchronize shadow database. exit 12 If it helps, the mmbackup job is kicked off with the following options : /usr/lpp/mmfs/bin/mmbackup gpfs -n 8 -t full -B 2 -L 1 --tsm-servers gpfs_weekly_stanza -N glossop1a | /usr/bin/tee /var/log/mmbackup/gpfs_weekly/backup_log.`date +%Y%m%d_%H_%M` (The excerpts above are from the backup_log. file.) Our NSD servers are running GPFS 4.1.1-11, TSM is at 7.1.1.100 and the File system version is 12.06 (3.4.0.3). Has anyone else seen this behaviour with mmbackup and if so, found a fix? Thanks, Regards, Ash -- - Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thanda...@cs.ox.ac.uk ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] AFM OpenFiles
4.2.1.1 or CentOs 7. So that might account for it. Thanks Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Venkateswara R Puvvada <vpuvv...@in.ibm.com> Sent: Thursday, February 9, 2017 3:10:58 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM OpenFiles What is the version of GPFS ? There was an issue fixed in Spectrum Scale 4.2.2 for file count(file_nr) leak. This issue mostly happens on Linux kernel version >= 3.6. ~Venkat (vpuvv...@in.ibm.com) From: Peter Childs <p.chi...@qmul.ac.uk> To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date:02/09/2017 08:00 PM Subject:[gpfsug-discuss] AFM OpenFiles Sent by:gpfsug-discuss-boun...@spectrumscale.org We are trying to preform a file migration from our old GPFS cluster to our New GPFS Cluster using AFM. Currently we have 142 AFM Filesets setup one for each fileset on the old cluster, and are attempting to prefetch the files. in batched of 100,000 files with "mmafmctl home prefetch -j $fileset --list-file=$curfile --home-fs-path=/data/$fileset 2>&1" I'm doing this on a separate gateway node from our main gpfs servers and its work quiet well, However there seams to be a leak in AFM with file handles and after a couple of days of prefetch the gateway will run out of file handles and need rebooting before we can continue. We thought to begin with this was improved by not doing --metadata-only on the prefetch but in fact (As we where attempting to get the metadata before getting the main data) but in truth the machine was just lasting a little longer. Does anyone know of any setting that may help this or what is wrong? Thanks Peter Childs ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] AFM OpenFiles
We are trying to preform a file migration from our old GPFS cluster to our New GPFS Cluster using AFM. Currently we have 142 AFM Filesets setup one for each fileset on the old cluster, and are attempting to prefetch the files. in batched of 100,000 files with "mmafmctl home prefetch -j $fileset --list-file=$curfile --home-fs-path=/data/$fileset 2>&1" I'm doing this on a separate gateway node from our main gpfs servers and its work quiet well, However there seams to be a leak in AFM with file handles and after a couple of days of prefetch the gateway will run out of file handles and need rebooting before we can continue. We thought to begin with this was improved by not doing --metadata-only on the prefetch but in fact (As we where attempting to get the metadata before getting the main data) but in truth the machine was just lasting a little longer. Does anyone know of any setting that may help this or what is wrong? Thanks Peter Childs ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] AFM Migration Issue
Interesting, I'm currently doing similar but currently am only using read-only to premigrate the filesets, The directory file stamps don't agree with the original but neither are they all marked when they were migrated. So there is something very weird going on. (We're planning to switch them to Local Update when we move the users over to them) We're using a mmapplypolicy on our old gpfs cluster to get the files to migrate, and have noticed that you need a RULE EXTERNAL LIST ESCAPE '%/' line otherwise files with % in the filenames don't get migrated and through errors. I'm trying to work out if empty directories or those containing only empty directories get migrated correctly as you can't list them in the mmafmctl prefetch statement. (If you try (using DIRECTORIES_PLUS) they through errors) I am very interested in the solution to this issue. Peter Childs Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of paul.tomlin...@awe.co.uk <paul.tomlin...@awe.co.uk> Sent: Monday, January 9, 2017 3:09:43 PM To: gpfsug-discuss@spectrumscale.org Subject: [gpfsug-discuss] AFM Migration Issue Hi All, We have just completed the first data move from our old cluster to the new one using AFM Local Update as per the guide, however we have noticed that all date stamps on the directories have the date they were created on(e.g. 9th Jan 2017) , not the date from the old system (e.g. 14th April 2007), whereas all the files have the correct dates. Has anyone else seen this issue as we now have to convert all the directory dates to their original dates ! The information in this email and in any attachment(s) is commercial in confidence. If you are not the named addressee(s) or if you receive this email in error then any distribution, copying or use of this communication or the information in it is strictly prohibited. Please notify us immediately by email at admin.internet(at)awe.co.uk, and then delete this message from your computer. While attachments are virus checked, AWE plc does not accept any liability in respect of any virus which is not detected. AWE Plc Registered in England and Wales Registration No 02763902 AWE, Aldermaston, Reading, RG7 4PR ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] LROC
So your saying maxStatCache should be raised on LROC enabled nodes only as its the only place under Linux its used and should be set low on non-LROC enabled nodes. Fine just good to know, nice and easy now with nodeclasses Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme <oeh...@gmail.com> Sent: Wednesday, December 21, 2016 11:37:46 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC StatCache is not useful on Linux, that hasn't changed if you don't use LROC on the same node. LROC uses the compact object (StatCache) to store its pointer to the full file Object which is stored on the LROC device. so on a call for attributes that are not in the StatCache the object gets recalled from LROC and converted back into a full File Object, which is why you still need to have a reasonable maxFiles setting even you use LROC as you otherwise constantly move file infos in and out of LROC and put the device under heavy load. sven On Wed, Dec 21, 2016 at 12:29 PM Peter Childs <p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>> wrote: My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>> on behalf of Sven Oehme <oeh...@gmail.com<mailto:oeh...@gmail.com>> Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil <mw...@wustl.edu<mailto:mw...@wustl.edu><mailto:mw...@wustl.edu<mailto:mw...@wustl.edu>>> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 4 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil <mw...@wustl.edu<mailto:mw...@wustl.edu><mailto:mw...@wustl.edu<mailto:mw...@wustl.edu>>> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage<https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage> Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org><http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/lis
Re: [gpfsug-discuss] LROC
My understanding was the maxStatCache was only used on AIX and should be set low on Linux, as raising it did't help and wasted resources. Are we saying that LROC now uses it and setting it low if you raise maxFilesToCache under linux is no longer the advice. Peter Childs From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme <oeh...@gmail.com> Sent: Wednesday, December 21, 2016 9:23:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC Lroc only needs a StatCache object as it 'compacts' a full open File object (maxFilesToCache) to a StatCache Object when it moves the content to the LROC device. therefore the only thing you really need to increase is maxStatCache on the LROC node, but you still need maxFiles Objects, so leave that untouched and just increas maxStat Olaf's comment is important you need to make sure your manager nodes have enough memory to hold tokens for all the objects you want to cache, but if the memory is there and you have enough its well worth spend a lot of memory on it and bump maxStatCache to a high number. i have tested maxStatCache up to 16 million at some point per node, but if nodes with this large amount of inodes crash or you try to shut them down you have some delays , therefore i suggest you stay within a 1 or 2 million per node and see how well it does and also if you get a significant gain. i did help Bob to setup some monitoring for it so he can actually get comparable stats, i suggest you setup Zimon and enable the Lroc sensors to have real stats too , so you can see what benefits you get. Sven On Tue, Dec 20, 2016 at 8:13 PM Matt Weil <mw...@wustl.edu<mailto:mw...@wustl.edu>> wrote: as many as possible and both have maxFilesToCache 128000 and maxStatCache 4 do these effect what sits on the LROC as well? Are those to small? 1million seemed excessive. On 12/20/16 11:03 AM, Sven Oehme wrote: how much files do you want to cache ? and do you only want to cache metadata or also data associated to the files ? sven On Tue, Dec 20, 2016 at 5:35 PM Matt Weil <mw...@wustl.edu<mailto:mw...@wustl.edu>> wrote: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Flash%20Storage<https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Flash%20Storage> Hello all, Are there any tuning recommendations to get these to cache more metadata? Thanks Matt ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Using AFM to migrate files. (Peter Childs) (Peter Childs)
Bill Pappas wrote > >>the largest of the filesets has 52TB and 63 million files > > > Are you using NFS as the transport path between the home and cache? No plans to was planning to use gpfs multi-cluster, as transport. > If you are using NFS, how are you producing the list of files to migrate? > mmafmctl with the prefetch option? If so, I would measure the time it takes > for that command (with that option) to produce the list of files it intends > to prefetch. From my experience, this is very important as a) it can take a > long time if you have >10 million of files and b) I've seen this operation > crash when the list grew large. Does anyone else on this thread have any > experiences? I would love to hear positive experiences as well. I tried so > hard and for so long to make AFM work with one customer, but we gave up as it > was not reliable and stable for large scale (many files) migrations. > If you are using GPFS as the conduit between the home and cache (i.e. no > NFS), I would still ask the same question, more with respect to stability for > large file lists during the initial prefetch stages. I was planning to use a gpfs policy to create the list, but I guess a find should work, I'm guessing your saying don't migrate the files in bulk by using a find onto cache. It would be nice to see some examples recipes to prefetch files into afm. > > > As far as I could tell, from GPFS 3.5 to 4.2, the phases of prefetch where > the home and cache are compared (i.e. let's make a list of what is ot be > migrated over) before the data transfer begins only runs on the GW node > managing that cache. It does not leverage multiple gw nodes and multiple > home nodes to speed up this 'list and find' stage of prefetch. I hope some > AFM developers can clarify or correct my findings. This was a huge > impediment for large file migrations where it is difficult (organizationally, > not technically) to split a folder structure into multiple file sets. The > lack of stability under these large scans was the real failing for us. Interesting. > > > Bill Pappas > > 901-619-0585 > > bpap...@dstonline.com<mailto:bpap...@dstonline.com> > > Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London > > > > > > http://www.prweb.com/releases/2016/06/prweb13504050.htm > > > > From: > gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> > > <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>> > on behalf of > gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org> > > <gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org>> > Sent: Thursday, October 20, 2016 2:07 PM > To: gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org> > Subject: gpfsug-discuss Digest, Vol 57, Issue 53 > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org> > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-requ...@spectrumscale.org<mailto:gpfsug-discuss-requ...@spectrumscale.org> > > You can reach the person managing the list at > > gpfsug-discuss-ow...@spectrumscale.org<mailto:gpfsug-discuss-ow...@spectrumscale.org> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > >1. Re: Using AFM to migrate files. (Peter Childs) (Peter Childs) > > > ------ > > Message: 1 > Date: Thu, 20 Oct 2016 19:07:44 +<tel:+> > From: Peter Childs <p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>> > To: gpfsug main discussion list > <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> > Subject: Re: [gpfsug-discuss] Using AFM to migrate files. (Peter > Childs) > Message-ID: > <5qv6d7inj2j1pa94kqamk2uf.1476989646...@email.android.com<mailto:5qv6d7inj2j1pa94kqamk2uf.1476989646...@email.android.com>> > Content-Type: text/plain; charset="iso-8859-1" > > Yes, most of the filesets are based on research groups, projects or > departments, with the exception of scratch and home, hence the idea to use a > different method for these filesets. > > There are ap
Re: [gpfsug-discuss] Using AFM to migrate files.
Yes but not a great deal, Peter Childs Research Storage Expert ITS Research Infrastructure Queen Mary, University of London From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Yaron Daniel <y...@il.ibm.com> Sent: Thursday, October 20, 2016 7:15:54 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Using AFM to migrate files. Hi Does you use NFSv4 acls in your old cluster ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_09E5055809E4FFC4002269E8C2258052] Server, Storage and Data Services<https://w3-03.ibm.com/services/isd/secure/client.wss/Somt?eventType=getHomePage=115>- Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax:+972-3-916-5672 Mobile: +972-52-8395593 e-mail: y...@il.ibm.com IBM Israel<http://www.ibm.com/il/he/> From:Peter Childs <p.chi...@qmul.ac.uk> To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date:10/19/2016 05:34 PM Subject:[gpfsug-discuss] Using AFM to migrate files. Sent by:gpfsug-discuss-boun...@spectrumscale.org We are planning to use AFM to migrate our old GPFS file store to a new GPFS file store. This will give us the advantages of Spectrum Scale (GPFS) 4.2, such as larger block and inode size. I would like to attempt to gain some insight on my plans before I start. The old file store was running GPFS 3.5 with 512 byte inodes and 1MB block size. We have now upgraded it to 4.1 and are working towards 4.2 with 300TB of files. (385TB max space) this is so we can use both the old and new storage via multi-cluster. We are moving to a new GPFS cluster so we can use the new protocol nodes eventually and also put the new storage machines as cluster managers, as this should be faster and future proof The new hardware has 1PB of space running GPFS 4.2 We have multiple filesets, and would like to maintain our namespace as far as possible. My plan was to. 1. Create a read-only (RO) AFM cache on the new storage (ro) 2a. Move old fileset and replace with SymLink to new. 2b. Convert RO AFM to Local Update (LU) AFM pointing to new parking area of old files. 2c. move user access to new location in cache. 3. Flush everything into cache and disconnect. I've read the docs including the ones on migration but it's not clear if it's safe to move the home of a cache and update the target. It looks like it should be possible and my tests say it works. An alternative plan is to use a Independent Writer (IW) AFM Cache to move the home directories which are pointed to by LDAP. Hence we can move users one at a time and only have to drain the HPC cluster at the end to disconnect the cache. I assume that migrating users over an Independent Writer is safe so long as the users don't use both sides of the cache at once (ie home and target) I'm also interested in any recipe people have on GPFS policies to preseed and flush the cache. We plan to do all the migration using AFM over GPFS we're not currently using NFS and have no plans to start. I believe using GPFS is the faster method to preform the migration. Any suggestions and experience of doing similar migration jobs would be helpful. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Hardware refresh
My reading is, If you are running a small cluster with tie-breaker disks and your wanting to change the manager servers, or you want to switch to using the new config management method in v4 then new cluster, and use multicluster to upgrade. Otherwise just use a new Filesystem within the old cluster. But I'm interested to hear otherwise, as I'm about to embark on this myself. I note you can switch an old cluster but need to shutdown to do so. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Marc A Kaplan wrote New FS? Yes there are some good reasons. New cluster? I did not see a compelling argument either way. From:"mark.b...@siriuscom.com" <mark.b...@siriuscom.com> To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date:10/11/2016 03:34 PM Subject:Re: [gpfsug-discuss] Hardware refresh Sent by:gpfsug-discuss-boun...@spectrumscale.org Ok. I think I am hearing that a new cluster with a new FS and copying data from old to new cluster is the best way forward. Thanks everyone for your input. From: <gpfsug-discuss-boun...@spectrumscale.org> on behalf of Yuri L Volobuev <volob...@us.ibm.com> Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: Tuesday, October 11, 2016 at 12:22 PM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] Hardware refresh This depends on the committed cluster version level (minReleaseLevel) and file system format. Since NFSv2 is an on-disk format change, older code wouldn't be able to understand what it is, and thus if there's a possibility of a downlevel node looking at the NSD, the NFSv1 format is going to be used. The code does NSDv1<->NSDv2 conversions under the covers as needed when adding an empty NSD to a file system. I'd strongly recommend getting a fresh start by formatting a new file system. Many things have changed over the course of the last few years. In particular, having a 4K-aligned file system can be a pretty big deal, depending on what hardware one is going to deploy in the future, and this is something that can't be bolted onto an existing file system. Having 4K inodes is very handy for many reasons. New directory format and NSD format changes are attractive, too. And disks generally tend to get larger with time, and at some point you may want to add a disk to an existing storage pool that's larger than the existing allocation map format allows. Obviously, it's more hassle to migrate data to a new file system, as opposed to extending an existing one. In a perfect world, GPFS would offer a conversion tool that seamlessly and robustly converts old file systems, making them as good as new, but in the real world such a tool doesn't exist. Getting a clean slate by formatting a new file system every few years is a good long-term investment of time, although it comes front-loaded with extra work. yuri [nactive hide details for Aaron Knister ---10/10/2016 04:45:31 PM---Can on]Aaron Knister ---10/10/2016 04:45:31 PM---Can one format NSDv2 NSDs and put them in a filesystem with NSDv1 NSD's? -Aaron From: Aaron Knister <aaron.s.knis...@nasa.gov> To: <gpfsug-discuss@spectrumscale.org>, Date: 10/10/2016 04:45 PM Subject: Re: [gpfsug-discuss] Hardware refresh Sent by: gpfsug-discuss-boun...@spectrumscale.org Can one format NSDv2 NSDs and put them in a filesystem with NSDv1 NSD's? -Aaron On 10/10/16 7:40 PM, Luis Bolinches wrote: > Hi > > Creating a new FS sounds like a best way to go. NSDv2 being a very good > reason to do so. > > AFM for migrations is quite good, latest versions allows to use NSD > protocol for mounts as well. Olaf did a great job explaining this > scenario on the redbook chapter 6 > > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open > > -- > Cheers > > On 10 Oct 2016, at 23.05, Buterbaugh, Kevin L > <kevin.buterba...@vanderbilt.edu > <mailto:kevin.buterba...@vanderbilt.edu>> wrote: > >> Hi Mark, >> >> The last time we did something like this was 2010 (we’re doing rolling >> refreshes now), so there are probably lots of better ways to do this >> than what we did, but we: >> >> 1) set up the new hardware >> 2) created new filesystems (so that we could make adjustments we >> wanted to make that can only be made at FS creation time) >> 3) used rsync to make a 1st pass copy of everything >> 4) coordinated a time with users / groups to do a 2nd rsync when they >> weren’t active >> 5) used symbolic links during the transition (i.e. rm -rvf >> /gpfs0/home/joeuser; ln -s /gpfs2/home/joeuser /gpfs0/home/joeuser) >> 6) once everybody was migrated, u
Re: [gpfsug-discuss] GPFS Upgrade 3.5 -> 4.1
So in short we're saying, "mmchfs -V LATEST" increments a version number and allows new features to become possible, it does not start using them straight away. Hence Directories will shrink in 4.1 but you need to run "mmchattr --compact" on all the old ones before anything actually changes. (new ones are fine) increasing the version number makes this possible but it does not actually do it, as doing it would mean walking every directory and updating stuff. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London Yuri L Volobuev wrote Correct. mmchfs -V only done quick operations (that can be easily undone if something goes wrong). Essentially the big task here is to increase on-disk file system descriptor version number, to allow using those features that require a higher version. Bigger "conversion"-style tasks belong in mmmigratefs. The only way to increase the inode size and the data block size is to format a new file system. This cannot be done on an existing file system. yuri [Inactive hide details for Jan-Frode Myklebust ---10/10/2016 07:35:48 AM---I've also always been worried about that one, but nev]Jan-Frode Myklebust ---10/10/2016 07:35:48 AM---I've also always been worried about that one, but never experienced it taking any time, I/O or inter From: Jan-Frode Myklebust <janfr...@tanso.net> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>, Date: 10/10/2016 07:35 AM Subject: Re: [gpfsug-discuss] GPFS Upgrade 3.5 -> 4.1 Sent by: gpfsug-discuss-boun...@spectrumscale.org I've also always been worried about that one, but never experienced it taking any time, I/O or interruption. I've the interpreted it to just start using new features, but not really changing anything with the existing metadata. Things needing on disk changes are probably put in mmmigratefs I have not heard about anything needing mmmigratefs since GPFS v3.3 (fs version 11.03) added fast extended attributes. Would be great to hear otherwize, or confirmations. -jf man. 10. okt. 2016 kl. 14.32 skrev Peter Childs <p.chi...@qmul.ac.uk<mailto:p.chi...@qmul.ac.uk>>: We are finishing upgrading our GPFS cluster of around 250 (client) nodes from GPFS 3.5.0.31 to Spectrum Scale 4.1.1.8, and have just about upgraded all the computers. We are looking at running the "mmchfs -V LATEST" step and where wondering how much io this takes and if it was likely to interrupt service? We are looking at upgrading to 4.2 but plan to do that via Multi-cluster and AFM as we are integrating new hardware and wish to increase the block and inode size at the same time. Peter Childs Research Storage Expert ITS Research Infrastructure Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/> http://gpfsug.org/mailman/listinfo/gpfsug-discuss___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] OOM Killer killing off GPFS 3.5
Hi All, We have an issue where the Linux kills off GPFS first when a computer runs out of memory, this happens when user processors have exhausted memory and swap and the out of memory killer in Linux kills the GPFS daemon as the largest user of memory, due to its large pinned memory foot print. We have an issue where the Linux kills off GPFS first when a computer runs out of memory. We are running GPFS 3.5 We believe this happens when user processes have exhausted memory and swap and the out of memory killer in Linux chooses to kill the GPFS daemon as the largest user of memory, due to its large pinned memory footprint. This means that GPFS is killed and the whole cluster blocks for a minute before it resumes operation, this is not ideal, and kills and causes issues with most of the cluster. What we see is users unable to login elsewhere on the cluster until we have powered off the node. We believe this is because while the node is still pingable, GPFS doesn't expel it from the cluster. This issue mainly occurs on our frontend nodes of our HPC cluster but can effect the rest of the cluster when it occurs. This issue mainly occurs on the login nodes of our HPC cluster but can affect the rest of the cluster when it occurs. I've seen others on list with this issue. We've come up with a solution where by the gpfs is adjusted so that is unlikely to be the first thing to be killed, and hopefully the user process is killed and not GPFS. We've come up with a solution to adjust the OOM score of GPFS, so that it is unlikely to be the first thing to be killed, and hopefully the OOM killer picks a user process instead. Out testing says this solution works, but I'm asking here firstly to share our knowledge and secondly to ask if there is anything we've missed with this solution and issues with this. We've tested this and it seems to work. I'm asking here firstly to share our knowledge and secondly to ask if there is anything we've missed with this solution. Its short which is part of its beauty. /usr/local/sbin/gpfs-oom_score_adj #!/bin/bash for proc in $(pgrep mmfs); do echo -500 >/proc/$proc/oom_score_adj done This can then be called automatically on GPFS startup with the following: mmaddcallback startupoomkiller --command /usr/local/sbin/gpfs-oom_score_adj --event startup and either restart gpfs or just run the script on all nodes. Peter Childs ITS Research Infrastructure Queen Mary, University of London ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss