Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )
Let's give Fujitsu an opportunity to answer with some facts and re-pose their questions. When I first read the complaint, I kinda assumed they were using mmbackup and TSM -- but then I noticed words about some gpfs_XXX apis So it looks like this Fujitsu fellow is "rolling his own"... NOT using mmapplypolicy. And we don't know if he is backing up to an old paper tape punch device or what ! He's just saying that whatever it is that he did took 60 days... Can you get from here to there faster? Sure, take an airplane instead of walking! My other remark which had a typo was and is: There have many satisfied customers and installations of Spectrum Scale File System using mmbackup and/or Tivoli Storage Manager. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )
On Thu, 08 Feb 2018 10:33:13 -0500, "Marc A Kaplan" said: > Please clarify and elaborate When you write "a full backup ... takes > 60 days" - that seems very poor indeed. > BUT you haven't stated how much data is being copied to what kind of > backup media nor how much equipment or what types you are using... Nor > which backup software... > > We have Spectrum Scale installation doing nightly backups of huge file > systems using the mmbackup command with TivoliStorageManager backup, using > IBM branded or approved equipment and software. How long did the *first* TSM backup take? Remember that TSM does the moral equivalent of a 'full' backup at first, and incrementals thereafter. So it's quite possible to have a very large filesystem with little data churn to do incrementals in 5-6 hours, even though the first one took several weeks. pgpFDcsfP_Unb.pgp Description: PGP signature ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )
Please clarify and elaborate When you write "a full backup ... takes 60 days" - that seems very poor indeed. BUT you haven't stated how much data is being copied to what kind of backup media nor how much equipment or what types you are using... Nor which backup software... We have Spectrum Scale installation doing nightly backups of huge file systems using the mmbackup command with TivoliStorageManager backup, using IBM branded or approved equipment and software. From: "tomasz.wol...@ts.fujitsu.com" To: "gpfsug-discuss@spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by:gpfsug-discuss-boun...@spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: · Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the “hidden” config parameters § iscanPrefetchAggressiveness 2 § iscanPrefetchDepth 0 § iscanPrefetchThreadsPerNode 0 · Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Inode scan optimization
Recall that many years ago we demonstrated a Billion files scanned with mmapplypolicy in under 20 minutes... And that was on ordinary at the time, spinning disks (not SSD!)... Granted we packed about 1000 files per directory and made some other choices that might not be typical usage OTOH storage and nodes have improved since then... SO when you say it takes 60 days to backup 2 billion files and that's a problem Like any large computing job, one has to do some analysis to find out what parts of the job are taking how much time... So... what commands are you using to do the backup...? What timing statistics or measurements have you collected? If you are using mmbackup and/or mmapplypolicy, those commands can show you how much time they spend scanning the file system looking for files to backup AND then how much time they spend copying the data to backup media. In fact they operate in distinct phases... directory scan, inode scan, THEN data copying ... so it's straightforward to see which phases are taking how much time. OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- These APIs are tricky and not a panacea That's why we provide you with mmapplypolicy which in fact uses those APIs in clever, patented ways -- optimized and honed with years of work And more recently, we provided you with samples/ilm/mmfind -- which has the functionality of the classic unix find command -- but runs in parallel - using mmapplypolicy. TRY IT on you file system! From: "tomasz.wol...@ts.fujitsu.com" To: "gpfsug-discuss@spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by:gpfsug-discuss-boun...@spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: · Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the “hidden” config parameters § iscanPrefetchAggressiveness 2 § iscanPrefetchDepth 0 § iscanPrefetchThreadsPerNode 0 · Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Inode scan optimization
You mention that all the NSDs are metadata and data but you do not say how many NSDs are defined or the type of storage used, that is are these on SAS or NL-SAS storage? I'm assuming they are not on SSDs/flash storage. Have you considered moving the metadata to separate NSDs, preferably SSD/flash storage? This is likely to give you a significant performance boost. You state that using the inode scan API you reduced the time to 40 days. Did you analyze your backup application to determine where the time was being spent for the backup? If the inode scan is a small percentage of your backup time then optimizing it will not provide much benefit. Fred __ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 sto...@us.ibm.com From: "tomasz.wol...@ts.fujitsu.com" To: "gpfsug-discuss@spectrumscale.org" Date: 02/08/2018 05:50 AM Subject: [gpfsug-discuss] Inode scan optimization Sent by:gpfsug-discuss-boun...@spectrumscale.org Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: · Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the “hidden” config parameters § iscanPrefetchAggressiveness 2 § iscanPrefetchDepth 0 § iscanPrefetchThreadsPerNode 0 · Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=y2y22xZuqjpkKfO2WSdcJsBXMaM8hOedaB_AlgFlIb0&s=DL0ZnBuH9KpvKN6XQNvoYmvwfZDbbwMlM-4rCbsAgWo&e= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] Inode scan optimization
Hello All, A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 takes 60 days. We try to optimize and using inode scans seems to improve, even when we are using a directory scan and the inode scan just for having a better performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in parallel doing dir scans (+ inode scans for stat info) we have decreased the time to 40 days. All NSDs are dataAndMetadata type. I have the following questions: · Is there a way to increase the inode scan cache (we may use 32 GByte)? o Can we us the "hidden" config parameters § iscanPrefetchAggressiveness 2 § iscanPrefetchDepth 0 § iscanPrefetchThreadsPerNode 0 · Is there a documentation concerning cache behavior? o if no, is the inode scan cache process or node specific? o Is there a suggestion to optimize the termIno parameter in the gpfs_stat_inode_with_xattrs64() in such a use case? Thanks! Best regards, Tomasz Wolski ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss