Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )

2018-02-08 Thread Marc A Kaplan
Let's give Fujitsu an opportunity to answer with some facts and re-pose 
their questions. 

When I first read the complaint, I kinda assumed they were using mmbackup 
and TSM -- but then I noticed words about some gpfs_XXX apis So it 
looks like this Fujitsu fellow is "rolling his own"...  NOT using 
mmapplypolicy.  And we don't know if he is backing up to an old paper tape 
punch device or what ! 
He's just saying that whatever it is that he did took 60 days...   Can you 
get from here to there faster? Sure, take an airplane instead of walking!

My other remark which had a typo was and is:

There have many satisfied customers and installations of Spectrum Scale 
File System using mmbackup and/or Tivoli Storage Manager.


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )

2018-02-08 Thread valdis . kletnieks
On Thu, 08 Feb 2018 10:33:13 -0500, "Marc A Kaplan" said:

> Please clarify and elaborate   When you write "a full backup ... takes 
> 60 days"  - that seems very poor indeed.
> BUT you haven't stated how much data is being copied to what kind of 
> backup media nor how much equipment or what types you are using... Nor 
> which backup software... 
>
> We have Spectrum Scale installation doing nightly backups of huge file 
> systems using the mmbackup command with TivoliStorageManager backup, using 
> IBM branded or approved equipment and software.

How long did the *first* TSM backup take?  Remember that TSM does the moral
equivalent of a 'full' backup at first, and incrementals thereafter.  So it's 
quite
possible to have a very large filesystem with little data churn to do 
incrementals
in 5-6 hours, even though the first one took several weeks.


pgpFDcsfP_Unb.pgp
Description: PGP signature
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Inode scan optimization - (tomasz.wol...@ts.fujitsu.com )

2018-02-08 Thread Marc A Kaplan
Please clarify and elaborate   When you write "a full backup ... takes 
60 days"  - that seems very poor indeed.
BUT you haven't stated how much data is being copied to what kind of 
backup media nor how much equipment or what types you are using... Nor 
which backup software... 

We have Spectrum Scale installation doing nightly backups of huge file 
systems using the mmbackup command with TivoliStorageManager backup, using 
IBM branded or approved equipment and software.



From:   "tomasz.wol...@ts.fujitsu.com" 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   02/08/2018 05:50 AM
Subject:    [gpfsug-discuss] Inode scan optimization
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hello All,
 
A full backup of an 2 billion inodes spectrum scale file system on 
V4.1.1.16 takes 60 days.
 
We try to optimize and using inode scans seems to improve, even when we 
are using a directory scan and the inode scan just for having a better 
performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 
processes in parallel doing dir scans (+ inode scans for stat info) we 
have decreased the time to 40 days.
All NSDs are dataAndMetadata type.
 
I have the following questions:
· Is there a way to increase the inode scan cache (we may use 32 
GByte)? 
o   Can we us the “hidden” config parameters
§ iscanPrefetchAggressiveness 2
§ iscanPrefetchDepth 0
§ iscanPrefetchThreadsPerNode 0
· Is there a documentation concerning cache behavior?
o   if no, is the  inode scan cache process or node specific?
o   Is there a suggestion to optimize the termIno parameter in the 
gpfs_stat_inode_with_xattrs64() in such a use case?
 
Thanks! 
 
Best regards,
Tomasz Wolski___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Inode scan optimization

2018-02-08 Thread Marc A Kaplan
Recall that many years ago we demonstrated a Billion files scanned with 
mmapplypolicy in under 20 minutes...
And that was on ordinary at the time, spinning disks (not SSD!)... Granted 
we packed about 1000 files per directory and made some other choices that 
might not be typical usage  OTOH storage and nodes have improved since 
then...

SO when you say it takes 60 days to backup 2 billion files and that's a 
problem
Like any large computing job, one has to do some analysis to find out what 
parts of the job are taking how much time...

So... what commands are you using to do the backup...?
What timing statistics or measurements have you collected?

If you are using mmbackup and/or mmapplypolicy, those commands can show 
you how much time they spend scanning the file system looking for files to 
backup AND then how much time they spend copying the data to backup media. 
 In fact they operate in distinct phases... directory scan, inode scan, 
THEN data copying ... so it's straightforward to see which phases are 
taking how much time.

OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- 
These APIs are tricky and not a panacea That's why we provide you with 
mmapplypolicy which in fact uses those APIs in clever, patented ways  -- 
optimized and honed with years of work 

And more recently, we provided you with samples/ilm/mmfind -- which has 
the functionality of the classic unix find command -- but runs in parallel 
- using mmapplypolicy.
TRY IT on you file system!



From:   "tomasz.wol...@ts.fujitsu.com" 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   02/08/2018 05:50 AM
Subject:    [gpfsug-discuss] Inode scan optimization
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hello All,
 
A full backup of an 2 billion inodes spectrum scale file system on 
V4.1.1.16 takes 60 days.
 
We try to optimize and using inode scans seems to improve, even when we 
are using a directory scan and the inode scan just for having a better 
performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 
processes in parallel doing dir scans (+ inode scans for stat info) we 
have decreased the time to 40 days.
All NSDs are dataAndMetadata type.
 
I have the following questions:
· Is there a way to increase the inode scan cache (we may use 32 
GByte)? 
o   Can we us the “hidden” config parameters
§ iscanPrefetchAggressiveness 2
§ iscanPrefetchDepth 0
§ iscanPrefetchThreadsPerNode 0
· Is there a documentation concerning cache behavior?
o   if no, is the  inode scan cache process or node specific?
o   Is there a suggestion to optimize the termIno parameter in the 
gpfs_stat_inode_with_xattrs64() in such a use case?
 
Thanks! 
 
Best regards,
Tomasz Wolski___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Inode scan optimization

2018-02-08 Thread Frederick Stock
You mention that all the NSDs are metadata and data but you do not say how 
many NSDs are defined or the type of storage used, that is are these on 
SAS or NL-SAS storage?  I'm assuming they are not on SSDs/flash storage.

Have you considered moving the metadata to separate NSDs, preferably 
SSD/flash storage?  This is likely to give you a significant performance 
boost.

You state that using  the inode scan API you reduced the time to 40 days. 
Did you analyze your backup application to determine where the time was 
being spent for the backup?  If the inode scan is a small percentage of 
your backup time then optimizing it will not provide much benefit.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com



From:   "tomasz.wol...@ts.fujitsu.com" 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   02/08/2018 05:50 AM
Subject:    [gpfsug-discuss] Inode scan optimization
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hello All,
 
A full backup of an 2 billion inodes spectrum scale file system on 
V4.1.1.16 takes 60 days.
 
We try to optimize and using inode scans seems to improve, even when we 
are using a directory scan and the inode scan just for having a better 
performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 
processes in parallel doing dir scans (+ inode scans for stat info) we 
have decreased the time to 40 days.
All NSDs are dataAndMetadata type.
 
I have the following questions:
· Is there a way to increase the inode scan cache (we may use 32 
GByte)? 
o   Can we us the “hidden” config parameters
§ iscanPrefetchAggressiveness 2
§ iscanPrefetchDepth 0
§ iscanPrefetchThreadsPerNode 0
· Is there a documentation concerning cache behavior?
o   if no, is the  inode scan cache process or node specific?
o   Is there a suggestion to optimize the termIno parameter in the 
gpfs_stat_inode_with_xattrs64() in such a use case?
 
Thanks! 
 
Best regards,
Tomasz Wolski___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=y2y22xZuqjpkKfO2WSdcJsBXMaM8hOedaB_AlgFlIb0&s=DL0ZnBuH9KpvKN6XQNvoYmvwfZDbbwMlM-4rCbsAgWo&e=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Inode scan optimization

2018-02-08 Thread tomasz.wol...@ts.fujitsu.com
Hello All,

A full backup of an 2 billion inodes spectrum scale file system on V4.1.1.16 
takes 60 days.

We try to optimize and using inode scans seems to improve, even when we are 
using a directory scan and the inode scan just for having a better performance 
concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 processes in 
parallel doing dir scans (+ inode scans for stat info) we have decreased the 
time to 40 days.
All NSDs are dataAndMetadata type.

I have the following questions:

· Is there a way to increase the inode scan cache (we may use 32 GByte)?

o   Can we us the "hidden" config parameters

§ iscanPrefetchAggressiveness 2

§ iscanPrefetchDepth 0

§ iscanPrefetchThreadsPerNode 0

· Is there a documentation concerning cache behavior?

o   if no, is the  inode scan cache process or node specific?

o   Is there a suggestion to optimize the termIno parameter in the 
gpfs_stat_inode_with_xattrs64() in such a use case?

Thanks!

Best regards,
Tomasz Wolski
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss