Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan. Plan your Test and Test your Plan Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably h
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan. Plan your Test and Test your Plan Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, supp
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan. Plan your Test and Test your Plan Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject:Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackupwith fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers&quo
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
So it could be that we didn’t really know what we were doing when our system was installed (and still don’t by some of the messages I post *cough*) but basically I think we’re quite similar to other shops where we resell GPFS to departmental users internally and it just made some sense to break down each one into a fileset. We can then snapshot each one individually (7402 snapshots at the moment) and apply quotas. I know your question was why independent and not dependent – but I honestly don’t know. I assume it’s to do with not crossing the streams if you’ll excuse the obvious film reference. Richard From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Stephen Ulmer Sent: 18 May 2017 15:48 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn’t a single point of contention for creating files. I’m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I’m sure that limit could be changed eventually, there’s probably some efficiencies in not making it much bigger than it needs to be. I don’t know if it would take an on-disk format change or not. So how do you decide that a use case gets it’s own fileset, and do you just always use independent or is there an evaluation? I’m just curious because I like to understand lots of different points of view — feel free to tell me to go away. :) -- Stephen On May 18, 2017, at 10:32 AM, Sobey, Richard A mailto:r.so...@imperial.ac.uk>> wrote: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the –inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it’s just IBM not envisaging any customer needing more than that? We’ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> [mailto:gpfsug-discuss-boun...@spectrumscale.org]On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list mailto:gpfsug-discuss@spectrumscale.org>> Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there’s only one number to watch per filesystem. — ddj Dave Johnson Brown University ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn’t a single point of contention for creating files. I’m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I’m sure that limit could be changed eventually, there’s probably some efficiencies in not making it much bigger than it needs to be. I don’t know if it would take an on-disk format change or not. So how do you decide that a use case gets it’s own fileset, and do you just always use independent or is there an evaluation? I’m just curious because I like to understand lots of different points of view — feel free to tell me to go away. :) -- Stephen > On May 18, 2017, at 10:32 AM, Sobey, Richard A <mailto:r.so...@imperial.ac.uk>> wrote: > > Thanks, I was just about to post that, and I guess is still the reason a > dependent fileset is still the default without the –inode-space new option > fileset creation. > > I do wonder why there is a limit of 1000, whether it’s just IBM not > envisaging any customer needing more than that? We’ve only got 414 at the > moment but that will grow to over 500 this year. > > Richard > > From: gpfsug-discuss-boun...@spectrumscale.org > <mailto:gpfsug-discuss-boun...@spectrumscale.org> > [mailto:gpfsug-discuss-boun...@spectrumscale.org > <mailto:gpfsug-discuss-boun...@spectrumscale.org>]On Behalf Of David D. > Johnson > Sent: 18 May 2017 15:24 > To: gpfsug main discussion list <mailto:gpfsug-discuss@spectrumscale.org>> > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup > with fileset : scope errors > > Here is one big reason independent filesets are problematic: > A5.13: > Table 43. Maximum number of filesets > Version of GPFS > Maximum Number of Dependent Filesets > Maximum Number of Independent Filesets > IBM Spectrum Scale V4 > 10,000 > 1,000 > GPFS V3.5 > 10,000 > 1,000 > Another is that each independent fileset must be sized (and resized) for the > number of inodes it is expected to contain. > If that runs out (due to growth or a runaway user job), new files cannot be > created until the inode limit is bumped up. > This is true of the root namespace as well, but there’s only one number to > watch per filesystem. > > — ddj > Dave Johnson > Brown University > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > <http://gpfsug.org/mailman/listinfo/gpfsug-discuss> ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the –inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it’s just IBM not envisaging any customer needing more than that? We’ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there’s only one number to watch per filesystem. — ddj Dave Johnson Brown University ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent FilesetsMaximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there’s only one number to watch per filesystem. — ddj Dave Johnson Brown University > On May 18, 2017, at 10:12 AM, Peter Childs wrote: > > As I understand it, > > mmbackup calls mmapplypolicy so this stands for mmapplypolicy too. > > mmapplypolicy scans the metadata inodes (file) as requested depending on the > query supplied. > > You can ask mmapplypolicy to scan a fileset, inode space or filesystem. > > If scanning a fileset it scans the inode space that fileset is dependant on, > for all files in that fileset. Smaller inode spaces hence less to scan, hence > its faster to use an independent filesets, you get a list of what to process > quicker. > > Another advantage is that once an inode is allocated you can't deallocate it, > however you can delete independent filesets and hence deallocate the inodes, > so if you have a task which has losts and lots of small files which are only > needed for a short period of time, you can create a new independent fileset > for them work on them and then blow them away afterwards. > > I like independent filesets I'm guessing the only reason dependant filesets > are used by default is history. > > > Peter > > > On 18/05/17 14:58, Jaime Pinto wrote: >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by default, if >> the adverse repercussions can be so great afterward? Even in my case, where >> I manage GPFS and TSM deployments (and I have been around for a while), >> didn't realize at all that not adding and extra option at fileset creation >> time would cause me huge trouble with scaling later on as I try to use >> mmbackup. >> >> When you have different groups to manage file systems and backups that don't >> read each-other's manuals ahead of time then we have a really bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one way or >> another. >> >> I'm also hoping for a hint as to how to configure backup exclusion rules on >> the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup >> smart enough (actually smarter than TSM client itself) to read the exclusion >> rules on the TSM configuration and apply them before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that are >>> in a separable range of inode numbers - this allows GPFS to efficiently do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset If you didn't say otherwise, inodes come from the >>> default "root" fileset >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss@spectrumscale.org >>> Cc: gpfsug-discuss@spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by:gpfsug-discuss-boun...@spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor ESS, >>> so anyone in this list feel free to give feedback on that page people with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the dat
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset If you didn't say otherwise, inodes come from the > default "root" fileset > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss@spectrumscale.org > Cc: gpfsug-discuss@spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by:gpfsug-discuss-boun...@spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
As I understand it, mmbackup calls mmapplypolicy so this stands for mmapplypolicy too. mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. You can ask mmapplypolicy to scan a fileset, inode space or filesystem. If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. I like independent filesets I'm guessing the only reason dependant filesets are used by default is history. Peter On 18/05/17 14:58, Jaime Pinto wrote: Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss@spectrumscale.org Cc: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 02:10 AM Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. - Original message - From: "Jaime Pinto" Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $lo
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss@spectrumscale.org Cc: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 02:10 AM Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. - Original message - From: "Jaime Pinto" Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 Will post the outcome. Jaime Quoting "Jaime Pinto" : Quoting "Luis Bolinches" : Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude
[gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss@spectrumscale.org Cc: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 02:10 AM Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. - Original message - From: "Jaime Pinto" Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject:[gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by:gpfsug-discuss-boun...@spectrumsc