Re: Looking for suggestions to deal with large backups not completing in 24-hours
Hi Bjørn, actually they improved the isi change list a lot with OneFS8 and performance is no longer really that much of an issue - at least not with 8 to 9-figure number of objects in the file system. My problem (and the main reason why we haven’t integrated it yet into MAGS) is that it is a 99.9% kind of a function. Just like NetApp Snapdiff, you’d always have to recommend doing periodic full incrementals to catch whatever was missed when calculating the snapshot difference (and there usually is). Just like with Snapdiff, you’ll have the back and forth when it comes to who’s problem it is if it doesn’t work and what combination of client and OS on the filer is supported. At the end of the day you’ll have to be able to run a full incremental within an acceptable period of time - and if you have to be able to do that anyway, why not make backup fast enough to run the real thing every day? And using a journal or snapshot difference for backup doesn’t benefit restores one bit, of course. Regards Lars Henningsen General Storage > On 20. Jul 2018, at 15:48, Bjørn Nachtwey wrote: > > Hi all, > > yes there's a special daemon that might be used -- in theory :-) > in pratice it worked only for small filesystem sizes ... and if it's filled > partially. > > A guy from the concat company did some tests and told me they were totally > disappointing as this deamon consumes too many ressources if you let it write > a protocol file which you can use to identify changed, added and deleted > files. > But as far as i know it didn't give a list of changed files just logs all > changes on the files with the kind of the change. > > @Lars (Henningsen): Do you know some more details? > > best > Bjørn > > Skylar Thompson wrote: >> Sadly, no. I made a feature request for this years ago (back when Isilon >> was Isilon) but it didn't go anywhere. At this point, our days of running >> Isilon storage are numbered, and we'll be investing in DDN/GPFS for the >> forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon >> on the matter. >> On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote: >>> Is there no journaling/logging service on these Isilions that could be used >>> to maintain a list of changed files and hand-roll a >>> dsmc-selective-with-file-list process similar to what GPFS uses? >>> >>> Cheers >>> >>> Steve > > >> [...]
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Hi all, yes there's a special daemon that might be used -- in theory :-) in pratice it worked only for small filesystem sizes ... and if it's filled partially. A guy from the concat company did some tests and told me they were totally disappointing as this deamon consumes too many ressources if you let it write a protocol file which you can use to identify changed, added and deleted files. But as far as i know it didn't give a list of changed files just logs all changes on the files with the kind of the change. @Lars (Henningsen): Do you know some more details? best Bjørn Skylar Thompson wrote: Sadly, no. I made a feature request for this years ago (back when Isilon was Isilon) but it didn't go anywhere. At this point, our days of running Isilon storage are numbered, and we'll be investing in DDN/GPFS for the forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon on the matter. On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote: Is there no journaling/logging service on these Isilions that could be used to maintain a list of changed files and hand-roll a dsmc-selective-with-file-list process similar to what GPFS uses? Cheers Steve >> [...]
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Sadly, no. I made a feature request for this years ago (back when Isilon was Isilon) but it didn't go anywhere. At this point, our days of running Isilon storage are numbered, and we'll be investing in DDN/GPFS for the forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon on the matter. On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote: > Is there no journaling/logging service on these Isilions that could be used > to maintain a list of changed files and hand-roll a > dsmc-selective-with-file-list process similar to what GPFS uses? > > Cheers > > Steve > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Richard Cowen > Sent: Friday, 20 July 2018 6:15 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not > completing in 24-hours > > Canary! I like it! > Richard > > -Original Message- > From: ADSM: Dist Stor Manager On Behalf Of Skylar > Thompson > Sent: Thursday, July 19, 2018 10:37 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not > completing in 24-hours > > There's a couple ways we've gotten around this problem: > > 1. For NFS backups, we don't let TSM do partial incremental backups, even if > we have the filesystem split up. Instead, we mount sub-directories of the > filesystem root on our proxy nodes. This has the double advantage of letting > us break up the filesystem into multiple TSM filespaces (giving us > directory-level backup status reporting, and parallelism in TSM when we have > COLLOCG=FILESPACE), and also parallelism at the NFS level when there are > multiple NFS targets we can talk to (as in the case with Isilon). > > 2. For GPFS backups, in some cases we can setup independent filesets and let > mmbackup process each as a separate filesystem, though we have some instances > where the end users want an entire GPFS filesystem to have one inode space so > they can do atomic moves as renames. In either case, though, mmbackup does > its own "incremental" backups with filelists passed to "dsmc selective", > which don't update the last-backup time on the TSM filespace. Our workaround > has been to run mmbackup via a preschedule command, and have the actual TSM > incremental backup be of an empty directory (I call them canary directories > in our documentation) that's set as a virtual mountpoint. dsmc will only run > the backup portion of its scheduled task if the preschedule command succeeds, > so if mmbackup fails, the canary never gets backed up, and will raise an > alert. > > On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote: > > @All > > > > possibly the biggest issue when backing up massive file systems in parallel > > with multiple dsmc processes is expiration. Once you back up a directory > > with ???subdir no???, a no longer existing directory object on that level > > is expired properly and becomes inactive. However everything underneath > > that remains active and doesn???t expire (ever) unless you run a ???full??? > > incremental on the level above (with ???subdir yes???) - and that kind of > > defeats the purpose of parallelisation. Other pitfalls include avoiding > > swapping, keeping log files consistent (dsmc doesn???t do thread awareness > > when logging - it assumes being alone), handling the local dedup cache, > > updating backup timestamps for a file space on the server, distributing > > load evenly across multiple nodes on a scale-out filer, backing up from > > snapshots, chunking file systems up into even parts automatically so you > > don???t end up with lots of small jobs and one big one, dynamically > > distributing load across multiple ???proxies??? if one isn???t enough, > > handling exceptions, handling directories with characters you can???t parse > > to dsmc via the command line, consolidating results in a single, > > comprehensible overview similar to the summary of a regular incremental, > > being able to do it all in reverse for a massively parallel restore??? the > > list is quite long. > > > > We developed MAGS (as mentioned by Del) to cope with all that - and more. I > > can only recommend trying it out for free. > > > > Regards > > > > Lars Henningsen > > General Storage > > -- > -- Skylar Thompson (skyl...@u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > > Th
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Is there no journaling/logging service on these Isilions that could be used to maintain a list of changed files and hand-roll a dsmc-selective-with-file-list process similar to what GPFS uses? Cheers Steve -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Richard Cowen Sent: Friday, 20 July 2018 6:15 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours Canary! I like it! Richard -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Skylar Thompson Sent: Thursday, July 19, 2018 10:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours There's a couple ways we've gotten around this problem: 1. For NFS backups, we don't let TSM do partial incremental backups, even if we have the filesystem split up. Instead, we mount sub-directories of the filesystem root on our proxy nodes. This has the double advantage of letting us break up the filesystem into multiple TSM filespaces (giving us directory-level backup status reporting, and parallelism in TSM when we have COLLOCG=FILESPACE), and also parallelism at the NFS level when there are multiple NFS targets we can talk to (as in the case with Isilon). 2. For GPFS backups, in some cases we can setup independent filesets and let mmbackup process each as a separate filesystem, though we have some instances where the end users want an entire GPFS filesystem to have one inode space so they can do atomic moves as renames. In either case, though, mmbackup does its own "incremental" backups with filelists passed to "dsmc selective", which don't update the last-backup time on the TSM filespace. Our workaround has been to run mmbackup via a preschedule command, and have the actual TSM incremental backup be of an empty directory (I call them canary directories in our documentation) that's set as a virtual mountpoint. dsmc will only run the backup portion of its scheduled task if the preschedule command succeeds, so if mmbackup fails, the canary never gets backed up, and will raise an alert. On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote: > @All > > possibly the biggest issue when backing up massive file systems in parallel > with multiple dsmc processes is expiration. Once you back up a directory with > ???subdir no???, a no longer existing directory object on that level is > expired properly and becomes inactive. However everything underneath that > remains active and doesn???t expire (ever) unless you run a ???full??? > incremental on the level above (with ???subdir yes???) - and that kind of > defeats the purpose of parallelisation. Other pitfalls include avoiding > swapping, keeping log files consistent (dsmc doesn???t do thread awareness > when logging - it assumes being alone), handling the local dedup cache, > updating backup timestamps for a file space on the server, distributing load > evenly across multiple nodes on a scale-out filer, backing up from snapshots, > chunking file systems up into even parts automatically so you don???t end up > with lots of small jobs and one big one, dynamically distributing load across > multiple ???proxies??? if one isn???t enough, handling exceptions, handling > directories with characters you can???t parse to dsmc via the command line, > consolidating results in a single, comprehensible overview similar to the > summary of a regular incremental, being able to do it all in reverse for a > massively parallel restore??? the list is quite long. > > We developed MAGS (as mentioned by Del) to cope with all that - and more. I > can only recommend trying it out for free. > > Regards > > Lars Henningsen > General Storage -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine This message and any attachment is confidential and may be privileged or otherwise protected from disclosure. You should immediately delete the message if you are not the intended recipient. If you have received this email by mistake please delete it from your system; you should not copy the message or disclose its content to anyone. This electronic communication may contain general financial product advice but should not be relied upon or construed as a recommendation of any financial product. The information has been prepared without taking into account your objectives, financial situation or needs. You should consider the Product Disclosure Statement relating to the financial product and consult your financial adviser before making a decision about whether to acquire, hold or dispose of a financial product. For fu
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Canary! I like it! Richard -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Skylar Thompson Sent: Thursday, July 19, 2018 10:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours There's a couple ways we've gotten around this problem: 1. For NFS backups, we don't let TSM do partial incremental backups, even if we have the filesystem split up. Instead, we mount sub-directories of the filesystem root on our proxy nodes. This has the double advantage of letting us break up the filesystem into multiple TSM filespaces (giving us directory-level backup status reporting, and parallelism in TSM when we have COLLOCG=FILESPACE), and also parallelism at the NFS level when there are multiple NFS targets we can talk to (as in the case with Isilon). 2. For GPFS backups, in some cases we can setup independent filesets and let mmbackup process each as a separate filesystem, though we have some instances where the end users want an entire GPFS filesystem to have one inode space so they can do atomic moves as renames. In either case, though, mmbackup does its own "incremental" backups with filelists passed to "dsmc selective", which don't update the last-backup time on the TSM filespace. Our workaround has been to run mmbackup via a preschedule command, and have the actual TSM incremental backup be of an empty directory (I call them canary directories in our documentation) that's set as a virtual mountpoint. dsmc will only run the backup portion of its scheduled task if the preschedule command succeeds, so if mmbackup fails, the canary never gets backed up, and will raise an alert. On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote: > @All > > possibly the biggest issue when backing up massive file systems in parallel > with multiple dsmc processes is expiration. Once you back up a directory with > ???subdir no???, a no longer existing directory object on that level is > expired properly and becomes inactive. However everything underneath that > remains active and doesn???t expire (ever) unless you run a ???full??? > incremental on the level above (with ???subdir yes???) - and that kind of > defeats the purpose of parallelisation. Other pitfalls include avoiding > swapping, keeping log files consistent (dsmc doesn???t do thread awareness > when logging - it assumes being alone), handling the local dedup cache, > updating backup timestamps for a file space on the server, distributing load > evenly across multiple nodes on a scale-out filer, backing up from snapshots, > chunking file systems up into even parts automatically so you don???t end up > with lots of small jobs and one big one, dynamically distributing load across > multiple ???proxies??? if one isn???t enough, handling exceptions, handling > directories with characters you can???t parse to dsmc via the command line, > consolidating results in a single, comprehensible overview similar to the > summary of a regular incremental, being able to do it all in reverse for a > massively parallel restore??? the list is quite long. > > We developed MAGS (as mentioned by Del) to cope with all that - and more. I > can only recommend trying it out for free. > > Regards > > Lars Henningsen > General Storage -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
Re: Looking for suggestions to deal with large backups not completing in 24-hours
There's a couple ways we've gotten around this problem: 1. For NFS backups, we don't let TSM do partial incremental backups, even if we have the filesystem split up. Instead, we mount sub-directories of the filesystem root on our proxy nodes. This has the double advantage of letting us break up the filesystem into multiple TSM filespaces (giving us directory-level backup status reporting, and parallelism in TSM when we have COLLOCG=FILESPACE), and also parallelism at the NFS level when there are multiple NFS targets we can talk to (as in the case with Isilon). 2. For GPFS backups, in some cases we can setup independent filesets and let mmbackup process each as a separate filesystem, though we have some instances where the end users want an entire GPFS filesystem to have one inode space so they can do atomic moves as renames. In either case, though, mmbackup does its own "incremental" backups with filelists passed to "dsmc selective", which don't update the last-backup time on the TSM filespace. Our workaround has been to run mmbackup via a preschedule command, and have the actual TSM incremental backup be of an empty directory (I call them canary directories in our documentation) that's set as a virtual mountpoint. dsmc will only run the backup portion of its scheduled task if the preschedule command succeeds, so if mmbackup fails, the canary never gets backed up, and will raise an alert. On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote: > @All > > possibly the biggest issue when backing up massive file systems in parallel > with multiple dsmc processes is expiration. Once you back up a directory with > ???subdir no???, a no longer existing directory object on that level is > expired properly and becomes inactive. However everything underneath that > remains active and doesn???t expire (ever) unless you run a ???full??? > incremental on the level above (with ???subdir yes???) - and that kind of > defeats the purpose of parallelisation. Other pitfalls include avoiding > swapping, keeping log files consistent (dsmc doesn???t do thread awareness > when logging - it assumes being alone), handling the local dedup cache, > updating backup timestamps for a file space on the server, distributing load > evenly across multiple nodes on a scale-out filer, backing up from snapshots, > chunking file systems up into even parts automatically so you don???t end up > with lots of small jobs and one big one, dynamically distributing load across > multiple ???proxies??? if one isn???t enough, handling exceptions, handling > directories with characters you can???t parse to dsmc via the command line, > consolidating results in a single, comprehensible overview similar to the > summary of a regular incremental, being able to do it all in reverse for a > massively parallel restore??? the list is quite long. > > We developed MAGS (as mentioned by Del) to cope with all that - and more. I > can only recommend trying it out for free. > > Regards > > Lars Henningsen > General Storage -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
Re: Looking for suggestions to deal with large backups not completing in 24-hours
@All possibly the biggest issue when backing up massive file systems in parallel with multiple dsmc processes is expiration. Once you back up a directory with “subdir no”, a no longer existing directory object on that level is expired properly and becomes inactive. However everything underneath that remains active and doesn’t expire (ever) unless you run a “full” incremental on the level above (with “subdir yes”) - and that kind of defeats the purpose of parallelisation. Other pitfalls include avoiding swapping, keeping log files consistent (dsmc doesn’t do thread awareness when logging - it assumes being alone), handling the local dedup cache, updating backup timestamps for a file space on the server, distributing load evenly across multiple nodes on a scale-out filer, backing up from snapshots, chunking file systems up into even parts automatically so you don’t end up with lots of small jobs and one big one, dynamically distributing load across multiple “proxies” if one isn’t enough, handling exceptions, handling directories with characters you can’t parse to dsmc via the command line, consolidating results in a single, comprehensible overview similar to the summary of a regular incremental, being able to do it all in reverse for a massively parallel restore… the list is quite long. We developed MAGS (as mentioned by Del) to cope with all that - and more. I can only recommend trying it out for free. Regards Lars Henningsen General Storage
Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained
Hi Skylar, Skylar Thompson wrote: One thing to be aware of with partial incremental backups is the danger of backing up data multiple times if the mount points are nested. For instance, /mnt/backup/some-dir /mnt/backup/some-dir/another-dir Under normal operation, a node with DOMAIN set to "/mnt/backup/some-dir /mnt/backup/some-dir/another-dir" will backup the contents of /mnt/backup/some-dir/another-dir as a separate filespace, *and also* will backup another-dir as a subdirectory of the /mnt/backup/some-dir filespace. We reported this as a bug, and IBM pointed us at this flag that can be passed as a scheduler option to prevent this: -TESTFLAG=VMPUNDERNFSENABLED good point, even if my script works a little bit differently: by now the starting folder is not red from the "dsm.opt" file but given in the configuration file for my script "dsmci.cfg". so one run can work for one node starting on a subfolder (done si as windows has no VIRTUALMOUNTPOINT option) Within the this config file several starting folders can be declared and my script creates in the first step a global list of all folders to be backed up "partially incremental" => well, i'm not sure if i check for multiple entries in the list => and if the nesting is done on a deeper level than the list is created from, i think i won't be aware of such a set-up i will check this -- thanks for the advice! best Bjørn On Tue, Jul 17, 2018 at 04:12:17PM +0200, Bjrn Nachtwey wrote: Hi Zoltan, OK, i will translate my text as there are some more approaches discussed :-) breaking up the filesystems in several nodes will work as long as the nodes are of suffiecient size. I'm not sure if a PROXY node will solve the problem, because each "member node" will backup the whole mountpoint. You will need to do partial incremental backups. I expect you will do this based on folders, do you? So, some questions: 1) how will you distribute the folders to the nodes? 2) how will you ensure new folders are processed by one of your "member nodes"? On our filers many folders are created and deleted, sometimes a whole bunch every day. So for me, it was no option to maintain the option file manually. The approach from my script / "MAGS" does this somehow "automatically". 3) what happens if the folders grew not evenly and all the big ones are backed up by one of your nodes? (OK you can change the distribution or even add another node) 4) Are you going to map each backupnode to different nodes of the isilon cluster to distribute the traffic / workload for the isilon nodes? best Bjørn -- -- Bjørn Nachtwey Arbeitsgruppe "IT-Infrastruktur“ Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de -- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de Geschäftsführer: Prof. Dr. Ramin Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau Sitz der Gesellschaft: Göttingen Registergericht: Göttingen, Handelsregister-Nr. B 598 -- Zertifiziert nach ISO 9001 --
Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained
One thing to be aware of with partial incremental backups is the danger of backing up data multiple times if the mount points are nested. For instance, /mnt/backup/some-dir /mnt/backup/some-dir/another-dir Under normal operation, a node with DOMAIN set to "/mnt/backup/some-dir /mnt/backup/some-dir/another-dir" will backup the contents of /mnt/backup/some-dir/another-dir as a separate filespace, *and also* will backup another-dir as a subdirectory of the /mnt/backup/some-dir filespace. We reported this as a bug, and IBM pointed us at this flag that can be passed as a scheduler option to prevent this: -TESTFLAG=VMPUNDERNFSENABLED On Tue, Jul 17, 2018 at 04:12:17PM +0200, Bjrn Nachtwey wrote: > Hi Zoltan, > > OK, i will translate my text as there are some more approaches discussed :-) > > breaking up the filesystems in several nodes will work as long as the nodes > are of suffiecient size. > > I'm not sure if a PROXY node will solve the problem, because each "member > node" will backup the whole mountpoint. You will need to do partial > incremental backups. I expect you will do this based on folders, do you? > So, some questions: > 1) how will you distribute the folders to the nodes? > 2) how will you ensure new folders are processed by one of your "member > nodes"? On our filers many folders are created and deleted, sometimes a > whole bunch every day. So for me, it was no option to maintain the option > file manually. The approach from my script / "MAGS" does this somehow > "automatically". > 3) what happens if the folders grew not evenly and all the big ones are > backed up by one of your nodes? (OK you can change the distribution or even > add another node) > 4) Are you going to map each backupnode to different nodes of the isilon > cluster to distribute the traffic / workload for the isilon nodes? > > best > Bjørn -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained
Hi Zoltan, OK, i will translate my text as there are some more approaches discussed :-) breaking up the filesystems in several nodes will work as long as the nodes are of suffiecient size. I'm not sure if a PROXY node will solve the problem, because each "member node" will backup the whole mountpoint. You will need to do partial incremental backups. I expect you will do this based on folders, do you? So, some questions: 1) how will you distribute the folders to the nodes? 2) how will you ensure new folders are processed by one of your "member nodes"? On our filers many folders are created and deleted, sometimes a whole bunch every day. So for me, it was no option to maintain the option file manually. The approach from my script / "MAGS" does this somehow "automatically". 3) what happens if the folders grew not evenly and all the big ones are backed up by one of your nodes? (OK you can change the distribution or even add another node) 4) Are you going to map each backupnode to different nodes of the isilon cluster to distribute the traffic / workload for the isilon nodes? best Bjørn
Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained
Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de > Geschäftsführer: Prof. Dr. Ramin Yahyapour > Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau > Sitz der Gesellschaft: Göttingen > Registergericht: Göttingen, Handelsregister-Nr. B 598 > -------------------------------------- > > Zertifiziert nach ISO 9001 > > -- > > -Ursprüngliche Nachricht- > Von: ADSM: Dist Stor Manager Im Auftrag von Zoltan > Forray > Gesendet: Mittwoch, 11. Juli 2018 13:50 > An: ADSM-L@VM.MARIST.EDU > Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups > not completing in 24-hours > > I will need to translate to English but I gather it is talking about the > RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP to > 5 on the server (will try going higher), not sure how much good it would > do since the backup schedule uses OBJECTS to point to a specific/single > mountpoint/filesystem (see below) but is worth trying to bump the > RESOURCEUTILIZATION value on the client even higher... > > We have checked the dsminstr.log file and it is spending 92% of the time > in PROCESS DIRS (no surprise) > > 7:46:25 AM SUN : q schedule * ISILON-SOM-SOMADFS1 f=d > Policy Domain Name: DFS > Schedule Name: ISILON-SOM-SOMADFS1 >Description: ISILON-SOM-SOMADFS1 > Action: Incremental > Subaction: >Options: -subdir=yes >Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\* > Priority: 5 >Start Date/Time: 12/05/2017 08:30:00 > Duration: 1 Hour(s) > Maximum Run Time (Minutes): 0 > Schedule Style: Enhanced > Period: >Day of Week: Any > Month: Any > Day of Month: Any > Week of Month: Any > Expiration: > Last Update by (administrator): ZFORRAY > Last Update Date/Time: 01/12/2018 10:30:48 > Managing profile: > > > On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas > wrote: > > > It is possible to da a parallel backup of file system parts. > > https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) > > have a look on page 10. > > > > --- > > Jonas Jansen > > > > IT Center > > Gruppe: Server & Storage > > Abteilung: Systeme & Betrieb > > RWTH Aachen University > > Seffenter Weg 23 > > 52074 Aachen > > Tel: +49 241 80-28784 > > Fax: +49 241 80-22134 > > jan...@itc.rwth-aachen.de > > www.itc.rwth-aachen.de > > > > -Original Message----- > > From: ADSM: Dist Stor Manager On Behalf Of Del > > Hoobler > > Sent: Monday, July 9, 2018 3:29 PM > > To: ADSM-L@VM.MARIST.EDU > > Subject: Re: [ADSM-L] Looking for suggestions to deal with large > > backups not completing in 24-hours > > > > They are a 3rd-party partner that offers an integrated Spectrum > > Protect solution for large filer backups. > > > > > > Del > > > > > > > > "ADSM: Dist Stor Manager" wrote on 07/09/2018 > > 09:17:06 AM: > > > > > From: Zoltan Forray > > > To: ADSM-L@VM.MARIST.EDU > > > Date: 07/09/2018 09:17 AM > > > Subject: Re: Looking for suggestions to deal with large backups not > > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager" > > > > > > > > > Thanks Del. Very interesting. Are they a VAR for IBM? > > > > > > Not sure if it would work in the current configuration we are using > > > to > > back > > > up ISILON. I have passed the info on. > > > > > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker > > red-flagged > > > on "The easy way to incrementally backup billons of objects" > (billions). > > > So if you know anybody at the company, please pass it on to them. > > > > > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > > > > > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > > > > > INVALID URI REMOVED > > > > > > > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c > > =jf_ia > > SHvJObTbx- > > &
Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained
Hi Zoltan, i will come back to the approach Jonas mentioned (as I'm the author of that text: thanks to Jonas for doing this ;-) ) the text is in german of course, but the script has some comments in English and will be understandable -- I hope so :-) the text describes first the problem everybody on this list will know: the treewalk takes more times than we have. TSM/ISP has some opportunities to speed up, such as "-incrbydate", but they do not work properly. So for me the only solution is to parallelize the tree walk and do partial incremental backups. First tried to write it with BASH commands, but multithreading was not easy to implement and second it won't run on windows -- but our largest filers ( 500 TB - 1.2 PB) need to be accessed via CIFS to store the ACL information. My first steps with PowerShell for the Windows cost lots of time and were disappointing. Using PERL made everything really easy as it runs on windows with the strawberry perl software and within the script there are only a few if-conditions needed to determine between Linux and Windows. I did some tests according to the depth or the level of the filetree to dive in: As the subfolders are of unequal size, diving just below the mount point and parallelize on the folders of this "first level" mostly does not work well, there's (nearly) always one folder taking all the time. On the other hand diving into all levels will take a certain amount of additional time. The best performance I do see using 3 to 4 levels and 4 to 6 parallel threads for each node. Due to separating users and for accounting I have several nodes on such large file systems. So in total there are about 20 to 40 streams in parallel. Rudi Wüst mentioned in my text figured out a p520 server running AIX6 will support up to 2,000 parallel streams, but as mentioned by Grant using an isilon system the filer will be the bottle neck. As mentioned by Del, you may also test a commercial software "MAGS" by general storage, it can addresses multiple isilon nodes in parallel If there're any questions -- just ask or have a look on the script: https://gitlab.gwdg.de/bnachtw/dsmci // even if the last submit is about 4 month old, the project is still in development ;-) ==> maybe I should update and translate the text from the "GWDG news" to English? Any interest? Best Bjørn p.s. A Result from the wild (weekly backup of a node from a 343 TB Quantum StorNext File System) : >> Process ID:12988 Path processed: - Start time: 2018-07-14 12:00 End time : 2018-07-15 06:07 total processing time : 3d 15h 59m 23s total wallclock time : 18h 7m 30s effective speedup :4.855 using 6 parallel threads datatransfertime ratio:3.575 % - Objects inspected : 92061596 Objects backed up : 9774876 Objects updated :0 Objects deleted :0 Objects expired : 7696 Objects failed:0 Bytes inspected :52818.242 (GB) Bytes transferred : 5063.620 (GB) - Number of Errors :0 Number of Warnings: 43 # of severe Errors:0 # Out-of-Space Errors :0 << -- Bjørn Nachtwey Arbeitsgruppe "IT-Infrastruktur“ Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de -- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de Geschäftsführer: Prof. Dr. Ramin Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau Sitz der Gesellschaft: Göttingen Registergericht: Göttingen, Handelsregister-Nr. B 598 -- Zertifiziert nach ISO 9001 -- -Ursprüngliche Nachricht- Von: ADSM: Dist Stor Manager Im Auftrag von Zoltan Forray Gesendet: Mittwoch, 11. Juli 2018 13:50 An: ADSM-L@VM.MARIST.EDU Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours I will need to translate to English but I gather it is talking about the RESOURCEUTILZATION / MAXNUMMP values. W
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Robert, Again thanks for the information. It fills in a lot of missing pieces in my information. From what I gather, you are probably doing backups via SAN not via IP like we do. Plus as you suggested, breaking up the backup targets into multiple filesystem/directories to reduce the number of files each has to scan/manage. I am pushing this issue right now. I have always been confused with the whole proxy process but from what I gather, it isn't that much different from what we are doing right now except to give you a central management point for restores and backups vs us using the web-client to give departments a way to manage their own restores. We could adapt our process to use proxy, the biggest hurdle being what you have accomplished ("*work with the system admins to split the backup*") and make managing the restores a function of the University Computer Center (where I work) vs. everyone doing their own thing. Until we get over this reconfiguration effort, we won't be able to move forwards on the clients since that would immediately kill the webclient. So, do I understand correctly, each of your 144 target nodes " -asnodename=DATANODE” is a Windows VM? If so, what specs are you using for each VM? On Mon, Jul 16, 2018 at 11:15 AM Robert Talda wrote: > Zoltan: > I wish I could give you more details about the NAS/storage device > connections, but either a) I’m not privy to that information; or b) I know > it only as the SAN fabric. That is, our largest backups are from systems > in our server farm that are part of the same SAN fabric as both the system > running the SP client doing the backups AND the system hosting the TSM > server. There is a 10 GB pipe connecting the two physical systems but that > hasn’t ever been the bottleneck. And the system running the SP client is a > VM as well. > > Our bigger challenge was filesystems or shares with lots of files. This > is where the proxy node strategy came into play. We were able to work with > the system admins to split the backup of the those filesystems into many > smaller (in terms of number of files) backups that started deeper in the > filesystem. That is, instead of running a backup against > \\rams\som\TSM\FC\* > We would have one backup running through PROXY.NODE1 for > \\rams\som\TSM\FC\dir1\* > While another was running through PROXY.NODE2 for > \\rams\som\TSM\FC\dir2\* > And so on and so forth. > > We did this using a set of client schedules that used the “objects” option > to specify the directory in question: > > Def sched DOMAIN PROXY.NODE1.HOUR01 action=incr options=“-subir=yes > -asnodename=DATANODE” -objects=‘“\\rams\som\TSM\RC\dir1\” startt=01:00 > dur=1 duru=hour > > Where DATANODE is the target for agent PROXY.NODE1. > > Currently, we are running up to 144 backups (6 Proxy nodes, 24 hourly > backups) for our largest devices. > > HTH, > Bob > > On Jul 16, 2018, at 8:29 AM, Zoltan Forray zfor...@vcu.edu>> wrote: > > Robert, > > Thanks for the extensive details. You backup 5-nodes with as more data > then we do for 90-nodes. So, my question is - what kind of connections do > you have to your NAS/storage device to process that much data in such a > short period of time? > > I am not sure what benefit a proxy-node would do for us, other than to > manage multiple nodes from one connection/GUI - or am I totally off base on > this? > > Our current configuration is such: > > 7-Windows 2016 VM's (adding more to spread out the load) > Each of these 7-VM's handle the backups for 5-30 nodes. Each node is a > mountpoint for an user/department ISILON DFS mount - > i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* > etc. FWIW, the reason we are > using VM's is the connection is actually faster then when we were using > physical servers since they only had gigabit nics. > > Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM > with only 4-other nodes, it still took 4-days to scan and backup 102GB of > 32TB. Below are a recent end-of-session statistics (current backup started > Friday and is still running) > > 07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects inspected: 20,276,912 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects backed up: 26,787 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects updated: 31 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects rebound: 0 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects deleted: 0 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2) > Total number of objects expired: 20,630 (SESSION: 21423) > 07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISI
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Zoltan: I wish I could give you more details about the NAS/storage device connections, but either a) I’m not privy to that information; or b) I know it only as the SAN fabric. That is, our largest backups are from systems in our server farm that are part of the same SAN fabric as both the system running the SP client doing the backups AND the system hosting the TSM server. There is a 10 GB pipe connecting the two physical systems but that hasn’t ever been the bottleneck. And the system running the SP client is a VM as well. Our bigger challenge was filesystems or shares with lots of files. This is where the proxy node strategy came into play. We were able to work with the system admins to split the backup of the those filesystems into many smaller (in terms of number of files) backups that started deeper in the filesystem. That is, instead of running a backup against \\rams\som\TSM\FC\* We would have one backup running through PROXY.NODE1 for \\rams\som\TSM\FC\dir1\* While another was running through PROXY.NODE2 for \\rams\som\TSM\FC\dir2\* And so on and so forth. We did this using a set of client schedules that used the “objects” option to specify the directory in question: Def sched DOMAIN PROXY.NODE1.HOUR01 action=incr options=“-subir=yes -asnodename=DATANODE” -objects=‘“\\rams\som\TSM\RC\dir1\” startt=01:00 dur=1 duru=hour Where DATANODE is the target for agent PROXY.NODE1. Currently, we are running up to 144 backups (6 Proxy nodes, 24 hourly backups) for our largest devices. HTH, Bob On Jul 16, 2018, at 8:29 AM, Zoltan Forray mailto:zfor...@vcu.edu>> wrote: Robert, Thanks for the extensive details. You backup 5-nodes with as more data then we do for 90-nodes. So, my question is - what kind of connections do you have to your NAS/storage device to process that much data in such a short period of time? I am not sure what benefit a proxy-node would do for us, other than to manage multiple nodes from one connection/GUI - or am I totally off base on this? Our current configuration is such: 7-Windows 2016 VM's (adding more to spread out the load) Each of these 7-VM's handle the backups for 5-30 nodes. Each node is a mountpoint for an user/department ISILON DFS mount - i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* etc. FWIW, the reason we are using VM's is the connection is actually faster then when we were using physical servers since they only had gigabit nics. Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM with only 4-other nodes, it still took 4-days to scan and backup 102GB of 32TB. Below are a recent end-of-session statistics (current backup started Friday and is still running) 07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects inspected: 20,276,912 (SESSION: 21423) 07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects backed up: 26,787 (SESSION: 21423) 07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects updated: 31 (SESSION: 21423) 07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects rebound: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects deleted: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects expired: 20,630 (SESSION: 21423) 07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects failed: 36 (SESSION: 21423) 07/09/2018 02:00:06 ANE4197I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects encrypted:0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4965I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of subfile objects: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4914I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects grew: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4916I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of retries:124 (SESSION: 21423) 07/09/2018 02:00:06 ANE4977I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of bytes inspected: 31.75 TB (SESSION: 21423) 07/09/2018 02:00:06 ANE4961I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of bytes transferred: 101.90 GB (SESSION: 21423) 07/09/2018 02:00:06 ANE4963I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Data transfer time: 115.78 sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4966I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Network data transfer rate: 922,800.00 KB/sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4967I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Aggregate data transfer rate:271.46 KB/sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4968I (Session: 21423, Node
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Robert, Thanks for the extensive details. You backup 5-nodes with as more data then we do for 90-nodes. So, my question is - what kind of connections do you have to your NAS/storage device to process that much data in such a short period of time? I am not sure what benefit a proxy-node would do for us, other than to manage multiple nodes from one connection/GUI - or am I totally off base on this? Our current configuration is such: 7-Windows 2016 VM's (adding more to spread out the load) Each of these 7-VM's handle the backups for 5-30 nodes. Each node is a mountpoint for an user/department ISILON DFS mount - i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* etc. FWIW, the reason we are using VM's is the connection is actually faster then when we were using physical servers since they only had gigabit nics. Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM with only 4-other nodes, it still took 4-days to scan and backup 102GB of 32TB. Below are a recent end-of-session statistics (current backup started Friday and is still running) 07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects inspected: 20,276,912 (SESSION: 21423) 07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects backed up: 26,787 (SESSION: 21423) 07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects updated: 31 (SESSION: 21423) 07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects rebound: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects deleted: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects expired: 20,630 (SESSION: 21423) 07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects failed: 36 (SESSION: 21423) 07/09/2018 02:00:06 ANE4197I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects encrypted:0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4965I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of subfile objects: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4914I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of objects grew: 0 (SESSION: 21423) 07/09/2018 02:00:06 ANE4916I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of retries:124 (SESSION: 21423) 07/09/2018 02:00:06 ANE4977I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of bytes inspected: 31.75 TB (SESSION: 21423) 07/09/2018 02:00:06 ANE4961I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total number of bytes transferred: 101.90 GB (SESSION: 21423) 07/09/2018 02:00:06 ANE4963I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Data transfer time: 115.78 sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4966I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Network data transfer rate: 922,800.00 KB/sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4967I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Aggregate data transfer rate:271.46 KB/sec (SESSION: 21423) 07/09/2018 02:00:06 ANE4968I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Objects compressed by: 30% (SESSION: 21423) 07/09/2018 02:00:06 ANE4976I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Total data reduction ratio: 99.69% (SESSION: 21423) 07/09/2018 02:00:06 ANE4969I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Subfile objects reduced by: 0% (SESSION: 21423) 07/09/2018 02:00:06 ANE4964I (Session: 21423, Node: ISILON-SOM-SOMADFS2) Elapsed processing time: 109:19:48 (SESSION: 21423) Even when we m On Sun, Jul 15, 2018 at 7:30 PM Robert Talda wrote: > Zoltan: > Finally get a chance to answer you. I :think: I understand what you are > getting at… > > First, some numbers - recalling that each of these nodes is one storage > device: > Node1: 358,000,000+ files totalling 430 TB of primary occupied space > Node2: 302,000,000+ files totaling 82 TB of primary occupied space > Node3: 79,000,000+ files totaling 75 TB of primary occupied space > Node4: 1,000,000+ files totalling 75 TB of primary occupied space > Node5: 17,000,000+ files totalling 42 TB of primary occupied space > There are more, but I think this answers your initial question. > > Restore requests are handled by the local system admin or, for lack of a > better description, data admin. (Basically, the research area has a person > dedicated to all the various data issues related to research grants, from > including proper verbiage in grant requests to making sure the necessary > protections are in place). > > We try to make it as simple as we can, because we do concentrate all the > data in one node per
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Zoltan: Finally get a chance to answer you. I :think: I understand what you are getting at… First, some numbers - recalling that each of these nodes is one storage device: Node1: 358,000,000+ files totalling 430 TB of primary occupied space Node2: 302,000,000+ files totaling 82 TB of primary occupied space Node3: 79,000,000+ files totaling 75 TB of primary occupied space Node4: 1,000,000+ files totalling 75 TB of primary occupied space Node5: 17,000,000+ files totalling 42 TB of primary occupied space There are more, but I think this answers your initial question. Restore requests are handled by the local system admin or, for lack of a better description, data admin. (Basically, the research area has a person dedicated to all the various data issues related to research grants, from including proper verbiage in grant requests to making sure the necessary protections are in place). We try to make it as simple as we can, because we do concentrate all the data in one node per storage device (usually a NAS). So restores are usually done directly from the node - while all backups are done through proxies. Generally, the restores are done without permissions so that the appropriate permissions can be applied to the restored data. (Oft times, the data is restored so a different user or set of users can work with it, so the original permissions aren’t useful) There are some exceptions - of course, as we work at universities, there are always exceptions - and these we handle as best we can by providing proxy nodes with restricted priviledges. Let me know if I can provide more, Bob Robert Talda EZ-Backup Systems Engineer Cornell University +1 607-255-8280 r...@cornell.edu > On Jul 11, 2018, at 3:59 PM, Zoltan Forray wrote: > > Robert, > > Thanks for the insight/suggestions. Your scenario is similar to ours but > on a larger scale when it comes to the amount of data/files to process, > thus the issue (assuming such since you didn't list numbers). Currently we > have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest > (our troublemaker) has over 21M objects and 26TB of data (this is the one > that takes 4-5 days). dsminstr.log from a recently finished run shows it > only backed up 15K objects. > > We agree that this and other similarly larger nodes need to be broken up > into smaller/less objects to backup per node. But the owner of this large > one is balking since previously this was backed up via a solitary Windows > server using Journaling so everything finished in a day. > > We have never dealt with proxy nodes but might need to head in that > direction since our current method of allowing users to perform their own > restores relies on the now deprecated Web Client. Our current method is > numerous Windows VM servers with 20-30 nodes defined to each. > > How do you handle restore requests? > > On Wed, Jul 11, 2018 at 2:56 PM Robert Talda wrote: >
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Hey Zoltan Key points for backing up isilon: 1 Each isilon node is limited by it's CPU/protocol rather than Networking (other than the new G6 F800's ) 2 To increase throughput to/from isilon increase the number isilon nodes you access via your clients 3 To increase the isilon nodes you access you can either mount the storage multiple times from the same client using a different IP, or use TSM proxies. 4 Increase resource utilisation to 10 (max) to increase parallelisation 5 Increase the Max num mount points to be bigger than the number of client machines X the resource utilization X the number of SP clients you run per client machine. This ensures each session is actively working and not waiting for a mount point. 6 Size your Disk storage pool files so that you can have at least 2 X max num of mount points. This is so that should you fill your disk storage pool you do not have lock contention between migration and backup. Ideally you should have enough disk pool storage to do a single run . We have a setup where we need to do archives of up to 50 TB a day and do this using over 24 dsmc's running across 6 Client vm's with a resource utilisation of 10. HTH Grant From: ADSM: Dist Stor Manager on behalf of Zoltan Forray Sent: Thursday, 12 July 2018 5:59 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours Robert, Thanks for the insight/suggestions. Your scenario is similar to ours but on a larger scale when it comes to the amount of data/files to process, thus the issue (assuming such since you didn't list numbers). Currently we have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest (our troublemaker) has over 21M objects and 26TB of data (this is the one that takes 4-5 days). dsminstr.log from a recently finished run shows it only backed up 15K objects. We agree that this and other similarly larger nodes need to be broken up into smaller/less objects to backup per node. But the owner of this large one is balking since previously this was backed up via a solitary Windows server using Journaling so everything finished in a day. We have never dealt with proxy nodes but might need to head in that direction since our current method of allowing users to perform their own restores relies on the now deprecated Web Client. Our current method is numerous Windows VM servers with 20-30 nodes defined to each. How do you handle restore requests? On Wed, Jul 11, 2018 at 2:56 PM Robert Talda wrote: > Zoltan, et al: > :IF: I understand the scenario you outline originally, here at Cornell > we are using two different approaches in backing up large storage arrays. > > 1. For backups of CIFS shares in our Shared File Share service hosted on a > NetApp device, we rely on a set of Powershell scripts to build a list of > shares to backup, then invoke up to 5 SP clients at a time, each client > backing up a share. As such, we are able to backup some 200+ shares on a > daily basis. I’m not sure this is a good match to your problem... > > 2. For backups of a large Dell array containing research data that does > seem to be a good match, I have defined a set of 10 proxy nodes and 240 > hourly schedules (once each hour for each proxy node) that allows us to > divide the Dell array up into 240 pieces - pieces that are controlled by > the specification of the “objects” in the schedule. That is, in your case, > instead of associating node to the schedule > ISILON-SOM-SOMDFS1 with object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*”, > I would instead have something like > Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\ > rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRA\*” > Node PROXY2.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\ > rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRB\*” > … > Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR2 for object " \\ > rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRB\SUBDIRA\*” > > And so on. For known large directories, slots of multiple hours are > allocated, up to the largest directory which is given its own proxy node > with one schedule, and hence 24 hours to back up. > > There are pros and cons to both of these, but they do enable us to perform > the backups. > > FWIW, > Bob > > Robert Talda > EZ-Backup Systems Engineer > Cornell University > +1 607-255-8280 > r...@cornell.edu > > > > On Jul 11, 2018, at 7:49 AM, Zoltan Forray wrote: > > > > I will need to translate to English but I gather it is talking about the > > RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP > to > > 5 on the server (will try going higher), not sure how much good it would > > do since the ba
Re: Looking for suggestions to deal with large backups not completing in 24-hours
> > On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas > > > wrote: > > > >> It is possible to da a parallel backup of file system parts. > >> https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) > >> have a > >> look on page 10. > >> > >> --- > >> Jonas Jansen > >> > >> IT Center > >> Gruppe: Server & Storage > >> Abteilung: Systeme & Betrieb > >> RWTH Aachen University > >> Seffenter Weg 23 > >> 52074 Aachen > >> Tel: +49 241 80-28784 > >> Fax: +49 241 80-22134 > >> jan...@itc.rwth-aachen.de > >> www.itc.rwth-aachen.de > >> > >> -Original Message- > >> From: ADSM: Dist Stor Manager On Behalf Of Del > >> Hoobler > >> Sent: Monday, July 9, 2018 3:29 PM > >> To: ADSM-L@VM.MARIST.EDU > >> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups > >> not > >> completing in 24-hours > >> > >> They are a 3rd-party partner that offers an integrated Spectrum Protect > >> solution for large filer backups. > >> > >> > >> Del > >> > >> > >> > >> "ADSM: Dist Stor Manager" wrote on 07/09/2018 > >> 09:17:06 AM: > >> > >>> From: Zoltan Forray > >>> To: ADSM-L@VM.MARIST.EDU > >>> Date: 07/09/2018 09:17 AM > >>> Subject: Re: Looking for suggestions to deal with large backups not > >>> completing in 24-hours > >>> Sent by: "ADSM: Dist Stor Manager" > >>> > >>> Thanks Del. Very interesting. Are they a VAR for IBM? > >>> > >>> Not sure if it would work in the current configuration we are using to > >> back > >>> up ISILON. I have passed the info on. > >>> > >>> BTW, FWIW, when I copied/pasted the info, Chrome spell-checker > >> red-flagged > >>> on "The easy way to incrementally backup billons of objects" > (billions). > >>> So if you know anybody at the company, please pass it on to them. > >>> > >>> On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > >>> > >>>> Another possible idea is to look at General Storage dsmISI MAGS: > >>>> > >>>>INVALID URI REMOVED > >>> > >> > >> > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia > >> SHvJObTbx- > >>> > >> > >> > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75 > >> lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > >>> ach7r-VHXaLNVD_E&e= > >>>> > >>>> > >>>> Del > >>>> > >>>> > >>>> "ADSM: Dist Stor Manager" wrote on 07/05/2018 > >>>> 02:52:27 PM: > >>>> > >>>>> From: Zoltan Forray > >>>>> To: ADSM-L@VM.MARIST.EDU > >>>>> Date: 07/05/2018 02:53 PM > >>>>> Subject: Looking for suggestions to deal with large backups not > >>>>> completing in 24-hours > >>>>> Sent by: "ADSM: Dist Stor Manager" > >>>>> > >>>>> As I have mentioned in the past, we have gone through large > >> migrations > >>>> to > >>>>> DFS based storage on EMC ISILON hardware. As you may recall, we > >> backup > >>>>> these DFS mounts (about 90 at last count) using multiple Windows > >> servers > >>>>> that run multiple ISP nodes (about 30-each) and they access each DFS > >>>>> mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > >>>>> > >>>>> This has lead to lots of performance issue with backups and some > >>>>> departments are now complain that their backups are running into > >>>>> multiple-days in some cases. > >>>>> > >>>>> One such case in a department with 2-nodes with over 30-million > >> objects > >>>> for > >>>>> each node. In the past, their backups were able to finish quicker > >> since > >>>>> they were accessed via dedicated servers and were able to use > >> Journaling > >>>> to > >>>>> reduce the scan times
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Zoltan, et al: :IF: I understand the scenario you outline originally, here at Cornell we are using two different approaches in backing up large storage arrays. 1. For backups of CIFS shares in our Shared File Share service hosted on a NetApp device, we rely on a set of Powershell scripts to build a list of shares to backup, then invoke up to 5 SP clients at a time, each client backing up a share. As such, we are able to backup some 200+ shares on a daily basis. I’m not sure this is a good match to your problem... 2. For backups of a large Dell array containing research data that does seem to be a good match, I have defined a set of 10 proxy nodes and 240 hourly schedules (once each hour for each proxy node) that allows us to divide the Dell array up into 240 pieces - pieces that are controlled by the specification of the “objects” in the schedule. That is, in your case, instead of associating node to the schedule ISILON-SOM-SOMDFS1 with object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*”, I would instead have something like Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRA\*” Node PROXY2.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRB\*” … Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR2 for object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRB\SUBDIRA\*” And so on. For known large directories, slots of multiple hours are allocated, up to the largest directory which is given its own proxy node with one schedule, and hence 24 hours to back up. There are pros and cons to both of these, but they do enable us to perform the backups. FWIW, Bob Robert Talda EZ-Backup Systems Engineer Cornell University +1 607-255-8280 r...@cornell.edu > On Jul 11, 2018, at 7:49 AM, Zoltan Forray wrote: > > I will need to translate to English but I gather it is talking about the > RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP to > 5 on the server (will try going higher), not sure how much good it would > do since the backup schedule uses OBJECTS to point to a specific/single > mountpoint/filesystem (see below) but is worth trying to bump the > RESOURCEUTILIZATION value on the client even higher... > > We have checked the dsminstr.log file and it is spending 92% of the time in > PROCESS DIRS (no surprise) > > 7:46:25 AM SUN : q schedule * ISILON-SOM-SOMADFS1 f=d >Policy Domain Name: DFS > Schedule Name: ISILON-SOM-SOMADFS1 > Description: ISILON-SOM-SOMADFS1 >Action: Incremental > Subaction: > Options: -subdir=yes > Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\* > Priority: 5 > Start Date/Time: 12/05/2017 08:30:00 > Duration: 1 Hour(s) >Maximum Run Time (Minutes): 0 >Schedule Style: Enhanced >Period: > Day of Week: Any > Month: Any > Day of Month: Any > Week of Month: Any >Expiration: > Last Update by (administrator): ZFORRAY > Last Update Date/Time: 01/12/2018 10:30:48 > Managing profile: > > > On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas > wrote: > >> It is possible to da a parallel backup of file system parts. >> https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) >> have a >> look on page 10. >> >> --- >> Jonas Jansen >> >> IT Center >> Gruppe: Server & Storage >> Abteilung: Systeme & Betrieb >> RWTH Aachen University >> Seffenter Weg 23 >> 52074 Aachen >> Tel: +49 241 80-28784 >> Fax: +49 241 80-22134 >> jan...@itc.rwth-aachen.de >> www.itc.rwth-aachen.de >> >> -Original Message- >> From: ADSM: Dist Stor Manager On Behalf Of Del >> Hoobler >> Sent: Monday, July 9, 2018 3:29 PM >> To: ADSM-L@VM.MARIST.EDU >> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups >> not >> completing in 24-hours >> >> They are a 3rd-party partner that offers an integrated Spectrum Protect >> solution for large filer backups. >> >> >> Del >> >> >> >> "ADSM: Dist Stor Manager" wrote on 07/09/2018 >> 09:17:06 AM: >> >>> From: Zoltan Forray >>> To: ADSM-L@VM.MARIST.EDU >>> Date: 07/09/2018 09:17 AM >>> Subject: Re: Looking for suggestions to deal with large backups not >>> completing in
Re: Looking for suggestions to deal with large backups not completing in 24-hours
I will need to translate to English but I gather it is talking about the RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP to 5 on the server (will try going higher), not sure how much good it would do since the backup schedule uses OBJECTS to point to a specific/single mountpoint/filesystem (see below) but is worth trying to bump the RESOURCEUTILIZATION value on the client even higher... We have checked the dsminstr.log file and it is spending 92% of the time in PROCESS DIRS (no surprise) 7:46:25 AM SUN : q schedule * ISILON-SOM-SOMADFS1 f=d Policy Domain Name: DFS Schedule Name: ISILON-SOM-SOMADFS1 Description: ISILON-SOM-SOMADFS1 Action: Incremental Subaction: Options: -subdir=yes Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\* Priority: 5 Start Date/Time: 12/05/2017 08:30:00 Duration: 1 Hour(s) Maximum Run Time (Minutes): 0 Schedule Style: Enhanced Period: Day of Week: Any Month: Any Day of Month: Any Week of Month: Any Expiration: Last Update by (administrator): ZFORRAY Last Update Date/Time: 01/12/2018 10:30:48 Managing profile: On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas wrote: > It is possible to da a parallel backup of file system parts. > https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) > have a > look on page 10. > > --- > Jonas Jansen > > IT Center > Gruppe: Server & Storage > Abteilung: Systeme & Betrieb > RWTH Aachen University > Seffenter Weg 23 > 52074 Aachen > Tel: +49 241 80-28784 > Fax: +49 241 80-22134 > jan...@itc.rwth-aachen.de > www.itc.rwth-aachen.de > > -Original Message- > From: ADSM: Dist Stor Manager On Behalf Of Del > Hoobler > Sent: Monday, July 9, 2018 3:29 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups > not > completing in 24-hours > > They are a 3rd-party partner that offers an integrated Spectrum Protect > solution for large filer backups. > > > Del > > > > "ADSM: Dist Stor Manager" wrote on 07/09/2018 > 09:17:06 AM: > > > From: Zoltan Forray > > To: ADSM-L@VM.MARIST.EDU > > Date: 07/09/2018 09:17 AM > > Subject: Re: Looking for suggestions to deal with large backups not > > completing in 24-hours > > Sent by: "ADSM: Dist Stor Manager" > > > > Thanks Del. Very interesting. Are they a VAR for IBM? > > > > Not sure if it would work in the current configuration we are using to > back > > up ISILON. I have passed the info on. > > > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker > red-flagged > > on "The easy way to incrementally backup billons of objects" (billions). > > So if you know anybody at the company, please pass it on to them. > > > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > > > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > > > INVALID URI REMOVED > > > > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia > SHvJObTbx- > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75 > lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > > ach7r-VHXaLNVD_E&e= > > > > > > > > > Del > > > > > > > > > "ADSM: Dist Stor Manager" wrote on 07/05/2018 > > > 02:52:27 PM: > > > > > > > From: Zoltan Forray > > > > To: ADSM-L@VM.MARIST.EDU > > > > Date: 07/05/2018 02:53 PM > > > > Subject: Looking for suggestions to deal with large backups not > > > > completing in 24-hours > > > > Sent by: "ADSM: Dist Stor Manager" > > > > > > > > As I have mentioned in the past, we have gone through large > migrations > > > to > > > > DFS based storage on EMC ISILON hardware. As you may recall, we > backup > > > > these DFS mounts (about 90 at last count) using multiple Windows > servers > > > > that run multiple ISP nodes (about 30-each) and they access each DFS > > > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > > > > > > > This has lead to lots of performance issue with backups and some > &
Re: Looking for suggestions to deal with large backups not completing in 24-hours
It is possible to da a parallel backup of file system parts. https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) have a look on page 10. --- Jonas Jansen IT Center Gruppe: Server & Storage Abteilung: Systeme & Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-28784 Fax: +49 241 80-22134 jan...@itc.rwth-aachen.de www.itc.rwth-aachen.de -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Del Hoobler Sent: Monday, July 9, 2018 3:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours They are a 3rd-party partner that offers an integrated Spectrum Protect solution for large filer backups. Del "ADSM: Dist Stor Manager" wrote on 07/09/2018 09:17:06 AM: > From: Zoltan Forray > To: ADSM-L@VM.MARIST.EDU > Date: 07/09/2018 09:17 AM > Subject: Re: Looking for suggestions to deal with large backups not > completing in 24-hours > Sent by: "ADSM: Dist Stor Manager" > > Thanks Del. Very interesting. Are they a VAR for IBM? > > Not sure if it would work in the current configuration we are using to back > up ISILON. I have passed the info on. > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker red-flagged > on "The easy way to incrementally backup billons of objects" (billions). > So if you know anybody at the company, please pass it on to them. > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > INVALID URI REMOVED > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia SHvJObTbx- > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75 lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > ach7r-VHXaLNVD_E&e= > > > > > > Del > > > > > > "ADSM: Dist Stor Manager" wrote on 07/05/2018 > > 02:52:27 PM: > > > > > From: Zoltan Forray > > > To: ADSM-L@VM.MARIST.EDU > > > Date: 07/05/2018 02:53 PM > > > Subject: Looking for suggestions to deal with large backups not > > > completing in 24-hours > > > Sent by: "ADSM: Dist Stor Manager" > > > > > > As I have mentioned in the past, we have gone through large migrations > > to > > > DFS based storage on EMC ISILON hardware. As you may recall, we backup > > > these DFS mounts (about 90 at last count) using multiple Windows servers > > > that run multiple ISP nodes (about 30-each) and they access each DFS > > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > > > > > This has lead to lots of performance issue with backups and some > > > departments are now complain that their backups are running into > > > multiple-days in some cases. > > > > > > One such case in a department with 2-nodes with over 30-million objects > > for > > > each node. In the past, their backups were able to finish quicker since > > > they were accessed via dedicated servers and were able to use Journaling > > to > > > reduce the scan times. Unless things have changed, I believe Journling > > is > > > not an option due to how the files are accessed. > > > > > > FWIW, average backups are usually <50k files and <200GB once it finished > > > scanning. > > > > > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since > > many > > > of these objects haven't been accessed in many years old. But as I > > > understand it, that won't work either given our current configuration. > > > > > > Given the current DFS configuration (previously CIFS), what can we do to > > > improve backup performance? > > > > > > So, any-and-all ideas are up for discussion. There is even discussion > > on > > > replacing ISP/TSM due to these issues/limitations. > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator > > > VMware Administrator > > > Virginia Commonwealth University > > > UCC/Office of Technology Services > > > www.ucc.vcu.edu > > > zfor...@vcu.edu - 804-828-4807 > > > Don't be a phishing victim - VCU and other reputable organizations will > > > never use email to request that you reply with your password, social > > > security number or confidential per
Re: Looking for suggestions to deal with large backups not completing in 24-hours
They are a 3rd-party partner that offers an integrated Spectrum Protect solution for large filer backups. Del "ADSM: Dist Stor Manager" wrote on 07/09/2018 09:17:06 AM: > From: Zoltan Forray > To: ADSM-L@VM.MARIST.EDU > Date: 07/09/2018 09:17 AM > Subject: Re: Looking for suggestions to deal with large backups not > completing in 24-hours > Sent by: "ADSM: Dist Stor Manager" > > Thanks Del. Very interesting. Are they a VAR for IBM? > > Not sure if it would work in the current configuration we are using to back > up ISILON. I have passed the info on. > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker red-flagged > on "The easy way to incrementally backup billons of objects" (billions). > So if you know anybody at the company, please pass it on to them. > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > INVALID URI REMOVED > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > ach7r-VHXaLNVD_E&e= > > > > > > Del > > > > > > "ADSM: Dist Stor Manager" wrote on 07/05/2018 > > 02:52:27 PM: > > > > > From: Zoltan Forray > > > To: ADSM-L@VM.MARIST.EDU > > > Date: 07/05/2018 02:53 PM > > > Subject: Looking for suggestions to deal with large backups not > > > completing in 24-hours > > > Sent by: "ADSM: Dist Stor Manager" > > > > > > As I have mentioned in the past, we have gone through large migrations > > to > > > DFS based storage on EMC ISILON hardware. As you may recall, we backup > > > these DFS mounts (about 90 at last count) using multiple Windows servers > > > that run multiple ISP nodes (about 30-each) and they access each DFS > > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > > > > > This has lead to lots of performance issue with backups and some > > > departments are now complain that their backups are running into > > > multiple-days in some cases. > > > > > > One such case in a department with 2-nodes with over 30-million objects > > for > > > each node. In the past, their backups were able to finish quicker since > > > they were accessed via dedicated servers and were able to use Journaling > > to > > > reduce the scan times. Unless things have changed, I believe Journling > > is > > > not an option due to how the files are accessed. > > > > > > FWIW, average backups are usually <50k files and <200GB once it finished > > > scanning. > > > > > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since > > many > > > of these objects haven't been accessed in many years old. But as I > > > understand it, that won't work either given our current configuration. > > > > > > Given the current DFS configuration (previously CIFS), what can we do to > > > improve backup performance? > > > > > > So, any-and-all ideas are up for discussion. There is even discussion > > on > > > replacing ISP/TSM due to these issues/limitations. > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator > > > VMware Administrator > > > Virginia Commonwealth University > > > UCC/Office of Technology Services > > > www.ucc.vcu.edu > > > zfor...@vcu.edu - 804-828-4807 > > > Don't be a phishing victim - VCU and other reputable organizations will > > > never use email to request that you reply with your password, social > > > security number or confidential personal information. For more details > > > visit INVALID URI REMOVED > > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3- > > > a432oKYronO-w1z- > > > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e= > > > > > > > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > Xymon Monitor Administrator > VMware Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > www.ucc.vcu.edu > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit INVALID URI REMOVED > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75lwUZLmc_kYAQxroVCZQUCSs&s=umTd28h- > GlxqSvNShsNIqm8D1PcanVk0HPcP5KTurKw&e= >
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Thanks Del. Very interesting. Are they a VAR for IBM? Not sure if it would work in the current configuration we are using to back up ISILON. I have passed the info on. BTW, FWIW, when I copied/pasted the info, Chrome spell-checker red-flagged on "The easy way to incrementally backup billons of objects" (billions). So if you know anybody at the company, please pass it on to them. On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler wrote: > Another possible idea is to look at General Storage dsmISI MAGS: > > http://www.general-storage.com/PRODUCTS/products.html > > > Del > > > "ADSM: Dist Stor Manager" wrote on 07/05/2018 > 02:52:27 PM: > > > From: Zoltan Forray > > To: ADSM-L@VM.MARIST.EDU > > Date: 07/05/2018 02:53 PM > > Subject: Looking for suggestions to deal with large backups not > > completing in 24-hours > > Sent by: "ADSM: Dist Stor Manager" > > > > As I have mentioned in the past, we have gone through large migrations > to > > DFS based storage on EMC ISILON hardware. As you may recall, we backup > > these DFS mounts (about 90 at last count) using multiple Windows servers > > that run multiple ISP nodes (about 30-each) and they access each DFS > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > > > This has lead to lots of performance issue with backups and some > > departments are now complain that their backups are running into > > multiple-days in some cases. > > > > One such case in a department with 2-nodes with over 30-million objects > for > > each node. In the past, their backups were able to finish quicker since > > they were accessed via dedicated servers and were able to use Journaling > to > > reduce the scan times. Unless things have changed, I believe Journling > is > > not an option due to how the files are accessed. > > > > FWIW, average backups are usually <50k files and <200GB once it finished > > scanning. > > > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since > many > > of these objects haven't been accessed in many years old. But as I > > understand it, that won't work either given our current configuration. > > > > Given the current DFS configuration (previously CIFS), what can we do to > > improve backup performance? > > > > So, any-and-all ideas are up for discussion. There is even discussion > on > > replacing ISP/TSM due to these issues/limitations. > > > > -- > > *Zoltan Forray* > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > Xymon Monitor Administrator > > VMware Administrator > > Virginia Commonwealth University > > UCC/Office of Technology Services > > www.ucc.vcu.edu > > zfor...@vcu.edu - 804-828-4807 > > Don't be a phishing victim - VCU and other reputable organizations will > > never use email to request that you reply with your password, social > > security number or confidential personal information. For more details > > visit INVALID URI REMOVED > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3- > > a432oKYronO-w1z- > > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e= > > > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Another possible idea is to look at General Storage dsmISI MAGS: http://www.general-storage.com/PRODUCTS/products.html Del "ADSM: Dist Stor Manager" wrote on 07/05/2018 02:52:27 PM: > From: Zoltan Forray > To: ADSM-L@VM.MARIST.EDU > Date: 07/05/2018 02:53 PM > Subject: Looking for suggestions to deal with large backups not > completing in 24-hours > Sent by: "ADSM: Dist Stor Manager" > > As I have mentioned in the past, we have gone through large migrations to > DFS based storage on EMC ISILON hardware. As you may recall, we backup > these DFS mounts (about 90 at last count) using multiple Windows servers > that run multiple ISP nodes (about 30-each) and they access each DFS > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > This has lead to lots of performance issue with backups and some > departments are now complain that their backups are running into > multiple-days in some cases. > > One such case in a department with 2-nodes with over 30-million objects for > each node. In the past, their backups were able to finish quicker since > they were accessed via dedicated servers and were able to use Journaling to > reduce the scan times. Unless things have changed, I believe Journling is > not an option due to how the files are accessed. > > FWIW, average backups are usually <50k files and <200GB once it finished > scanning. > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many > of these objects haven't been accessed in many years old. But as I > understand it, that won't work either given our current configuration. > > Given the current DFS configuration (previously CIFS), what can we do to > improve backup performance? > > So, any-and-all ideas are up for discussion. There is even discussion on > replacing ISP/TSM due to these issues/limitations. > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > Xymon Monitor Administrator > VMware Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > www.ucc.vcu.edu > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit INVALID URI REMOVED > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3- > a432oKYronO-w1z- > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e= >
Re: [EXTERNAL] Looking for suggestions to deal with large backups not completing in 24-hours
A couple years ago we decided to replace dozens and dozens of big Windows servers with a centralize Isilon NAS. The Windows servers, being tons of little files, were an ongoing pain to backup with TSM. Our decision was to NOT backup the Isilon to TSM or any other external program. Instead, we decided to use snapshots and replication to a DR Isilon. In other words, we made a conscience decision to stop using TSM to backup this data when we moved to Isilon. We took the opportunity to standardize backup policies to a single snapshot retention of just 32 days to help keep the snapshot disk space down. Other than watching free disk space and a periodic check of replication and snapshots, backup of this data is out of sight and out of mind. Rick -Original Message- From: ADSM: Dist Stor Manager On Behalf Of Zoltan Forray Sent: Thursday, July 5, 2018 2:52 PM To: ADSM-L@VM.MARIST.EDU Subject: [EXTERNAL] Looking for suggestions to deal with large backups not completing in 24-hours As I have mentioned in the past, we have gone through large migrations to DFS based storage on EMC ISILON hardware. As you may recall, we backup these DFS mounts (about 90 at last count) using multiple Windows servers that run multiple ISP nodes (about 30-each) and they access each DFS mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. This has lead to lots of performance issue with backups and some departments are now complain that their backups are running into multiple-days in some cases. One such case in a department with 2-nodes with over 30-million objects for each node. In the past, their backups were able to finish quicker since they were accessed via dedicated servers and were able to use Journaling to reduce the scan times. Unless things have changed, I believe Journling is not an option due to how the files are accessed. FWIW, average backups are usually <50k files and <200GB once it finished scanning. Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many of these objects haven't been accessed in many years old. But as I understand it, that won't work either given our current configuration. Given the current DFS configuration (previously CIFS), what can we do to improve backup performance? So, any-and-all ideas are up for discussion. There is even discussion on replacing ISP/TSM due to these issues/limitations. -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/ -- The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.
Re: Looking for suggestions to deal with large backups not completing in 24-hours
We've implemented file count quotas in addition to our existing byte quotas to try to avoid this situation. You can improve some things (metadata on SSDs, maybe get an accelerator node if Isilon still offers those) but the fact is that metadata is expensive in terms of CPU (both client and server) and disk. We chose 1 million objects/TB of allocated disk space. We sort of compete with a storage system offered by our central IT organization, and picked a limit higher than what they would provide. To be honest, though, we're retiring our Isilon systems because the performance/scalability/cost ratios just aren't as great as they used to be. Our new storage is GPFS and mmbackup works much better with huge number of files, though it's still not great. In particular, the filelist generation is based around UNIX sort which is definitely a memory pig, though it can be split across multiple systems so can scale out pretty well. On Thu, Jul 05, 2018 at 02:52:27PM -0400, Zoltan Forray wrote: > As I have mentioned in the past, we have gone through large migrations to > DFS based storage on EMC ISILON hardware. As you may recall, we backup > these DFS mounts (about 90 at last count) using multiple Windows servers > that run multiple ISP nodes (about 30-each) and they access each DFS > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > This has lead to lots of performance issue with backups and some > departments are now complain that their backups are running into > multiple-days in some cases. > > One such case in a department with 2-nodes with over 30-million objects for > each node. In the past, their backups were able to finish quicker since > they were accessed via dedicated servers and were able to use Journaling to > reduce the scan times. Unless things have changed, I believe Journling is > not an option due to how the files are accessed. > > FWIW, average backups are usually <50k files and <200GB once it finished > scanning. > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many > of these objects haven't been accessed in many years old. But as I > understand it, that won't work either given our current configuration. > > Given the current DFS configuration (previously CIFS), what can we do to > improve backup performance? > > So, any-and-all ideas are up for discussion. There is even discussion on > replacing ISP/TSM due to these issues/limitations. > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > Xymon Monitor Administrator > VMware Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > www.ucc.vcu.edu > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://phishing.vcu.edu/ -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Zoltan I kind of agree with Ung Yi What is the purpose of your TSM backups? DR? Long term retention for auditability/sarbox/other regulation? It may well be that a daily or even more frequent snapshot regime might be the best way to get back that recently lost/deleted/corrupted file. Use a TSM backup of a weekly point-of-consistency snapshot as your long term strategy. Of course a better option would be an embedded TSM client on the Isilon itself, but the commercial realities are that will never happen. Cheers Steve Steven Harris TSM Admin Canberra Australia -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Yi, Ung Sent: Friday, 6 July 2018 6:36 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours Hello, I don’t know much about Isilon. There might be SAN level snap backups option for Isilon. For our Data domain, we replicate from Main site to DR site, then take snap at our DR site every night. Each snap is consider a backup. Thank you. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Zoltan Forray Sent: Thursday, July 05, 2018 2:52 PM To: ADSM-L@VM.MARIST.EDU Subject: Looking for suggestions to deal with large backups not completing in 24-hours As I have mentioned in the past, we have gone through large migrations to DFS based storage on EMC ISILON hardware. As you may recall, we backup these DFS mounts (about 90 at last count) using multiple Windows servers that run multiple ISP nodes (about 30-each) and they access each DFS mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. This has lead to lots of performance issue with backups and some departments are now complain that their backups are running into multiple-days in some cases. One such case in a department with 2-nodes with over 30-million objects for each node. In the past, their backups were able to finish quicker since they were accessed via dedicated servers and were able to use Journaling to reduce the scan times. Unless things have changed, I believe Journling is not an option due to how the files are accessed. FWIW, average backups are usually <50k files and <200GB once it finished scanning. Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many of these objects haven't been accessed in many years old. But as I understand it, that won't work either given our current configuration. Given the current DFS configuration (previously CIFS), what can we do to improve backup performance? So, any-and-all ideas are up for discussion. There is even discussion on replacing ISP/TSM due to these issues/limitations. -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ucc.vcu.edu&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=KsUuBwu8G3pWJ7R7hedi0ZISk0CjIRrWQMJneyjNxD4&e= zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit https://urldefense.proofpoint.com/v2/url?u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=xiPt_TkUv02i1b7VQfKybQwokZGIKegAHQtBFG_G19U&e= This message and any attachment is confidential and may be privileged or otherwise protected from disclosure. You should immediately delete the message if you are not the intended recipient. If you have received this email by mistake please delete it from your system; you should not copy the message or disclose its content to anyone. This electronic communication may contain general financial product advice but should not be relied upon or construed as a recommendation of any financial product. The information has been prepared without taking into account your objectives, financial situation or needs. You should consider the Product Disclosure Statement relating to the financial product and consult your financial adviser before making a decision about whether to acquire, hold or dispose of a financial product. For further details on the financial product please go to http://www.bt.com.au Past performance is not a reliable indicator of future performance.
Re: Looking for suggestions to deal with large backups not completing in 24-hours
Hello, I don’t know much about Isilon. There might be SAN level snap backups option for Isilon. For our Data domain, we replicate from Main site to DR site, then take snap at our DR site every night. Each snap is consider a backup. Thank you. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Zoltan Forray Sent: Thursday, July 05, 2018 2:52 PM To: ADSM-L@VM.MARIST.EDU Subject: Looking for suggestions to deal with large backups not completing in 24-hours As I have mentioned in the past, we have gone through large migrations to DFS based storage on EMC ISILON hardware. As you may recall, we backup these DFS mounts (about 90 at last count) using multiple Windows servers that run multiple ISP nodes (about 30-each) and they access each DFS mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. This has lead to lots of performance issue with backups and some departments are now complain that their backups are running into multiple-days in some cases. One such case in a department with 2-nodes with over 30-million objects for each node. In the past, their backups were able to finish quicker since they were accessed via dedicated servers and were able to use Journaling to reduce the scan times. Unless things have changed, I believe Journling is not an option due to how the files are accessed. FWIW, average backups are usually <50k files and <200GB once it finished scanning. Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many of these objects haven't been accessed in many years old. But as I understand it, that won't work either given our current configuration. Given the current DFS configuration (previously CIFS), what can we do to improve backup performance? So, any-and-all ideas are up for discussion. There is even discussion on replacing ISP/TSM due to these issues/limitations. -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ucc.vcu.edu&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=KsUuBwu8G3pWJ7R7hedi0ZISk0CjIRrWQMJneyjNxD4&e= zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit https://urldefense.proofpoint.com/v2/url?u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=xiPt_TkUv02i1b7VQfKybQwokZGIKegAHQtBFG_G19U&e=
Looking for suggestions to deal with large backups not completing in 24-hours
As I have mentioned in the past, we have gone through large migrations to DFS based storage on EMC ISILON hardware. As you may recall, we backup these DFS mounts (about 90 at last count) using multiple Windows servers that run multiple ISP nodes (about 30-each) and they access each DFS mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. This has lead to lots of performance issue with backups and some departments are now complain that their backups are running into multiple-days in some cases. One such case in a department with 2-nodes with over 30-million objects for each node. In the past, their backups were able to finish quicker since they were accessed via dedicated servers and were able to use Journaling to reduce the scan times. Unless things have changed, I believe Journling is not an option due to how the files are accessed. FWIW, average backups are usually <50k files and <200GB once it finished scanning. Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many of these objects haven't been accessed in many years old. But as I understand it, that won't work either given our current configuration. Given the current DFS configuration (previously CIFS), what can we do to improve backup performance? So, any-and-all ideas are up for discussion. There is even discussion on replacing ISP/TSM due to these issues/limitations. -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/