subject:"Looking for suggestions to deal with large backups not completing in 24\-hours"

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-20 Thread Lars Henningsen

Hi Bjørn,

actually they improved the isi change list a lot with OneFS8 and performance is 
no longer really that much of an issue - at least not with 8 to 9-figure number 
of objects in the file system. My problem (and the main reason why we haven’t 
integrated it yet into MAGS) is that it is a 99.9% kind of a function. Just 
like NetApp Snapdiff, you’d always have to recommend doing periodic full 
incrementals to catch whatever was missed when calculating the snapshot 
difference (and there usually is). Just like with Snapdiff, you’ll have the 
back and forth when it comes to who’s problem it is if it doesn’t work and what 
combination of client and OS on the filer is supported. At the end of the day 
you’ll have to be able to run a full incremental within an acceptable period of 
time - and if you have to be able to do that anyway, why not make backup fast 
enough to run the real thing every day? And using a journal or snapshot 
difference for backup doesn’t benefit restores one bit, of course.

Regards

Lars Henningsen
General Storage

> On 20. Jul 2018, at 15:48, Bjørn Nachtwey  wrote:
> 
> Hi all,
> 
> yes there's a special daemon that might be used -- in theory :-)
> in pratice it worked only for small filesystem sizes ... and if it's filled 
> partially.
> 
> A guy from the concat company did some tests and told me they were totally 
> disappointing as this deamon consumes too many ressources if you let it write 
> a protocol file which you can use to identify changed, added and deleted 
> files.
> But as far as i know it didn't give a list of changed files just logs all 
> changes on the files with the kind of the change.
> 
> @Lars (Henningsen): Do you know some more details?
> 
> best
> Bjørn
> 
> Skylar Thompson wrote:
>> Sadly, no. I made a feature request for this years ago (back when Isilon
>> was Isilon) but it didn't go anywhere. At this point, our days of running
>> Isilon storage are numbered, and we'll be investing in DDN/GPFS for the
>> forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon
>> on the matter.
>> On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote:
>>> Is there no journaling/logging service on these Isilions that could be used 
>>> to maintain a list of changed files and hand-roll a 
>>> dsmc-selective-with-file-list process similar to what GPFS uses?
>>> 
>>> Cheers
>>> 
>>> Steve
> 
> >> [...]

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-20 Thread Bjørn Nachtwey


Hi all,

yes there's a special daemon that might be used -- in theory :-)
in pratice it worked only for small filesystem sizes ... and if it's 
filled partially.


A guy from the concat company did some tests and told me they were 
totally disappointing as this deamon consumes too many ressources if you 
let it write a protocol file which you can use to identify changed, 
added and deleted files.
But as far as i know it didn't give a list of changed files just logs 
all changes on the files with the kind of the change.


@Lars (Henningsen): Do you know some more details?

best
Bjørn

Skylar Thompson wrote:

Sadly, no. I made a feature request for this years ago (back when Isilon
was Isilon) but it didn't go anywhere. At this point, our days of running
Isilon storage are numbered, and we'll be investing in DDN/GPFS for the
forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon
on the matter.

On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote:

Is there no journaling/logging service on these Isilions that could be used to 
maintain a list of changed files and hand-roll a dsmc-selective-with-file-list 
process similar to what GPFS uses?

Cheers

Steve


 >> [...]

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-19 Thread Skylar Thompson

Sadly, no. I made a feature request for this years ago (back when Isilon
was Isilon) but it didn't go anywhere. At this point, our days of running
Isilon storage are numbered, and we'll be investing in DDN/GPFS for the
forseeable future, so I haven't really had leverage to push Dell/EMC/Isilon
on the matter.

On Thu, Jul 19, 2018 at 11:31:06PM +, Harris, Steven wrote:
> Is there no journaling/logging service on these Isilions that could be used 
> to maintain a list of changed files and hand-roll a 
> dsmc-selective-with-file-list process similar to what GPFS uses? 
> 
> Cheers
> 
> Steve
> 
> -Original Message-
> From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of 
> Richard Cowen
> Sent: Friday, 20 July 2018 6:15 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
> completing in 24-hours
> 
> Canary! I like it!
> Richard
> 
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Skylar 
> Thompson
> Sent: Thursday, July 19, 2018 10:37 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
> completing in 24-hours
> 
> There's a couple ways we've gotten around this problem:
> 
> 1. For NFS backups, we don't let TSM do partial incremental backups, even if 
> we have the filesystem split up. Instead, we mount sub-directories of the 
> filesystem root on our proxy nodes. This has the double advantage of letting 
> us break up the filesystem into multiple TSM filespaces (giving us 
> directory-level backup status reporting, and parallelism in TSM when we have 
> COLLOCG=FILESPACE), and also parallelism at the NFS level when there are 
> multiple NFS targets we can talk to (as in the case with Isilon).
> 
> 2. For GPFS backups, in some cases we can setup independent filesets and let 
> mmbackup process each as a separate filesystem, though we have some instances 
> where the end users want an entire GPFS filesystem to have one inode space so 
> they can do atomic moves as renames. In either case, though, mmbackup does 
> its own "incremental" backups with filelists passed to "dsmc selective", 
> which don't update the last-backup time on the TSM filespace. Our workaround 
> has been to run mmbackup via a preschedule command, and have the actual TSM 
> incremental backup be of an empty directory (I call them canary directories 
> in our documentation) that's set as a virtual mountpoint. dsmc will only run 
> the backup portion of its scheduled task if the preschedule command succeeds, 
> so if mmbackup fails, the canary never gets backed up, and will raise an 
> alert.
> 
> On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote:
> > @All
> > 
> > possibly the biggest issue when backing up massive file systems in parallel 
> > with multiple dsmc processes is expiration. Once you back up a directory 
> > with ???subdir no???, a no longer existing directory object on that level 
> > is expired properly and becomes inactive. However everything underneath 
> > that remains active and doesn???t expire (ever) unless you run a ???full??? 
> > incremental on the level above (with ???subdir yes???) - and that kind of 
> > defeats the purpose of parallelisation. Other pitfalls include avoiding 
> > swapping, keeping log files consistent (dsmc doesn???t do thread awareness 
> > when logging - it assumes being alone), handling the local dedup cache, 
> > updating backup timestamps for a file space on the server, distributing 
> > load evenly across multiple nodes on a scale-out filer, backing up from 
> > snapshots, chunking file systems up into even parts automatically so you 
> > don???t end up with lots of small jobs and one big one, dynamically 
> > distributing load across multiple ???proxies??? if one isn???t enough, 
> > handling exceptions, handling directories with characters you can???t parse 
> > to dsmc via the command line, consolidating results in a single, 
> > comprehensible overview similar to the summary of a regular incremental, 
> > being able to do it all in reverse for a massively parallel restore??? the 
> > list is quite long.
> > 
> > We developed MAGS (as mentioned by Del) to cope with all that - and more. I 
> > can only recommend trying it out for free.
> > 
> > Regards
> > 
> > Lars Henningsen
> > General Storage
> 
> --
> -- Skylar Thompson (skyl...@u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
> 
> Th

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-19 Thread Harris, Steven

Is there no journaling/logging service on these Isilions that could be used to 
maintain a list of changed files and hand-roll a dsmc-selective-with-file-list 
process similar to what GPFS uses? 

Cheers

Steve

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of 
Richard Cowen
Sent: Friday, 20 July 2018 6:15 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

Canary! I like it!
Richard

-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Skylar 
Thompson
Sent: Thursday, July 19, 2018 10:37 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

There's a couple ways we've gotten around this problem:

1. For NFS backups, we don't let TSM do partial incremental backups, even if we 
have the filesystem split up. Instead, we mount sub-directories of the 
filesystem root on our proxy nodes. This has the double advantage of letting us 
break up the filesystem into multiple TSM filespaces (giving us directory-level 
backup status reporting, and parallelism in TSM when we have 
COLLOCG=FILESPACE), and also parallelism at the NFS level when there are 
multiple NFS targets we can talk to (as in the case with Isilon).

2. For GPFS backups, in some cases we can setup independent filesets and let 
mmbackup process each as a separate filesystem, though we have some instances 
where the end users want an entire GPFS filesystem to have one inode space so 
they can do atomic moves as renames. In either case, though, mmbackup does its 
own "incremental" backups with filelists passed to "dsmc selective", which 
don't update the last-backup time on the TSM filespace. Our workaround has been 
to run mmbackup via a preschedule command, and have the actual TSM incremental 
backup be of an empty directory (I call them canary directories in our 
documentation) that's set as a virtual mountpoint. dsmc will only run the 
backup portion of its scheduled task if the preschedule command succeeds, so if 
mmbackup fails, the canary never gets backed up, and will raise an alert.

On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote:
> @All
> 
> possibly the biggest issue when backing up massive file systems in parallel 
> with multiple dsmc processes is expiration. Once you back up a directory with 
> ???subdir no???, a no longer existing directory object on that level is 
> expired properly and becomes inactive. However everything underneath that 
> remains active and doesn???t expire (ever) unless you run a ???full??? 
> incremental on the level above (with ???subdir yes???) - and that kind of 
> defeats the purpose of parallelisation. Other pitfalls include avoiding 
> swapping, keeping log files consistent (dsmc doesn???t do thread awareness 
> when logging - it assumes being alone), handling the local dedup cache, 
> updating backup timestamps for a file space on the server, distributing load 
> evenly across multiple nodes on a scale-out filer, backing up from snapshots, 
> chunking file systems up into even parts automatically so you don???t end up 
> with lots of small jobs and one big one, dynamically distributing load across 
> multiple ???proxies??? if one isn???t enough, handling exceptions, handling 
> directories with characters you can???t parse to dsmc via the command line, 
> consolidating results in a single, comprehensible overview similar to the 
> summary of a regular incremental, being able to do it all in reverse for a 
> massively parallel restore??? the list is quite long.
> 
> We developed MAGS (as mentioned by Del) to cope with all that - and more. I 
> can only recommend trying it out for free.
> 
> Regards
> 
> Lars Henningsen
> General Storage

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

This message and any attachment is confidential and may be privileged or 
otherwise protected from disclosure. You should immediately delete the message 
if you are not the intended recipient. If you have received this email by 
mistake please delete it from your system; you should not copy the message or 
disclose its content to anyone. 

This electronic communication may contain general financial product advice but 
should not be relied upon or construed as a recommendation of any financial 
product. The information has been prepared without taking into account your 
objectives, financial situation or needs. You should consider the Product 
Disclosure Statement relating to the financial product and consult your 
financial adviser before making a decision about whether to acquire, hold or 
dispose of a financial product. 

For fu

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-19 Thread Richard Cowen

Canary! I like it!
Richard

-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Skylar 
Thompson
Sent: Thursday, July 19, 2018 10:37 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

There's a couple ways we've gotten around this problem:

1. For NFS backups, we don't let TSM do partial incremental backups, even if we 
have the filesystem split up. Instead, we mount sub-directories of the 
filesystem root on our proxy nodes. This has the double advantage of letting us 
break up the filesystem into multiple TSM filespaces (giving us directory-level 
backup status reporting, and parallelism in TSM when we have 
COLLOCG=FILESPACE), and also parallelism at the NFS level when there are 
multiple NFS targets we can talk to (as in the case with Isilon).

2. For GPFS backups, in some cases we can setup independent filesets and let 
mmbackup process each as a separate filesystem, though we have some instances 
where the end users want an entire GPFS filesystem to have one inode space so 
they can do atomic moves as renames. In either case, though, mmbackup does its 
own "incremental" backups with filelists passed to "dsmc selective", which 
don't update the last-backup time on the TSM filespace. Our workaround has been 
to run mmbackup via a preschedule command, and have the actual TSM incremental 
backup be of an empty directory (I call them canary directories in our 
documentation) that's set as a virtual mountpoint. dsmc will only run the 
backup portion of its scheduled task if the preschedule command succeeds, so if 
mmbackup fails, the canary never gets backed up, and will raise an alert.

On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote:
> @All
> 
> possibly the biggest issue when backing up massive file systems in parallel 
> with multiple dsmc processes is expiration. Once you back up a directory with 
> ???subdir no???, a no longer existing directory object on that level is 
> expired properly and becomes inactive. However everything underneath that 
> remains active and doesn???t expire (ever) unless you run a ???full??? 
> incremental on the level above (with ???subdir yes???) - and that kind of 
> defeats the purpose of parallelisation. Other pitfalls include avoiding 
> swapping, keeping log files consistent (dsmc doesn???t do thread awareness 
> when logging - it assumes being alone), handling the local dedup cache, 
> updating backup timestamps for a file space on the server, distributing load 
> evenly across multiple nodes on a scale-out filer, backing up from snapshots, 
> chunking file systems up into even parts automatically so you don???t end up 
> with lots of small jobs and one big one, dynamically distributing load across 
> multiple ???proxies??? if one isn???t enough, handling exceptions, handling 
> directories with characters you can???t parse to dsmc via the command line, 
> consolidating results in a single, comprehensible overview similar to the 
> summary of a regular incremental, being able to do it all in reverse for a 
> massively parallel restore??? the list is quite long.
> 
> We developed MAGS (as mentioned by Del) to cope with all that - and more. I 
> can only recommend trying it out for free.
> 
> Regards
> 
> Lars Henningsen
> General Storage

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-19 Thread Skylar Thompson

There's a couple ways we've gotten around this problem:

1. For NFS backups, we don't let TSM do partial incremental backups, even
if we have the filesystem split up. Instead, we mount sub-directories of the
filesystem root on our proxy nodes. This has the double advantage of
letting us break up the filesystem into multiple TSM filespaces (giving us
directory-level backup status reporting, and parallelism in TSM when we
have COLLOCG=FILESPACE), and also parallelism at the NFS level when there
are multiple NFS targets we can talk to (as in the case with Isilon).

2. For GPFS backups, in some cases we can setup independent filesets and
let mmbackup process each as a separate filesystem, though we have some
instances where the end users want an entire GPFS filesystem to have one
inode space so they can do atomic moves as renames. In either case,
though, mmbackup does its own "incremental" backups with filelists passed
to "dsmc selective", which don't update the last-backup time on the TSM
filespace. Our workaround has been to run mmbackup via a preschedule
command, and have the actual TSM incremental backup be of an empty
directory (I call them canary directories in our documentation) that's set
as a virtual mountpoint. dsmc will only run the backup portion of its
scheduled task if the preschedule command succeeds, so if mmbackup fails,
the canary never gets backed up, and will raise an alert.

On Wed, Jul 18, 2018 at 03:07:16PM +0200, Lars Henningsen wrote:
> @All
> 
> possibly the biggest issue when backing up massive file systems in parallel 
> with multiple dsmc processes is expiration. Once you back up a directory with 
> ???subdir no???, a no longer existing directory object on that level is 
> expired properly and becomes inactive. However everything underneath that 
> remains active and doesn???t expire (ever) unless you run a ???full??? 
> incremental on the level above (with ???subdir yes???) - and that kind of 
> defeats the purpose of parallelisation. Other pitfalls include avoiding 
> swapping, keeping log files consistent (dsmc doesn???t do thread awareness 
> when logging - it assumes being alone), handling the local dedup cache, 
> updating backup timestamps for a file space on the server, distributing load 
> evenly across multiple nodes on a scale-out filer, backing up from snapshots, 
> chunking file systems up into even parts automatically so you don???t end up 
> with lots of small jobs and one big one, dynamically distributing load across 
> multiple ???proxies??? if one isn???t enough, handling exceptions, handling 
> directories with characters you can???t parse to dsmc via the command line, 
> consolidating results in a single, comprehensible overview similar to the 
> summary of a regular incremental, being able to do it all in reverse for a 
> massively parallel restore??? the list is quite long.
> 
> We developed MAGS (as mentioned by Del) to cope with all that - and more. I 
> can only recommend trying it out for free.
> 
> Regards
> 
> Lars Henningsen
> General Storage

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-18 Thread Lars Henningsen

@All

possibly the biggest issue when backing up massive file systems in parallel 
with multiple dsmc processes is expiration. Once you back up a directory with 
“subdir no”, a no longer existing directory object on that level is expired 
properly and becomes inactive. However everything underneath that remains 
active and doesn’t expire (ever) unless you run a “full” incremental on the 
level above (with “subdir yes”) - and that kind of defeats the purpose of 
parallelisation. Other pitfalls include avoiding swapping, keeping log files 
consistent (dsmc doesn’t do thread awareness when logging - it assumes being 
alone), handling the local dedup cache, updating backup timestamps for a file 
space on the server, distributing load evenly across multiple nodes on a 
scale-out filer, backing up from snapshots, chunking file systems up into even 
parts automatically so you don’t end up with lots of small jobs and one big 
one, dynamically distributing load across multiple “proxies” if one isn’t 
enough, handling exceptions, handling directories with characters you can’t 
parse to dsmc via the command line, consolidating results in a single, 
comprehensible overview similar to the summary of a regular incremental, being 
able to do it all in reverse for a massively parallel restore… the list is 
quite long.

We developed MAGS (as mentioned by Del) to cope with all that - and more. I can 
only recommend trying it out for free.

Regards

Lars Henningsen
General Storage

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

2018-07-18 Thread Bjørn Nachtwey


Hi Skylar,

Skylar Thompson wrote:

One thing to be aware of with partial incremental backups is the danger of
backing up data multiple times if the mount points are nested. For
instance,

/mnt/backup/some-dir
/mnt/backup/some-dir/another-dir

Under normal operation, a node with DOMAIN set to "/mnt/backup/some-dir
/mnt/backup/some-dir/another-dir" will backup the contents of 
/mnt/backup/some-dir/another-dir
as a separate filespace, *and also* will backup another-dir as a
subdirectory of the /mnt/backup/some-dir filespace. We reported this as a
bug, and IBM pointed us at this flag that can be passed as a scheduler
option to prevent this:

-TESTFLAG=VMPUNDERNFSENABLED


good point,
even if my script works a little bit differently:
by now the starting folder is not red from the "dsm.opt" file but given 
in the configuration file for my script "dsmci.cfg". so one run can work 
for one node starting on a subfolder (done si as windows has no 
VIRTUALMOUNTPOINT option)
Within the this config file several starting folders can be declared and 
my script creates in the first step a global list of all folders to be 
backed up "partially incremental"


=> well, i'm not sure if i check for multiple entries in the list
=> and if the nesting is done on a deeper level than the list is created 
from, i think i won't be aware of such a set-up


i will check this -- thanks for the advice!

best
Bjørn


On Tue, Jul 17, 2018 at 04:12:17PM +0200, Bjrn Nachtwey wrote:

Hi Zoltan,

OK, i will translate my text as there are some more approaches discussed :-)

breaking up the filesystems in several nodes will work as long as the nodes
are of suffiecient size.

I'm not sure if a PROXY node will solve the problem, because each "member
node" will backup the whole mountpoint. You will need to do partial
incremental backups. I expect you will do this based on folders, do you?
So, some questions:
1) how will you distribute the folders to the nodes?
2) how will you ensure new folders are processed by one of your "member
nodes"? On our filers many folders are created and deleted, sometimes a
whole bunch every day. So for me, it was no option to maintain the option
file manually. The approach from my script / "MAGS" does this somehow
"automatically".
3) what happens if the folders grew not evenly and all the big ones are
backed up by one of your nodes? (OK you can change the distribution or even
add another node)
4) Are you going to map each backupnode to different nodes of the isilon
cluster to distribute the traffic / workload for the isilon nodes?

best
Bjørn




--
--
Bjørn Nachtwey

Arbeitsgruppe "IT-Infrastruktur“
Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de
--
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de
Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de
Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de
Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598
--
Zertifiziert nach ISO 9001
--

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

2018-07-17 Thread Skylar Thompson

One thing to be aware of with partial incremental backups is the danger of
backing up data multiple times if the mount points are nested. For
instance,

/mnt/backup/some-dir
/mnt/backup/some-dir/another-dir

Under normal operation, a node with DOMAIN set to "/mnt/backup/some-dir
/mnt/backup/some-dir/another-dir" will backup the contents of 
/mnt/backup/some-dir/another-dir
as a separate filespace, *and also* will backup another-dir as a
subdirectory of the /mnt/backup/some-dir filespace. We reported this as a
bug, and IBM pointed us at this flag that can be passed as a scheduler
option to prevent this:

-TESTFLAG=VMPUNDERNFSENABLED

On Tue, Jul 17, 2018 at 04:12:17PM +0200, Bjrn Nachtwey wrote:
> Hi Zoltan,
> 
> OK, i will translate my text as there are some more approaches discussed :-)
> 
> breaking up the filesystems in several nodes will work as long as the nodes
> are of suffiecient size.
> 
> I'm not sure if a PROXY node will solve the problem, because each "member
> node" will backup the whole mountpoint. You will need to do partial
> incremental backups. I expect you will do this based on folders, do you?
> So, some questions:
> 1) how will you distribute the folders to the nodes?
> 2) how will you ensure new folders are processed by one of your "member
> nodes"? On our filers many folders are created and deleted, sometimes a
> whole bunch every day. So for me, it was no option to maintain the option
> file manually. The approach from my script / "MAGS" does this somehow
> "automatically".
> 3) what happens if the folders grew not evenly and all the big ones are
> backed up by one of your nodes? (OK you can change the distribution or even
> add another node)
> 4) Are you going to map each backupnode to different nodes of the isilon
> cluster to distribute the traffic / workload for the isilon nodes?
> 
> best
> Bjørn

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

2018-07-17 Thread Bjørn Nachtwey


Hi Zoltan,

OK, i will translate my text as there are some more approaches discussed :-)

breaking up the filesystems in several nodes will work as long as the 
nodes are of suffiecient size.


I'm not sure if a PROXY node will solve the problem, because each 
"member node" will backup the whole mountpoint. You will need to do 
partial incremental backups. I expect you will do this based on folders, 
do you?

So, some questions:
1) how will you distribute the folders to the nodes?
2) how will you ensure new folders are processed by one of your "member 
nodes"? On our filers many folders are created and deleted, sometimes a 
whole bunch every day. So for me, it was no option to maintain the 
option file manually. The approach from my script / "MAGS" does this 
somehow "automatically".
3) what happens if the folders grew not evenly and all the big ones are 
backed up by one of your nodes? (OK you can change the distribution or 
even add another node)
4) Are you going to map each backupnode to different nodes of the isilon 
cluster to distribute the traffic / workload for the isilon nodes?


best
Bjørn

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

2018-07-17 Thread Zoltan Forray

Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de
> Geschäftsführer: Prof. Dr. Ramin Yahyapour
> Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
> Sitz der Gesellschaft: Göttingen
> Registergericht: Göttingen, Handelsregister-Nr. B 598
> --------------------------------------
>
> Zertifiziert nach ISO 9001
>
> --
>
> -Ursprüngliche Nachricht-
> Von: ADSM: Dist Stor Manager  Im Auftrag von Zoltan
> Forray
> Gesendet: Mittwoch, 11. Juli 2018 13:50
> An: ADSM-L@VM.MARIST.EDU
> Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups
> not completing in 24-hours
>
> I will need to translate to English but I gather it is talking about the
> RESOURCEUTILZATION / MAXNUMMP values.  While we have increased MAXNUMMP to
> 5 on the server (will try going higher),  not sure how much good it would
> do since the backup schedule uses OBJECTS to point to a specific/single
> mountpoint/filesystem (see below) but is worth trying to bump the
> RESOURCEUTILIZATION value on the client even higher...
>
> We have checked the dsminstr.log file and it is spending 92% of the time
> in PROCESS DIRS (no surprise)
>
> 7:46:25 AM   SUN : q schedule * ISILON-SOM-SOMADFS1 f=d
> Policy Domain Name: DFS
>  Schedule Name: ISILON-SOM-SOMADFS1
>Description: ISILON-SOM-SOMADFS1
> Action: Incremental
>  Subaction:
>Options: -subdir=yes
>Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*
>   Priority: 5
>Start Date/Time: 12/05/2017 08:30:00
>   Duration: 1 Hour(s)
> Maximum Run Time (Minutes): 0
> Schedule Style: Enhanced
> Period:
>Day of Week: Any
>  Month: Any
>   Day of Month: Any
>  Week of Month: Any
> Expiration:
> Last Update by (administrator): ZFORRAY
>  Last Update Date/Time: 01/12/2018 10:30:48
>   Managing profile:
>
>
> On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas 
> wrote:
>
> > It is possible to da a parallel backup of file system parts.
> > https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german)
> > have a look on page 10.
> >
> > ---
> > Jonas Jansen
> >
> > IT Center
> > Gruppe: Server & Storage
> > Abteilung: Systeme & Betrieb
> > RWTH Aachen University
> > Seffenter Weg 23
> > 52074 Aachen
> > Tel: +49 241 80-28784
> > Fax: +49 241 80-22134
> > jan...@itc.rwth-aachen.de
> > www.itc.rwth-aachen.de
> >
> > -Original Message-----
> > From: ADSM: Dist Stor Manager  On Behalf Of Del
> > Hoobler
> > Sent: Monday, July 9, 2018 3:29 PM
> > To: ADSM-L@VM.MARIST.EDU
> > Subject: Re: [ADSM-L] Looking for suggestions to deal with large
> > backups not completing in 24-hours
> >
> > They are a 3rd-party partner that offers an integrated Spectrum
> > Protect solution for large filer backups.
> >
> >
> > Del
> >
> > 
> >
> > "ADSM: Dist Stor Manager"  wrote on 07/09/2018
> > 09:17:06 AM:
> >
> > > From: Zoltan Forray 
> > > To: ADSM-L@VM.MARIST.EDU
> > > Date: 07/09/2018 09:17 AM
> > > Subject: Re: Looking for suggestions to deal with large backups not
> > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager"
> > > 
> > >
> > > Thanks Del.  Very interesting.  Are they a VAR for IBM?
> > >
> > > Not sure if it would work in the current configuration we are using
> > > to
> > back
> > > up ISILON. I have passed the info on.
> > >
> > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker
> > red-flagged
> > > on "The easy way to incrementally backup billons of objects"
> (billions).
> > > So if you know anybody at the company, please pass it on to them.
> > >
> > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:
> > >
> > > > Another possible idea is to look at General Storage dsmISI MAGS:
> > > >
> > > > INVALID URI REMOVED
> > >
> >
> > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c
> > =jf_ia
> > SHvJObTbx-
> > &

Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

2018-07-17 Thread Nachtwey, Bjoern

Hi Zoltan,

i will come back to the approach Jonas mentioned (as I'm the author of that 
text: thanks to Jonas for doing this ;-) )

the text is in german of course, but the script has some comments in English 
and will be understandable -- I hope so :-)

the text describes first the problem everybody on this list will know: the 
treewalk takes more times than we have.
TSM/ISP has some opportunities to speed up, such as "-incrbydate", but they do 
not work properly.

So for me the only solution is to parallelize the tree walk and do partial 
incremental backups. 
First tried to write it with BASH commands, but multithreading was not easy to 
implement and second it won't run on windows -- but our largest filers ( 500 TB 
- 1.2 PB) need to be accessed via CIFS to store the ACL information.
My first steps with PowerShell for the Windows cost lots of time and were 
disappointing.
Using PERL made everything really easy as it runs on windows with the 
strawberry perl software and within the script there are only a few 
if-conditions needed to determine between Linux and Windows.

I did some tests according to the depth or the level of the filetree to dive in:
As the subfolders are of unequal size, diving just below the mount point and 
parallelize on the folders of this "first level" mostly does not work well, 
there's (nearly) always one folder taking all the time.
On the other hand diving into all levels will take a certain amount of 
additional time.

The best performance I do see using 3 to 4 levels and 4 to 6 parallel threads 
for each node. Due to separating users and for accounting I have several nodes 
on such large file systems. So in total there are about 20 to 40 streams in 
parallel.

Rudi Wüst mentioned in my text figured out a p520 server running AIX6 will 
support up to 2,000 parallel streams, but as mentioned by Grant using an isilon 
system the filer will be the bottle neck.

As mentioned by Del, you may also test a commercial software "MAGS" by general 
storage, it can addresses multiple isilon nodes in parallel 

If there're any questions -- just ask or have a look on the script:
https://gitlab.gwdg.de/bnachtw/dsmci

// even if the last submit is about 4 month old, the project is still in 
development ;-)



==> maybe I should update and translate the text from the "GWDG news" to 
English? Any interest?


Best
Bjørn


p.s.
A Result from the wild (weekly backup of a node from a 343 TB Quantum StorNext 
File System) :
>>
Process ID:12988
Path processed: 
-
Start time: 2018-07-14 12:00
End time  : 2018-07-15 06:07
total processing time :   3d 15h 59m 23s
total wallclock time  :   18h 7m 30s
effective speedup :4.855 using 6 parallel threads
datatransfertime ratio:3.575 %
-
Objects inspected : 92061596
Objects backed up :  9774876
Objects updated   :0
Objects deleted   :0
Objects expired   : 7696
Objects failed:0
Bytes inspected   :52818.242 (GB)
Bytes transferred : 5063.620 (GB)
-
Number of Errors  :0
Number of Warnings:   43
# of severe Errors:0
# Out-of-Space Errors :0
<<

--
 
Bjørn Nachtwey 

Arbeitsgruppe "IT-Infrastruktur“ 
Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de 
--
 
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) 
Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de 
Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de 
Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de 
Geschäftsführer: Prof. Dr. Ramin Yahyapour 
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau 
Sitz der Gesellschaft: Göttingen 
Registergericht: Göttingen, Handelsregister-Nr. B 598 
--
 
Zertifiziert nach ISO 9001 
--

-Ursprüngliche Nachricht-
Von: ADSM: Dist Stor Manager  Im Auftrag von Zoltan Forray
Gesendet: Mittwoch, 11. Juli 2018 13:50
An: ADSM-L@VM.MARIST.EDU
Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

I will need to translate to English but I gather it is talking about the 
RESOURCEUTILZATION / MAXNUMMP values.  W

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-16 Thread Zoltan Forray

Robert,

Again thanks for the information.  It fills in a lot of missing pieces in
my information.  From what I gather, you are probably doing backups via SAN
not via IP like we do.  Plus as you suggested, breaking up the backup
targets into multiple filesystem/directories to reduce the number of files
each has to scan/manage.  I am pushing this issue right now.

I have always been confused with the whole proxy process but from what I
gather, it isn't that much different from what we are doing right now
except to give you a central management point for restores and backups vs
us using the web-client to give departments a way to manage their own
restores.  We could adapt our process to use proxy, the biggest hurdle
being what you have accomplished ("*work with the system admins to split
the backup*") and make managing the restores a function of the University
Computer Center (where I work) vs. everyone doing their own thing.  Until
we get over this reconfiguration effort, we won't be able to move forwards
on the clients since that would immediately kill the webclient.

So, do I understand correctly, each of your 144 target nodes "
-asnodename=DATANODE”
is a Windows VM?  If so, what specs are you using for each VM?

On Mon, Jul 16, 2018 at 11:15 AM Robert Talda  wrote:

> Zoltan:
>   I wish I could give you more details about the NAS/storage device
> connections, but either a) I’m not privy to that information; or b) I know
> it only as the SAN fabric.  That is, our largest backups are from systems
> in our server farm that are part of the same SAN fabric as both the system
> running the SP client doing the backups AND the system hosting the TSM
> server.  There is a 10 GB pipe connecting the two physical systems but that
> hasn’t ever been the bottleneck.  And the system running the SP client is a
> VM as well.
>
>   Our bigger challenge was filesystems or shares with lots of files.  This
> is where the proxy node strategy came into play.  We were able to work with
> the system admins to split the backup of the those filesystems into many
> smaller (in terms of number of files) backups that started deeper in the
> filesystem.  That is, instead of running a backup against
> \\rams\som\TSM\FC\*
> We would have one backup running through PROXY.NODE1 for
> \\rams\som\TSM\FC\dir1\*
> While another was running through PROXY.NODE2 for
> \\rams\som\TSM\FC\dir2\*
> And so on and so forth.
>
> We did this using a set of client schedules that used the “objects” option
> to specify the directory in question:
>
> Def sched DOMAIN PROXY.NODE1.HOUR01 action=incr options=“-subir=yes
> -asnodename=DATANODE” -objects=‘“\\rams\som\TSM\RC\dir1\” startt=01:00
> dur=1 duru=hour
>
> Where DATANODE is the target for agent PROXY.NODE1.
>
> Currently, we are running up to 144 backups (6 Proxy nodes, 24 hourly
> backups) for our largest devices.
>
> HTH,
> Bob
>
> On Jul 16, 2018, at 8:29 AM, Zoltan Forray  zfor...@vcu.edu>> wrote:
>
> Robert,
>
> Thanks for the extensive details.  You backup 5-nodes with as more data
> then we do for 90-nodes.  So, my question is - what kind of connections do
> you have to your NAS/storage device to process that much data in such a
> short period of time?
>
> I am not sure what benefit a proxy-node would do for us, other than to
> manage multiple nodes from one connection/GUI - or am I totally off base on
> this?
>
> Our current configuration is such:
>
> 7-Windows 2016 VM's (adding more to spread out the load)
> Each of these 7-VM's handle the backups for 5-30 nodes.  Each node is a
> mountpoint for an user/department ISILON DFS mount -
> i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\*
> etc.  FWIW, the reason we are
> using VM's is the connection is actually faster then when we were using
> physical servers since they only had gigabit nics.
>
> Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM
> with only 4-other nodes, it still took 4-days to scan and backup 102GB of
> 32TB.  Below are a recent end-of-session statistics (current backup started
> Friday and is still running)
>
> 07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects inspected:   20,276,912  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects backed up:   26,787  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects updated: 31  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects rebound:  0  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects deleted:  0  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
> Total number of objects expired: 20,630  (SESSION: 21423)
> 07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISI

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-16 Thread Robert Talda

Zoltan:
  I wish I could give you more details about the NAS/storage device 
connections, but either a) I’m not privy to that information; or b) I know it 
only as the SAN fabric.  That is, our largest backups are from systems in our 
server farm that are part of the same SAN fabric as both the system running the 
SP client doing the backups AND the system hosting the TSM server.  There is a 
10 GB pipe connecting the two physical systems but that hasn’t ever been the 
bottleneck.  And the system running the SP client is a VM as well.

  Our bigger challenge was filesystems or shares with lots of files.  This is 
where the proxy node strategy came into play.  We were able to work with the 
system admins to split the backup of the those filesystems into many smaller 
(in terms of number of files) backups that started deeper in the filesystem.  
That is, instead of running a backup against
\\rams\som\TSM\FC\*
We would have one backup running through PROXY.NODE1 for
\\rams\som\TSM\FC\dir1\*
While another was running through PROXY.NODE2 for
\\rams\som\TSM\FC\dir2\*
And so on and so forth.

We did this using a set of client schedules that used the “objects” option to 
specify the directory in question:

Def sched DOMAIN PROXY.NODE1.HOUR01 action=incr options=“-subir=yes 
-asnodename=DATANODE” -objects=‘“\\rams\som\TSM\RC\dir1\” startt=01:00 dur=1 
duru=hour

Where DATANODE is the target for agent PROXY.NODE1.

Currently, we are running up to 144 backups (6 Proxy nodes, 24 hourly backups) 
for our largest devices.

HTH,
Bob

On Jul 16, 2018, at 8:29 AM, Zoltan Forray 
mailto:zfor...@vcu.edu>> wrote:

Robert,

Thanks for the extensive details.  You backup 5-nodes with as more data
then we do for 90-nodes.  So, my question is - what kind of connections do
you have to your NAS/storage device to process that much data in such a
short period of time?

I am not sure what benefit a proxy-node would do for us, other than to
manage multiple nodes from one connection/GUI - or am I totally off base on
this?

Our current configuration is such:

7-Windows 2016 VM's (adding more to spread out the load)
Each of these 7-VM's handle the backups for 5-30 nodes.  Each node is a
mountpoint for an user/department ISILON DFS mount -
i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* etc.  
FWIW, the reason we are
using VM's is the connection is actually faster then when we were using
physical servers since they only had gigabit nics.

Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM
with only 4-other nodes, it still took 4-days to scan and backup 102GB of
32TB.  Below are a recent end-of-session statistics (current backup started
Friday and is still running)

07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects inspected:   20,276,912  (SESSION: 21423)
07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects backed up:   26,787  (SESSION: 21423)
07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects updated: 31  (SESSION: 21423)
07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects rebound:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects deleted:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects expired: 20,630  (SESSION: 21423)
07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects failed:  36  (SESSION: 21423)
07/09/2018 02:00:06 ANE4197I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects encrypted:0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4965I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of subfile objects:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4914I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects grew: 0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4916I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of retries:124  (SESSION: 21423)
07/09/2018 02:00:06 ANE4977I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes inspected:  31.75 TB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4961I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes transferred:   101.90 GB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4963I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Data transfer time:  115.78 sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4966I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Network data transfer rate:  922,800.00 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4967I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Aggregate data transfer rate:271.46 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4968I (Session: 21423, Node

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-16 Thread Zoltan Forray

Robert,

Thanks for the extensive details.  You backup 5-nodes with as more data
then we do for 90-nodes.  So, my question is - what kind of connections do
you have to your NAS/storage device to process that much data in such a
short period of time?

I am not sure what benefit a proxy-node would do for us, other than to
manage multiple nodes from one connection/GUI - or am I totally off base on
this?

Our current configuration is such:

7-Windows 2016 VM's (adding more to spread out the load)
Each of these 7-VM's handle the backups for 5-30 nodes.  Each node is a
mountpoint for an user/department ISILON DFS mount -
i.e. \\rams\som\TSM\FC\*, \\rams\som\TSM\UR\* etc.  FWIW, the reason we are
using VM's is the connection is actually faster then when we were using
physical servers since they only had gigabit nics.

Even when we moved the biggest ISILON node (20,000,000+ files) to a new VM
with only 4-other nodes, it still took 4-days to scan and backup 102GB of
32TB.  Below are a recent end-of-session statistics (current backup started
Friday and is still running)

07/09/2018 02:00:06 ANE4952I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects inspected:   20,276,912  (SESSION: 21423)
07/09/2018 02:00:06 ANE4954I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects backed up:   26,787  (SESSION: 21423)
07/09/2018 02:00:06 ANE4958I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects updated: 31  (SESSION: 21423)
07/09/2018 02:00:06 ANE4960I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects rebound:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4957I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects deleted:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4970I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects expired: 20,630  (SESSION: 21423)
07/09/2018 02:00:06 ANE4959I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects failed:  36  (SESSION: 21423)
07/09/2018 02:00:06 ANE4197I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects encrypted:0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4965I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of subfile objects:  0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4914I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of objects grew: 0  (SESSION: 21423)
07/09/2018 02:00:06 ANE4916I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of retries:124  (SESSION: 21423)
07/09/2018 02:00:06 ANE4977I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes inspected:  31.75 TB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4961I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total number of bytes transferred:   101.90 GB  (SESSION: 21423)
07/09/2018 02:00:06 ANE4963I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Data transfer time:  115.78 sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4966I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Network data transfer rate:  922,800.00 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4967I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Aggregate data transfer rate:271.46 KB/sec  (SESSION: 21423)
07/09/2018 02:00:06 ANE4968I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Objects compressed by:   30%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4976I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Total data reduction ratio:   99.69%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4969I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Subfile objects reduced by:   0%   (SESSION: 21423)
07/09/2018 02:00:06 ANE4964I (Session: 21423, Node: ISILON-SOM-SOMADFS2)
Elapsed processing time:  109:19:48  (SESSION: 21423)


Even when we m

On Sun, Jul 15, 2018 at 7:30 PM Robert Talda  wrote:

> Zoltan:
>  Finally get a chance to answer you.  I :think: I understand what you are
> getting at…
>
>  First, some numbers - recalling that each of these nodes is one storage
> device:
> Node1: 358,000,000+ files totalling 430 TB of primary occupied space
> Node2: 302,000,000+ files totaling 82 TB of primary occupied space
> Node3: 79,000,000+ files totaling 75 TB of primary occupied space
> Node4: 1,000,000+ files totalling 75 TB of primary occupied space
> Node5: 17,000,000+ files totalling 42 TB of  primary occupied space
>   There are more, but I think this answers your initial question.
>
>  Restore requests are handled by the local system admin or, for lack of a
> better description, data admin.  (Basically, the research area has a person
> dedicated to all the various data issues related to research grants, from
> including proper verbiage in grant requests to making sure the necessary
> protections are in place).
>
>   We try to make it as simple as we can, because we do concentrate all the
> data in one node per

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-15 Thread Robert Talda

Zoltan:
 Finally get a chance to answer you.  I :think: I understand what you are 
getting at…

 First, some numbers - recalling that each of these nodes is one storage device:
Node1: 358,000,000+ files totalling 430 TB of primary occupied space
Node2: 302,000,000+ files totaling 82 TB of primary occupied space
Node3: 79,000,000+ files totaling 75 TB of primary occupied space
Node4: 1,000,000+ files totalling 75 TB of primary occupied space
Node5: 17,000,000+ files totalling 42 TB of  primary occupied space
  There are more, but I think this answers your initial question.

 Restore requests are handled by the local system admin or, for lack of a 
better description, data admin.  (Basically, the research area has a person 
dedicated to all the various data issues related to research grants, from 
including proper verbiage in grant requests to making sure the necessary 
protections are in place). 

  We try to make it as simple as we can, because we do concentrate all the data 
in one node per storage device (usually a NAS).  So restores are usually done 
directly from the node - while all backups are done through proxies.  
Generally, the restores are done without permissions so that the appropriate 
permissions can be applied to the restored data.  (Oft times, the data is 
restored so a different user or set of users can work with it, so the original 
permissions aren’t useful)

  There are some exceptions - of course, as we work at universities, there are 
always exceptions - and these we handle as best we can by providing proxy nodes 
with restricted priviledges.

  Let me know if I can provide more,
Bob

Robert Talda
EZ-Backup Systems Engineer
Cornell University
+1 607-255-8280
r...@cornell.edu

> On Jul 11, 2018, at 3:59 PM, Zoltan Forray  wrote:
> 
> Robert,
> 
> Thanks for the insight/suggestions.  Your scenario is similar to ours but
> on a larger scale when it comes to the amount of data/files to process,
> thus the issue (assuming such since you didn't list numbers).  Currently we
> have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest
> (our troublemaker) has over 21M objects and 26TB of data (this is the one
> that takes 4-5 days).  dsminstr.log from a recently finished run shows it
> only backed up 15K objects.
> 
> We agree that this and other similarly larger nodes need to be broken up
> into smaller/less objects to backup per node.  But the owner of this large
> one is balking since previously this was backed up via a solitary Windows
> server using Journaling so everything finished in a day.
> 
> We have never dealt with proxy nodes but might need to head in that
> direction since our current method of allowing users to perform their own
> restores relies on the now deprecated Web Client.  Our current method is
> numerous Windows VM servers with 20-30 nodes defined to each.
> 
> How do you handle restore requests?
> 
> On Wed, Jul 11, 2018 at 2:56 PM Robert Talda  wrote:
>

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-11 Thread Grant Street

Hey Zoltan

Key points for backing up isilon:
1 Each isilon node is limited by it's CPU/protocol rather than Networking 
(other than the new G6 F800's )
2 To increase throughput to/from isilon increase the number isilon nodes you 
access via your clients
3 To increase the isilon nodes you access you can either mount the storage 
multiple times from the same client using a different IP, or use TSM proxies.
4 Increase resource utilisation to 10 (max) to increase parallelisation
5 Increase the Max num mount points to be bigger than the number of client 
machines X the resource utilization X the number of SP clients you run per 
client machine. This ensures each session is actively working and not waiting 
for a mount point.
6 Size your Disk storage pool files so that you can have at least 2 X max num 
of mount points. This is so that should you fill your disk storage pool you do 
not have lock contention between migration and backup. Ideally you should have 
enough disk pool storage to do a single run .

We have a setup where we need to do archives of up to 50 TB a day and do this 
using over 24 dsmc's running across 6 Client vm's with a resource utilisation 
of 10.

HTH

Grant

From: ADSM: Dist Stor Manager  on behalf of Zoltan Forray 

Sent: Thursday, 12 July 2018 5:59 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

Robert,

Thanks for the insight/suggestions.  Your scenario is similar to ours but
on a larger scale when it comes to the amount of data/files to process,
thus the issue (assuming such since you didn't list numbers).  Currently we
have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest
(our troublemaker) has over 21M objects and 26TB of data (this is the one
that takes 4-5 days).  dsminstr.log from a recently finished run shows it
only backed up 15K objects.

We agree that this and other similarly larger nodes need to be broken up
into smaller/less objects to backup per node.  But the owner of this large
one is balking since previously this was backed up via a solitary Windows
server using Journaling so everything finished in a day.

We have never dealt with proxy nodes but might need to head in that
direction since our current method of allowing users to perform their own
restores relies on the now deprecated Web Client.  Our current method is
numerous Windows VM servers with 20-30 nodes defined to each.

How do you handle restore requests?

On Wed, Jul 11, 2018 at 2:56 PM Robert Talda  wrote:

> Zoltan, et al:
>   :IF: I understand the scenario you outline originally, here at Cornell
> we are using two different approaches in backing up large storage arrays.
>
> 1. For backups of CIFS shares in our Shared File Share service hosted on a
> NetApp device,  we rely on a set of Powershell scripts to build a list of
> shares to backup, then invoke up to 5 SP clients at a time, each client
> backing up a share.  As such, we are able to backup some 200+ shares on a
> daily basis.  I’m not sure this is a good match to your problem...
>
> 2. For backups of a large Dell array containing research data that does
> seem to be a good match, I have defined a set of 10 proxy nodes and 240
> hourly schedules (once each hour for each proxy node) that allows us to
> divide the Dell array up into 240 pieces - pieces that are controlled by
> the specification of the “objects” in the schedule.  That is, in your case,
> instead of associating node  to the schedule
> ISILON-SOM-SOMDFS1 with object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*”,
> I would instead have something like
> Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\
> rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRA\*”
> Node PROXY2.ISILON associated to PROXY1.ISILON.HOUR1 for object " \\
> rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRB\*”
> …
> Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR2 for object " \\
> rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRB\SUBDIRA\*”
>
> And so on.   For known large directories, slots of multiple hours are
> allocated, up to the largest directory which is given its own proxy node
> with one schedule, and hence 24 hours to back up.
>
> There are pros and cons to both of these, but they do enable us to perform
> the backups.
>
> FWIW,
> Bob
>
> Robert Talda
> EZ-Backup Systems Engineer
> Cornell University
> +1 607-255-8280
> r...@cornell.edu
>
>
> > On Jul 11, 2018, at 7:49 AM, Zoltan Forray  wrote:
> >
> > I will need to translate to English but I gather it is talking about the
> > RESOURCEUTILZATION / MAXNUMMP values.  While we have increased MAXNUMMP
> to
> > 5 on the server (will try going higher),  not sure how much good it would
> > do since the ba

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-11 Thread Zoltan Forray

> > On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas  >
> > wrote:
> >
> >> It is possible to da a parallel backup of file system parts.
> >> https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german)
> >> have a
> >> look on page 10.
> >>
> >> ---
> >> Jonas Jansen
> >>
> >> IT Center
> >> Gruppe: Server & Storage
> >> Abteilung: Systeme & Betrieb
> >> RWTH Aachen University
> >> Seffenter Weg 23
> >> 52074 Aachen
> >> Tel: +49 241 80-28784
> >> Fax: +49 241 80-22134
> >> jan...@itc.rwth-aachen.de
> >> www.itc.rwth-aachen.de
> >>
> >> -Original Message-
> >> From: ADSM: Dist Stor Manager  On Behalf Of Del
> >> Hoobler
> >> Sent: Monday, July 9, 2018 3:29 PM
> >> To: ADSM-L@VM.MARIST.EDU
> >> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups
> >> not
> >> completing in 24-hours
> >>
> >> They are a 3rd-party partner that offers an integrated Spectrum Protect
> >> solution for large filer backups.
> >>
> >>
> >> Del
> >>
> >> 
> >>
> >> "ADSM: Dist Stor Manager"  wrote on 07/09/2018
> >> 09:17:06 AM:
> >>
> >>> From: Zoltan Forray 
> >>> To: ADSM-L@VM.MARIST.EDU
> >>> Date: 07/09/2018 09:17 AM
> >>> Subject: Re: Looking for suggestions to deal with large backups not
> >>> completing in 24-hours
> >>> Sent by: "ADSM: Dist Stor Manager" 
> >>>
> >>> Thanks Del.  Very interesting.  Are they a VAR for IBM?
> >>>
> >>> Not sure if it would work in the current configuration we are using to
> >> back
> >>> up ISILON. I have passed the info on.
> >>>
> >>> BTW, FWIW, when I copied/pasted the info, Chrome spell-checker
> >> red-flagged
> >>> on "The easy way to incrementally backup billons of objects"
> (billions).
> >>> So if you know anybody at the company, please pass it on to them.
> >>>
> >>> On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:
> >>>
> >>>> Another possible idea is to look at General Storage dsmISI MAGS:
> >>>>
> >>>>INVALID URI REMOVED
> >>>
> >>
> >>
> u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia
> >> SHvJObTbx-
> >>>
> >>
> >>
> siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75
> >> lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv-
> >>> ach7r-VHXaLNVD_E&e=
> >>>>
> >>>>
> >>>> Del
> >>>>
> >>>>
> >>>> "ADSM: Dist Stor Manager"  wrote on 07/05/2018
> >>>> 02:52:27 PM:
> >>>>
> >>>>> From: Zoltan Forray 
> >>>>> To: ADSM-L@VM.MARIST.EDU
> >>>>> Date: 07/05/2018 02:53 PM
> >>>>> Subject: Looking for suggestions to deal with large backups not
> >>>>> completing in 24-hours
> >>>>> Sent by: "ADSM: Dist Stor Manager" 
> >>>>>
> >>>>> As I have mentioned in the past, we have gone through large
> >> migrations
> >>>> to
> >>>>> DFS based storage on EMC ISILON hardware.  As you may recall, we
> >> backup
> >>>>> these DFS mounts (about 90 at last count) using multiple Windows
> >> servers
> >>>>> that run multiple ISP nodes (about 30-each) and they access each DFS
> >>>>> mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> >>>>>
> >>>>> This has lead to lots of performance issue with backups and some
> >>>>> departments are now complain that their backups are running into
> >>>>> multiple-days in some cases.
> >>>>>
> >>>>> One such case in a department with 2-nodes with over 30-million
> >> objects
> >>>> for
> >>>>> each node.  In the past, their backups were able to finish quicker
> >> since
> >>>>> they were accessed via dedicated servers and were able to use
> >> Journaling
> >>>> to
> >>>>> reduce the scan times

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-11 Thread Robert Talda

Zoltan, et al:
  :IF: I understand the scenario you outline originally, here at Cornell we are 
using two different approaches in backing up large storage arrays.

1. For backups of CIFS shares in our Shared File Share service hosted on a 
NetApp device,  we rely on a set of Powershell scripts to build a list of 
shares to backup, then invoke up to 5 SP clients at a time, each client backing 
up a share.  As such, we are able to backup some 200+ shares on a daily basis.  
I’m not sure this is a good match to your problem...

2. For backups of a large Dell array containing research data that does seem to 
be a good match, I have defined a set of 10 proxy nodes and 240 hourly 
schedules (once each hour for each proxy node) that allows us to divide the 
Dell array up into 240 pieces - pieces that are controlled by the specification 
of the “objects” in the schedule.  That is, in your case, instead of 
associating node  to the schedule ISILON-SOM-SOMDFS1 with 
object " \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*”, I would instead have something 
like
Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR1 for object " 
\\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRA\*”
Node PROXY2.ISILON associated to PROXY1.ISILON.HOUR1 for object " 
\\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRA\SUBDIRB\*”
…
Node PROXY1.ISILON associated to PROXY1.ISILON.HOUR2 for object " 
\\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\DIRB\SUBDIRA\*”

And so on.   For known large directories, slots of multiple hours are 
allocated, up to the largest directory which is given its own proxy node with 
one schedule, and hence 24 hours to back up.

There are pros and cons to both of these, but they do enable us to perform the 
backups.

FWIW,
Bob

Robert Talda
EZ-Backup Systems Engineer
Cornell University
+1 607-255-8280
r...@cornell.edu

> On Jul 11, 2018, at 7:49 AM, Zoltan Forray  wrote:
> 
> I will need to translate to English but I gather it is talking about the
> RESOURCEUTILZATION / MAXNUMMP values.  While we have increased MAXNUMMP to
> 5 on the server (will try going higher),  not sure how much good it would
> do since the backup schedule uses OBJECTS to point to a specific/single
> mountpoint/filesystem (see below) but is worth trying to bump the
> RESOURCEUTILIZATION value on the client even higher...
> 
> We have checked the dsminstr.log file and it is spending 92% of the time in
> PROCESS DIRS (no surprise)
> 
> 7:46:25 AM   SUN : q schedule * ISILON-SOM-SOMADFS1 f=d
>Policy Domain Name: DFS
> Schedule Name: ISILON-SOM-SOMADFS1
>   Description: ISILON-SOM-SOMADFS1
>Action: Incremental
> Subaction:
>   Options: -subdir=yes
>   Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*
>  Priority: 5
>   Start Date/Time: 12/05/2017 08:30:00
>  Duration: 1 Hour(s)
>Maximum Run Time (Minutes): 0
>Schedule Style: Enhanced
>Period:
>   Day of Week: Any
> Month: Any
>  Day of Month: Any
> Week of Month: Any
>Expiration:
> Last Update by (administrator): ZFORRAY
> Last Update Date/Time: 01/12/2018 10:30:48
>  Managing profile:
> 
> 
> On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas 
> wrote:
> 
>> It is possible to da a parallel backup of file system parts.
>> https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german)
>> have a
>> look on page 10.
>> 
>> ---
>> Jonas Jansen
>> 
>> IT Center
>> Gruppe: Server & Storage
>> Abteilung: Systeme & Betrieb
>> RWTH Aachen University
>> Seffenter Weg 23
>> 52074 Aachen
>> Tel: +49 241 80-28784
>> Fax: +49 241 80-22134
>> jan...@itc.rwth-aachen.de
>> www.itc.rwth-aachen.de
>> 
>> -Original Message-
>> From: ADSM: Dist Stor Manager  On Behalf Of Del
>> Hoobler
>> Sent: Monday, July 9, 2018 3:29 PM
>> To: ADSM-L@VM.MARIST.EDU
>> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups
>> not
>> completing in 24-hours
>> 
>> They are a 3rd-party partner that offers an integrated Spectrum Protect
>> solution for large filer backups.
>> 
>> 
>> Del
>> 
>> 
>> 
>> "ADSM: Dist Stor Manager"  wrote on 07/09/2018
>> 09:17:06 AM:
>> 
>>> From: Zoltan Forray 
>>> To: ADSM-L@VM.MARIST.EDU
>>> Date: 07/09/2018 09:17 AM
>>> Subject: Re: Looking for suggestions to deal with large backups not
>>> completing in

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-11 Thread Zoltan Forray

I will need to translate to English but I gather it is talking about the
RESOURCEUTILZATION / MAXNUMMP values.  While we have increased MAXNUMMP to
5 on the server (will try going higher),  not sure how much good it would
do since the backup schedule uses OBJECTS to point to a specific/single
mountpoint/filesystem (see below) but is worth trying to bump the
RESOURCEUTILIZATION value on the client even higher...

We have checked the dsminstr.log file and it is spending 92% of the time in
PROCESS DIRS (no surprise)

7:46:25 AM   SUN : q schedule * ISILON-SOM-SOMADFS1 f=d
Policy Domain Name: DFS
 Schedule Name: ISILON-SOM-SOMADFS1
   Description: ISILON-SOM-SOMADFS1
Action: Incremental
 Subaction:
   Options: -subdir=yes
   Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\*
  Priority: 5
   Start Date/Time: 12/05/2017 08:30:00
  Duration: 1 Hour(s)
Maximum Run Time (Minutes): 0
Schedule Style: Enhanced
Period:
   Day of Week: Any
 Month: Any
  Day of Month: Any
 Week of Month: Any
Expiration:
Last Update by (administrator): ZFORRAY
 Last Update Date/Time: 01/12/2018 10:30:48
  Managing profile:


On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas 
wrote:

> It is possible to da a parallel backup of file system parts.
> https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german)
> have a
> look on page 10.
>
> ---
> Jonas Jansen
>
> IT Center
> Gruppe: Server & Storage
> Abteilung: Systeme & Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-28784
> Fax: +49 241 80-22134
> jan...@itc.rwth-aachen.de
> www.itc.rwth-aachen.de
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Del
> Hoobler
> Sent: Monday, July 9, 2018 3:29 PM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups
> not
> completing in 24-hours
>
> They are a 3rd-party partner that offers an integrated Spectrum Protect
> solution for large filer backups.
>
>
> Del
>
> 
>
> "ADSM: Dist Stor Manager"  wrote on 07/09/2018
> 09:17:06 AM:
>
> > From: Zoltan Forray 
> > To: ADSM-L@VM.MARIST.EDU
> > Date: 07/09/2018 09:17 AM
> > Subject: Re: Looking for suggestions to deal with large backups not
> > completing in 24-hours
> > Sent by: "ADSM: Dist Stor Manager" 
> >
> > Thanks Del.  Very interesting.  Are they a VAR for IBM?
> >
> > Not sure if it would work in the current configuration we are using to
> back
> > up ISILON. I have passed the info on.
> >
> > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker
> red-flagged
> > on "The easy way to incrementally backup billons of objects" (billions).
> > So if you know anybody at the company, please pass it on to them.
> >
> > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:
> >
> > > Another possible idea is to look at General Storage dsmISI MAGS:
> > >
> > > INVALID URI REMOVED
> >
>
> u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia
> SHvJObTbx-
> >
>
> siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75
> lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv-
> > ach7r-VHXaLNVD_E&e=
> > >
> > >
> > > Del
> > >
> > >
> > > "ADSM: Dist Stor Manager"  wrote on 07/05/2018
> > > 02:52:27 PM:
> > >
> > > > From: Zoltan Forray 
> > > > To: ADSM-L@VM.MARIST.EDU
> > > > Date: 07/05/2018 02:53 PM
> > > > Subject: Looking for suggestions to deal with large backups not
> > > > completing in 24-hours
> > > > Sent by: "ADSM: Dist Stor Manager" 
> > > >
> > > > As I have mentioned in the past, we have gone through large
> migrations
> > > to
> > > > DFS based storage on EMC ISILON hardware.  As you may recall, we
> backup
> > > > these DFS mounts (about 90 at last count) using multiple Windows
> servers
> > > > that run multiple ISP nodes (about 30-each) and they access each DFS
> > > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> > > >
> > > > This has lead to lots of performance issue with backups and some
> &

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-10 Thread Jansen, Jonas

It is possible to da a parallel backup of file system parts.
https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) have a
look on page 10.

---
Jonas Jansen

IT Center
Gruppe: Server & Storage
Abteilung: Systeme & Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-28784
Fax: +49 241 80-22134
jan...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Del
Hoobler
Sent: Monday, July 9, 2018 3:29 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not
completing in 24-hours

They are a 3rd-party partner that offers an integrated Spectrum Protect
solution for large filer backups.


Del



"ADSM: Dist Stor Manager"  wrote on 07/09/2018
09:17:06 AM:

> From: Zoltan Forray 
> To: ADSM-L@VM.MARIST.EDU
> Date: 07/09/2018 09:17 AM
> Subject: Re: Looking for suggestions to deal with large backups not
> completing in 24-hours
> Sent by: "ADSM: Dist Stor Manager" 
>
> Thanks Del.  Very interesting.  Are they a VAR for IBM?
>
> Not sure if it would work in the current configuration we are using to
back
> up ISILON. I have passed the info on.
>
> BTW, FWIW, when I copied/pasted the info, Chrome spell-checker
red-flagged
> on "The easy way to incrementally backup billons of objects" (billions).
> So if you know anybody at the company, please pass it on to them.
>
> On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:
>
> > Another possible idea is to look at General Storage dsmISI MAGS:
> >
> > INVALID URI REMOVED
>
u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_ia
SHvJObTbx-
>
siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75
lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv-
> ach7r-VHXaLNVD_E&e=
> >
> >
> > Del
> >
> >
> > "ADSM: Dist Stor Manager"  wrote on 07/05/2018
> > 02:52:27 PM:
> >
> > > From: Zoltan Forray 
> > > To: ADSM-L@VM.MARIST.EDU
> > > Date: 07/05/2018 02:53 PM
> > > Subject: Looking for suggestions to deal with large backups not
> > > completing in 24-hours
> > > Sent by: "ADSM: Dist Stor Manager" 
> > >
> > > As I have mentioned in the past, we have gone through large
migrations
> > to
> > > DFS based storage on EMC ISILON hardware.  As you may recall, we
backup
> > > these DFS mounts (about 90 at last count) using multiple Windows
servers
> > > that run multiple ISP nodes (about 30-each) and they access each DFS
> > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> > >
> > > This has lead to lots of performance issue with backups and some
> > > departments are now complain that their backups are running into
> > > multiple-days in some cases.
> > >
> > > One such case in a department with 2-nodes with over 30-million
objects
> > for
> > > each node.  In the past, their backups were able to finish quicker
since
> > > they were accessed via dedicated servers and were able to use
Journaling
> > to
> > > reduce the scan times.  Unless things have changed, I believe
Journling
> > is
> > > not an option due to how the files are accessed.
> > >
> > > FWIW, average backups are usually <50k files and <200GB once it
finished
> > > scanning.
> > >
> > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since
> > many
> > > of these objects haven't been accessed in many years old. But as I
> > > understand it, that won't work either given our current
configuration.
> > >
> > > Given the current DFS configuration (previously CIFS), what can we
do to
> > > improve backup performance?
> > >
> > > So, any-and-all ideas are up for discussion.  There is even
discussion
> > on
> > > replacing ISP/TSM due to these issues/limitations.
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zfor...@vcu.edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations
will
> > > never use email to request that you reply with your password, social
> > > security number or confidential per

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-09 Thread Del Hoobler

They are a 3rd-party partner that offers an integrated Spectrum Protect 
solution for large filer backups.


Del



"ADSM: Dist Stor Manager"  wrote on 07/09/2018 
09:17:06 AM:

> From: Zoltan Forray 
> To: ADSM-L@VM.MARIST.EDU
> Date: 07/09/2018 09:17 AM
> Subject: Re: Looking for suggestions to deal with large backups not 
> completing in 24-hours
> Sent by: "ADSM: Dist Stor Manager" 
> 
> Thanks Del.  Very interesting.  Are they a VAR for IBM?
> 
> Not sure if it would work in the current configuration we are using to 
back
> up ISILON. I have passed the info on.
> 
> BTW, FWIW, when I copied/pasted the info, Chrome spell-checker 
red-flagged
> on "The easy way to incrementally backup billons of objects" (billions).
> So if you know anybody at the company, please pass it on to them.
> 
> On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:
> 
> > Another possible idea is to look at General Storage dsmISI MAGS:
> >
> > INVALID URI REMOVED
> 
u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv-
> ach7r-VHXaLNVD_E&e=
> >
> >
> > Del
> >
> >
> > "ADSM: Dist Stor Manager"  wrote on 07/05/2018
> > 02:52:27 PM:
> >
> > > From: Zoltan Forray 
> > > To: ADSM-L@VM.MARIST.EDU
> > > Date: 07/05/2018 02:53 PM
> > > Subject: Looking for suggestions to deal with large backups not
> > > completing in 24-hours
> > > Sent by: "ADSM: Dist Stor Manager" 
> > >
> > > As I have mentioned in the past, we have gone through large 
migrations
> > to
> > > DFS based storage on EMC ISILON hardware.  As you may recall, we 
backup
> > > these DFS mounts (about 90 at last count) using multiple Windows 
servers
> > > that run multiple ISP nodes (about 30-each) and they access each DFS
> > > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> > >
> > > This has lead to lots of performance issue with backups and some
> > > departments are now complain that their backups are running into
> > > multiple-days in some cases.
> > >
> > > One such case in a department with 2-nodes with over 30-million 
objects
> > for
> > > each node.  In the past, their backups were able to finish quicker 
since
> > > they were accessed via dedicated servers and were able to use 
Journaling
> > to
> > > reduce the scan times.  Unless things have changed, I believe 
Journling
> > is
> > > not an option due to how the files are accessed.
> > >
> > > FWIW, average backups are usually <50k files and <200GB once it 
finished
> > > scanning.
> > >
> > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since
> > many
> > > of these objects haven't been accessed in many years old. But as I
> > > understand it, that won't work either given our current 
configuration.
> > >
> > > Given the current DFS configuration (previously CIFS), what can we 
do to
> > > improve backup performance?
> > >
> > > So, any-and-all ideas are up for discussion.  There is even 
discussion
> > on
> > > replacing ISP/TSM due to these issues/limitations.
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zfor...@vcu.edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations 
will
> > > never use email to request that you reply with your password, social
> > > security number or confidential personal information. For more 
details
> > > visit INVALID URI REMOVED
> > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3-
> > > a432oKYronO-w1z-
> > > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e=
> > >
> >
> 
> 
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zfor...@vcu.edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit INVALID URI REMOVED
> u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1HFyHU75lwUZLmc_kYAQxroVCZQUCSs&s=umTd28h-
> GlxqSvNShsNIqm8D1PcanVk0HPcP5KTurKw&e=
>

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-09 Thread Zoltan Forray

Thanks Del.  Very interesting.  Are they a VAR for IBM?

Not sure if it would work in the current configuration we are using to back
up ISILON. I have passed the info on.

BTW, FWIW, when I copied/pasted the info, Chrome spell-checker red-flagged
on "The easy way to incrementally backup billons of objects" (billions).
So if you know anybody at the company, please pass it on to them.

On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler  wrote:

> Another possible idea is to look at General Storage dsmISI MAGS:
>
> http://www.general-storage.com/PRODUCTS/products.html
>
>
> Del
>
>
> "ADSM: Dist Stor Manager"  wrote on 07/05/2018
> 02:52:27 PM:
>
> > From: Zoltan Forray 
> > To: ADSM-L@VM.MARIST.EDU
> > Date: 07/05/2018 02:53 PM
> > Subject: Looking for suggestions to deal with large backups not
> > completing in 24-hours
> > Sent by: "ADSM: Dist Stor Manager" 
> >
> > As I have mentioned in the past, we have gone through large migrations
> to
> > DFS based storage on EMC ISILON hardware.  As you may recall, we backup
> > these DFS mounts (about 90 at last count) using multiple Windows servers
> > that run multiple ISP nodes (about 30-each) and they access each DFS
> > mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> >
> > This has lead to lots of performance issue with backups and some
> > departments are now complain that their backups are running into
> > multiple-days in some cases.
> >
> > One such case in a department with 2-nodes with over 30-million objects
> for
> > each node.  In the past, their backups were able to finish quicker since
> > they were accessed via dedicated servers and were able to use Journaling
> to
> > reduce the scan times.  Unless things have changed, I believe Journling
> is
> > not an option due to how the files are accessed.
> >
> > FWIW, average backups are usually <50k files and <200GB once it finished
> > scanning.
> >
> > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since
> many
> > of these objects haven't been accessed in many years old. But as I
> > understand it, that won't work either given our current configuration.
> >
> > Given the current DFS configuration (previously CIFS), what can we do to
> > improve backup performance?
> >
> > So, any-and-all ideas are up for discussion.  There is even discussion
> on
> > replacing ISP/TSM due to these issues/limitations.
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zfor...@vcu.edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit INVALID URI REMOVED
> > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3-
> > a432oKYronO-w1z-
> > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e=
> >
>


--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-09 Thread Del Hoobler

Another possible idea is to look at General Storage dsmISI MAGS:

http://www.general-storage.com/PRODUCTS/products.html


Del


"ADSM: Dist Stor Manager"  wrote on 07/05/2018 
02:52:27 PM:

> From: Zoltan Forray 
> To: ADSM-L@VM.MARIST.EDU
> Date: 07/05/2018 02:53 PM
> Subject: Looking for suggestions to deal with large backups not 
> completing in 24-hours
> Sent by: "ADSM: Dist Stor Manager" 
> 
> As I have mentioned in the past, we have gone through large migrations 
to
> DFS based storage on EMC ISILON hardware.  As you may recall, we backup
> these DFS mounts (about 90 at last count) using multiple Windows servers
> that run multiple ISP nodes (about 30-each) and they access each DFS
> mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
> 
> This has lead to lots of performance issue with backups and some
> departments are now complain that their backups are running into
> multiple-days in some cases.
> 
> One such case in a department with 2-nodes with over 30-million objects 
for
> each node.  In the past, their backups were able to finish quicker since
> they were accessed via dedicated servers and were able to use Journaling 
to
> reduce the scan times.  Unless things have changed, I believe Journling 
is
> not an option due to how the files are accessed.
> 
> FWIW, average backups are usually <50k files and <200GB once it finished
> scanning.
> 
> Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since 
many
> of these objects haven't been accessed in many years old. But as I
> understand it, that won't work either given our current configuration.
> 
> Given the current DFS configuration (previously CIFS), what can we do to
> improve backup performance?
> 
> So, any-and-all ideas are up for discussion.  There is even discussion 
on
> replacing ISP/TSM due to these issues/limitations.
> 
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zfor...@vcu.edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit INVALID URI REMOVED
> u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY3-
> a432oKYronO-w1z-
> ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA&e=
>

Re: [EXTERNAL] Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-06 Thread Rhodes, Richard L.

A couple years ago we decided to replace dozens and dozens of big
Windows servers with a centralize Isilon NAS.  The Windows servers, 
being tons of little files, were an ongoing pain to backup with TSM.  
Our decision was to NOT backup the Isilon to TSM or any other external 
program.  Instead, we decided to use snapshots and replication to a DR 
Isilon.  In other words, we made a conscience decision to stop using
TSM to backup this data when we moved to Isilon. We took the opportunity
to standardize backup policies to a single snapshot retention
of just 32 days to help keep the snapshot disk space down.  
Other than watching free disk space and a periodic check of
replication and snapshots, backup of this data is out of sight
and out of mind. 


Rick


 



-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Zoltan Forray
Sent: Thursday, July 5, 2018 2:52 PM
To: ADSM-L@VM.MARIST.EDU
Subject: [EXTERNAL] Looking for suggestions to deal with large backups not 
completing in 24-hours

As I have mentioned in the past, we have gone through large migrations to
DFS based storage on EMC ISILON hardware.  As you may recall, we backup
these DFS mounts (about 90 at last count) using multiple Windows servers
that run multiple ISP nodes (about 30-each) and they access each DFS
mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.

This has lead to lots of performance issue with backups and some
departments are now complain that their backups are running into
multiple-days in some cases.

One such case in a department with 2-nodes with over 30-million objects for
each node.  In the past, their backups were able to finish quicker since
they were accessed via dedicated servers and were able to use Journaling to
reduce the scan times.  Unless things have changed, I believe Journling is
not an option due to how the files are accessed.

FWIW, average backups are usually <50k files and <200GB once it finished
scanning.

Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many
of these objects haven't been accessed in many years old. But as I
understand it, that won't work either given our current configuration.

Given the current DFS configuration (previously CIFS), what can we do to
improve backup performance?

So, any-and-all ideas are up for discussion.  There is even discussion on
replacing ISP/TSM due to these issues/limitations.

--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/
--

The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-05 Thread Skylar Thompson

We've implemented file count quotas in addition to our existing byte
quotas to try to avoid this situation. You can improve some things
(metadata on SSDs, maybe get an accelerator node if Isilon still offers
those) but the fact is that metadata is expensive in terms of CPU (both
client and server) and disk.

We chose 1 million objects/TB of allocated disk space. We sort of compete
with a storage system offered by our central IT organization, and picked a
limit higher than what they would provide.

To be honest, though, we're retiring our Isilon systems because the
performance/scalability/cost ratios just aren't as great as they used to
be. Our new storage is GPFS and mmbackup works much better with huge number
of files, though it's still not great. In particular, the filelist
generation is based around UNIX sort which is definitely a memory pig,
though it can be split across multiple systems so can scale out pretty
well.

On Thu, Jul 05, 2018 at 02:52:27PM -0400, Zoltan Forray wrote:
> As I have mentioned in the past, we have gone through large migrations to
> DFS based storage on EMC ISILON hardware.  As you may recall, we backup
> these DFS mounts (about 90 at last count) using multiple Windows servers
> that run multiple ISP nodes (about 30-each) and they access each DFS
> mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.
>
> This has lead to lots of performance issue with backups and some
> departments are now complain that their backups are running into
> multiple-days in some cases.
>
> One such case in a department with 2-nodes with over 30-million objects for
> each node.  In the past, their backups were able to finish quicker since
> they were accessed via dedicated servers and were able to use Journaling to
> reduce the scan times.  Unless things have changed, I believe Journling is
> not an option due to how the files are accessed.
>
> FWIW, average backups are usually <50k files and <200GB once it finished
> scanning.
>
> Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many
> of these objects haven't been accessed in many years old. But as I
> understand it, that won't work either given our current configuration.
>
> Given the current DFS configuration (previously CIFS), what can we do to
> improve backup performance?
>
> So, any-and-all ideas are up for discussion.  There is even discussion on
> replacing ISP/TSM due to these issues/limitations.
>
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zfor...@vcu.edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://phishing.vcu.edu/

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-05 Thread Harris, Steven

Zoltan

I kind of agree with Ung Yi

What is the purpose of your TSM  backups?  DR?  Long term retention for 
auditability/sarbox/other regulation?

It may well be that a daily or even more frequent snapshot regime might be the 
best way to get back that recently lost/deleted/corrupted file.
Use a TSM backup of a weekly point-of-consistency snapshot as your long term 
strategy.

Of course a better option would be an embedded TSM client on the Isilon itself, 
but the commercial realities are that will never happen.

Cheers

Steve
Steven Harris
TSM Admin
Canberra Australia 

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Yi, Ung
Sent: Friday, 6 July 2018 6:36 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Looking for suggestions to deal with large backups not 
completing in 24-hours

Hello,
I don’t know much about Isilon.
There might be SAN level snap backups option for Isilon.

For our Data domain, we replicate from Main site to DR site, then take snap at 
our DR site every night. Each snap is consider a backup.

Thank you.

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Zoltan 
Forray
Sent: Thursday, July 05, 2018 2:52 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Looking for suggestions to deal with large backups not completing in 
24-hours

As I have mentioned in the past, we have gone through large migrations to DFS 
based storage on EMC ISILON hardware.  As you may recall, we backup these DFS 
mounts (about 90 at last count) using multiple Windows servers that run 
multiple ISP nodes (about 30-each) and they access each DFS mount/filesystem 
via -object=\\rams.adp.vcu.edu\departmentname.

This has lead to lots of performance issue with backups and some departments 
are now complain that their backups are running into multiple-days in some 
cases.

One such case in a department with 2-nodes with over 30-million objects for 
each node.  In the past, their backups were able to finish quicker since they 
were accessed via dedicated servers and were able to use Journaling to reduce 
the scan times.  Unless things have changed, I believe Journling is not an 
option due to how the files are accessed.

FWIW, average backups are usually <50k files and <200GB once it finished 
scanning.

Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many of 
these objects haven't been accessed in many years old. But as I understand it, 
that won't work either given our current configuration.

Given the current DFS configuration (previously CIFS), what can we do to 
improve backup performance?

So, any-and-all ideas are up for discussion.  There is even discussion on 
replacing ISP/TSM due to these issues/limitations.

--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor 
Administrator VMware Administrator Virginia Commonwealth University UCC/Office 
of Technology Services 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ucc.vcu.edu&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=KsUuBwu8G3pWJ7R7hedi0ZISk0CjIRrWQMJneyjNxD4&e=
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will never 
use email to request that you reply with your password, social security number 
or confidential personal information. For more details visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=xiPt_TkUv02i1b7VQfKybQwokZGIKegAHQtBFG_G19U&e=

This message and any attachment is confidential and may be privileged or 
otherwise protected from disclosure. You should immediately delete the message 
if you are not the intended recipient. If you have received this email by 
mistake please delete it from your system; you should not copy the message or 
disclose its content to anyone. 

This electronic communication may contain general financial product advice but 
should not be relied upon or construed as a recommendation of any financial 
product. The information has been prepared without taking into account your 
objectives, financial situation or needs. You should consider the Product 
Disclosure Statement relating to the financial product and consult your 
financial adviser before making a decision about whether to acquire, hold or 
dispose of a financial product. 

For further details on the financial product please go to http://www.bt.com.au 

Past performance is not a reliable indicator of future performance.

Re: Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-05 Thread Yi, Ung

Hello,
I don’t know much about Isilon.
There might be SAN level snap backups option for Isilon.

For our Data domain, we replicate from Main site to DR site, then take snap at 
our DR site every night. Each snap is consider a backup.

Thank you.

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Zoltan 
Forray
Sent: Thursday, July 05, 2018 2:52 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Looking for suggestions to deal with large backups not completing in 
24-hours

As I have mentioned in the past, we have gone through large migrations to
DFS based storage on EMC ISILON hardware.  As you may recall, we backup
these DFS mounts (about 90 at last count) using multiple Windows servers
that run multiple ISP nodes (about 30-each) and they access each DFS
mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.

This has lead to lots of performance issue with backups and some
departments are now complain that their backups are running into
multiple-days in some cases.

One such case in a department with 2-nodes with over 30-million objects for
each node.  In the past, their backups were able to finish quicker since
they were accessed via dedicated servers and were able to use Journaling to
reduce the scan times.  Unless things have changed, I believe Journling is
not an option due to how the files are accessed.

FWIW, average backups are usually <50k files and <200GB once it finished
scanning.

Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many
of these objects haven't been accessed in many years old. But as I
understand it, that won't work either given our current configuration.

Given the current DFS configuration (previously CIFS), what can we do to
improve backup performance?

So, any-and-all ideas are up for discussion.  There is even discussion on
replacing ISP/TSM due to these issues/limitations.

--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ucc.vcu.edu&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=KsUuBwu8G3pWJ7R7hedi0ZISk0CjIRrWQMJneyjNxD4&e=
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=p7OfdQbObZllF-mnDqjrQg&m=jX8fSV2xXtioczSetX1viQa6EzVNOcZlBX9ddwWGXRM&s=xiPt_TkUv02i1b7VQfKybQwokZGIKegAHQtBFG_G19U&e=

Looking for suggestions to deal with large backups not completing in 24-hours

2018-07-05 Thread Zoltan Forray

As I have mentioned in the past, we have gone through large migrations to
DFS based storage on EMC ISILON hardware.  As you may recall, we backup
these DFS mounts (about 90 at last count) using multiple Windows servers
that run multiple ISP nodes (about 30-each) and they access each DFS
mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname.

This has lead to lots of performance issue with backups and some
departments are now complain that their backups are running into
multiple-days in some cases.

One such case in a department with 2-nodes with over 30-million objects for
each node.  In the past, their backups were able to finish quicker since
they were accessed via dedicated servers and were able to use Journaling to
reduce the scan times.  Unless things have changed, I believe Journling is
not an option due to how the files are accessed.

FWIW, average backups are usually <50k files and <200GB once it finished
scanning.

Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head since many
of these objects haven't been accessed in many years old. But as I
understand it, that won't work either given our current configuration.

Given the current DFS configuration (previously CIFS), what can we do to
improve backup performance?

So, any-and-all ideas are up for discussion.  There is even discussion on
replacing ISP/TSM due to these issues/limitations.

--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zfor...@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

Re: Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

Looking for suggestions to deal with large backups not completing in 24-hours: the GWDG solution briefly explained

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: [EXTERNAL] Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Looking for suggestions to deal with large backups not completing in 24-hours

29 matches

Site Navigation

Mail list logo

Footer information