Re: Re. War stories: Restores 200GB ?

2002-03-18 Thread Remco Post

Maybe the best idee I heard untill now, If you're planning a major upgrade, 
switch on collocation on your storagepool first and then run a selective 
(full) backup of your system, this will bring the restore time down to the 
time it takes to mount about 8 tapes and read them

(and switch collaocation off again :)


-- 
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing  Tel. +31 20 592 8008Fax. +31 20 668 3167

I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end. -- Douglas Adams



Re: Re. War stories: Restores/200GB ?

2002-03-17 Thread Daniel Sparrman
HiWhen reaching volumes over 200GB for a server, you need to do find ways to minimize the amount of data that has to be restored in case of a disaster.If this is a fileserver, the most efficent way to speed up the restore time of the whole server, would be to implement HSM.Normally, 40-60% of data on a fileserver is older than 60 days. Therefore, moving this information to tape, but letting the users still see the files using for example Explorer, would result in only having to restore 40-60% of data in case of a disaster(the only information that has to be restored is the data that has not been moved by the HSM client.) The HSM client migrates data transparent to the users. This means that even if the information is migrated, the user will still see the information as if it were on disk. Only difference is that it will take a few more seconds to open the file.This means that a restore that normally would take 6-8 hours, would only take 3-4 hours.Optimizing the performance of the client would probably save you an hour. However, if this is a critical machine, 1 hour isn't good enough. Therefore, implementing HSM is the simplest and most efficient way to secure the restore time of the server.Best RegardsDaniel Sparrman---DanielSparrmanExistiStockholmABBergkällavägen31D19279SOLLENTUNAVäxel:08-7549800Mobil:070-3992751-"ADSM: Dist Stor Manager" [EMAIL PROTECTED] wrote: -To: [EMAIL PROTECTED]From: "Don France (P.A.C.E.)" [EMAIL PROTECTED]Sent by: "ADSM: Dist Stor Manager" [EMAIL PROTECTED]Date: 03/17/2002 12:39AMSubject: Re. War stories: Restores/200GB ?There are several keys to speed in restoring a large number files with TSM; they are:  1.. If using WindowsNT/2000 or AIX, be sure to use DIRMC, storing primary pool on disk, migrate to FILE on disk, then copy-pool both (this avoids tape mounts for the directories not stored in TSM db due to ACL's);  I've seen *two* centralized ways to implement DIRMC -- (1) using client-option-set, or (2) establish the DIRMC management class as the one with the longest retention (in each affected policy domain);   2.. Restore the directories first, using -DIRSONLY (this minimizes NTFS db-insert thrashing);   3.. Consider multiple, parallel restores of high-level directories -- despite potential contention for tapes in common, you want to keep the data flowing on at least one session to maximize restore speed;   4.. Consider using CLASSIC restore, rather than no-query restore -- this will minimize tape mounts, as classic restore analyzes which files to request and has the server sort the tapes needed -- though tape mounts may not be an issue with your high-performance configuration;   5.. If you must use RAID-5, realize that you will spend TWO write cycles for every write;  if using EMC RAID-S (or ESS), you may want to increase write-cache to as large as allowed (or turn it off, altogether).  Using 9 or 15 physical disks will help.A client of mine just had a server disk failure last weekend;  it had local disk configured with RAID-5 (hardware RAID controller attached to Dell-Win2000 server) -- after addressing items 1 to 3, above, we were able to saturate the 100Mbps network, achieving 10-15 GB/Hr for the entire restore -- only delays incurred were attributable to tape mounts... this customer had an over-committed silo, so tapes not in silo had to be checked-in on demand.  316 GB restored in approx. 30 hours.  Their data was stored under 10 high-level directories, so we ran two restore sessions in parallel -- only had two tape drives -- and disabled other client schedules during this exercise.For your situation, 250 GB and millions of files, and assuming DIRMC (item #1, above), you should be able to see 5 - 10 GB/Hr -- 50 hours at 5 GB/Hr, 25 hours at 10 GB/Hr.  So you are looking at two or three days, typically.Large numbers of small files is the "Achilles Heal" of any file-based backup/restore operation -- restore is the slowest (since you are fighting with the file system of the client OS) because of the way file systems traverse directories and reorganize branches "on the fly", it's important to minimize the "re-org" processing (in NTFS, by populating the branches with leaves AFTER first creating all the branches). We did some benchmarks and compared notes with IBM;  on another client, we developed the basic expectation that 2-7 GB/Hr was the "standard" for comparison purposes -- you can exceed that number by observing the first 3 recommended configuration items, above.How to mitigate this:  (a) use image backup (now available for Unix, soon to be available on Win2000) in concert with file-level progressive incremental; and (b) limit your file server file systems to either 100 GB or "X" million files, then start a separate file system or server upon reaching that threshold... You need to test for your environment to determine what is the acceptable standard to implement.Hope this helps.Don FranceTechnical Architect - Tivoli Certified 

Re: Re. War stories: Restores 200GB ?

2002-03-17 Thread Seay, Paul

On the ESS the write cache is not optional.  You must use it.  That is where
all the write performance comes from.  It eliminates the RAID-5 Write
penalty and basically changes the writes to RAID-3, no reads before write,
when writing sequentially.

Not sure what you mean by 2 write cycles.  RAID-1 has two write cycles.
RAID-5 has one write cycle to all the drives in the stripe set unless all
the parity is on a single drive, but no one does it that way anymore (EMC
kind of does with RAID-S).  The issue with RAID-5 is the read before write
so that you can recalculate the parity.  On sequential write of full
internal 32K blocks (6 or 7 depending on the disk group) you do not need to
read back because you are going to write the whole stripe.  So, the ESS
calculates the parity in the SSA adapters and writes it to the disks as the
IO occurs.

The rest of the stuff in here is really good stuff.  Though backupsets are
extremely hard to manage for customers with only 3494/3590 libaries because
the client cannot do the restore.  The reality is they run so fast anyway
many of the issues that backupsets eliminate do not necessarily apply if you
use collocation and parallel restores of them.

-Original Message-
From: Don France (P.A.C.E.) [mailto:[EMAIL PROTECTED]]
Sent: Saturday, March 16, 2002 6:40 PM
To: [EMAIL PROTECTED]
Subject: Re. War stories: Restores  200GB ?


There are several keys to speed in restoring a large number files with TSM;
they are:
  1.. If using WindowsNT/2000 or AIX, be sure to use DIRMC, storing primary
pool on disk, migrate to FILE on disk, then copy-pool both (this avoids tape
mounts for the directories not stored in TSM db due to ACL's);
  I've seen *two* centralized ways to implement DIRMC -- (1) using
client-option-set, or (2) establish the DIRMC management class as the one
with the longest retention (in each affected policy domain);
  2.. Restore the directories first, using -DIRSONLY (this minimizes NTFS
db-insert thrashing);
  3.. Consider multiple, parallel restores of high-level directories --
despite potential contention for tapes in common, you want to keep the data
flowing on at least one session to maximize restore speed;
  4.. Consider using CLASSIC restore, rather than no-query restore -- this
will minimize tape mounts, as classic restore analyzes which files to
request and has the server sort the tapes needed -- though tape mounts may
not be an issue with your high-performance configuration;
  5.. If you must use RAID-5, realize that you will spend TWO write cycles
for every write;  if using EMC RAID-S (or ESS), you may want to increase
write-cache to as large as allowed (or turn it off, altogether).  Using 9 or
15 physical disks will help. A client of mine just had a server disk failure
last weekend;  it had local disk configured with RAID-5 (hardware RAID
controller attached to Dell-Win2000 server) -- after addressing items 1 to
3, above, we were able to saturate the 100Mbps network, achieving 10-15
GB/Hr for the entire restore -- only delays incurred were attributable to
tape mounts... this customer had an over-committed silo, so tapes not in
silo had to be checked-in on demand.  316 GB restored in approx. 30 hours.
Their data was stored under 10 high-level directories, so we ran two restore
sessions in parallel -- only had two tape drives -- and disabled other
client schedules during this exercise.

For your situation, 250 GB and millions of files, and assuming DIRMC (item
#1, above), you should be able to see 5 - 10 GB/Hr -- 50 hours at 5 GB/Hr,
25 hours at 10 GB/Hr.  So you are looking at two or three days, typically.

Large numbers of small files is the Achilles Heal of any file-based
backup/restore operation -- restore is the slowest (since you are fighting
with the file system of the client OS) because of the way file systems
traverse directories and reorganize branches on the fly, it's important to
minimize the re-org processing (in NTFS, by populating the branches with
leaves AFTER first creating all the branches). We did some benchmarks and
compared notes with IBM;  on another client, we developed the basic
expectation that 2-7 GB/Hr was the standard for comparison purposes -- you
can exceed that number by observing the first 3 recommended configuration
items, above.

How to mitigate this:  (a) use image backup (now available for Unix, soon to
be available on Win2000) in concert with file-level progressive incremental;
and (b) limit your file server file systems to either 100 GB or X million
files, then start a separate file system or server upon reaching that
threshold... You need to test for your environment to determine what is the
acceptable standard to implement.

Hope this helps.

Don France

Technical Architect - Tivoli Certified Consultant



Professional Association of Contract Employees (P.A.C.E.)
San Jose, CA