Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" - how about a Billion files in 140 seconds?

2016-08-31 Thread Marc A Kaplan
When you write something like "mmbackup takes ages" - that let's us know 
how you feel, kinda. 

But we need some facts and data to make a determination if there is a real 
problem and whether and how it might be improved.

Just to do a "back of the envelope" estimate of how long backup operations 
"ought to" take - we'd need to
know how many disks and/or SSDs with what performance characteristics,
how many nodes withf what performance characteristics,
network "fabric(s)",

Number of files to be scanned,
Average number of files per directory,
GPFS blocksize(s) configured,

Backup devices available with speeds and feeds, etc, etc.

But anyway just to throw ballpark numbers "out there" to give you an idea 
of what is possible.

I can tell you that a 20 months ago Sven and I benchmarked mmapplypolicy 
scanning 983 Million files  in 136 seconds!

The command looked like this:

mmapplypolicy /ibm/fs2-1m-p01/shared/Btt -g /ibm/fs2-1m-p01/tmp -d 7 -A 
256 -a 32 -n 8  -P /ghome/makaplan/sventests/milli.policy -I test -L 1 -N 
fastclients

fastclients was  10 X86_64 commodity nodes

The fs2-1m-p01 file system was hosted on just two IBM GSS nodes and 
everything was on an Infiniband switch.

We packed about 7000 files into each directory  (This admittedly may 
not be typical...)

This is NOT to say you could back up that many files that fast, but 
Spectrum Scale metadata scanning can be fast, even 
with relatively modest hardware resources.

YMMV ;-)

Marc of GPFS

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" (Dominic Mueller-Wicke)

2016-08-31 Thread Lukas Hejtmanek
On Wed, Aug 31, 2016 at 07:52:38AM +0200, Dominic Mueller-Wicke01 wrote:
> Thanks for reading the paper. I agree that the restore of a large number of
> files is a challenge today. The restore is the focus area for future
> enhancements for the integration between IBM Spectrum Scale and IBM
> Spectrum Protect. If something will be available that helps to improve the
> restore capabilities the paper will be updated with this information.

I guess that one of the reasons that restore is slow is because this:
(strace dsmc)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud/atl_en/_referencenotitsig",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud/atl_en",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit",
F_OK) = -1 ENOENT (No such file or directory)
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home",
F_OK) = 0
[pid  9022] 
access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum", F_OK)
= 0

it seems that dsmc tests access again and again up to root for each item in
the file list if I set different location where to place the restored files.

-- 
Lukáš Hejtmánek
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" (Dominic Mueller-Wicke)

2016-08-30 Thread Dominic Mueller-Wicke01
/STXKQY/gpfsclustersfaq.html?view=kc#linuxq


Kevin D. Johnson, MBA, MAFM
Spectrum Computing, Senior Managing Consultant

IBM Certified Deployment Professional - Spectrum Scale V4.1.1
IBM Certified Deployment Professional - Cloud Object Storage V3.8
720.349.6199 - kevin...@us.ibm.com



 - Original message -
 From: Lukas Hejtmanek <xhejt...@ics.muni.cz>
 Sent by: gpfsug-discuss-boun...@spectrumscale.org
 To: gpfsug-discuss@spectrumscale.org
 Cc:
 Subject: [gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8
 Date: Tue, Aug 30, 2016 4:39 PM

 Hello,

 does it work for anyone? As of kernel 2.6.32-642, GPFS 3.5.0 (including
 the
 latest patch 32) does start but does not mount and file system. The
 internal
 mount cmd gets stucked.

 --
 Lukáš Hejtmánek
 ___
 gpfsug-discuss mailing list
 gpfsug-discuss at spectrumscale.org
 http://gpfsug.org/mailman/listinfo/gpfsug-discuss



- Message from mark.berg...@uphs.upenn.edu on Tue, 30 Aug 2016 17:07:21
-0400 -

  To: gpfsug main discussion list   
  <gpfsug-discuss@spectrumscale.org>

 Subject: Re: [gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8   


In the message dated: Tue, 30 Aug 2016 22:39:18 +0200,
The pithy ruminations from Lukas Hejtmanek on
<[gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8> were:
=> Hello,

GPFS 3.5.0.[23..3-0] work for me under [CentOS|ScientificLinux] 6.8,
but at kernel 2.6.32-573 and lower.

I've found kernel bugs in blk_cloned_rq_check_limits() in later kernel
revs that caused multipath errors, resulting in GPFS being unable to
find all NSDs and mount the filesystem.

I am not updating to a newer kernel until I'm certain this is resolved.

I opened a bug with CentOS:

 https://bugs.centos.org/view.php?id=10997

and began an extended discussion with the (RH & SUSE) developers of that
chunk of kernel code. I don't know if an upstream bug has been opened
by RH, but see:

 https://patchwork.kernel.org/patch/9140337/
=>
=> does it work for anyone? As of kernel 2.6.32-642, GPFS 3.5.0 (including
the
=> latest patch 32) does start but does not mount and file system. The
internal
=> mount cmd gets stucked.
=>
=> --
=> Lukáš Hejtmánek


--
Mark Bergman   voice: 215-746-4061

mark.berg...@uphs.upenn.edu  fax: 215-614-0266
http://www.cbica.upenn.edu/
IT Technical Director, Center for Biomedical Image Computing and Analytics
Department of Radiology University of Pennsylvania
  PGP Key: http://www.cbica.upenn.edu/sbia/bergman


- Message from Lukas Hejtmanek <xhejt...@ics.muni.cz> on Wed, 31 Aug
2016 00:02:50 +0200 -

 
  To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>    
 
        
 
 Subject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale 
 
  Data Protection"  
 

 

Hello,

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> Find the paper here:
>
>
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection


thank you for the paper, I appreciate it.

However, I wonder whether it could be extended a little. As it has the
title
Petascale Data Protection, I think that in Peta scale, you have to deal
with
millions (well rather hundreds of millions) of files you store in and this
is
something where TSM does not scale well.

Could you give some hints:

On the backup site:
mmbackup takes ages for:
a) scan (try to scan 500M files even in parallel)
b) backup - what if 10 % of files get changed - backup process can be
blocked
several days as mmbackup cannot run in several instances on the same file
system, so you have to wait until one run of mmbackup finishes. How long
could
it take at petascale?

On the restore site:
how can I restore e.g. 40 millions of file efficiently? dsmc restore
'/path/*'
runs into serious troubles after say 20M files (maybe wrong internal
structures used), however, scanning 1000 more files takes several minutes
resulting the dsmc restore never reaches that 40M files.

using filelists the situation is even worse. I run dsmc restore -filelist
with a filelist consisting of 2.4M files. Running for *two* days without
restoring even a single file. dsmc is consuming 100 % CPU.

So any hints addressing these i

Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Olaf Weiser

there're multiple dependencies , the
performance of MD scan is related toas a rule of thumb... the total amount of IOPS you need to
scan your MD is highly dependent on the metadata blocksize, inode size
(assuming default 4K )   ( and the total number Inodes.. ;-) ) the time it takes to answer these IOs
depends on your backend(s) , and ... .. the parallelism and the node's hardware
resource  and finally the network connectivity (latency, bandwidth)
to give some directions... we even have clusters, using regular
(old and spinning) drives , and 're able to scan > 200 mio files within
< 15 minutes.. From:      
 "Knister, Aaron
S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" <aaron.s.knis...@nasa.gov>To:      
 gpfsug main discussion
list <gpfsug-discuss@spectrumscale.org>, "gpfsug    
   main discussion list" <gpfsug-discuss@spectrumscale.org>Date:      
 08/31/2016 06:01 AMSubject:    
   Re: [gpfsug-discuss]
*New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"Sent by:    
   gpfsug-discuss-boun...@spectrumscale.orgJust want to add on to one of the points
Sven touched on regarding metadata HW. We have a modest SSD infrastructure
for our metadata disks and we can scan 500M inodes in parallel in about
5 hours if my memory serves me right (and I believe we could go faster
if we really wanted to). I think having solid metadata disks (no pun intended)
will really help with scan times. From: Sven OehmeSent: 8/30/16, 7:25 PMTo: gpfsug main discussion listSubject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper
"Petascale Data Protection"so lets start with some simple questions.  when you say mmbackup takes ages, what version of gpfs
code are you running ? how do you execute the mmbackup command ? exact parameters
would be useful . what HW are you using for the metadata disks ? how much capacity (df -h) and how many inodes (df -i)
do you have in the filesystem you try to backup ?svenOn Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek <xhejt...@ics.muni.cz>
wrote:Hello,On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:> Find the paper here:>> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protectionthank you for the paper, I appreciate it.However, I wonder whether it could be extended a little. As it has the
titlePetascale Data Protection, I think that in Peta scale, you have to deal
withmillions (well rather hundreds of millions) of files you store in and this
issomething where TSM does not scale well.Could you give some hints:On the backup site:mmbackup takes ages for:a) scan (try to scan 500M files even in parallel)b) backup - what if 10 % of files get changed - backup process can be blockedseveral days as mmbackup cannot run in several instances on the same filesystem, so you have to wait until one run of mmbackup finishes. How long
couldit take at petascale?On the restore site:how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'runs into serious troubles after say 20M files (maybe wrong internalstructures used), however, scanning 1000 more files takes several minutesresulting the dsmc restore never reaches that 40M files.using filelists the situation is even worse. I run dsmc restore -filelistwith a filelist consisting of 2.4M files. Running for *two* days withoutrestoring even a single file. dsmc is consuming 100 % CPU.So any hints addressing these issues with really large number of files
wouldbe even more appreciated.--Lukáš Hejtmánek___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Just want to add on to one of the points Sven touched on regarding metadata HW. 
We have a modest SSD infrastructure for our metadata disks and we can scan 500M 
inodes in parallel in about 5 hours if my memory serves me right (and I believe 
we could go faster if we really wanted to). I think having solid metadata disks 
(no pun intended) will really help with scan times.


From: Sven Oehme
Sent: 8/30/16, 7:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale 
Data Protection"
so lets start with some simple questions.

when you say mmbackup takes ages, what version of gpfs code are you running ?
how do you execute the mmbackup command ? exact parameters would be useful .
what HW are you using for the metadata disks ?
how much capacity (df -h) and how many inodes (df -i) do you have in the 
filesystem you try to backup ?

sven


On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek 
<xhejt...@ics.muni.cz<mailto:xhejt...@ics.muni.cz>> wrote:
Hello,

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> Find the paper here:
>
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection

thank you for the paper, I appreciate it.

However, I wonder whether it could be extended a little. As it has the title
Petascale Data Protection, I think that in Peta scale, you have to deal with
millions (well rather hundreds of millions) of files you store in and this is
something where TSM does not scale well.

Could you give some hints:

On the backup site:
mmbackup takes ages for:
a) scan (try to scan 500M files even in parallel)
b) backup - what if 10 % of files get changed - backup process can be blocked
several days as mmbackup cannot run in several instances on the same file
system, so you have to wait until one run of mmbackup finishes. How long could
it take at petascale?

On the restore site:
how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'
runs into serious troubles after say 20M files (maybe wrong internal
structures used), however, scanning 1000 more files takes several minutes
resulting the dsmc restore never reaches that 40M files.

using filelists the situation is even worse. I run dsmc restore -filelist
with a filelist consisting of 2.4M files. Running for *two* days without
restoring even a single file. dsmc is consuming 100 % CPU.

So any hints addressing these issues with really large number of files would
be even more appreciated.

--
Lukáš Hejtmánek
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Sven Oehme
so lets start with some simple questions.

when you say mmbackup takes ages, what version of gpfs code are you running
?
how do you execute the mmbackup command ? exact parameters would be useful
.
what HW are you using for the metadata disks ?
how much capacity (df -h) and how many inodes (df -i) do you have in the
filesystem you try to backup ?

sven


On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek 
wrote:

> Hello,
>
> On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> > Find the paper here:
> >
> > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/
> Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection
>
> thank you for the paper, I appreciate it.
>
> However, I wonder whether it could be extended a little. As it has the
> title
> Petascale Data Protection, I think that in Peta scale, you have to deal
> with
> millions (well rather hundreds of millions) of files you store in and this
> is
> something where TSM does not scale well.
>
> Could you give some hints:
>
> On the backup site:
> mmbackup takes ages for:
> a) scan (try to scan 500M files even in parallel)
> b) backup - what if 10 % of files get changed - backup process can be
> blocked
> several days as mmbackup cannot run in several instances on the same file
> system, so you have to wait until one run of mmbackup finishes. How long
> could
> it take at petascale?
>
> On the restore site:
> how can I restore e.g. 40 millions of file efficiently? dsmc restore
> '/path/*'
> runs into serious troubles after say 20M files (maybe wrong internal
> structures used), however, scanning 1000 more files takes several minutes
> resulting the dsmc restore never reaches that 40M files.
>
> using filelists the situation is even worse. I run dsmc restore -filelist
> with a filelist consisting of 2.4M files. Running for *two* days without
> restoring even a single file. dsmc is consuming 100 % CPU.
>
> So any hints addressing these issues with really large number of files
> would
> be even more appreciated.
>
> --
> Lukáš Hejtmánek
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Lukas Hejtmanek
Hello,

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> Find the paper here:
> 
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection

thank you for the paper, I appreciate it. 

However, I wonder whether it could be extended a little. As it has the title
Petascale Data Protection, I think that in Peta scale, you have to deal with
millions (well rather hundreds of millions) of files you store in and this is
something where TSM does not scale well.

Could you give some hints:

On the backup site:
mmbackup takes ages for:
a) scan (try to scan 500M files even in parallel)
b) backup - what if 10 % of files get changed - backup process can be blocked
several days as mmbackup cannot run in several instances on the same file
system, so you have to wait until one run of mmbackup finishes. How long could
it take at petascale?

On the restore site:
how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'
runs into serious troubles after say 20M files (maybe wrong internal
structures used), however, scanning 1000 more files takes several minutes
resulting the dsmc restore never reaches that 40M files. 

using filelists the situation is even worse. I run dsmc restore -filelist
with a filelist consisting of 2.4M files. Running for *two* days without
restoring even a single file. dsmc is consuming 100 % CPU. 

So any hints addressing these issues with really large number of files would
be even more appreciated.

-- 
Lukáš Hejtmánek
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss