Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" - how about a Billion files in 140 seconds?
When you write something like "mmbackup takes ages" - that let's us know how you feel, kinda. But we need some facts and data to make a determination if there is a real problem and whether and how it might be improved. Just to do a "back of the envelope" estimate of how long backup operations "ought to" take - we'd need to know how many disks and/or SSDs with what performance characteristics, how many nodes withf what performance characteristics, network "fabric(s)", Number of files to be scanned, Average number of files per directory, GPFS blocksize(s) configured, Backup devices available with speeds and feeds, etc, etc. But anyway just to throw ballpark numbers "out there" to give you an idea of what is possible. I can tell you that a 20 months ago Sven and I benchmarked mmapplypolicy scanning 983 Million files in 136 seconds! The command looked like this: mmapplypolicy /ibm/fs2-1m-p01/shared/Btt -g /ibm/fs2-1m-p01/tmp -d 7 -A 256 -a 32 -n 8 -P /ghome/makaplan/sventests/milli.policy -I test -L 1 -N fastclients fastclients was 10 X86_64 commodity nodes The fs2-1m-p01 file system was hosted on just two IBM GSS nodes and everything was on an Infiniband switch. We packed about 7000 files into each directory (This admittedly may not be typical...) This is NOT to say you could back up that many files that fast, but Spectrum Scale metadata scanning can be fast, even with relatively modest hardware resources. YMMV ;-) Marc of GPFS ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" (Dominic Mueller-Wicke)
On Wed, Aug 31, 2016 at 07:52:38AM +0200, Dominic Mueller-Wicke01 wrote: > Thanks for reading the paper. I agree that the restore of a large number of > files is a challenge today. The restore is the focus area for future > enhancements for the integration between IBM Spectrum Scale and IBM > Spectrum Protect. If something will be available that helps to improve the > restore capabilities the paper will be updated with this information. I guess that one of the reasons that restore is slow is because this: (strace dsmc) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud/atl_en/_referencenotitsig", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud/atl_en", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases/stud", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases/atlases", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit/atlases", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home/jfeit", F_OK) = -1 ENOENT (No such file or directory) [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum/home", F_OK) = 0 [pid 9022] access("/exports/tape_tape/admin/restored/disk_error/1/VO_metacentrum", F_OK) = 0 it seems that dsmc tests access again and again up to root for each item in the file list if I set different location where to place the restored files. -- Lukáš Hejtmánek ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" (Dominic Mueller-Wicke)
/STXKQY/gpfsclustersfaq.html?view=kc#linuxq Kevin D. Johnson, MBA, MAFM Spectrum Computing, Senior Managing Consultant IBM Certified Deployment Professional - Spectrum Scale V4.1.1 IBM Certified Deployment Professional - Cloud Object Storage V3.8 720.349.6199 - kevin...@us.ibm.com - Original message - From: Lukas Hejtmanek <xhejt...@ics.muni.cz> Sent by: gpfsug-discuss-boun...@spectrumscale.org To: gpfsug-discuss@spectrumscale.org Cc: Subject: [gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8 Date: Tue, Aug 30, 2016 4:39 PM Hello, does it work for anyone? As of kernel 2.6.32-642, GPFS 3.5.0 (including the latest patch 32) does start but does not mount and file system. The internal mount cmd gets stucked. -- Lukáš Hejtmánek ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Message from mark.berg...@uphs.upenn.edu on Tue, 30 Aug 2016 17:07:21 -0400 - To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8 In the message dated: Tue, 30 Aug 2016 22:39:18 +0200, The pithy ruminations from Lukas Hejtmanek on <[gpfsug-discuss] GPFS 3.5.0 on RHEL 6.8> were: => Hello, GPFS 3.5.0.[23..3-0] work for me under [CentOS|ScientificLinux] 6.8, but at kernel 2.6.32-573 and lower. I've found kernel bugs in blk_cloned_rq_check_limits() in later kernel revs that caused multipath errors, resulting in GPFS being unable to find all NSDs and mount the filesystem. I am not updating to a newer kernel until I'm certain this is resolved. I opened a bug with CentOS: https://bugs.centos.org/view.php?id=10997 and began an extended discussion with the (RH & SUSE) developers of that chunk of kernel code. I don't know if an upstream bug has been opened by RH, but see: https://patchwork.kernel.org/patch/9140337/ => => does it work for anyone? As of kernel 2.6.32-642, GPFS 3.5.0 (including the => latest patch 32) does start but does not mount and file system. The internal => mount cmd gets stucked. => => -- => Lukáš Hejtmánek -- Mark Bergman voice: 215-746-4061 mark.berg...@uphs.upenn.edu fax: 215-614-0266 http://www.cbica.upenn.edu/ IT Technical Director, Center for Biomedical Image Computing and Analytics Department of Radiology University of Pennsylvania PGP Key: http://www.cbica.upenn.edu/sbia/bergman - Message from Lukas Hejtmanek <xhejt...@ics.muni.cz> on Wed, 31 Aug 2016 00:02:50 +0200 - To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" Hello, On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote: > Find the paper here: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection thank you for the paper, I appreciate it. However, I wonder whether it could be extended a little. As it has the title Petascale Data Protection, I think that in Peta scale, you have to deal with millions (well rather hundreds of millions) of files you store in and this is something where TSM does not scale well. Could you give some hints: On the backup site: mmbackup takes ages for: a) scan (try to scan 500M files even in parallel) b) backup - what if 10 % of files get changed - backup process can be blocked several days as mmbackup cannot run in several instances on the same file system, so you have to wait until one run of mmbackup finishes. How long could it take at petascale? On the restore site: how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*' runs into serious troubles after say 20M files (maybe wrong internal structures used), however, scanning 1000 more files takes several minutes resulting the dsmc restore never reaches that 40M files. using filelists the situation is even worse. I run dsmc restore -filelist with a filelist consisting of 2.4M files. Running for *two* days without restoring even a single file. dsmc is consuming 100 % CPU. So any hints addressing these i
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"
there're multiple dependencies , the performance of MD scan is related toas a rule of thumb... the total amount of IOPS you need to scan your MD is highly dependent on the metadata blocksize, inode size (assuming default 4K ) ( and the total number Inodes.. ;-) ) the time it takes to answer these IOs depends on your backend(s) , and ... .. the parallelism and the node's hardware resource and finally the network connectivity (latency, bandwidth) to give some directions... we even have clusters, using regular (old and spinning) drives , and 're able to scan > 200 mio files within < 15 minutes.. From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" <aaron.s.knis...@nasa.gov>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>, "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>Date: 08/31/2016 06:01 AMSubject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"Sent by: gpfsug-discuss-boun...@spectrumscale.orgJust want to add on to one of the points Sven touched on regarding metadata HW. We have a modest SSD infrastructure for our metadata disks and we can scan 500M inodes in parallel in about 5 hours if my memory serves me right (and I believe we could go faster if we really wanted to). I think having solid metadata disks (no pun intended) will really help with scan times. From: Sven OehmeSent: 8/30/16, 7:25 PMTo: gpfsug main discussion listSubject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"so lets start with some simple questions. when you say mmbackup takes ages, what version of gpfs code are you running ? how do you execute the mmbackup command ? exact parameters would be useful . what HW are you using for the metadata disks ? how much capacity (df -h) and how many inodes (df -i) do you have in the filesystem you try to backup ?svenOn Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek <xhejt...@ics.muni.cz> wrote:Hello,On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:> Find the paper here:>> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protectionthank you for the paper, I appreciate it.However, I wonder whether it could be extended a little. As it has the titlePetascale Data Protection, I think that in Peta scale, you have to deal withmillions (well rather hundreds of millions) of files you store in and this issomething where TSM does not scale well.Could you give some hints:On the backup site:mmbackup takes ages for:a) scan (try to scan 500M files even in parallel)b) backup - what if 10 % of files get changed - backup process can be blockedseveral days as mmbackup cannot run in several instances on the same filesystem, so you have to wait until one run of mmbackup finishes. How long couldit take at petascale?On the restore site:how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'runs into serious troubles after say 20M files (maybe wrong internalstructures used), however, scanning 1000 more files takes several minutesresulting the dsmc restore never reaches that 40M files.using filelists the situation is even worse. I run dsmc restore -filelistwith a filelist consisting of 2.4M files. Running for *two* days withoutrestoring even a single file. dsmc is consuming 100 % CPU.So any hints addressing these issues with really large number of files wouldbe even more appreciated.--Lukáš Hejtmánek___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"
Just want to add on to one of the points Sven touched on regarding metadata HW. We have a modest SSD infrastructure for our metadata disks and we can scan 500M inodes in parallel in about 5 hours if my memory serves me right (and I believe we could go faster if we really wanted to). I think having solid metadata disks (no pun intended) will really help with scan times. From: Sven Oehme Sent: 8/30/16, 7:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection" so lets start with some simple questions. when you say mmbackup takes ages, what version of gpfs code are you running ? how do you execute the mmbackup command ? exact parameters would be useful . what HW are you using for the metadata disks ? how much capacity (df -h) and how many inodes (df -i) do you have in the filesystem you try to backup ? sven On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek <xhejt...@ics.muni.cz<mailto:xhejt...@ics.muni.cz>> wrote: Hello, On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote: > Find the paper here: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection thank you for the paper, I appreciate it. However, I wonder whether it could be extended a little. As it has the title Petascale Data Protection, I think that in Peta scale, you have to deal with millions (well rather hundreds of millions) of files you store in and this is something where TSM does not scale well. Could you give some hints: On the backup site: mmbackup takes ages for: a) scan (try to scan 500M files even in parallel) b) backup - what if 10 % of files get changed - backup process can be blocked several days as mmbackup cannot run in several instances on the same file system, so you have to wait until one run of mmbackup finishes. How long could it take at petascale? On the restore site: how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*' runs into serious troubles after say 20M files (maybe wrong internal structures used), however, scanning 1000 more files takes several minutes resulting the dsmc restore never reaches that 40M files. using filelists the situation is even worse. I run dsmc restore -filelist with a filelist consisting of 2.4M files. Running for *two* days without restoring even a single file. dsmc is consuming 100 % CPU. So any hints addressing these issues with really large number of files would be even more appreciated. -- Lukáš Hejtmánek ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"
so lets start with some simple questions. when you say mmbackup takes ages, what version of gpfs code are you running ? how do you execute the mmbackup command ? exact parameters would be useful . what HW are you using for the metadata disks ? how much capacity (df -h) and how many inodes (df -i) do you have in the filesystem you try to backup ? sven On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanekwrote: > Hello, > > On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote: > > Find the paper here: > > > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/ > Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection > > thank you for the paper, I appreciate it. > > However, I wonder whether it could be extended a little. As it has the > title > Petascale Data Protection, I think that in Peta scale, you have to deal > with > millions (well rather hundreds of millions) of files you store in and this > is > something where TSM does not scale well. > > Could you give some hints: > > On the backup site: > mmbackup takes ages for: > a) scan (try to scan 500M files even in parallel) > b) backup - what if 10 % of files get changed - backup process can be > blocked > several days as mmbackup cannot run in several instances on the same file > system, so you have to wait until one run of mmbackup finishes. How long > could > it take at petascale? > > On the restore site: > how can I restore e.g. 40 millions of file efficiently? dsmc restore > '/path/*' > runs into serious troubles after say 20M files (maybe wrong internal > structures used), however, scanning 1000 more files takes several minutes > resulting the dsmc restore never reaches that 40M files. > > using filelists the situation is even worse. I run dsmc restore -filelist > with a filelist consisting of 2.4M files. Running for *two* days without > restoring even a single file. dsmc is consuming 100 % CPU. > > So any hints addressing these issues with really large number of files > would > be even more appreciated. > > -- > Lukáš Hejtmánek > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"
Hello, On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote: > Find the paper here: > > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection thank you for the paper, I appreciate it. However, I wonder whether it could be extended a little. As it has the title Petascale Data Protection, I think that in Peta scale, you have to deal with millions (well rather hundreds of millions) of files you store in and this is something where TSM does not scale well. Could you give some hints: On the backup site: mmbackup takes ages for: a) scan (try to scan 500M files even in parallel) b) backup - what if 10 % of files get changed - backup process can be blocked several days as mmbackup cannot run in several instances on the same file system, so you have to wait until one run of mmbackup finishes. How long could it take at petascale? On the restore site: how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*' runs into serious troubles after say 20M files (maybe wrong internal structures used), however, scanning 1000 more files takes several minutes resulting the dsmc restore never reaches that 40M files. using filelists the situation is even worse. I run dsmc restore -filelist with a filelist consisting of 2.4M files. Running for *two* days without restoring even a single file. dsmc is consuming 100 % CPU. So any hints addressing these issues with really large number of files would be even more appreciated. -- Lukáš Hejtmánek ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss