Re: [gpfsug-discuss] Question about inodes incrise

2019-03-06 Thread Ryan Novosielski
They hadn’t asked, but neither is the process of raising the maximum, which 
could be what they’re asking about (might be some momentary performance hit — 
can’t recall, but I don’t believe it’s significant if so).

--

|| \\UTGERS,   |---*O*---
||_// the State | Ryan Novosielski - 
novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

On Mar 6, 2019, at 06:14, Frederick Stock 
mailto:sto...@us.ibm.com>> wrote:

No.  It happens automatically and generally without notice to end users, that 
is they do not see any noticeable pause in operations.  If you are asking the 
question because you are considering pre-allocating all of your inodes I would 
advise you not take that option.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com


- Original message -
From: "Mladen Portak" 
mailto:mladen.por...@hr.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org
To: gpfsug-discuss@spectrumscale.org
Cc:
Subject: [gpfsug-discuss] Question about inodes incrise
Date: Wed, Mar 6, 2019 4:49 AM


Dear.

is it process of increasing inodes disruptiv?

Thank You


Mladen Portak
Lab Service SEE Storage Consultant
mladen.por...@hr.ibm.com
+385 91 6308 293


IBM Hrvatska d.o.o. za proizvodnju i trgovinu
Miramarska 23, 10 000 Zagreb, Hrvatska
Upisan kod Trgovačkog suda u Zagrebu pod br. 080011422
Temeljni kapital: 788,000.00 kuna - uplaćen u cijelosti
Direktor: Željka Tičić
Žiro račun kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69, 1 
Zagreb, Hrvatska
IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] SLURM scripts/policy for data movement into a flash pool?

2019-03-06 Thread Alex Chekholko
Hi,

I have tried this before and I would like to temper your expectations.

If you use a placement policy to allow users to write any files into your
"small" pool (e.g. by directory), they will get E_NOSPC when your small
pool fills up.  And they will be confused because they can't see the pool
configuration, they just see a large filesystem with lots of space.  I
think there may now be an "overflow" policy but it will only work for new
files, not if someone keeps writing into an existing file in an existing
pool.

If you use a migration policy (even based on heat map) it is still a
periodic scheduled data movement and not anything that happens "on the
fly".  Also, "fileheat" only gets updated at some interval anyway.

If you use a migration policy to move data between pools, you may starve
users of I/O which will confuse your users because suddenly things are
slow.  I think there is now a QOS way to throttle your data migration.  I
guess it depends on how much of your disk I/O throughput is not used; if
your disks are already churning, migrations will just slow everything down.

Think of it less like a cache layer and more like two separate storage
locations.  If a bunch of jobs want to read the same files from your big
pool, it's probably faster to just have them read from the big pool
directly rather than have some kind of prologue job to read the data from
the big pool, write it into the small poool, then have the jobs read from
the small pool.

Also, my experience was with pool ratios of like 10%/90%, yours is more
like 2%/98%.  However, mine were with write-heavy workloads (typical
university environment with quickly growing capacity utilization).

Hope these anecdotes help.  Also, it could be that things work a bit
differently now in new versions.

Regards,
Alex


On Wed, Mar 6, 2019 at 3:13 AM Jake Carroll  wrote:

> Hi Scale-folk.
>
> I have an IBM ESS GH14S building block currently configured for my HPC
> workloads.
>
> I've got about 1PB of /scratch filesystem configured in mechanical
> spindles via GNR and about 20TB of SSD/flash sitting in another GNR
> filesystem at the moment. My intention is to destroy that stand-alone flash
> filesystem eventually and use storage pools coupled with GPFS policy to
> warm up workloads into that flash storage:
>
>
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_storagepool.htm
>
> A little dated, but that kind of thing.
>
> Does anyone have any experience in this space in using flash storage
> inside a pool with pre/post flight SLURM scripts to puppeteer GPFS policy
> to warm data up?
>
> I had a few ideas for policy construction around file size, file count,
> file access intensity. Someone mentioned heat map construction and mmdiag
> --iohist to me the other day. Could use some background there.
>
> If anyone has any SLURM specific integration tips for the scheduler or
> pre/post flight bits for SBATCH, it'd be really very much appreciated.
>
> This array really does fly along and surpassed my expectations - but, I
> want to get the most out of it that I can for my users - and I think
> storage pool automation and good file placement management is going to be
> an important part of that.
>
> Thank you.
>
> -jc
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Fw: Question about inodes increase - how to increase non-disruptively - orig question by Mladen Portak on 3/6/19 - 09:46 GMT

2019-03-06 Thread John M Sing




Hi, all, Mladen,

(This is my first post to the GPFSug-discuss list.  I am IBMer, am the IBM
worldwide technical support Evangelist on Spectrum Scale/ESS.  I am based
in Florida.   Apologies if my attachment URL is not permitted or if I did
not reply properly to tie my reply to the original poster -  pls let me
know if there are more instructions or rules for using GPFSug-discuss (I
could not find any such guidelines)).

-

Mladen,

Increasing or changing inodes in a GPFS/Spectrum Scale file system can be
done non-disruptively, within the boundaries of how GPFS / Spectrum Scale
works.

I wrote and delivered the following presentation on this topic back in 2013
in the GPFS V4.1 timeframe.  While older IBM technologies SONAS/V7000
Unified are the reason the preso was written, and the commands shown are
from those now-withdrawn products, the GPFS concepts involved  as far as I
know have not changed, and you can simply use the GPFS/Spectrum Scale
equivalent commands such as mmcrfs, mmcrfileset,  mmchfileset, etc to
allocate, add, or change inodes non-disruptively, within the boundaries of
how GPFS / Spectrum Scale works.   There's lots of diagrams.

Here is a Box link to download this 8.7MB  preso which anyone who has the
link can use to download :

https://ibm.box.com/shared/static/phn9dypcdbzyn2ei6hy2hc79lgmch904.ppt

This should apply to any Spectrum Scale / GPFS file system this is the the
Spectrum Scale V4.x or older format.  I would imagine a file system with
the newer  Scale V5 variable sub-blocks has a modification to the above
schema.  I'd be interested to know what that is and how V5 users should
modify the above diagrams/information.

The PPT is handy because there is animation in Slideshow mode to better
explain (at least in my mind) how GPFS / Spectrum Scale V4.x and older
allocates inodes, and how you extend or under what circumstances you can
change the number of inodes in either a file system or an independent file
set.


This Box link, will expire on Dec 31, 2019.  If you are reading this post
past that date, just email me and I will be happy to reshare the preso with
you.

I wrote this up because I myself needed to remember inode allocation
especially in light of how GPFS independent filesets works, should I ever
need to refer back to it.

Happy to hear feedback on the above preso from all of you out there.
Corrections/comments/update suggestions welcome.


Regards,

John M. Sing
Offering Evangelist, IBM Spectrum Scale, Elastic Storage Server, Spectrum
NAS
Venice, Florida
https://www.linkedin.com/in/johnsing/
jms...@us.ibm.comoffice:  941-492-2998


-

Mladen Portak mladen.portak at hr.ibm.com  wrote on Wed Mar 6 09:49:13 GMT
2019

Dear.

is it process of increasing inodes disruptive?

Thank You


Mladen Portak
Lab Service SEE Storage Consultant
mladen.portak at hr.ibm.com
+385 91 6308 293

IBM Hrvatska d.o.o. za proizvodnju i trgovinu
Miramarska 23, 10 000 Zagreb, Hrvatska
Upisan kod Trgovačkog suda u Zagrebu pod br. 080011422
Temeljni kapital: 788,000.00 kuna - uplaćen u cijelosti
Direktor: Željka Tičić
Žiro račun kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69,
1 Zagreb, Hrvatska
IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Question about inodes increase - how to increase non-disruptively - orig question by Mladen Portak on 3/6/19 - 09:46 GMT

2019-03-06 Thread John M Sing

Upon further thought, it occurs to me that Spectrum Scale V5's introduction
of variable sub-blocks must
by necessity have changed the inode calculation that I describe below.  I
would be interested to know
how exactly in Spectrum Scale V5 formatted file systems, how one may need
to change the information I document below.

I would imagine the pre-V5 file system format probably still uses the inode
allocation schema that I document below.


John Sing
IBM Offering Evangelist, Spectrum Scale, ESS
Venice FL





From:   John M Sing/Tampa/IBM
To: gpfsug-discuss@spectrumscale.org
Date:   03/06/2019 11:23 AM
Subject:Question about inodes increase - how to increase
non-disruptively - orig question by Mladen Portak on 3/6/19 -
09:46 GMT


Hi, all, Mladen,

(This is my first post to the GPFSug-discuss list.  I am IBMer, am the IBM
worldwide technical support Evangelist on Spectrum Scale/ESS.  I am based
in Florida.   Apologies if my attachment is not permitted or if I did not
reply properly to tie my reply to the original poster -  pls let me know if
there are more instructions or rules for using GPFSug-discuss (I could not
find any such guidelines)).

-

Mladen,

Increasing or changing inodes in a GPFS/Spectrum Scale file system can be
done non-disruptively, within the boundaries of how GPFS / Spectrum Scale
works.

I wrote and delivered the following presentation on this topic back in 2013
in the GPFS V4.1 timeframe.  While older IBM technologies SONAS/V7000
Unified are the reason the preso was written, and the commands shown are
from those now-withdrawn products, the GPFS concepts involved  as far as I
know have not changed, and you can simply use the GPFS/Spectrum Scale
equivalent commands such as mmcrfs, mmcrfileset,  mmchfileset, etc to
allocate, add, or change inodes non-disruptively, within the boundaries of
how GPFS / Spectrum Scale works.   There's lots of diagrams.

[attachment
"sDS05_John_Sing_SONAS_V7000_GPFS_Unified_Independent_Filesets_Inode_Planning.ppt"
 deleted by John M Sing/Tampa/IBM]

The PPT is handy because there is animation in Slideshow mode to better
explain (at least in my mind) how GPFS allocates inodes, and how you extend
or under what circumstances you can change the number of inodes in either a
file system or an independent file set.

Here is a Box link to download this 8.7MB  preso, should the attachment not
come thru or be too big for the list.

https://ibm.box.com/shared/static/phn9dypcdbzyn2ei6hy2hc79lgmch904.ppt

This Box link, which anyone who has the link can use to download, will
expire on Dec 31, 2019.  If you are reading this post past that date, just
email me and I will be happy to reshare the preso with you.

I wrote this up because I myself needed to remember inode allocation
especially in light of how GPFS independent filesets works, should I ever
need to refer back to it.

Happy to hear feedback on the above preso from all of you out there.
Corrections/comments/update suggestions welcome.


Regards,

John M. Sing
Offering Evangelist, IBM Spectrum Scale, Elastic Storage Server, Spectrum
NAS
Venice, Florida
https://www.linkedin.com/in/johnsing/
jms...@us.ibm.comoffice:  941-492-2998


-

Mladen Portak mladen.portak at hr.ibm.com  wrote on Wed Mar 6 09:49:13 GMT
2019

Dear.

is it process of increasing inodes disruptive?

Thank You


Mladen Portak
Lab Service SEE Storage Consultant
mladen.portak at hr.ibm.com
+385 91 6308 293

IBM Hrvatska d.o.o. za proizvodnju i trgovinu
Miramarska 23, 10 000 Zagreb, Hrvatska
Upisan kod Trgovačkog suda u Zagrebu pod br. 080011422
Temeljni kapital: 788,000.00 kuna - uplaćen u cijelosti
Direktor: Željka Tičić
Žiro račun kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69,
1 Zagreb, Hrvatska
IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfsug-discuss mmxcp

2019-03-06 Thread Marc A Kaplan
Basically yes. If you can't find the scripts in 4.2 samples... You can 
copy them over from 5.x to the 4.2 system...  Should work except perhaps 
for some of the more esoteric find conditionals...



From:   "Edward Boyd" 
To: gpfsug-discuss@spectrumscale.org
Date:   03/06/2019 10:42 AM
Subject:Re: [gpfsug-discuss] gpfsug-discuss mmxcp
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Curious if this command would be suitable for migration from Scale 4.2 
file system to 5.x file system?  What is lost or left behind?

Edward L. Boyd ( Ed ), Client Technical Specialist
IBM Systems Storage Solutions
US Federal
407-271-9210 Office / Cell / Office / Text
eb...@us.ibm.com email

-gpfsug-discuss-boun...@spectrumscale.org wrote: -
To: gpfsug-discuss@spectrumscale.org
From: gpfsug-discuss-requ...@spectrumscale.org
Sent by: gpfsug-discuss-boun...@spectrumscale.org
Date: 03/06/2019 10:03AM
Subject: gpfsug-discuss Digest, Vol 86, Issue 11

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Migrating billions of files? mmfind ... mmxcp (Marc A Kaplan)


--

Message: 1
Date: Wed, 6 Mar 2019 10:01:57 -0500
From: "Marc A Kaplan" 
To: gpfsug main discussion list 
Cc: gpfsug-discuss-boun...@spectrumscale.org
Subject: Re: [gpfsug-discuss] Migrating billions of files? mmfind ...
mmxcp
Message-ID:
<
of18fdf6d8.c850134f-on852583b5.005243d0-852583b5.00529...@notes.na.collabserv.com
>

Content-Type: text/plain; charset="us-ascii"

mmxcp may be in samples/ilm  if not, perhaps we can put it on an approved 
file sharing service ...


   + mmxcp script, for use with mmfind ... -xargs mmxcp ...
  Which makes parallelized file copy relatively easy and super fast!

Usage: /gh/bin/mmxcp -t target -p strip_count source_pathname1 
source_pathname2 ...

 Run "cp" in a  mmfind ... -xarg ... pipeline, e.g.

  mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight 
DIRECTORY_HASH -xargs mmxcp -t /target -p 2

 Options:
  -t target_path : Copy files to this path.
  -p strip_count : Remove this many directory names from the pathnames of 
the source files.
  -a  : pass -a to cp
  -v  : pass -v to cp




-- next part --
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190306/0361a3dd/attachment.html
>
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190306/0361a3dd/attachment.gif
>

--

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 86, Issue 11
**

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=UpQuMLyiY5RYAlgIz4tU_Ou1f0vzJQeW3YhaTsUNNjg=UG74CyaXta-G7ib_KTNz0_ypCbmqWveCUFnV-oPaDYY=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Follow-up: migrating billions of files

2019-03-06 Thread Stephen Ulmer
In the case where tar -C doesn’t work, you can always use a subshell (I do this 
regularly):

tar -cf . | ssh someguy@otherhost "(cd targetdir; tar -xvf - )"

Only use -v on one end. :)

Also, for parallel work that’s not designed that way, don't underestimate the 
-P option to GNU and BSD xargs! With the amount of stuff to be copied, making 
sure a subjob doesn’t finish right after you go home leaving a slot idle for 
several hours is a medium deal.

In Bob’s case, however, treating it like a DR exercise where users "restore" 
their own files by accessing them (using AFM instead of HSM) is probably the 
most convenient.

-- 
Stephen



> On Mar 6, 2019, at 8:13 AM, Uwe Falke  > wrote:
> 
> Hi, in that case I'd open several tar pipes in parallel, maybe using 
> directories carefully selected, like 
> 
>  tar -c  | ssh   "tar -x"
> 
> I am not quite sure whether "-C /" for tar works here ("tar -C / -x"), but 
> along these lines might be a good efficient method. target_hosts should be 
> all nodes haveing the target file system mounted, and you should start 
> those pipes on the nodes with the source file system. 
> It is best to start with the largest directories, and use some 
> masterscript to start the tar pipes controlled by semaphores  to not 
> overload anything. 
> 
> 
> 
> Mit freundlichen Grüßen / Kind regards
> 
> 
> Dr. Uwe Falke
> 
> IT Specialist
> High Performance Computing Services / Integrated Technology Services / 
> Data Center Services
> ---
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefa...@de.ibm.com 
> ---
> IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
> Thomas Wolter, Sven Schooß
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
> HRB 17122 
> 
> 
> 
> 
> From:   "Oesterlin, Robert"  >
> To: gpfsug main discussion list  >
> Date:   06/03/2019 13:44
> Subject:[gpfsug-discuss] Follow-up: migrating billions of files
> Sent by:gpfsug-discuss-boun...@spectrumscale.org 
> 
> 
> 
> 
> Some of you had questions to my original post. More information:
> 
> Source:
> - Files are straight GPFS/Posix - no extended NFSV4 ACLs
> - A solution that requires $?s to be spent on software (ie, Aspera) isn?t 
> a very viable option
> - Both source and target clusters are in the same DC
> - Source is stand-alone NSD servers (bonded 10g-E) and 8gb FC SAN storage
> - Approx 40 file systems, a few large ones with 300M-400M files each, 
> others smaller
> - no independent file sets
> - migration must pose minimal disruption to existing users
> 
> Target architecture is a small number of file systems (2-3) on ESS with 
> independent filesets
> - Target (ESS) will have multiple 40gb-E links on each NSD server (GS4)
> 
> My current thinking is AFM with a pre-populate of the file space and 
> switch the clients over to have them pull data they need (most of the data 
> is older and less active) and them let AFM populate the rest in the 
> background.
> 
> 
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8=J5RpIj-EzFyU_dM9I4P8SrpHMikte_pn9sbllFcOvyM=fEwDQyDSL7hvOVPbg_n8o_LDz-cLqSI6lQtSzmhaSoI=
>  
> 
> 
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org 
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfsug-discuss mmxcp

2019-03-06 Thread Edward Boyd
 Curious if this command would be suitable for migration from Scale 4.2 file system to 5.x file system?  What is lost or left behind?Edward L. Boyd ( Ed ), Client Technical SpecialistIBM Systems Storage SolutionsUS Federal407-271-9210 Office / Cell / Office / Texteb...@us.ibm.com email-gpfsug-discuss-boun...@spectrumscale.org wrote: -To: gpfsug-discuss@spectrumscale.orgFrom: gpfsug-discuss-requ...@spectrumscale.orgSent by: gpfsug-discuss-boun...@spectrumscale.orgDate: 03/06/2019 10:03AMSubject: gpfsug-discuss Digest, Vol 86, Issue 11Send gpfsug-discuss mailing list submissions togpfsug-discuss@spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visithttp://gpfsug.org/mailman/listinfo/gpfsug-discussor, via email, send a message with subject or body 'help' togpfsug-discuss-requ...@spectrumscale.orgYou can reach the person managing the list atgpfsug-discuss-ow...@spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..."Today's Topics:   1. Re: Migrating billions of files? mmfind ... mmxcp (Marc A Kaplan)--Message: 1Date: Wed, 6 Mar 2019 10:01:57 -0500From: "Marc A Kaplan" <makap...@us.ibm.com>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>Cc: gpfsug-discuss-boun...@spectrumscale.orgSubject: Re: [gpfsug-discuss] Migrating billions of files? mmfind ...mmxcpMessage-ID:<of18fdf6d8.c850134f-on852583b5.005243d0-852583b5.00529...@notes.na.collabserv.com>Content-Type: text/plain; charset="us-ascii"mmxcp may be in samples/ilm  if not, perhaps we can put it on an approved file sharing service ...   + mmxcp script, for use with mmfind ... -xargs mmxcp ...      Which makes parallelized file copy relatively easy and super fast!Usage: /gh/bin/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a  mmfind ... -xarg ... pipeline, e.g.  mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options:  -t target_path : Copy files to this path.  -p strip_count : Remove this many directory names from the pathnames of the source files.  -a  : pass -a to cp  -v  : pass -v to cp-- next part --An HTML attachment was scrubbed...URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190306/0361a3dd/attachment.html>-- next part --A non-text attachment was scrubbed...Name: not availableType: image/gifSize: 21994 bytesDesc: not availableURL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190306/0361a3dd/attachment.gif>--___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discussEnd of gpfsug-discuss Digest, Vol 86, Issue 11**

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Question about inodes incrise

2019-03-06 Thread Sven Oehme
While Fred is right, in most cases you shouldn’t see this, under heavy burst 
create workloads before 5.0.2 you can even trigger out of space errors even you 
have plenty of space in the filesystem (very hard to reproduce so unlikely to 
hit for a normal enduser). to address the issues there have been significant 
enhancements in this area in 5.0.2. prior the changes expansions under heavy 
load many times happened in the foreground (means the application waits for the 
expansion to finish before it proceeds) especially if many nodes create lots of 
files in parallel. Since the changes you now see messages on the filesystem 
manager in its mmfs log when a expansion happens with details including if 
somebody had to wait for it or not. 

 

Sven

 

From:  on behalf of Mladen Portak 

Reply-To: gpfsug main discussion list 
Date: Wednesday, March 6, 2019 at 1:49 AM
To: 
Subject: [gpfsug-discuss] Question about inodes incrise

 

Dear.

is it process of increasing inodes disruptiv?

Thank You


Mladen Portak
Lab Service SEE Storage Consultant
mladen.por...@hr.ibm.com
+385 91 6308 293


IBM Hrvatska d.o.o. za proizvodnju i trgovinu
Miramarska 23, 10 000 Zagreb, Hrvatska
Upisan kod Trgovačkog suda u Zagrebu pod br. 080011422
Temeljni kapital: 788,000.00 kuna - uplaćen u cijelosti
Direktor: Željka Tičić
Žiro račun kod: RAIFFEISENBANK AUSTRIA d.d. Zagreb, Magazinska cesta 69, 1 
Zagreb, Hrvatska
IBAN: HR5424840081100396574 (SWIFT RZBHHR2X); OIB 43331467622
___ gpfsug-discuss mailing list 
gpfsug-discuss at spectrumscale.org 
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files? mmfind ... mmxcp

2019-03-06 Thread Simon Thompson
Last time this was mentioned, it doesn't do ACLs?

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of makap...@us.ibm.com 
[makap...@us.ibm.com]
Sent: 06 March 2019 15:01
To: gpfsug main discussion list
Cc: gpfsug-discuss-boun...@spectrumscale.org
Subject: Re: [gpfsug-discuss] Migrating billions of files? mmfind ... mmxcp

mmxcp may be in samples/ilm  if not, perhaps we can put it on an approved file 
sharing service ...


   + mmxcp script, for use with mmfind ... -xargs mmxcp ...
  Which makes parallelized file copy relatively easy and super fast!

Usage: /gh/bin/mmxcp -t target -p strip_count source_pathname1 source_pathname2 
...

 Run "cp" in a  mmfind ... -xarg ... pipeline, e.g.

  mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight 
DIRECTORY_HASH -xargs mmxcp -t /target -p 2

 Options:
  -t target_path : Copy files to this path.
  -p strip_count : Remove this many directory names from the pathnames of the 
source files.
  -a  : pass -a to cp
  -v  : pass -v to cp


[Marc A Kaplan]
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files? mmfind ... mmxcp

2019-03-06 Thread Marc A Kaplan
mmxcp may be in samples/ilm  if not, perhaps we can put it on an approved 
file sharing service ...


   + mmxcp script, for use with mmfind ... -xargs mmxcp ...
  Which makes parallelized file copy relatively easy and super fast!

Usage: /gh/bin/mmxcp -t target -p strip_count source_pathname1 
source_pathname2 ...

 Run "cp" in a  mmfind ... -xarg ... pipeline, e.g.

  mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight 
DIRECTORY_HASH -xargs mmxcp -t /target -p 2

 Options:
  -t target_path : Copy files to this path.
  -p strip_count : Remove this many directory names from the pathnames of 
the source files.
  -a  : pass -a to cp
  -v  : pass -v to cp




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Memory accounting for processes writing to GPFS

2019-03-06 Thread Tomer Perry
It might be the case that AsynchronousFileChannel is actually doing mmap 
access to the files. Thus, the memory management will be completely 
different with GPFS in compare to local fs.

Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: t...@il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:+1 720 3422758
Israel Tel:  +972 3 9188625
Mobile: +972 52 2554625




From:   Jim Doherty 
To: gpfsug main discussion list 
Date:   06/03/2019 06:59
Subject:Re: [gpfsug-discuss] Memory accounting for processes 
writing to GPFS
Sent by:gpfsug-discuss-boun...@spectrumscale.org



For any process with a large number of threads the VMM size has become an 
imaginary number ever since the glibc change to allocate a heap per 
thread. 
I look to /proc/$pid/status to find the memory used by a proc  RSS + Swap 
+ kernel page tables. 

Jim

On Wednesday, March 6, 2019, 4:25:48 AM EST, Dorigo Alvise (PSI) 
 wrote: 


Hello to everyone,
Here a PSI we're observing something that in principle seems strange (at 
least to me).
We run a Java application writing into disk by mean of a standard 
AsynchronousFileChannel, whose I do not the details.
There are two instances of this application: one runs on a node writing on 
a local drive, the other one runs writing on a GPFS mounted filesystem 
(this node is part of the cluster, no remote-mounting).

What we do see is that in the former the application has a lower sum 
VIRT+RES memory and the OS shows a really big cache usage; in the latter, 
OS's cache is negligible while VIRT+RES is very (even too) high (with VIRT 
very high).

So I wonder what is the difference... Writing into a GPFS mounted 
filesystem, as far as I understand, implies "talking" to the local mmfsd 
daemon which fills up its own pagepool... and then the system will 
asynchronously handle these pages to be written on real pdisk. But why the 
Linux kernel accounts so much memory to the process itself ? And why this 
large amount of memory is much more VIRT than RES ?

thanks in advance,

   Alvise
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=mLPyKeOa1gNDrORvEXBgMw=cm3DTOcac__Y20DdtIZcwEXYG9GqlDxlHFTLeSAUOdE=hxak8mqRwAQuN7BaF-B9gvTQu1PGnCFF8am1GvMu3bI=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Follow-up: migrating billions of files

2019-03-06 Thread Uwe Falke
Hi, in that case I'd open several tar pipes in parallel, maybe using 
directories carefully selected, like 

  tar -c  | ssh   "tar -x"

I am not quite sure whether "-C /" for tar works here ("tar -C / -x"), but 
along these lines might be a good efficient method. target_hosts should be 
all nodes haveing the target file system mounted, and you should start 
those pipes on the nodes with the source file system. 
It is best to start with the largest directories, and use some 
masterscript to start the tar pipes controlled by semaphores  to not 
overload anything. 


 
Mit freundlichen Grüßen / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
---
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefa...@de.ibm.com
---
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 




From:   "Oesterlin, Robert" 
To: gpfsug main discussion list 
Date:   06/03/2019 13:44
Subject:[gpfsug-discuss] Follow-up: migrating billions of files
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Some of you had questions to my original post. More information:
 
Source:
- Files are straight GPFS/Posix - no extended NFSV4 ACLs
- A solution that requires $?s to be spent on software (ie, Aspera) isn?t 
a very viable option
- Both source and target clusters are in the same DC
- Source is stand-alone NSD servers (bonded 10g-E) and 8gb FC SAN storage
- Approx 40 file systems, a few large ones with 300M-400M files each, 
others smaller
- no independent file sets
- migration must pose minimal disruption to existing users
 
Target architecture is a small number of file systems (2-3) on ESS with 
independent filesets
- Target (ESS) will have multiple 40gb-E links on each NSD server (GS4)
 
My current thinking is AFM with a pre-populate of the file space and 
switch the clients over to have them pull data they need (most of the data 
is older and less active) and them let AFM populate the rest in the 
background.
 
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8=J5RpIj-EzFyU_dM9I4P8SrpHMikte_pn9sbllFcOvyM=fEwDQyDSL7hvOVPbg_n8o_LDz-cLqSI6lQtSzmhaSoI=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Memory accounting for processes writing to GPFS

2019-03-06 Thread Jim Doherty
 For any process with a large number of threads the VMM size has become an 
imaginary number ever since the glibc change to allocate a heap per thread. 
I look to /proc/$pid/status to find the memory used by a proc  RSS + Swap + 
kernel page tables.  

Jim

On Wednesday, March 6, 2019, 4:25:48 AM EST, Dorigo Alvise (PSI) 
 wrote:  
 
  #yiv1607149323 P {margin-top:0;margin-bottom:0;}Hello to everyone,Here a PSI 
we're observing something that in principle seems strange (at least to me).We 
run a Java application writing into disk by mean of a standard 
AsynchronousFileChannel, whose I do not the details.There are two instances of 
this application: one runs on a node writing on a local drive, the other one 
runs writing on a GPFS mounted filesystem (this node is part of the cluster, no 
remote-mounting).

What we do see is that in the former the application has a lower sum VIRT+RES 
memory and the OS shows a really big cache usage; in the latter, OS's cache is 
negligible while VIRT+RES is very (even too) high (with VIRT very high).
So I wonder what is the difference... Writing into a GPFS mounted filesystem, 
as far as I understand, implies "talking" to the local mmfsd daemon which fills 
up its own pagepool... and then the system will asynchronously handle these 
pages to be written on real pdisk. But why the Linux kernel accounts so much 
memory to the process itself ? And why this large amount of memory is much more 
VIRT than RES ?
thanks in advance,
   Alvise
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Follow-up: migrating billions of files

2019-03-06 Thread Oesterlin, Robert
Some of you had questions to my original post. More information:

Source:
- Files are straight GPFS/Posix - no extended NFSV4 ACLs
- A solution that requires $’s to be spent on software (ie, Aspera) isn’t a 
very viable option
- Both source and target clusters are in the same DC
- Source is stand-alone NSD servers (bonded 10g-E) and 8gb FC SAN storage
- Approx 40 file systems, a few large ones with 300M-400M files each, others 
smaller
- no independent file sets
- migration must pose minimal disruption to existing users

Target architecture is a small number of file systems (2-3) on ESS with 
independent filesets
- Target (ESS) will have multiple 40gb-E links on each NSD server (GS4)

My current thinking is AFM with a pre-populate of the file space and switch the 
clients over to have them pull data they need (most of the data is older and 
less active) and them let AFM populate the rest in the background.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Andrew Beattie
Yes and its licensed based on the size of your network pipe.
 
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com
 
 
- Original message -From: "Frederick Stock" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug-discuss@spectrumscale.orgCc: gpfsug-discuss@spectrumscale.orgSubject: Re: [gpfsug-discuss] Migrating billions of files?Date: Wed, Mar 6, 2019 9:20 PM 
Does Aspera require a license? 
Fred__Fred Stock | IBM Pittsburgh Lab | 720-430-8821sto...@us.ibm.com
 
 
- Original message -From: "Yaron Daniel" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list Cc:Subject: Re: [gpfsug-discuss] Migrating billions of files?Date: Wed, Mar 6, 2019 4:18 AM HiU can also use today Aspera - which will replicate gpfs extended attr.Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globallyhttp://www.redbooks.ibm.com/redpieces/abstracts/redp5527.html?Open Regards 
  Yaron Daniel 94 Em Ha'Moshavot RdStorage Architect – IL Lab Services (Storage) Petach Tiqva, 49527IBM Global Markets, Systems HW Sales Israel   Phone:+972-3-916-5672  Fax:+972-3-916-5672   Mobile:+972-52-8395593   e-mail:y...@il.ibm.com   IBM Israel     
   From:        Simon Thompson To:        gpfsug main discussion list Date:        03/06/2019 11:08 AMSubject:        Re: [gpfsug-discuss] Migrating billions of files?Sent by:        gpfsug-discuss-boun...@spectrumscale.org
 
AFM doesn’t work well if you have dependent filesets though .. which we did for quota purposes.
 
Simon
 
From:  on behalf of "y...@il.ibm.com" Reply-To: "gpfsug-discuss@spectrumscale.org" Date: Wednesday, 6 March 2019 at 09:01To: "gpfsug-discuss@spectrumscale.org" Subject: Re: [gpfsug-discuss] Migrating billions of files?
 
HiWhat permissions you have ? Do u have only Posix , or also SMB attributes ?If only posix attributes you can do the following:- rsync (which will work on different filesets/directories in parallel.- AFM (but in case you need rollback - it will be problematic) Regards 


  Yaron Daniel 94 Em Ha'Moshavot RdStorage Architect – IL Lab Services (Storage) Petach Tiqva, 49527IBM Global Markets, Systems HW Sales Israel   Phone:+972-3-916-5672  Fax:+972-3-916-5672   Mobile:+972-52-8395593   e-mail:y...@il.ibm.com   IBM Israel    

Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Frederick Stock
Does Aspera require a license? 
Fred__Fred Stock | IBM Pittsburgh Lab | 720-430-8821sto...@us.ibm.com
 
 
- Original message -From: "Yaron Daniel" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list Cc:Subject: Re: [gpfsug-discuss] Migrating billions of files?Date: Wed, Mar 6, 2019 4:18 AM HiU can also use today Aspera - which will replicate gpfs extended attr.Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globallyhttp://www.redbooks.ibm.com/redpieces/abstracts/redp5527.html?Open Regards 
  Yaron Daniel 94 Em Ha'Moshavot RdStorage Architect – IL Lab Services (Storage) Petach Tiqva, 49527IBM Global Markets, Systems HW Sales Israel   Phone:+972-3-916-5672  Fax:+972-3-916-5672   Mobile:+972-52-8395593   e-mail:y...@il.ibm.com   IBM Israel     
   From:        Simon Thompson To:        gpfsug main discussion list Date:        03/06/2019 11:08 AMSubject:        Re: [gpfsug-discuss] Migrating billions of files?Sent by:        gpfsug-discuss-boun...@spectrumscale.org
 
AFM doesn’t work well if you have dependent filesets though .. which we did for quota purposes.
 
Simon
 
From:  on behalf of "y...@il.ibm.com" Reply-To: "gpfsug-discuss@spectrumscale.org" Date: Wednesday, 6 March 2019 at 09:01To: "gpfsug-discuss@spectrumscale.org" Subject: Re: [gpfsug-discuss] Migrating billions of files?
 
HiWhat permissions you have ? Do u have only Posix , or also SMB attributes ?If only posix attributes you can do the following:- rsync (which will work on different filesets/directories in parallel.- AFM (but in case you need rollback - it will be problematic) Regards 


  Yaron Daniel 94 Em Ha'Moshavot RdStorage Architect – IL Lab Services (Storage) Petach Tiqva, 49527IBM Global Markets, Systems HW Sales Israel   Phone:+972-3-916-5672  Fax:+972-3-916-5672   Mobile:+972-52-8395593   e-mail:y...@il.ibm.com   IBM Israel     
 From:        "Oesterlin, Robert" To:        gpfsug main discussion list Date:        03/05/2019 11:57 PMSubject:        [gpfsug-discuss] Migrating billions of files?Sent by:        gpfsug-discuss-boun...@spectrumscale.org

 
I’m looking at migration 3-4 Billion files, maybe 3PB of data between GPFS 

[gpfsug-discuss] SLURM scripts/policy for data movement into a flash pool?

2019-03-06 Thread Jake Carroll
Hi Scale-folk.

I have an IBM ESS GH14S building block currently configured for my HPC 
workloads.

I've got about 1PB of /scratch filesystem configured in mechanical spindles via 
GNR and about 20TB of SSD/flash sitting in another GNR filesystem at the 
moment. My intention is to destroy that stand-alone flash filesystem eventually 
and use storage pools coupled with GPFS policy to warm up workloads into that 
flash storage:

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_storagepool.htm

A little dated, but that kind of thing.

Does anyone have any experience in this space in using flash storage inside a 
pool with pre/post flight SLURM scripts to puppeteer GPFS policy to warm data 
up?

I had a few ideas for policy construction around file size, file count, file 
access intensity. Someone mentioned heat map construction and mmdiag --iohist 
to me the other day. Could use some background there.

If anyone has any SLURM specific integration tips for the scheduler or pre/post 
flight bits for SBATCH, it'd be really very much appreciated.

This array really does fly along and surpassed my expectations - but, I want to 
get the most out of it that I can for my users - and I think storage pool 
automation and good file placement management is going to be an important part 
of that.

Thank you.

-jc
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Chris Schlipalius
Hi Bob, so Simon has hit the nail on the head.

So it’s a challenge, we used dcp with multiple parallel threads per nsd with 
mmdsh  - 2PB and millions of files, it’s worth a test as it does look after 
xattribs, but test it.
See https://github.com/hpc/dcp
Test the preserve:
 -p, --preserve

Preserve the original files' owner, group, permissions (including the setuid 
and setgid bits), time of last modification and time of last access. In case 
duplication of owner or group fails, the setuid and setgid bits are cleared.
---

We migrated between 12K storage FS a few years back.

My colleague also has tested 
https://www.nersc.gov/users/storage-and-file-systems/transferring-data/bbcp/  
or http://www.slac.stanford.edu/~abh/bbcp/ 

It’s excellent I hear with xattribs and recursive small files copy.

I steer clear of rsync, different versions do not preserve xattribs and this is 
a bit of an issue some have found

Regards,
Chris Schlipalius
 
Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey 
Supercomputing Centre (CSIRO)
13 Burvill Court
Kensington  WA  6151
Australia



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Memory accounting for processes writing to GPFS

2019-03-06 Thread Dorigo Alvise (PSI)
Hello to everyone,
Here a PSI we're observing something that in principle seems strange (at least 
to me).
We run a Java application writing into disk by mean of a standard 
AsynchronousFileChannel, whose I do not the details.
There are two instances of this application: one runs on a node writing on a 
local drive, the other one runs writing on a GPFS mounted filesystem (this node 
is part of the cluster, no remote-mounting).

What we do see is that in the former the application has a lower sum VIRT+RES 
memory and the OS shows a really big cache usage; in the latter, OS's cache is 
negligible while VIRT+RES is very (even too) high (with VIRT very high).

So I wonder what is the difference... Writing into a GPFS mounted filesystem, 
as far as I understand, implies "talking" to the local mmfsd daemon which fills 
up its own pagepool... and then the system will asynchronously handle these 
pages to be written on real pdisk. But why the Linux kernel accounts so much 
memory to the process itself ? And why this large amount of memory is much more 
VIRT than RES ?

thanks in advance,

   Alvise
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Yaron Daniel
Hi

U can also use today Aspera - which will replicate gpfs extended attr.

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and 
Sharing Files Globally
http://www.redbooks.ibm.com/redpieces/abstracts/redp5527.html?Open


 
Regards
 


 
 
Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect ? IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel
 
 
 
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
y...@il.ibm.com
 
 
IBM Israel
 
 
 
 

  



From:   Simon Thompson 
To: gpfsug main discussion list 
Date:   03/06/2019 11:08 AM
Subject:Re: [gpfsug-discuss] Migrating billions of files?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



AFM doesn?t work well if you have dependent filesets though .. which we 
did for quota purposes.
 
Simon
 
From:  on behalf of 
"y...@il.ibm.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 

Date: Wednesday, 6 March 2019 at 09:01
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] Migrating billions of files?
 
Hi

What permissions you have ? Do u have only Posix , or also SMB attributes 
?

If only posix attributes you can do the following:

- rsync (which will work on different filesets/directories in parallel.
- AFM (but in case you need rollback - it will be problematic) 

 
Regards
 



 
 
Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect ? IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel
 
 
 
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
y...@il.ibm.com
 
 
IBM Israel
 
 
 
 

  



From:"Oesterlin, Robert" 
To:gpfsug main discussion list 
Date:03/05/2019 11:57 PM
Subject:[gpfsug-discuss] Migrating billions of files?
Sent by:gpfsug-discuss-boun...@spectrumscale.org

 
I?m looking at migration 3-4 Billion files, maybe 3PB of data between GPFS 
clusters. Most of the files are small - 60% 8K or less. Ideally I?d like 
to copy at least 15-20M files per day - ideally 50M.
 
Any thoughts on how achievable this is? Or what to use? Either with AFM, 
mpifileutils, rsync.. other? Many of these files would be in 4k inodes. 
Destination is ESS.
 
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=Bn1XE9uK2a9CZQ8qKnJE3Q=B2e9s5aGSXZvMOkd4ZPk_EIjfTloX7O_ExWsyR0RGP8=wwIfs_8RrX5Z7mGp2Mehj5z7z2yUhr0r-vO7TMyNUeE=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] suggestions for copying one GPFS file system into another

2019-03-06 Thread Yaron Daniel
Hi

U can also use today Aspera - which will replicate gpfs extended attr.

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and 
Sharing Files Globally
http://www.redbooks.ibm.com/redpieces/abstracts/redp5527.html?Open

I used in the past the arsync - used for Sonas - i think this is now the 

 
Regards
 


 
 
Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect ? IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel
 
 
 
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
y...@il.ibm.com
 
 
IBM Israel
 
 
 
 

  



From:   Simon Thompson 
To: gpfsug main discussion list 
Date:   03/05/2019 11:39 PM
Subject:Re: [gpfsug-discuss] suggestions for copying one GPFS file 
system into another
Sent by:gpfsug-discuss-boun...@spectrumscale.org



DDN also have a paid for product for doing moving of data (data flow) We 
found out about it after we did a massive data migration...

I can't comment on it other than being aware of it. Sure your local DDN 
sales person can help.

But if only IBM supported some sort of restripe to new block size, we 
wouldn't have to do this mass migration :-P

Simon 

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Simon Thompson 
[s.j.thomp...@bham.ac.uk]
Sent: 05 March 2019 16:38
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] suggestions forwar copying one GPFS file 
system into another

I wrote a patch to mpifileutils which will copy gpfs attributes, but when 
we played with it with rsync, something was obviously still different 
about the attrs from each, so use with care.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Ratliff, John 
[jdrat...@iu.edu]
Sent: 05 March 2019 16:21
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] suggestions for copying one GPFS file system 
into another

We use a GPFS file system for our computing clusters and we?re working on 
moving to a new SAN.

We originally tried AFM, but it didn?t seem to work very well. We tried to 
do a prefetch on a test policy scan of 100 million files, and after 24 
hours it hadn?t pre-fetched anything. It wasn?t clear what was happening. 
Some smaller tests succeeded, but the NFSv4 ACLs did not seem to be 
transferred.

Since then we started using rsync with the GPFS attrs patch. We have over 
600 million files and 700 TB. I split up the rsync tasks with lists of 
files generated by the policy engine and we transferred the original data 
in about 2 weeks. Now we?re working on final synchronization. I?d like to 
use one of the delete options to remove files that were sync?d earlier and 
then deleted. This can?t be combined with the files-from option, so it?s 
harder to break up the rsync tasks. Some of the directories I?m running 
this against have 30-150 million files each. This can take quite some time 
with a single rsync process.

I?m also wondering if any of my rsync options are unnecessary. I was using 
avHAXS and numeric-ids. I?m thinking the A (acls) and X (xatttrs) might be 
unnecessary with GPFS->GPFS. We?re only using NFSv4 GPFS ACLs. I don?t 
know if GPFS uses any xattrs that rsync would sync or not. Removing those 
two options removed several system calls, which should make it much 
faster, but I want to make sure I?m syncing correctly. Also, it seems 
there is a problem with the GPFS patch on rsync where it will always give 
an error trying to get GPFS attributes on a symlink, which means it 
doesn?t sync any symlinks when using that option. So you can rsync 
symlinks or GPFS attrs, but not both at the same time. This has lead to me 
running two rsyncs, one to get all files and one to get all attributes.

Thanks for any ideas or suggestions.

John Ratliff | Pervasive Technology Institute | UITS | Research Storage ? 
Indiana University | 
https://urldefense.proofpoint.com/v2/url?u=http-3A__pti.iu.edu=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=Bn1XE9uK2a9CZQ8qKnJE3Q=Yz-c0LCo_QGBe4pgbJEr_zzSX4Q1ttDOaHYmcfLln5U=gNzUpbvNUfVteTqZ3zpzpbC4M1lQiopyrIfr46h4Okc=


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=Bn1XE9uK2a9CZQ8qKnJE3Q=Yz-c0LCo_QGBe4pgbJEr_zzSX4Q1ttDOaHYmcfLln5U=pG-g3zRAtaMwcmwoabY4dvuI1j3jbLk-uGHZ6nz6TlU=

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=Bn1XE9uK2a9CZQ8qKnJE3Q=Yz-c0LCo_QGBe4pgbJEr_zzSX4Q1ttDOaHYmcfLln5U=pG-g3zRAtaMwcmwoabY4dvuI1j3jbLk-uGHZ6nz6TlU=







Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Simon Thompson
AFM doesn’t work well if you have dependent filesets though .. which we did for 
quota purposes.

Simon

From:  on behalf of "y...@il.ibm.com" 

Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Wednesday, 6 March 2019 at 09:01
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] Migrating billions of files?

Hi

What permissions you have ? Do u have only Posix , or also SMB attributes ?

If only posix attributes you can do the following:

- rsync (which will work on different filesets/directories in parallel.
- AFM (but in case you need rollback - it will be problematic)


Regards







Yaron Daniel

 94 Em Ha'Moshavot Rd

[cid:_1_0FC36C500FC3669C00318CDBC22583B5]

Storage Architect – IL Lab Services (Storage)

 Petach Tiqva, 49527

IBM Global Markets, Systems HW Sales

 Israel







Phone:

+972-3-916-5672





Fax:

+972-3-916-5672





Mobile:

+972-52-8395593





e-mail:

y...@il.ibm.com





IBM Israel










[IBM Storage Strategy and Solutions v1][IBM Storage Management and Data 
Protection 
v1][cid:_1_0FA0428C0FA03A6C00318CDBC22583B5][cid:_1_0FA044940FA03A6C00318CDBC22583B5]
 
[https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png]
  [FlashSystem A9000/R Foundations] [All Flash Storage Foundations V2]



From:"Oesterlin, Robert" 
To:gpfsug main discussion list 
Date:03/05/2019 11:57 PM
Subject:[gpfsug-discuss] Migrating billions of files?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



I’m looking at migration 3-4 Billion files, maybe 3PB of data between GPFS 
clusters. Most of the files are small - 60% 8K or less. Ideally I’d like to 
copy at least 15-20M files per day - ideally 50M.



Any thoughts on how achievable this is? Or what to use? Either with AFM, 
mpifileutils, rsync.. other? Many of these files would be in 4k inodes. 
Destination is ESS.





Bob Oesterlin

Sr Principal Storage Engineer, Nuance

 ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrating billions of files?

2019-03-06 Thread Yaron Daniel
Hi

What permissions you have ? Do u have only Posix , or also SMB attributes 
?

If only posix attributes you can do the following:

- rsync (which will work on different filesets/directories in parallel.
- AFM (but in case you need rollback - it will be problematic) 

 
Regards
 


 
 
Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect ? IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel
 
 
 
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
y...@il.ibm.com
 
 
IBM Israel
 
 
 
 

  



From:   "Oesterlin, Robert" 
To: gpfsug main discussion list 
Date:   03/05/2019 11:57 PM
Subject:[gpfsug-discuss] Migrating billions of files?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



I?m looking at migration 3-4 Billion files, maybe 3PB of data between GPFS 
clusters. Most of the files are small - 60% 8K or less. Ideally I?d like 
to copy at least 15-20M files per day - ideally 50M.
 
Any thoughts on how achievable this is? Or what to use? Either with AFM, 
mpifileutils, rsync.. other? Many of these files would be in 4k inodes. 
Destination is ESS.
 
 
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=Bn1XE9uK2a9CZQ8qKnJE3Q=uXadyLeBnskK8mq-S8OjwY-ESxuNxXme9Akj9QaQBiE=UdKoJNySkr8itrQaRD9XMkVjBGnVaU8XnyxuKCldX-8=





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss