Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Chris Schlipalius
So at my last job we used to rsync data between isilons across campus, and 
isilon to Windows File Cluster (and back).

I recommend using dry run to generate a list of files and then use this to run 
with rysnc.

This allows you also to be able to break up the transfer into batches, and 
check if files have changed before sync (say if your isilon files are not RO.

Also ensure you have a recent version of rsync that preserves extended 
attributes and check your ACLS.

 

A dry run example:

https://unix.stackexchange.com/a/261372

 

I always felt more comfortable having a list of files before a sync….

 

 

 

Regards,

Chris Schlipalius

 

Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey 
Supercomputing Centre (CSIRO)

1 Bryce Avenue

Kensington  WA  6151

Australia

 

Tel  +61 8 6436 8815 

Email  chris.schlipal...@pawsey.org.au

Web  www.pawsey.org.au

 

 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Andi Christiansen
Hi Jonathan,

yes you are correct! but we plan to resync this once or twice every week for 
the next 3-4months to be sure everything is as it should be.

Right now we are focused on getting them synced up and then we will run 
scheduled resyncs/checks once or twice a week depending on the data growth :)

Thanks
Andi Christiansen

> On 11/17/2020 2:53 PM Jonathan Buzzard  wrote:
> 
>  
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost 
> inevitable that some of those rsync sessions are going to have issues 
> and you will have no way to track it or even know it has happened unless 
> you do a single final giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside 
> protection when a user pops up claiming that you failed to transfer one 
> of their vitally important files six months down the line and the old 
> system is turned off and scrapped.
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Jonathan Buzzard

On 17/11/2020 15:55, Simon Thompson wrote:



Fortunately, we seem committed to GPFS so it might be we never have to do
another bulk transfer outside of the filesystem...


Until you want to move a v3 or v4 created file-system to v5 block sizes __


You forget the v2 to v3 for more than two billion files switch. Either 
that or you where not using it back then. Then there was the v3.2 if you 
ever want to mount it on Windows.




I hopes we won't be doing that sort of thing again...



Yep, going to be recycling my scripts in the coming week for a v4 to v5 
with capacity upgrade on our DSS-G. That basically involves a trashing 
of the file system and a restore from backup.


Going to be doing the your data will be restored based on a metric of 
how many files and how much data you have ploy again :-)


I too hope that will be the last time I have to do anything similar but 
my experience of the last couple of decades says that is likely to be a 
forlorn hope :-(


I speculate that one day the 10,000 file set limit will be lifted, but 
only if you reformat your file system...


JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Simon Thompson


>Fortunately, we seem committed to GPFS so it might be we never have to do
>another bulk transfer outside of the filesystem...

Until you want to move a v3 or v4 created file-system to v5 block sizes __

I hopes we won't be doing that sort of thing again...

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Skylar Thompson
On Tue, Nov 17, 2020 at 01:53:43PM +, Jonathan Buzzard wrote:
> On 17/11/2020 11:51, Andi Christiansen wrote:
> > Hi all,
> > 
> > thanks for all the information, there was some interesting things
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top
> > level user directories and splitting them into chunks of 347 per
> > rsync session(total 42000 ish folders). yesterday we had only 14
> > sessions with 3000 folders in each and that was too much work for one
> > rsync session..
> 
> Unless you use something similar to my DB suggestion it is almost inevitable
> that some of those rsync sessions are going to have issues and you will have
> no way to track it or even know it has happened unless you do a single final
> giant catchup/check rsync.
> 
> I should add that a copy of the sqlite DB is cover your backside protection
> when a user pops up claiming that you failed to transfer one of their
> vitally important files six months down the line and the old system is
> turned off and scrapped.

That's not a bad idea, and I like it more than the method I setup where we
captured the output of find from both sides of the transfer and preserved
it for posterity, but obviously did require a hard-stop date on the source.

Fortunately, we seem committed to GPFS so it might be we never have to do
another bulk transfer outside of the filesystem...

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Jonathan Buzzard

On 17/11/2020 11:51, Andi Christiansen wrote:

Hi all,

thanks for all the information, there was some interesting things
amount it..

I kept on going with rsync and ended up making a file with all top
level user directories and splitting them into chunks of 347 per
rsync session(total 42000 ish folders). yesterday we had only 14
sessions with 3000 folders in each and that was too much work for one
rsync session..


Unless you use something similar to my DB suggestion it is almost 
inevitable that some of those rsync sessions are going to have issues 
and you will have no way to track it or even know it has happened unless 
you do a single final giant catchup/check rsync.


I should add that a copy of the sqlite DB is cover your backside 
protection when a user pops up claiming that you failed to transfer one 
of their vitally important files six months down the line and the old 
system is turned off and scrapped.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Andi Christiansen
Hi Jan,

We are syncing ACLs, groups, owners and timestamps aswell :)

/Andi Christiansen

> On 11/17/2020 1:07 PM Jan-Frode Myklebust  wrote:
> 
> 
> Nice to see it working well!
> 
> But, what about ACLs? Does you rsync pull in all needed metadata, or do 
> you also need to sync ACLs ? Any plans for how to solve that ?
> 
> On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen 
>  wrote:
> 
> > > Hi all,
> > 
> > thanks for all the information, there was some interesting things 
> > amount it..
> > 
> > I kept on going with rsync and ended up making a file with all top 
> > level user directories and splitting them into chunks of 347 per rsync 
> > session(total 42000 ish folders). yesterday we had only 14 sessions with 
> > 3000 folders in each and that was too much work for one rsync session..
> > 
> > i divided them out among all GPFS nodes to have them fetch an area 
> > each and actually doing that 3 times on each node and that has now boosted 
> > the bandwidth usage from 3Gbit to around 16Gbit in total..
> > 
> > all nodes have been seing doing work above 7Gbit individual which 
> > is actually near to what i was expecting without any modifications to the 
> > NFS server or TCP tuning..
> > 
> > CPU is around 30-50% on each server and mostly below or around 30% 
> > so it seems like it could have handled abit more sessions..
> > 
> > Small files are really a killer but with all 96+ sessions we have 
> > now its not often all sessions are handling small files at the same time so 
> > we have an average of about 10-12Gbit bandwidth usage.
> > 
> > Thanks all! ill keep you in mind if for some reason we see it 
> > slowing down again but for now i think we will try to see if it will go the 
> > last mile with a bit more sessions on each :)
> > 
> > Best Regards
> > Andi Christiansen
> > 
> > > On 11/17/2020 9:57 AM Uwe Falke  > mailto:uwefa...@de.ibm.com > wrote:
> > >
> > > 
> > > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps 
> > bons, but
> > > it is over two nodes, so no bonding. But still, I'd expect to 
> > open several
> > > TCP connections in parallel per source-target pair  (like with 
> > several
> > > rsyncs per source node) would bear an advantage (and still I 
> > thing NFS
> > > doesn't do that, but I can be wrong).
> > > If more nodes have access to the Isilon data they could also 
> > participate
> > > (and don't need NFS exports for that).
> > >
> > > Mit freundlichen Grüßen / Kind regards
> > >
> > > Dr. Uwe Falke
> > > IT Specialist
> > > Hybrid Cloud Infrastructure / Technology Consulting & 
> > Implementation
> > > Services
> > > +49 175 575 2877 Mobile
> > > Rathausstr. 7, 09111 Chemnitz, Germany
> > > uwefa...@de.ibm.com mailto:uwefa...@de.ibm.com
> > >
> > > IBM Services
> > >
> > > IBM Data Privacy Statement
> > >
> > > IBM Deutschland Business & Technology Services GmbH
> > > Geschäftsführung: Sven Schooss, Stefan Hierl
> > > Sitz der Gesellschaft: Ehningen
> > > Registergericht: Amtsgericht Stuttgart, HRB 17122
> > >
> > >
> > >
> > > From:   Uwe Falke/Germany/IBM
> > > To: gpfsug main discussion list 
> > mailto:gpfsug-discuss@spectrumscale.org >
> > > Date:   17/11/2020 09:50
> > > Subject:Re: [EXTERNAL] [gpfsug-discuss] 
> > Migrate/syncronize data
> > > from Isilon to Scale over   NFS?
> > >
> > >
> > > Hi Andi,
> > >
> > > what about leaving NFS completeley out and using rsync  (multiple 
> > rsyncs
> > > in parallel, of course) directly between your source and target 
> > servers?
> > > I am not sure how many TCP connections (suppose it is NFS4) in 
> > parallel
> > > are opened between client and server, using a 2x bonded interface 
> > well
> > > requires at least two.  That combined with the DB approach 
> > suggested by
> > > Jonathan to control the activity of the rsync streams would be my 
> > best
> > > guess.
> > > If you have many small files, the overhead might still kill you. 
> > Tarring
> > > them up into larger aggregates for transfer would help a lot, but 
> > then you
> > > must be sure they won't change or you need to implement your own 
> > version
> > > control for that class of files.
> > >
> > > Mit freundlichen Grüßen / Kind regards
> > >
> > > Dr. Uwe Falke
> > > IT Specialist
> > > Hybrid Cloud Infrastructure / Technology Consulting & 
> > Implementation
> > > Services
> > > +49 175 575 2877 Mobile
> > > Rathausstr. 7, 09111 Chemnitz, 

Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Andi Christiansen
Hi all,

thanks for all the information, there was some interesting things amount it..

I kept on going with rsync and ended up making a file with all top level user 
directories and splitting them into chunks of 347 per rsync session(total 42000 
ish folders). yesterday we had only 14 sessions with 3000 folders in each and 
that was too much work for one rsync session..

i divided them out among all GPFS nodes to have them fetch an area each and 
actually doing that 3 times on each node and that has now boosted the bandwidth 
usage from 3Gbit to around 16Gbit in total..

all nodes have been seing doing work above 7Gbit individual which is actually 
near to what i was expecting without any modifications to the NFS server or TCP 
tuning..

CPU is around 30-50% on each server and mostly below or around 30% so it seems 
like it could have handled abit more sessions..

Small files are really a killer but with all 96+ sessions we have now its not 
often all sessions are handling small files at the same time so we have an 
average of about 10-12Gbit bandwidth usage.

Thanks all! ill keep you in mind if for some reason we see it slowing down 
again but for now i think we will try to see if it will go the last mile with a 
bit more sessions on each :)

Best Regards
Andi Christiansen

> On 11/17/2020 9:57 AM Uwe Falke  wrote:
> 
>  
> Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
> it is over two nodes, so no bonding. But still, I'd expect to open several 
> TCP connections in parallel per source-target pair  (like with several 
> rsyncs per source node) would bear an advantage (and still I thing NFS 
> doesn't do that, but I can be wrong). 
> If more nodes have access to the Isilon data they could also participate 
> (and don't need NFS exports for that).
> 
> Mit freundlichen Grüßen / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefa...@de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Geschäftsführung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> From:   Uwe Falke/Germany/IBM
> To: gpfsug main discussion list 
> Date:   17/11/2020 09:50
> Subject:Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
> from Isilon to Scale over   NFS?
> 
> 
> Hi Andi, 
> 
> what about leaving NFS completeley out and using rsync  (multiple rsyncs 
> in parallel, of course) directly between your source and target servers? 
> I am not sure how many TCP connections (suppose it is NFS4) in parallel 
> are opened between client and server, using a 2x bonded interface well 
> requires at least two.  That combined with the DB approach suggested by 
> Jonathan to control the activity of the rsync streams would be my best 
> guess.
> If you have many small files, the overhead might still kill you. Tarring 
> them up into larger aggregates for transfer would help a lot, but then you 
> must be sure they won't change or you need to implement your own version 
> control for that class of files.
> 
> Mit freundlichen Grüßen / Kind regards
> 
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
> Services
> +49 175 575 2877 Mobile
> Rathausstr. 7, 09111 Chemnitz, Germany
> uwefa...@de.ibm.com
> 
> IBM Services
> 
> IBM Data Privacy Statement
> 
> IBM Deutschland Business & Technology Services GmbH
> Geschäftsführung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
> 
> 
> 
> 
> From:   Andi Christiansen 
> To: "gpfsug-discuss@spectrumscale.org" 
> 
> Date:   16/11/2020 20:44
> Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
> Isilon to Scale overNFS?
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> Hi all, 
> 
> i have got a case where a customer wants 700TB migrated from isilon to 
> Scale and the only way for him is exporting the same directory on NFS from 
> two different nodes... 
> 
> as of now we are using multiple rsync processes on different parts of 
> folders within the main directory. this is really slow and will take 
> forever.. right now 14 rsync processes spread across 3 nodes fetching from 
> 2.. 
> 
> does anyone know of a way to speed it up? right now we see from 1Gbit to 
> 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
> scale nodes and 20Gbits from isilon so we should be able to reach just 
> under 20Gbit... 
> 
> 
> if anyone have any ideas they are welcome! 
> 
> 
> Thanks in advance 
> Andi Christiansen ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
> 
> 

Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Uwe Falke
Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but 
it is over two nodes, so no bonding. But still, I'd expect to open several 
TCP connections in parallel per source-target pair  (like with several 
rsyncs per source node) would bear an advantage (and still I thing NFS 
doesn't do that, but I can be wrong). 
If more nodes have access to the Isilon data they could also participate 
(and don't need NFS exports for that).

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   Uwe Falke/Germany/IBM
To: gpfsug main discussion list 
Date:   17/11/2020 09:50
Subject:Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data 
from Isilon to Scale over   NFS?


Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122




From:   Andi Christiansen 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   16/11/2020 20:44
Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale overNFS?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?

2020-11-17 Thread Uwe Falke
Hi Andi, 

what about leaving NFS completeley out and using rsync  (multiple rsyncs 
in parallel, of course) directly between your source and target servers? 
I am not sure how many TCP connections (suppose it is NFS4) in parallel 
are opened between client and server, using a 2x bonded interface well 
requires at least two.  That combined with the DB approach suggested by 
Jonathan to control the activity of the rsync streams would be my best 
guess.
If you have many small files, the overhead might still kill you. Tarring 
them up into larger aggregates for transfer would help a lot, but then you 
must be sure they won't change or you need to implement your own version 
control for that class of files.

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   Andi Christiansen 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   16/11/2020 20:44
Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from 
Isilon to Scale overNFS?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi all, 

i have got a case where a customer wants 700TB migrated from isilon to 
Scale and the only way for him is exporting the same directory on NFS from 
two different nodes... 

as of now we are using multiple rsync processes on different parts of 
folders within the main directory. this is really slow and will take 
forever.. right now 14 rsync processes spread across 3 nodes fetching from 
2.. 

does anyone know of a way to speed it up? right now we see from 1Gbit to 
3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from 
scale nodes and 20Gbits from isilon so we should be able to reach just 
under 20Gbit... 


if anyone have any ideas they are welcome! 


Thanks in advance 
Andi Christiansen ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss