Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
So at my last job we used to rsync data between isilons across campus, and isilon to Windows File Cluster (and back). I recommend using dry run to generate a list of files and then use this to run with rysnc. This allows you also to be able to break up the transfer into batches, and check if files have changed before sync (say if your isilon files are not RO. Also ensure you have a recent version of rsync that preserves extended attributes and check your ACLS. A dry run example: https://unix.stackexchange.com/a/261372 I always felt more comfortable having a list of files before a sync…. Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Supercomputing Platforms, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipal...@pawsey.org.au Web www.pawsey.org.au ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
Hi Jonathan, yes you are correct! but we plan to resync this once or twice every week for the next 3-4months to be sure everything is as it should be. Right now we are focused on getting them synced up and then we will run scheduled resyncs/checks once or twice a week depending on the data growth :) Thanks Andi Christiansen > On 11/17/2020 2:53 PM Jonathan Buzzard wrote: > > > On 17/11/2020 11:51, Andi Christiansen wrote: > > Hi all, > > > > thanks for all the information, there was some interesting things > > amount it.. > > > > I kept on going with rsync and ended up making a file with all top > > level user directories and splitting them into chunks of 347 per > > rsync session(total 42000 ish folders). yesterday we had only 14 > > sessions with 3000 folders in each and that was too much work for one > > rsync session.. > > Unless you use something similar to my DB suggestion it is almost > inevitable that some of those rsync sessions are going to have issues > and you will have no way to track it or even know it has happened unless > you do a single final giant catchup/check rsync. > > I should add that a copy of the sqlite DB is cover your backside > protection when a user pops up claiming that you failed to transfer one > of their vitally important files six months down the line and the old > system is turned off and scrapped. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
On 17/11/2020 15:55, Simon Thompson wrote: Fortunately, we seem committed to GPFS so it might be we never have to do another bulk transfer outside of the filesystem... Until you want to move a v3 or v4 created file-system to v5 block sizes __ You forget the v2 to v3 for more than two billion files switch. Either that or you where not using it back then. Then there was the v3.2 if you ever want to mount it on Windows. I hopes we won't be doing that sort of thing again... Yep, going to be recycling my scripts in the coming week for a v4 to v5 with capacity upgrade on our DSS-G. That basically involves a trashing of the file system and a restore from backup. Going to be doing the your data will be restored based on a metric of how many files and how much data you have ploy again :-) I too hope that will be the last time I have to do anything similar but my experience of the last couple of decades says that is likely to be a forlorn hope :-( I speculate that one day the 10,000 file set limit will be lifted, but only if you reformat your file system... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
>Fortunately, we seem committed to GPFS so it might be we never have to do >another bulk transfer outside of the filesystem... Until you want to move a v3 or v4 created file-system to v5 block sizes __ I hopes we won't be doing that sort of thing again... Simon ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
On Tue, Nov 17, 2020 at 01:53:43PM +, Jonathan Buzzard wrote: > On 17/11/2020 11:51, Andi Christiansen wrote: > > Hi all, > > > > thanks for all the information, there was some interesting things > > amount it.. > > > > I kept on going with rsync and ended up making a file with all top > > level user directories and splitting them into chunks of 347 per > > rsync session(total 42000 ish folders). yesterday we had only 14 > > sessions with 3000 folders in each and that was too much work for one > > rsync session.. > > Unless you use something similar to my DB suggestion it is almost inevitable > that some of those rsync sessions are going to have issues and you will have > no way to track it or even know it has happened unless you do a single final > giant catchup/check rsync. > > I should add that a copy of the sqlite DB is cover your backside protection > when a user pops up claiming that you failed to transfer one of their > vitally important files six months down the line and the old system is > turned off and scrapped. That's not a bad idea, and I like it more than the method I setup where we captured the output of find from both sides of the transfer and preserved it for posterity, but obviously did require a hard-stop date on the source. Fortunately, we seem committed to GPFS so it might be we never have to do another bulk transfer outside of the filesystem... -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
On 17/11/2020 11:51, Andi Christiansen wrote: Hi all, thanks for all the information, there was some interesting things amount it.. I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session.. Unless you use something similar to my DB suggestion it is almost inevitable that some of those rsync sessions are going to have issues and you will have no way to track it or even know it has happened unless you do a single final giant catchup/check rsync. I should add that a copy of the sqlite DB is cover your backside protection when a user pops up claiming that you failed to transfer one of their vitally important files six months down the line and the old system is turned off and scrapped. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
Hi Jan, We are syncing ACLs, groups, owners and timestamps aswell :) /Andi Christiansen > On 11/17/2020 1:07 PM Jan-Frode Myklebust wrote: > > > Nice to see it working well! > > But, what about ACLs? Does you rsync pull in all needed metadata, or do > you also need to sync ACLs ? Any plans for how to solve that ? > > On Tue, Nov 17, 2020 at 12:52 PM Andi Christiansen > wrote: > > > > Hi all, > > > > thanks for all the information, there was some interesting things > > amount it.. > > > > I kept on going with rsync and ended up making a file with all top > > level user directories and splitting them into chunks of 347 per rsync > > session(total 42000 ish folders). yesterday we had only 14 sessions with > > 3000 folders in each and that was too much work for one rsync session.. > > > > i divided them out among all GPFS nodes to have them fetch an area > > each and actually doing that 3 times on each node and that has now boosted > > the bandwidth usage from 3Gbit to around 16Gbit in total.. > > > > all nodes have been seing doing work above 7Gbit individual which > > is actually near to what i was expecting without any modifications to the > > NFS server or TCP tuning.. > > > > CPU is around 30-50% on each server and mostly below or around 30% > > so it seems like it could have handled abit more sessions.. > > > > Small files are really a killer but with all 96+ sessions we have > > now its not often all sessions are handling small files at the same time so > > we have an average of about 10-12Gbit bandwidth usage. > > > > Thanks all! ill keep you in mind if for some reason we see it > > slowing down again but for now i think we will try to see if it will go the > > last mile with a bit more sessions on each :) > > > > Best Regards > > Andi Christiansen > > > > > On 11/17/2020 9:57 AM Uwe Falke > mailto:uwefa...@de.ibm.com > wrote: > > > > > > > > > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps > > bons, but > > > it is over two nodes, so no bonding. But still, I'd expect to > > open several > > > TCP connections in parallel per source-target pair (like with > > several > > > rsyncs per source node) would bear an advantage (and still I > > thing NFS > > > doesn't do that, but I can be wrong). > > > If more nodes have access to the Isilon data they could also > > participate > > > (and don't need NFS exports for that). > > > > > > Mit freundlichen Grüßen / Kind regards > > > > > > Dr. Uwe Falke > > > IT Specialist > > > Hybrid Cloud Infrastructure / Technology Consulting & > > Implementation > > > Services > > > +49 175 575 2877 Mobile > > > Rathausstr. 7, 09111 Chemnitz, Germany > > > uwefa...@de.ibm.com mailto:uwefa...@de.ibm.com > > > > > > IBM Services > > > > > > IBM Data Privacy Statement > > > > > > IBM Deutschland Business & Technology Services GmbH > > > Geschäftsführung: Sven Schooss, Stefan Hierl > > > Sitz der Gesellschaft: Ehningen > > > Registergericht: Amtsgericht Stuttgart, HRB 17122 > > > > > > > > > > > > From: Uwe Falke/Germany/IBM > > > To: gpfsug main discussion list > > mailto:gpfsug-discuss@spectrumscale.org > > > > Date: 17/11/2020 09:50 > > > Subject:Re: [EXTERNAL] [gpfsug-discuss] > > Migrate/syncronize data > > > from Isilon to Scale over NFS? > > > > > > > > > Hi Andi, > > > > > > what about leaving NFS completeley out and using rsync (multiple > > rsyncs > > > in parallel, of course) directly between your source and target > > servers? > > > I am not sure how many TCP connections (suppose it is NFS4) in > > parallel > > > are opened between client and server, using a 2x bonded interface > > well > > > requires at least two. That combined with the DB approach > > suggested by > > > Jonathan to control the activity of the rsync streams would be my > > best > > > guess. > > > If you have many small files, the overhead might still kill you. > > Tarring > > > them up into larger aggregates for transfer would help a lot, but > > then you > > > must be sure they won't change or you need to implement your own > > version > > > control for that class of files. > > > > > > Mit freundlichen Grüßen / Kind regards > > > > > > Dr. Uwe Falke > > > IT Specialist > > > Hybrid Cloud Infrastructure / Technology Consulting & > > Implementation > > > Services > > > +49 175 575 2877 Mobile > > > Rathausstr. 7, 09111 Chemnitz,
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
Hi all, thanks for all the information, there was some interesting things amount it.. I kept on going with rsync and ended up making a file with all top level user directories and splitting them into chunks of 347 per rsync session(total 42000 ish folders). yesterday we had only 14 sessions with 3000 folders in each and that was too much work for one rsync session.. i divided them out among all GPFS nodes to have them fetch an area each and actually doing that 3 times on each node and that has now boosted the bandwidth usage from 3Gbit to around 16Gbit in total.. all nodes have been seing doing work above 7Gbit individual which is actually near to what i was expecting without any modifications to the NFS server or TCP tuning.. CPU is around 30-50% on each server and mostly below or around 30% so it seems like it could have handled abit more sessions.. Small files are really a killer but with all 96+ sessions we have now its not often all sessions are handling small files at the same time so we have an average of about 10-12Gbit bandwidth usage. Thanks all! ill keep you in mind if for some reason we see it slowing down again but for now i think we will try to see if it will go the last mile with a bit more sessions on each :) Best Regards Andi Christiansen > On 11/17/2020 9:57 AM Uwe Falke wrote: > > > Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but > it is over two nodes, so no bonding. But still, I'd expect to open several > TCP connections in parallel per source-target pair (like with several > rsyncs per source node) would bear an advantage (and still I thing NFS > doesn't do that, but I can be wrong). > If more nodes have access to the Isilon data they could also participate > (and don't need NFS exports for that). > > Mit freundlichen Grüßen / Kind regards > > Dr. Uwe Falke > IT Specialist > Hybrid Cloud Infrastructure / Technology Consulting & Implementation > Services > +49 175 575 2877 Mobile > Rathausstr. 7, 09111 Chemnitz, Germany > uwefa...@de.ibm.com > > IBM Services > > IBM Data Privacy Statement > > IBM Deutschland Business & Technology Services GmbH > Geschäftsführung: Sven Schooss, Stefan Hierl > Sitz der Gesellschaft: Ehningen > Registergericht: Amtsgericht Stuttgart, HRB 17122 > > > > From: Uwe Falke/Germany/IBM > To: gpfsug main discussion list > Date: 17/11/2020 09:50 > Subject:Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data > from Isilon to Scale over NFS? > > > Hi Andi, > > what about leaving NFS completeley out and using rsync (multiple rsyncs > in parallel, of course) directly between your source and target servers? > I am not sure how many TCP connections (suppose it is NFS4) in parallel > are opened between client and server, using a 2x bonded interface well > requires at least two. That combined with the DB approach suggested by > Jonathan to control the activity of the rsync streams would be my best > guess. > If you have many small files, the overhead might still kill you. Tarring > them up into larger aggregates for transfer would help a lot, but then you > must be sure they won't change or you need to implement your own version > control for that class of files. > > Mit freundlichen Grüßen / Kind regards > > Dr. Uwe Falke > IT Specialist > Hybrid Cloud Infrastructure / Technology Consulting & Implementation > Services > +49 175 575 2877 Mobile > Rathausstr. 7, 09111 Chemnitz, Germany > uwefa...@de.ibm.com > > IBM Services > > IBM Data Privacy Statement > > IBM Deutschland Business & Technology Services GmbH > Geschäftsführung: Sven Schooss, Stefan Hierl > Sitz der Gesellschaft: Ehningen > Registergericht: Amtsgericht Stuttgart, HRB 17122 > > > > > From: Andi Christiansen > To: "gpfsug-discuss@spectrumscale.org" > > Date: 16/11/2020 20:44 > Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from > Isilon to Scale overNFS? > Sent by:gpfsug-discuss-boun...@spectrumscale.org > > > > Hi all, > > i have got a case where a customer wants 700TB migrated from isilon to > Scale and the only way for him is exporting the same directory on NFS from > two different nodes... > > as of now we are using multiple rsync processes on different parts of > folders within the main directory. this is really slow and will take > forever.. right now 14 rsync processes spread across 3 nodes fetching from > 2.. > > does anyone know of a way to speed it up? right now we see from 1Gbit to > 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from > scale nodes and 20Gbits from isilon so we should be able to reach just > under 20Gbit... > > > if anyone have any ideas they are welcome! > > > Thanks in advance > Andi Christiansen ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > >
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
Hi, Andi, sorry I just took your 20Gbit for the sign of 2x10Gbps bons, but it is over two nodes, so no bonding. But still, I'd expect to open several TCP connections in parallel per source-target pair (like with several rsyncs per source node) would bear an advantage (and still I thing NFS doesn't do that, but I can be wrong). If more nodes have access to the Isilon data they could also participate (and don't need NFS exports for that). Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefa...@de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Uwe Falke/Germany/IBM To: gpfsug main discussion list Date: 17/11/2020 09:50 Subject:Re: [EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS? Hi Andi, what about leaving NFS completeley out and using rsync (multiple rsyncs in parallel, of course) directly between your source and target servers? I am not sure how many TCP connections (suppose it is NFS4) in parallel are opened between client and server, using a 2x bonded interface well requires at least two. That combined with the DB approach suggested by Jonathan to control the activity of the rsync streams would be my best guess. If you have many small files, the overhead might still kill you. Tarring them up into larger aggregates for transfer would help a lot, but then you must be sure they won't change or you need to implement your own version control for that class of files. Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefa...@de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Andi Christiansen To: "gpfsug-discuss@spectrumscale.org" Date: 16/11/2020 20:44 Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale overNFS? Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi all, i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes... as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit... if anyone have any ideas they are welcome! Thanks in advance Andi Christiansen ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale over NFS?
Hi Andi, what about leaving NFS completeley out and using rsync (multiple rsyncs in parallel, of course) directly between your source and target servers? I am not sure how many TCP connections (suppose it is NFS4) in parallel are opened between client and server, using a 2x bonded interface well requires at least two. That combined with the DB approach suggested by Jonathan to control the activity of the rsync streams would be my best guess. If you have many small files, the overhead might still kill you. Tarring them up into larger aggregates for transfer would help a lot, but then you must be sure they won't change or you need to implement your own version control for that class of files. Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefa...@de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Andi Christiansen To: "gpfsug-discuss@spectrumscale.org" Date: 16/11/2020 20:44 Subject:[EXTERNAL] [gpfsug-discuss] Migrate/syncronize data from Isilon to Scale overNFS? Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi all, i have got a case where a customer wants 700TB migrated from isilon to Scale and the only way for him is exporting the same directory on NFS from two different nodes... as of now we are using multiple rsync processes on different parts of folders within the main directory. this is really slow and will take forever.. right now 14 rsync processes spread across 3 nodes fetching from 2.. does anyone know of a way to speed it up? right now we see from 1Gbit to 3Gbit if we are lucky(total bandwidth) and there is a total of 30Gbit from scale nodes and 20Gbits from isilon so we should be able to reach just under 20Gbit... if anyone have any ideas they are welcome! Thanks in advance Andi Christiansen ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss