On Fri, Nov 9, 2012 at 1:47 PM, Andrew Deason <[email protected]>wrote:
> > > Yes.. I understand that. I was commenting on the slowness as compared > > to rsyncing over NFS, for example, which takes 5 hours for the entire > > tree when done from the top level of the tree. That tree contains 15 > > of the directories that I mentioned in my earlier post. So 15 * 24k > > dirs.. and to answer the question, 232,974 files of small size for the > > one subdirectory in question. > > I'm getting a little mixed up about when you're switching from talking > about the 'NFS solution' vs the 'AFS solution'. Are you saying it took > 1.5 hours to transfer 15 subdirs into AFS, where it took NFS 5 hours to > transfer 24000 subdirs? > I am not being clear about the numbers I am talking about it looks like... :) It takes about 5 hours to transfer/rsync the entire tree, with 15 subdirs, each containing upwards of 200k files and 24k directories, when I am going from a local volume to an NFSv4 mount. This NFSv4 mount is on the local network, so there is not a network bottleneck per se. That's the generic "fastest" case I am working against. > > > What can possibly make this faster, then, is copying/creating > > > files/directories in parallel. <snip> > > > > Yes, I routinely run 100's of parallel transfers using a combination > > of tar and rsync.. tar gets the files over in raw form, and rsync mops > > up behind. The rsync pass is to correct any problems with the tar > > copy, and is run twice on a fixed list, generated at transfer time. I > > have found that even when using a tuned rsync process designed to > > improve transfer speeds, many parallel tar/untar processes from local > > to NFSv4 followed by a "local" rsync to the same destination works > > better for new files, when timeliness is important. > > So... are you just talking about the NFS transfers, here? > Right, however using tar and rsync to manage the transfers. > > With AFS, trying to rsync again afterwards is possibly much slower due > to cache churn if the number of files in question is larger than the > stat cache (as Jeff said earlier). You also cannot do more than 4 > simultaneous 'things' to the AFS server with current client releases > (unless you fiddle with PAGs, or use some AFS-specific tools). So 100s > of parallel transfers aren't really helpful. > Thats excellent information! :) And thanks to you too Jeff. > > So if you want to write the tooling for it, I think ideally what you > would want is a cp/rsync-like tool that would copy files/dirs using a > separate thread for each file/dir up to some configured limit, tracking > dependencies so parent dirs are created first, possibly launching new > processes in separate PAGs as you go. Or, you could use utilities or > APIs that speak to AFS directly without going through the filesystem > layer (like afscp/afsio), so you wouldn't need separate threads or > processes. > afsio sounds like the perfect fit for this, actually... although I did have some trouble with writing with it...I checked manually with ls along the trees to make sure all tokens were working and etc. #:~/scripts# cat foo.sh | afsio write -file /afs/. realmname.com/home/timothy/foo.sh afsio: No such file or directory (is dir /afs/.realmname.com/home/timothy in AFS?) an afsio read succeeds from the RO tree: #:~/scripts# afsio read -file /afs/ realmname.com/home/timothy/OpenAFS-ServerDos.pdf > foo #:~/scripts# md5sum /afs/realmname.com/home/timothy/OpenAFS-ServerDos.pdf cd688648171672c1a5100aaa9e3186b5 /afs/ realmname.com/home/timothy/OpenAFS-ServerDos.pdf #:~/scripts# md5sum foo cd688648171672c1a5100aaa9e3186b5 foo But fails from the RW tree: #:~/scripts# afsio read -file /afs/. realmname.com/home/timothy/OpenAFS-ServerDos.pdf > foo afsio: No such file or directory (file not found: /afs/.realmname .com/home/timothy/OpenAFS-ServerDos.pdf) And of course, I can't seem to write to the read only tree, unsurprisingly: #:~/scripts# cat foo.sh| afsio write -file /afs/realmname.com/home/timothy/foo.sh afsio: server or network not responding (could not create file foo.sh in directory 536870922.1.1) > Am I making any sense, here? What I describe above is the way to > transfer lots of little files into AFS more quickly (and possibly other > network filesystems depending on their consistency guarantees) > regardless of other schemes like OSD in play. Something like that would > be useful in general for people that want to copy a large tree or untar > something into AFS. As far as I know it does not already exist, but I > could be wrong. You are making a LOAD of sense, Andrew, and.. as always.. your patience is sincerely appreciated. I do believe I want to write up a system that will accomplish this and make it relatively generic.. so once I have it done, I can send it off for people to review and use :) If I could have some help with getting afsio working properly, I think that will be enough to get me going. Thanks also to everyone else as well.. I am learning more about the underpinnings of file systems working with AFS than I have in a very long time! -- Timothy Balcer / IT Services Telmate / San Francisco, CA Direct / (415) 300-4313 Customer Service / (800) 205-5510 -- Timothy Balcer / IT Services Telmate / San Francisco, CA Direct / (415) 300-4313 Customer Service / (800) 205-5510
