On Fri, Nov 9, 2012 at 1:47 PM, Andrew Deason <[email protected]>wrote:

>
> > Yes.. I understand that. I was commenting on the slowness as compared
> > to rsyncing over NFS, for example, which takes 5 hours for the entire
> > tree when done from the top level of the tree. That tree contains 15
> > of the directories that I mentioned in my earlier post. So 15 * 24k
> > dirs.. and to answer the question, 232,974 files of small size for the
> > one subdirectory in question.
>
> I'm getting a little mixed up about when you're switching from talking
> about the 'NFS solution' vs the 'AFS solution'. Are you saying it took
> 1.5 hours to transfer 15 subdirs into AFS, where it took NFS 5 hours to
> transfer 24000 subdirs?
>

I am not being clear about the numbers I am talking about it looks like...
:)

It takes about 5 hours to transfer/rsync the entire tree, with 15 subdirs,
each containing upwards of 200k files and 24k directories, when I am going
from a local volume to an NFSv4 mount. This NFSv4 mount is on the local
network, so there is not a network bottleneck per se. That's the generic
"fastest" case I am working against.



> > > What can possibly make this faster, then, is copying/creating
> > > files/directories in parallel. <snip>
> >
> > Yes, I routinely run 100's of parallel transfers using a combination
> > of tar and rsync.. tar gets the files over in raw form, and rsync mops
> > up behind.  The rsync pass is to correct any problems with the tar
> > copy, and is run twice on a fixed list, generated at transfer time. I
> > have found that even when using a tuned rsync process designed to
> > improve transfer speeds, many parallel tar/untar processes from local
> > to  NFSv4 followed by a "local" rsync to the same destination works
> > better for new files, when timeliness is important.
>
> So... are you just talking about the NFS transfers, here?
>

Right, however using tar and rsync to manage the transfers.


>
> With AFS, trying to rsync again afterwards is possibly much slower due
> to cache churn if the number of files in question is larger than the
> stat cache (as Jeff said earlier). You also cannot do more than 4
> simultaneous 'things' to the AFS server with current client releases
> (unless you fiddle with PAGs, or use some AFS-specific tools). So 100s
> of parallel transfers aren't really helpful.
>

Thats excellent information! :) And thanks to you too Jeff.


>
> So if you want to write the tooling for it, I think ideally what you
> would want is a cp/rsync-like tool that would copy files/dirs using a
> separate thread for each file/dir up to some configured limit, tracking
> dependencies so parent dirs are created first, possibly launching new
> processes in separate PAGs as you go. Or, you could use utilities or
> APIs that speak to AFS directly without going through the filesystem
> layer (like afscp/afsio), so you wouldn't need separate threads or
> processes.
>

afsio sounds like the perfect fit for this, actually... although I did have
some trouble with writing with it...I checked manually with ls along the
trees to make sure all tokens were working and etc.

#:~/scripts# cat foo.sh | afsio write -file /afs/.
realmname.com/home/timothy/foo.sh
afsio: No such file or directory (is dir /afs/.realmname.com/home/timothy
in AFS?)

an afsio read succeeds from the RO tree:

#:~/scripts# afsio read -file /afs/
realmname.com/home/timothy/OpenAFS-ServerDos.pdf > foo
#:~/scripts# md5sum /afs/realmname.com/home/timothy/OpenAFS-ServerDos.pdf
cd688648171672c1a5100aaa9e3186b5  /afs/
realmname.com/home/timothy/OpenAFS-ServerDos.pdf
#:~/scripts# md5sum foo
cd688648171672c1a5100aaa9e3186b5  foo

But fails from the RW tree:

#:~/scripts# afsio read -file /afs/.
realmname.com/home/timothy/OpenAFS-ServerDos.pdf > foo
afsio: No such file or directory (file not found: /afs/.realmname
.com/home/timothy/OpenAFS-ServerDos.pdf)

And of course, I can't seem to write to the read only tree, unsurprisingly:

#:~/scripts# cat foo.sh| afsio write -file
/afs/realmname.com/home/timothy/foo.sh

afsio: server or network not responding (could not create file foo.sh in
directory 536870922.1.1)


> Am I making any sense, here? What I describe above is the way to
> transfer lots of little files into AFS more quickly (and possibly other
> network filesystems depending on their consistency guarantees)
> regardless of other schemes like OSD in play. Something like that would
> be useful in general for people that want to copy a large tree or untar
> something into AFS. As far as I know it does not already exist, but I
> could be wrong.


You are making a LOAD of sense, Andrew, and.. as always.. your patience is
sincerely appreciated. I do believe I want to write up a system that will
accomplish this and make it relatively generic.. so once I have it done, I
can send it off for people to review and use :) If I could have some help
with getting afsio working properly, I think that will be enough to get me
going.

Thanks also to everyone else as well.. I am learning more about the
underpinnings of file systems working with AFS than I have in a very long
time!


-- 
Timothy Balcer / IT Services
Telmate / San Francisco, CA
Direct / (415) 300-4313
Customer Service / (800) 205-5510



-- 
Timothy Balcer / IT Services
Telmate / San Francisco, CA
Direct / (415) 300-4313
Customer Service / (800) 205-5510

Reply via email to