On Thu, 8 Nov 2012 22:48:56 -0800 Timothy Balcer <[email protected]> wrote:
> Well, unless I am missing something seriously obvious, for example it > took 1.5 hours to rsync a subdirectory to an AFS volume that had not a > lot of content, but many directories. Creating lots of files is not fast. Due to the consistency guarantees of AFS, you have to wait for at least a network RTT for every single file you create. That is, a mkdir() call is going to take at least 50ms if the server is 50ms away. Most/all recursive copying tools will wait for that mkdir() to complete before doing anything else, so it's slow. Arguably we could maybe introduce something to 'fs storebehind' to do these operations asynchronously to the fileserver, but that has issues (as mentioned in the 'fs storebehind' manpage). And, well, it doesn't exist right now anyway, so that doesn't help you :) What can possibly make this faster, then, is copying/creating files/directories in parallel. I'm not sure if there exists any tool do copy files like that, but you could possibly script it or something. Or, if the data is organized in such a way that you can run one rsync/'cp -R'/etc per top-level directory, that could make it faster. That is, if you have 4 top-level directories, running 4 recursive copies in parallel could possibly make the whole thing faster. (You can go higher than 4 transfers in parallel if you so some fiddling, but I'm not going too deeply into that... let me know if you want to know more.) Also, I was assuming you're rsync'ing to an empty destination in AFS; that is, just using rsync to copy stuff around. If you're actually trying to synchronize a dir tree in AFS that's at least partially populated, see Jeff's comments about stat caches and stuff. > No, I am writing from a local audio/video server to a local repo, > which needs to be very fast in order to service live streaming in > parallel with write on a case by case basis. It seems like it could just write to /foo during the stream capture, and copy it to /afs/bar/baz when it's done. But if the union mount scheme makes it easier for you, then okay :) But I'm not sure I understand... above you discuss making these directory trees made up of a lot of directories or relatively small files. I would've thought that video captures of a live stream would not be particularly small... copying video to AFS sounds more like the "small number of large files" use case, which is much more manageable. Is this a lot of small video files or something? > > To improve things, you can maybe try to reduce the number of volumes > > that are changing. That is, if you are adding new data in batches, I > > don't know if it's feasible for you to add that 'batch' of data by > > creating a new volume instead of writing to existing volumes. > > That's feasible..... but what if, for example, vol1 is mounted at * > /afs/foo/home/bar* and contains a thousand directories. The new > content is a thousand more directories, but at the exact same level of > the tree. How would I handle that? As far as I can tell, OpenAFS only > allows a volume being mounted on its very own directory, and you can't > nest them together like that. Well, that's what I meant by "I don't know if it's feasible". If you must add stuff to the same level of the dir hierarchy, instead of putting it all under a new directory e.g. "foo_2012-11-09/", it's harder. But, if you can create a new vol for each dir as you mention: > How unfeasible would it be to create N volumes, where N >= 500 per > shot? I would end up with many thousands of tiny volumes.. none of > which I have trouble with, but would that be scalable? Let's assume I > have spread out db and file servers in such a way to equalize load. I'm not sure what scalability issues here you're expecting; making volumes smaller but more in number is typically something you do to improve scalability. We usually encourage more small volumes instead of fewer big volumes. What I would guess you may run into: - The speed of creating the volumes. I'm not actually sure how fast this goes, since creating a lot of volumes quickly isn't usually a concern... so you'll have to try it :) - Fileserver startup/shutdown time for non-DAFS is somewhat heavily influenced by the number of volumes on the server this is a significant issue when you start to have tens or hundreds of thousands of volumes on a server. That second point is addressed by DAFS, which can handle at least a million or so volumes per server rather quickly (a few seconds for startup). I'm not sure if you know what DAFS is, but converting to using it should be straightforward. There is a section about DAFS and how to convert to using it in appendix C of the Quick Start Guide: <http://docs.openafs.org/QuickStartUnix/index.html#DAFS.html>. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
