On Tue, Nov 6, 2012 at 8:49 AM, Andrew Deason <[email protected]>wrote:
> On Tue, 6 Nov 2012 00:06:53 -0800 > Timothy Balcer <[email protected]> wrote: > > > I have a need to think about replicating large volumes (multigigabyte) > > of large number (many terabytes of data total), to at least two other > > servers besides the read write volume, and to perform these releases > > relatively frequently (much more than once a day, preferably) > > How much more frequently? Hourly? Some people do 4 times hourly (and > maybe more) successfully. > Well, unless I am missing something seriously obvious, for example it took 1.5 hours to rsync a subdirectory to an AFS volume that had not a lot of content, but many directories. How frequently depends on use, and being able to release faster than the writes.. I don't have performance data on the writes yet, but that will change anyway.. we are going from 200+ clients to many more. Which is why I am working with AFS in the first place for this.. the environment is a write once, read many situation. > > > Also, these other two (or more) read-only volumes for each read write > > volume will be remote volumes, transiting across relatively fat, but > > less than gigabit, pipes (100+ megabits) > > Latency may matter more than bandwidth; do you know what it is? > depending on the colo site, between 30 and 60ms > > > For the moment what I have decided to experiment with is a simple > > system. My initial idea is to work the afs read-only volume tree into > > an AUFS union, with a local read write partition in the mix. This way, > > writes will be local, but I can periodically "flush" writes to the AFS > > tree, double check they have been written and released, and then > > remove them from the local partition.. this should maintain integrity > > and high availability for the up-to-the-moment recordings, given I > > RAID the local volume. Obviously, this still introduces a single point > > of failure... so I'd like to flush as frequently as possible. > > Incidentally, it seems you can NFS export such a union system fairly > > simply. > > I'm not sure I understand the purpose of this; are you trying to write > new data from all of the 'remote' locations, and you need those writes > to 'finish' quickly? > No, I am writing from a local audio/video server to a local repo, which needs to be very fast in order to service live streaming in parallel with write on a case by case basis. That local repo would be in a R/W branch above the AFS R/O branch, so: dirs=/Read-Write=rw:/afs/path/to/read-only=ro aufs /union This way I can present the /union to the application server as a read/write repo for all its needs, including archival use, but still have AFS underneath for replication and distribution. *sigh* I wish OSD was primetime :) > > But, I feel as if I am missing something... it has become clear that > > releasing is a pretty intensive operation, and if we're talking about > > multiple gigabytes per release, I can imagine it being extremely > > difficult. Is there a schema that i can use with OpenAFS that will > > help alleviate this problem? Or perhaps another approach I am missing > > that may solve it better? > > Eh, some people do that; it just reduces the benefit of the client-side > caching. Every time you release a volume, the server tells clients that > for all data in that volume, the client needs to check with the server > to see if the cached data is different from what's actually in the > volume. But that may not matter so much, especially for a small number > of large files. > Well thats the thing.. this is a large number of small to medium sized files that are being written continuously. In addition, there is a quite deep directory structure. I'm trying to get it flattened out to improve scaling, but at the moment it is taking 1.5 hours to rsync a subdirectory containing about 5G of data, but 23,681 directories, for example releasing is a whole 'nuther animal... ;-) > To improve things, you can maybe try to reduce the number of volumes > that are changing. That is, if you are adding new data in batches, I > don't know if it's feasible for you to add that 'batch' of data by > creating a new volume instead of writing to existing volumes. > That's feasible..... but what if, for example, vol1 is mounted at * /afs/foo/home/bar* and contains a thousand directories. The new content is a thousand more directories, but at the exact same level of the tree. How would I handle that? As far as I can tell, OpenAFS only allows a volume being mounted on its very own directory, and you can't nest them together like that. How unfeasible would it be to create N volumes, where N >= 500 per shot? I would end up with many thousands of tiny volumes.. none of which I have trouble with, but would that be scalable? Let's assume I have spread out db and file servers in such a way to equalize load. > > > And, of course, the release process may not be fast enough to actually > do releases as quickly as you want. There are maybe some ways to ship > around volume dumps yourself to get around that, and some pending > improvements to the volserver that would help, but I would only think > about that after you try the releases yourself. > The idea of doing R/W "checkpoint" volumes that I only have to release once in a while after the first release is very appealing... if you can suggest a solution to the problem above.. I am all ears!! :) I would be VERY happy to be able to allocate space, quota and location, on the fly, in batchwise operations. > -- > Andrew Deason > [email protected] > > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info > -- Timothy Balcer
