Re: [OpenAFS] openafs on Fedora 12?

Simon Wilkinson Fri, 11 Dec 2009 03:11:12 -0800

BTW: on decent machines an individual 1 GiB write does not make theuser wait: on write the data is first copied in the the AFS file'smapping, later into the cache file's mapping (the former step can beavoided by writing into the chunk files directly). On reads thereader is woken up on every RX packet, ensuring streaming to theuser. Here again, the double copy can be avoided.

What happens here depends on the VM model of the machine, and how weinteract with it. But on Linux, at least, this isn't strictly true.Here's how things work in 1.4

There are two different codepaths, one for writes from the write()syscall, and the other invoked when a page that is mmap'd gets writtento. With write()s, what we do currently is that we prepare a page forthe kernel - the kernel then takes care of copying the buffer passedby the user to that page, and lets us know when it has completed. Wethen take that data, from that page, and do a write() of it againstthe backing store. We then return control to the user, who's had towait whilst all this occurs. In the background, the pdflush processthen takes care of outputing this data to disk.

With mmap, things are a little different. pdflush is in charge of ourwriting and, at intervals, will call our writepage() operation onpages that the user has dirtied. This all happens completely behindthe scenes. We then write the AFS dirty page out into the backingstore (by using that store's write command), and it's scheduled foranother background flush.

In 1.5 this is streamlined a little by only working at the page level,which avoids some context swaps, and copies. As I noted in an earlieremail, we also do more in the background in order to get control backto the user quicker. One further optimisation is that we shouldn't bedoing the write to the backing cache from the write() syscall. Allwrite is supposed to do is to copy the data from the user into thefilesystem's mapping, and mark the page dirty. It should then be up tothe pdflush process to move this out to the backing store - I intendto revisit this at some point, but my previous attempts have resultedin a cache manager that is very prone to deadlocks.

As you note, our Linux implementation creates two copies of the data -one in AFS's mapping, the other in the backing files. However, wecannot easily get rid of this duplication - there's no simplemechanism of bypassing the VM and 'writing into the chunk filesdirectly'. Using direct-IO would be a possibility, but we'd need tohandle doing this in the backgound, otherwise the user would end uphaving to wait until chunk files actually made it to the disk, and itwould limit the range of filesystems we can use as a backing cache.


S.



_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] openafs on Fedora 12?

Reply via email to