In message <[EMAIL PROTECTED]>,Rainer Toebbicke writes: >Actually, I ran into this when told that running a 'find >/usr/vice/cache ...' has been suspected to hang AFS. Luckily >osi_UFSTruncate is about the only place where the i_sem is >downed/upped correctly, so that wasn't it. But it illustrates that >certain conventions should be taken seriously.
maybe. there is some doubt in my mind about the order of i_alloc_sem, i_sem and the BKL. but it seems to be working so its best not to change it unless it solves a problem. >Oh? Easy! You need a farm of about 10 clients on Gb Ethernet, 4-5 >small servers on Gb as well with decently performing RAIDs and plenty >of time: > >set up a directory /afs/.../$hostname for each client, about 10 2GB >volumes per client mounted at /afs/.../$hostname/[0-9], and then run >/afs/cern.ch/user/r/rtb/common/bin/disk_stress -rN500 \ > /afs/.../$hostname/? >on each client and wait. The problem usually manifests itself after a >few days, sometimes 1-2 weeks. Survival after > 3 weeks on all clients >is exceptional. not a problem. we are not a small shop. btw, you said this test "fails" but didnt indicate what the failure mode is. >Anyway, I'm knocking at various portions of the client code and listen >if it sounds hollow. The setup described seems to look artificial but >we've got enough traffic and reported oddities to suspect that it is >also triggered by normal use. At what frequency - no idea. have any changes to i_sem proven useful in fixing this problem? btw, addressing some of your other concerns in the previous message: >One of the prominent occasions where this looks particular careless is >in afs_linux_read() (osi_FlushPages) prior to calling >generic_file_read(). With a printf() in osi_VM_FlushPages and a little >mickey-mousing you can show that at least through this code path >truncate_inode_pages() is called without the i_sem lock. My local it should be safe to add an i_sem around the truncate_inodes_pages() in osi_VM_FlushPages(). there is only one path to osi_VM_FlushPages() via osi_FlushPages(). >... Similar suspects like osi_VM_Truncate and >osi_VM_FlushVCache have the same problem - a fast growing tree to >trace back. osi_VM_FlushVCache() is called as part of recycling an inode for use. so its typically called in the afs_lookup/afs_create/afs_mkdir code paths. it also happens to get called during inode and dentry revalidation. the parent dir's i_sem is held during this operations, so the new inode's i_sem can probably be taken safely. osi_VM_Truncate() typically happens when an inode's size changes. again, very likely safe to just take i_sem here. it would be pretty easy to just down_trylock() to catch any paths that might have a double lock and then fix these broken code paths. _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
