Dan Pritts wrote: > On Mon, Oct 16, 2006 at 05:05:20PM -0400, Jeffrey Altman wrote: >> Danno: >> >> I suspect that if people filed more bug reports when problems were >> experienced that things would get fixed faster. I understand that folks > > for the record (and I know *you* know this), our windows support folks > have in fact filed some bugs, or at least contacted you directly, > specifically regarding the delay issues. Also for the record you > were apparently responsive but somehow along the way the problem > didn't get solved.
Searching through the bugs queue the only tickets submitted from internet2.edu are from you. That is not to say that folks have not submitted tickets from other organizations but I would be unable to tie them back to Internet2. If I had recognized the relationship I would have contacted you directly when the problem had been reported. It is often the case that the information provided within a report is not sufficient to reproduce a problem or even narrow down a problem. Problems such as a deadlock or a panic are easy to fix. Problems involving protocol behaviors are harder to identify but still not so hard to fix. Things which involved complex interactions between clients, networks, file servers, and the stored metadata are most challenging. I have dozens of tickets in the openafs.org queue that have either been marked stalled or resolved simply because the submitter stopped provided responses to requests for more information. If I am unable to reproduce a problem and cannot identify the cause from the client side log files and if the file server logs are not accessible, the easiest way for me to debug a problem would be to be given remote access on a system within the cell that is experiencing the problem. Therefore, if I had seen an unresolvable ticket from internet2.edu I would certainly have contacted you personally about it because you would have been able to facilitate access to the necessary information. If you can provide the ticket number for the bug report I would be more than happy to take another look at it. > It may be that all of our problems have been either the long delays > accessing AFS (which I think was due to the server bug that was fxed in > 1.4.1), or the "missing data" problems, which I never bothered to report > because i was sure it's just locking. (and, because try as we could, > we could never reproduce it). Thanks for the info on the bug fixed > in the upcoming release - it sounds like that will be an improvement, > but I don't think it's related here, the users are editing the same file > over and over. Every edit of an office document produces a temporary file in the same directory as the original. The edits are made to the temporary file and the original is only updated after a "save" is executed. I suspect that if the users have the 5 minute auto-save feature enabled that they would have triggered the bugs I described. If you have multiple people editing in the same directory, the creation and deletion of temporary files will result in callbacks to the clients that can also trigger the callback break race conditions. > Part of the problem is that my team is overloaded and didn't do proper > followup with this beta customer in our organization - so they got very > frustrated. I'm loath to frustrate them any further, eg, with -dev branch > code that may have furhter problems, since they process my paycheck. The fact that a release is labeled "stable" vs "devel" doesn't mean that the code is more likely to work. "Stable" simply means that fewer changes will occur in its successors because no new features or behavioral changes will be added to that branch. This doesn't prevent bugs from being discovered or introduced in that code. What it does mean is that if there is a bug that requires a major redesign of the code in order to fix it, the bug fix may not go onto the "stable" branch at all and only be applied to the development branch. There are many organizations which deploy code off of the development branch for Windows but the stable branch for UNIX. If you are using 64-bit Windows you don't have a choice. If you are using Office applications and require locking you don't have a choice. If you are using files larger than 2GB you don't have a choice. And now if you are using large numbers of temporary files in a common directory from more than one machine you don't really have a choice. Another thing to think about is that bugs on the devel branch get fixed faster than bugs on the stable branch. A devel branch release does not have to work and have binaries built for all platforms unlike a stable branch release. Therefore, it is much easier for a fix to be tested in the Windows client and for a release to be issued. > Thanks also for the info on all the other afs access methods. Sounds > like samba isn't the right solution for us. We may give the SFTPdrive > thing a shot - I believe it will work, but it's not clear it enforces > locking either. Be sure to check what their implementation does. If it is a copy, local edit, and replace model then for office documents which consist of a single file you should be ok. If the office documents in question use multiple files with links between them, then I suspect that your users will require simultaneous access to the same file images with support for byte range locking. As AFS does not support byte range locks in the file server, if they are not simulated with full file locks by the SFTPdrive (or other access client) you will continue to have problems. Jeffrey Altman _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
