Well, I guess it's the difference between bugs in the source code itself that we are about to ship as "v1.5" versus operational issues that effect TestGrid specifically versus operational issues that might typically effect other users.
If it is not a new bug in the source code since the v1.4.1 then it wouldn't be a "regression" (people who are happily using v1.4.1 would not have a reason to refrain from upgrading to v1.5). If it is an operational issue that strikes rarely, doesn't cause great harm, and is easy to work around then it isn't "critical". (This part is, of course, a judgment call.) The investigation isn't complete, but so far it looks like the situation on the test grid where you can't create new directories is due to some combination of: 1. Limitations in the code that were already present in the v1.4.1 release (therefore not a regression): ticket #540 2. The limitation is that the testgrid web gateway (http://testgrid.allmydata.org:3567 ) is not handling misbehavior by some of the storage servers. That's a bug, but it isn't probably won't affect lots of users. It can be "worked around" by fixing your storage servers. 3. The misbehaving storage servers are running TahoeLAFS-v1.3-r3747, which is older than the current stable v1.4.1 release. It's possible (but again, without a complete investigation I don't know if it is true) that the cause of the MemoryError in the storage server has been fixed since then. So, I think the next step are: 1. Investigate more. Are any other storage servers besides tahoebs5.allmydata.com bs5c2 misbehaving? Do the munin graphs of bs5c2 show any interesting pattern in memory usage or other statistics? 2. Upgrade bs5c2 and reboot it, probably making TestGrid usable again. 3. ? Maybe experiment with adding some sort of kludge to hard-shutdown in case of MemoryError. I'm about to do #2, even though I don't want doing so to interfere with #1, and then I need to go to work. :-) Regards, Zooko tickets mentioned in this e-mail: http://allmydata.org/trac/tahoe/ticket/540 # inappropriate "uncoordinated write error" after handling a server failure _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
