On Mon, Feb 10, 2014 at 2:23 PM, Andrew Deason <[email protected]>wrote:
> On Mon, 10 Feb 2014 00:27:59 -0600 > Tracy Di Marco White <[email protected]> wrote: > > > Every night at midnight, we run 'vos backupsys'. For three nights in a > > row, on one of the servers I've upgraded to 1.6.5 and dafs, I've been > > getting the following errors, and it mostly stops being a fileserver. > > Is this fixed in 1.6.6? Anyone else seeing it? This is on NetBSD > > 6.1.3. > > I would guess you are the only one using NetBSD for a "real" fileserver, > at least for DAFS. The errors you've posted indicate there are some > problems with the mechanism by which the fileserver and other processes > use to communicate with each other, so it may be advisable to not trust > DAFS on NetBSD with "real" data until it's known what's going on, as > errors like this could possibly lead to corrupted volumes. > That's possible, certainly, depending on your definition of 'real'. I know other people are using DAFS on NetBSD for fileservers. Personally, I've only been doing it for a year or two. > Do you know if this seems to happen immediately, or if 'vos backupsys' > seems to correctly create some backup clones, and then eventually > triggers this error? I (or someone else) will probably need to reproduce > this to get a better idea of what's going on, but you can maybe save us > some time with some more info: It happens on one server, of four, and it's most of the way through creating backup volumes on this particular server. It is consistently happening on one, and only one, server. > > VolserLog > > Sat Feb 8 00:02:42 2014 SYNC_ask: length field in response inconsistent > > on circuit 'FSSYNC' > > Sat Feb 8 00:02:42 2014 SYNC_ask: protocol communications failure on > > circuit 'FSSYNC'; attempting reconnect to server > > This message says what one of the problems is, but isn't providing a lot > of information. If it's convenient for you to apply a patch and rebuild, > the following patch would give us a little more information in this > situation (from gerrit 10829): > > < > http://git.openafs.org/?p=openafs.git;a=patch;h=9604a45e94ed23a2941d0a7e11bfd892a0bd0bf7 > > > Sure, since I'm restarting just after midnight every night anyway. On Mon, 10 Feb 2014 12:15:08 -0600 > Tracy Di Marco White <[email protected]> wrote: > > > root 4129 0.0 0.2 46288 5124 ? Sl 7:46AM 0:00.02 > > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo > > root 7155 0.0 1.2 85200 42424 ? Il 8:06AM 1:27.36 > > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo > > Do you have any idea why you have multiple davolserver processes running > at once? Does BosLog maybe say anything about processes dying or > anything? Could you provide a 'ps' listing of all afs server processes > on that machine? > It's not. Those are three different days, three different restarts. Restarting afs is the only way I know of to make the fileserver work again. -Tracy
