On Mon, Feb 10, 2014 at 2:23 PM, Andrew Deason <[email protected]>wrote:

> On Mon, 10 Feb 2014 00:27:59 -0600
> Tracy Di Marco White <[email protected]> wrote:
>
> > Every night at midnight, we run 'vos backupsys'. For three nights in a
> > row, on one of the servers I've upgraded to 1.6.5 and dafs, I've been
> > getting the following errors, and it mostly stops being a fileserver.
> > Is this fixed in 1.6.6? Anyone else seeing it? This is on NetBSD
> > 6.1.3.
>
> I would guess you are the only one using NetBSD for a "real" fileserver,
> at least for DAFS. The errors you've posted indicate there are some
> problems with the mechanism by which the fileserver and other processes
> use to communicate with each other, so it may be advisable to not trust
> DAFS on NetBSD with "real" data until it's known what's going on, as
> errors like this could possibly lead to corrupted volumes.
>

That's possible, certainly, depending on your definition of 'real'. I know
other people are using DAFS on NetBSD for fileservers. Personally,
I've only been doing it for a year or two.


> Do you know if this seems to happen immediately, or if 'vos backupsys'
> seems to correctly create some backup clones, and then eventually
> triggers this error? I (or someone else) will probably need to reproduce
> this to get a better idea of what's going on, but you can maybe save us
> some time with some more info:


It happens on one server, of four, and it's most of the way through creating
backup volumes on this particular server. It is consistently happening on
one, and only one, server.


> > VolserLog
> > Sat Feb  8 00:02:42 2014 SYNC_ask:  length field in response inconsistent
> > on circuit 'FSSYNC'
> > Sat Feb  8 00:02:42 2014 SYNC_ask: protocol communications failure on
> > circuit 'FSSYNC'; attempting reconnect to server
>
> This message says what one of the problems is, but isn't providing a lot
> of information. If it's convenient for you to apply a patch and rebuild,
> the following patch would give us a little more information in this
> situation (from gerrit 10829):
>
> <
> http://git.openafs.org/?p=openafs.git;a=patch;h=9604a45e94ed23a2941d0a7e11bfd892a0bd0bf7
> >
>


Sure, since I'm restarting just after midnight every night anyway.

On Mon, 10 Feb 2014 12:15:08 -0600
> Tracy Di Marco White <[email protected]> wrote:
>
> > root      4129  0.0  0.2 46288 5124 ?     Sl    7:46AM  0:00.02
> > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
> > root      7155  0.0  1.2  85200  42424 ?     Il    8:06AM  1:27.36
> > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
>
> Do you have any idea why you have multiple davolserver processes running
> at once? Does BosLog maybe say anything about processes dying or
> anything? Could you provide a 'ps' listing of all afs server processes
> on that machine?
>

It's not. Those are three different days, three different restarts.
Restarting
afs is the only way I know of to make the fileserver work again.

-Tracy

Reply via email to