If you have core files from dasalvager and dafileserver then the
processes have terminated abnormally.   If you have an OpenAFS support
provider I suggest you contact them with a support request.

Note that this mailing list is likely to be very quiet over the next
24 to 48 hours as the core developers are in transit due to the end
of the European AFS and Kerberos Conference.

If you do not have a support provider, please open a ticket in OpenAFS
RT by sending mail to [email protected]  Please include in the
report stack traces obtained from the core files.  They will provide the
first clue as to what is failing since nothing is evident in the log 
files.
Be sure to also look at the *.old log files.

Jeffrey Altman


On Thursday, October 18, 2012 10:40:31 PM, Jack Neely wrote:
> Folks,
>
> One of our AFS file servers crashed this afternoon.  OpenAFS 1.6.1 on
> RHEL 6 with kernel 2.6.32-279.9.1.el6.x86_64.  It looks like the
> salvager hung and eventually the dafileserver stopped responding to
> clients.
>
> We're rebooted, fsck'd the ext4 partitions, and finally ran the
> dasalvager -force by hand to attempt to correctly salvage the server.
> In all cases once the dafs instance starts up, it serves requests, it
> dispatches a volume salvage or 4, all the salvager processes get stuck
> and we start all over again.  We've salvaged the server multiple times
> at this point -- our next hope is that we can restart the file server
> with the traditional file server process.  (BTW, 2 and 3 GiB cores from
> dafileserver and dasalvager abound.)
>
> SalsrvLog messages are usually along the following:
>
> 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC'
> 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit
> 'FSSYNC'; attempting reconnect to server
> 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC'
> 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit
> 'FSSYNC'; attempting reconnect to server
> 10/18/2012 17:55:11 SYNC_ask: too many / too latent fatal protocol
> errors on circuit 'FSSYNC'; giving up (tries 1 timeout 1350597266)
> 10/18/2012 17:55:11 FSYNC_askfs: internal FSSYNC protocol error 2
> 10/18/2012 17:55:11 AskOffline:  request for fileserver to take volume
> offline failed; trying again...
> 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC'
> 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit
> 'FSSYNC'; attempting reconnect to server
> 10/18/2012 17:55:11 SYNC_ask: too many / too latent fatal protocol
> errors on circuit 'FSSYNC'; giving up (tries 1 timeout 1350597265)
> 10/18/2012 17:55:11 FSYNC_askfs: internal FSSYNC protocol error 2
> 10/18/2012 17:55:11 AskOffline:  request for fileserver to take volume
> offline failed; trying again...
> 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC'
>
> or
>
> 10/18/2012 22:20:49 dispatching child to salvage volume 540007729...
> 10/18/2012 22:19:33 SYNC_ask: No response on circuit 'FSSYNC'
> 10/18/2012 22:19:33 SYNC_ask: protocol communications failure on circuit
> 'FSSYNC'; attempting reconnect to server
>
> and from FileLog (this looks like I'm restoring from backups)
>
> Thu Oct 18 22:25:30 2012 FSYNC_com:  invalid protocol version
> (2574739029)
> Thu Oct 18 22:25:30 2012 FSYNC_com:  invalid protocol version
> (3774863615)
> Thu Oct 18 22:25:30 2012 FSYNC_com:  invalid protocol version
> (944130375)
> Thu Oct 18 22:25:30 2012 Volume 539458481 now offline, must be salvaged.
> Thu Oct 18 22:25:30 2012 Scheduling salvage for volume 539458481 on part
> /vicepb over SALVSYNC
> Thu Oct 18 22:25:31 2012 nUsers == 0, but header not on LRU
> Thu Oct 18 22:25:31 2012 SYNC_getCom:  error receiving command
> Thu Oct 18 22:25:31 2012 Scheduling salvage for volume 539894230 on part
> /vicepb over SALVSYNC
> Thu Oct 18 22:25:31 2012 FSYNC_com:  read failed; dropping connection
> (cnt=103291)
> Thu Oct 18 22:25:37 2012 FSYNC_com:  invalid protocol version
> (2023862981)
>
> I've checked, all my binaries are from my 1.6.1 build.  What's going on?
>
> Jack Neely
>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to