On Mon, 13 Jan 2014 15:00:41 +0100 (CET) Harald Barth <[email protected]> wrote:
> After that: This seems not to be self-healing either. On the sync site: > > Fri Jan 3 13:58:44 2014 assuming distant vote time 19270408 from > 130.237.234.43 is an error; marking host down > Mon Jan 13 14:48:42 2014 ubik: A Remote Server has addresses: > > Looks like I have to restart the server on the syncsite as well (so it > forgets the bad vote time). And I'm not sure what 19270408 actually > means. 223 days ago? Sorry to further hijack Timothy's thread, but I guess he's not using it anyway :) 19270408 is an error code, as I had intended that log message to indicate. The error is: $ translate_et 19270408 19270408 (rxk).8 = ticket contained unknown key version number In that particular part of the ubik protocol, error codes are indistinguishable from vote timestamps. A somewhat recent change (v1.6.3) was done to provide a heuristic to see if something "looks like" a timestamp or an error code, and to treat it accordingly. That certainly does look like an error. Before that change was introduced, the behavior in ubik was indeed rather puzzling, since we would seem to not elect a sync site (since the quorum immediately expires), even though all of the hosts are up and reachable. That's probably not helping any confusion if such a version is relevant at all. The obvious question is whether you are changing your keys or something while the processes are running, but I assume you think you're not doing that. But it is possible to try monitoring it locally on each machine, to see if the KeyFile/rxkad.keytab is changing or something. If we had better logging, you could see what kvnos are actually in play. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
