On Fri, Apr 16, 2010 at 12:19 PM, Marcus Watts <[email protected]> wrote:
> Derrick Brashear <[email protected]> sent:
>
>> Date:    Thu, 15 Apr 2010 23:02:33 EDT
>> To:      Russ Allbery <[email protected]>
>> cc:      [email protected]
>> From:    Derrick Brashear <[email protected]>
>> Subject: Re: [OpenAFS] Re: Ubik problem
>>
>> On Thu, Apr 15, 2010 at 9:13 PM, Russ Allbery <[email protected]> wrote:
>> > Andrew Deason <[email protected]> writes:
>> >> Atro Tossavainen <[email protected]> wrote:
>> >
>> >>> Derrick,
>> >>>
>> >>> > I'd suggest just using the IBM binary for the kaserver (and only the
>> >>> > kaserver) in your OpenAFS installation
>> >>>
>> >>> That's an interesting thought, but unfortunately it's nowhere near
>> >>> an option. =A0sunx86_ is quite simply not a supported platform for
>> >>> IBM AFS at all, even at 3.6 Patch 19 (August 2009).
>> >
>> >> Older OpenAFS releases could be another option, but I don't know how
>> >> useful of an answer that is. I'm not sure what could have caused that,
>> >> so I don't have a particular range in mind; maybe just earlier 1.4...
>> >> 1.4.9? 1.4.2?
>> >
>> > We were successfully running a 1.2.x version of kaserver on SPARC Solaris=
>> ,
>> > and upgrading to 1.4.2 on Linux failed (albeit with different symptoms; i=
>> t
>> > would just stop successfully giving out tickets for a while and then come
>> > back, regularly), so we stuck with 1.2.x on SPARC until we turned it off
>> > entirely.
>>
>> I'm pretty sure it "broke" between 1.2.11 and 1.4.1.
>>
>> --=20
>> Derrick
>
> Gah.  You made me drag out my kaserver notes!  Worse!  You made me
> *run* the thing!  Bad!  Bad!
>
> "broke" is a pretty vague description, so...
>
> From the previous descriptions, it sounds like there might be ubik sync 
> issues.

That's not what I was referring to. I think it's between ubik database
reads and the clients.

> That could be caused either by problems in ubik, or unrelated problems
> that cause server crashes.  The reports do not include notes on any resulting
> core dumps, and the ubik problem reports clearly indicate another serious
> problem with server address determination.
>
> I experimented with building a version of 1.2.11, running it and using some
> of the diagnostic tools, followed by trying to run the resulting database with
> 1.4.12.  I certainly didn't thoroughly explore things.  I now have an 
> interesting
> list of "problems".
>
> /1/ ubik_hdr.size got changed to be a short, not a long.  ntohl is wrong.  
> This
>        is in ubik proper as well as kaserver diagnostics.  Fortunately, this
>        doesn't seem to break too much.
> /2/ udebug address output byte swap issues.  Previously mentioned as fixed.
> /3/ kadb_check complains about a lot of stuff, and the output does not
>        make much sense.  A lot of this looks like endian issues, but
>        also I think this tool probably started as a temporary hack and
>        never well cleaned up.  The output was probably never really
>        'clean" in the first place.
> /4/ I never got kaserver to core dump (granted, I'm not pushing it real hard.)
>
> I think at least in some basic way, the kaserver in 1.4.12 still "works".
> So I am still curious as to what Derrick meant by "broke".
>
> possible generic action items,
> /1/ fix uhdr.size usage issues. (ntohs/htons not ntohl/htonl).
> /2/ fix kadb_check to produce correct output.  Should match on little
>        and big-endian machines.
> /3/ fix kadb_check to produce "better" output?
>
> For Atro Tossavainen, I think my recommendations are:
> /1/ can he only run one source version of kaserver on all db hosts (not a
>        mixed ibm/openafs env),
> /2/ can he resolve the server setup such that when udebug is
>        run, it only reports "correct" IP addresses?  (Ideally only
>        the primary, but the other interfaces should be ok so long
>        as packets sent through them get to the same place.)
> /3/ can he resolve time so that he never sees "last beacon sent -3 secs ago"?,
>        ubik does care, even more than kerberos, about time.
> /4/ can he resolve his keyfile reference such that he never gets
>        "unknown key version number"?
>        (My suspicion, he's got path issues between differently built 
> binaries.)

no, because i suspect 4 is the "real issue"


-- 
Derrick
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to