Thanks, just checked that and it does seem renewable (tested using
kinit -R). I'm running my code in two separate scenarios:

1) As part of a NiFi processor, which currently makes multiple
Accumulo connections using the same keytab, each of which currently
has a separate renewer thread
2) As part of a simple command line application - this seems to have
no problem running for > 10 hours (even before I added the periodic
renewal code)

Will add extra logging to #2 and try to shorten the expiry from 10
hours to 1 so I can see any difference in output.

James

On 13 July 2017 at 16:05, Josh Elser <[email protected]> wrote:
> It also may be worth mentioning to check the principal's configuration that
> you're using in your client. Depending on which you're using and how it was
> created, it may not actually support renewals.
>
> A quick test is to just `kinit` and then `kinit -R`. You can view the
> explicit "configuration" for a principal using the `kadmin` console and the
> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
> principal as well:
>
> e.g.
>
> kadmin.local:  getprinc jelser
> Principal: [email protected]
> Maximum ticket life: 1 day 00:00:00
> Maximum renewable life: 7 days 00:00:00
>
> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
> Principal: krbtgt/[email protected]
> Maximum ticket life: 1 day 00:00:00
> Maximum renewable life: 7 days 00:00:00
>
> If the krbtgt/$REALM principal does not have a non-zero renewable lifetime,
> any other principals created in that realm would also not be allowed to be
> renewed. Since you have the working "service" principals, you can
> cross-check those.
>
> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>
>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>> exceptions every ten hours, but seem to sufficiently recover
>> themselves. Having turned up logging on my client:
>>
>> 1) On client start, I see hadoop login messages
>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>> place, I don't see any hadoop login messages
>> 3) After 10 hours, I see GSS exceptions
>> 4) After each GSS exception, I see an attempt to renew but using
>> ticket cache, rather than keytab.
>>
>> Currently working on shortening the 10 hour expiry time so I can catch
>> it in a debugger!
>>
>> Thanks,
>>
>> James
>>
>>
>> On 13 July 2017 at 15:20, Josh Elser <[email protected]> wrote:
>>>
>>> If you're using Hortonworks' HDP, you would probably benefit from
>>> https://github.com/hortonworks/accumulo
>>>
>>> There is likely a git-tag for the exact version that you're running. The
>>> line numbers would match there.
>>>
>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>> understanding, your problem is purely client-side. It's possible that the
>>> client-side RPC implementation isn't correctly handling the ticket
>>> re-login,
>>> but I know there is specifically code in there to handle the re-login
>>> case.
>>>
>>> The next step would be getting some debug logging from your application
>>> around UserGroupInformation or the JDK itself, or just spin up a trivial
>>> example with a small relogin window to reproduce the problem.
>>>
>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>
>>>>
>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>> see if it behaves any differently. There is at least one patch
>>>> included in their distro that isn't in the formal documentation, plus
>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> On 12 July 2017 at 20:37, Sean Busbey <[email protected]> wrote:
>>>>>
>>>>>
>>>>> Hi James!
>>>>>
>>>>> It sounds like you may need to chase things down with your vendor,
>>>>> since the precise combination of patches included will make looking at
>>>>> things hard for the community.
>>>>>
>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So I've fired off a thread to perform the periodic
>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>
>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>> 11 and will leave running overnight...
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 11 July 2017 at 16:17, Josh Elser <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Nope, you've got it exactly right! That's the code I would've pointed
>>>>>>> you at
>>>>>>> to copy :)
>>>>>>>
>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>> table[1].
>>>>>>> I
>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>> days, but
>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>
>>>>>>> - Josh
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>
>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised
>>>>>>>> a
>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>
>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>> need to periodically call
>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>
>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>> not
>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>
>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>> you
>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>> when
>>>>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>> correct
>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>> ticket).
>>>>>>>>>
>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>> indirection
>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>> (as
>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>
>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>> to
>>>>>>>>>> the
>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the
>>>>>>>>>> bugs
>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>> time
>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>> with
>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>> that
>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed -
>>>>>>>>>>> from
>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>> client,
>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I
>>>>>>>>>>> am
>>>>>>>>>>> hitting this bug?
>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> James
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> busbey

Reply via email to