Thanks, just checked that and it does seem renewable (tested using kinit -R). I'm running my code in two separate scenarios:
1) As part of a NiFi processor, which currently makes multiple Accumulo connections using the same keytab, each of which currently has a separate renewer thread 2) As part of a simple command line application - this seems to have no problem running for > 10 hours (even before I added the periodic renewal code) Will add extra logging to #2 and try to shorten the expiry from 10 hours to 1 so I can see any difference in output. James On 13 July 2017 at 16:05, Josh Elser <[email protected]> wrote: > It also may be worth mentioning to check the principal's configuration that > you're using in your client. Depending on which you're using and how it was > created, it may not actually support renewals. > > A quick test is to just `kinit` and then `kinit -R`. You can view the > explicit "configuration" for a principal using the `kadmin` console and the > `getprinc <principal>` command. Be sure to check the krbtgt/<REALM> > principal as well: > > e.g. > > kadmin.local: getprinc jelser > Principal: [email protected] > Maximum ticket life: 1 day 00:00:00 > Maximum renewable life: 7 days 00:00:00 > > kadmin.local: getprinc krbtgt/EXAMPLE.COM > Principal: krbtgt/[email protected] > Maximum ticket life: 1 day 00:00:00 > Maximum renewable life: 7 days 00:00:00 > > If the krbtgt/$REALM principal does not have a non-zero renewable lifetime, > any other principals created in that realm would also not be allowed to be > renewed. Since you have the working "service" principals, you can > cross-check those. > > On 7/13/17 10:56 AM, James Srinivasan wrote: >> >> Yup, I am indeed on HDP - thanks for the link. The services do log GSS >> exceptions every ten hours, but seem to sufficiently recover >> themselves. Having turned up logging on my client: >> >> 1) On client start, I see hadoop login messages >> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take >> place, I don't see any hadoop login messages >> 3) After 10 hours, I see GSS exceptions >> 4) After each GSS exception, I see an attempt to renew but using >> ticket cache, rather than keytab. >> >> Currently working on shortening the 10 hour expiry time so I can catch >> it in a debugger! >> >> Thanks, >> >> James >> >> >> On 13 July 2017 at 15:20, Josh Elser <[email protected]> wrote: >>> >>> If you're using Hortonworks' HDP, you would probably benefit from >>> https://github.com/hortonworks/accumulo >>> >>> There is likely a git-tag for the exact version that you're running. The >>> line numbers would match there. >>> >>> To be clear, if your services (e.g. TabletServers) aren't failing after >>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited) >>> understanding, your problem is purely client-side. It's possible that the >>> client-side RPC implementation isn't correctly handling the ticket >>> re-login, >>> but I know there is specifically code in there to handle the re-login >>> case. >>> >>> The next step would be getting some debug logging from your application >>> around UserGroupInformation or the JDK itself, or just spin up a trivial >>> example with a small relogin window to reproduce the problem. >>> >>> On 7/12/17 3:48 PM, James Srinivasan wrote: >>>> >>>> >>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to >>>> see if it behaves any differently. There is at least one patch >>>> included in their distro that isn't in the formal documentation, plus >>>> it makes matching line numbers in logs to src code rather difficult. >>>> >>>> Thanks, >>>> >>>> James >>>> >>>> On 12 July 2017 at 20:37, Sean Busbey <[email protected]> wrote: >>>>> >>>>> >>>>> Hi James! >>>>> >>>>> It sounds like you may need to chase things down with your vendor, >>>>> since the precise combination of patches included will make looking at >>>>> things hard for the community. >>>>> >>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan >>>>> <[email protected]> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> So I've fired off a thread to perform the periodic >>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the >>>>>> connection still fails with GSS errors after precisely 10 hours. >>>>>> >>>>>> While I am running 1.7.0, it seems the vendor included the >>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I >>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login". >>>>>> However, it should be using a keytab - have turned up the logging to >>>>>> 11 and will leave running overnight... >>>>>> >>>>>> James >>>>>> >>>>>> On 11 July 2017 at 16:17, Josh Elser <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> Nope, you've got it exactly right! That's the code I would've pointed >>>>>>> you at >>>>>>> to copy :) >>>>>>> >>>>>>> If/when you do get to long-running MR jobs, see the >>>>>>> "general.delegation.token.*" configuration properties in this >>>>>>> table[1]. >>>>>>> I >>>>>>> think the docs are citing that one delegation token is valid for 7 >>>>>>> days, but >>>>>>> it's been a long time since writing/testing that code. >>>>>>> >>>>>>> - Josh >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2 >>>>>>> >>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised >>>>>>>> a >>>>>>>> support case with our Hadoop distribution vendor. >>>>>>>> >>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll >>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just >>>>>>>> need to periodically call >>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121 >>>>>>>> >>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> James >>>>>>>> >>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[email protected]> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've >>>>>>>>> described sounds like your client's ticket expired. Accumulo does >>>>>>>>> not >>>>>>>>> spawn any ticket renewal on the behalf of clients. >>>>>>>>> >>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when >>>>>>>>> you >>>>>>>>> log in using a ticket cache. This does not happen automatically >>>>>>>>> when >>>>>>>>> you use a keytab (I have no explanation as to why this is). This is >>>>>>>>> the most likely cause of your error and something you need to >>>>>>>>> correct >>>>>>>>> in your application (spawn a thread to renew your application's >>>>>>>>> ticket). >>>>>>>>> >>>>>>>>> If you are using MapReduce, you have yet another layer of >>>>>>>>> indirection >>>>>>>>> with DelegationTokens, but that's probably not what you're seeing >>>>>>>>> (as >>>>>>>>> DelegationTokens don't actually have a Kerberos TGT). >>>>>>>>> >>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading >>>>>>>>>> to >>>>>>>>>> the >>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the >>>>>>>>>> bugs >>>>>>>>>> we've found and fixed in that release line. >>>>>>>>>> >>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of >>>>>>>>>>> time >>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails >>>>>>>>>>> with >>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is >>>>>>>>>>> that >>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed - >>>>>>>>>>> from >>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my >>>>>>>>>>> client, >>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I >>>>>>>>>>> am >>>>>>>>>>> hitting this bug? >>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069 >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> James >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> busbey
