Alternatively you can schedule a cron job to do kinit every 20 hours or so. 
Just to renew token before it expires. 

—
Sent from Mailbox for iPad

On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke <[email protected]>
wrote:

> Hello,
> I am trying to understand why my long-running mapreduce jobs stop after 24 
> hours (approx) on a secure cluster.
> This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), 
> authentication specified as "kerberos". Trying with a short-lived Kerberos 
> ticket (1h) I see that it gets renewed regularly. Still, the job crashes 
> after 24 hours because the delegation token expires.
> On a test cluster with increased logging and shortened 
> dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see 
> that an immediate renew of the delegation token fails, and then after the 
> expiry period the Namenode log starts getting clobbered.
> Detail:
> 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for [email protected] (auth:TOKEN)
> 2013-12-02 15:57:08,462 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for [email protected] (auth:TOKEN) for 
> protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for mapred/[email protected] (auth:SIMPLE)
> 2013-12-02 15:57:08,540 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for mapred/[email protected] (auth:KERBEROS) for 
> protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,541 INFO 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for 
> tobbicke
> 2013-12-02 15:57:08,541 ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:mapred/[email protected] (auth:KERBEROS) 
> cause:org.apache.hadoop.security.AccessControlException: Client mapred tries 
> to renew a token with renewer specified as nobody
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 9000, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 
> 188.184.xxx.xxx:42031: error: 
> org.apache.hadoop.security.AccessControlException: Client mapred tries to 
> renew a token with renewer specified as nobody
> org.apache.hadoop.security.AccessControlException: Client mapred tries to 
> renew a token with renewer specified as nobody
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
> Is this as unhealthy as it looks? If the first (immediate) renewal fails I 
> assume others will share the same fate. Would that explain the 
> 24-hour-lifetime on the "real" cluster and what could be the reason? How does 
> "nobody" come into the game here?
> In any case, linked to this or not, after 
> dfs.namenode.delegation.token.renew-interval ms the following is logged a 
> zillion times:
> 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring 
> password)
> 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 9000: readAndProcess threw exception 
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
> (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 
> 188.184.xxx.xxx. Count of bytes read: 0
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
> (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
>         at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
>         at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
>         at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
> (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
>         at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
>         at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
>         ... 6 more
> Any ideas?
> Rainer

Reply via email to