[ 
https://issues.apache.org/jira/browse/YARN-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780172#comment-16780172
 ] 

Tarun Parimi commented on YARN-9224:
------------------------------------

[~rohithsharma], attached a new patch which synchronizes the block in 
getCachedTimelineClient method . I tested in my local cluster where I submitted 
around 10 Distributed Shell jobs/sec. Haven't faced any errors so far. I also 
tested with the following sample test code which tries to call the renew for 
multiple ugi's in separate threads. 

{code:java}
UserGroupInformation.loginUserFromKeytab("[email protected]", 
"/etc/security/keytabs/proxyuser.headless.keytab");
    final Configuration conf = new Configuration();
    UserGroupInformation proxyuser = UserGroupInformation.getCurrentUser();
    TimelineClient client = TimelineClient.createTimelineClient();
    client.init(conf);
    client.start();
    Set<String> userSet = getUsers(numberOfUsers);
    final Map<UserGroupInformation,Token<TimelineDelegationTokenIdentifier>> 
tokens = new HashMap<>();
    for(String renewer : userSet) {
      tokens.put(UserGroupInformation
          .createProxyUser(renewer, proxyuser), 
client.getDelegationToken(renewer));
    }
    ExecutorService service = Executors.newFixedThreadPool(numberOfUsers);
    final CountDownLatch latch = new CountDownLatch(1);
    for(int i=0; i<2;i++) {
      for (final UserGroupInformation renewerUgi : tokens.keySet()) {
        service.submit(new Runnable() {
          @Override public void run() {
            try {
              latch.await();
              renewerUgi.doAs(new PrivilegedExceptionAction<Long>() {
                @Override public Long run() throws Exception {
                  return tokens.get(renewerUgi).renew(conf);
                }
              });
            } catch (Exception e) {
              throw new RuntimeException(e);
            }

          }
        });


      }
    }
    latch.countDown();
    service.shutdown();
    service.awaitTermination(1000, TimeUnit.SECONDS);
    client.stop();
{code}



> TimelineDelegationTokenIdentifier.Renewer contacts KDC for every renew/cancel 
> token operation
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-9224
>                 URL: https://issues.apache.org/jira/browse/YARN-9224
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.7.3
>            Reporter: Tarun Parimi
>            Priority: Major
>         Attachments: YARN-9224.001.patch, YARN-9224.002.patch, 
> YARN-9224.003.patch
>
>
> In a production cluster, we have observed the active RM principal making 
> excessive requests to the KDC server. Being a service principal, this 
> shouldn't be the case normally.
> On capturing tcpdump for the connections between RM and KDC, we saw that 
> these excessive requests were for the SPNEGO serviceĀ 
> HTTP/ats-host.example.com .
> The requests were also matching in frequency with the below log entry in RM.
> {code:java}
> 2019-01-09T03:41:56.048-0500 INFO 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl: Timeline service 
> address: http://ats-host.example.com:8188/ws/v1/timeline/ 
> {code}
> On looking at the code in TimelineDelegationTokenIdentifier.java, it seems 
> this kdc request for SPNEGO is done as we are creating a new timeline client 
> instance every time.
> {code:java}
> @SuppressWarnings("unchecked")
>     @Override
>     public long renew(Token<?> token, Configuration conf) throws IOException,
>         InterruptedException {
>       TimelineClient client = TimelineClient.createTimelineClient();
>       try {
>         client.init(conf);
>         client.start();
>         return client.renewDelegationToken(
>             (Token<TimelineDelegationTokenIdentifier>) token);
>       } catch (YarnException e) {
>         throw new IOException(e);
>       } finally {
>         client.stop();
>       }
>     }
>     @SuppressWarnings("unchecked")
>     @Override
>     public void cancel(Token<?> token, Configuration conf) throws IOException,
>         InterruptedException {
>       TimelineClient client = TimelineClient.createTimelineClient();
>       try {
>         client.init(conf);
>         client.start();
>         client.cancelDelegationToken(
>             (Token<TimelineDelegationTokenIdentifier>) token);
>       } catch (YarnException e) {
>         throw new IOException(e);
>       } finally {
>         client.stop();
>       }
>     }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to