It just seems like lazy code. You can see that, later, there is this:

{code}

        for(Token<?> token : UserGroupInformation.getCurrentUser().getTokens()) 
{
          childUGI.addToken(token);
        }

{code}

So eventually the JobToken is getting added to the UGI which runs task-code.

>  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server handler 63 
> on 9000): No groups available for user job_201401071758_0002

This seems to be a problem. When the task tries to reach the NameNode, it 
should do so as the user, not the job-id. It is not just logging, I'd be 
surprised if jobs pass. Do you have permissions enabled on HDFS?

Oh, or is this in non-secure mode (i.e. without kerberos)?

+Vinod


On Jan 7, 2014, at 5:14 PM, Jian Fang <jian.fang.subscr...@gmail.com> wrote:

> Hi,
> 
> I looked at Hadoop 1.X source code and found some logic that I could not 
> understand. 
> 
> In the org.apache.hadoop.mapred.Child class, there were two UGIs defined as 
> follows.
> 
>     UserGroupInformation current = UserGroupInformation.getCurrentUser();
>     current.addToken(jt);
> 
>     UserGroupInformation taskOwner 
>      = 
> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
>     taskOwner.addToken(jt);
> 
> But it is the taskOwner that is actually passed as a UGI to task tracker and 
> then to HDFS. The first one was not referenced any where.
> 
>     final TaskUmbilicalProtocol umbilical = 
>       taskOwner.doAs(new PrivilegedExceptionAction<TaskUmbilicalProtocol>() {
>         @Override
>         public TaskUmbilicalProtocol run() throws Exception {
>           return 
> (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
>               TaskUmbilicalProtocol.versionID,
>               address,
>               defaultConf);
>         }
>     });
> 
> What puzzled me is that the job id is actually passed in as the user name to 
> task tracker. On the Name node side, when it tries to map the non-existing 
> user name, i.e., task id, to a group, it always returns empty array. As a 
> result, we always see annoying warning messages such as
> 
>  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server handler 63 
> on 9000): No groups available for user job_201401071758_0002
> 
> Sometimes, the warning messages were thrown so fast, hundreds or even 
> thousands per second for a big cluster, the system performance was degraded 
> dramatically. 
> 
> Could someone please explain why this logic was designed in this way? Any 
> benefit to use non-existing user for the group mapping? Or is this a bug?
> 
> Thanks in advance,
> 
> John


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to