[jira] [Commented] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969524#comment-15969524
 ] 

Michael Gummelt commented on MAPREDUCE-6876:


Yea, I really mean {{getSplits}}.

And that proposal sounds perfect.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969458#comment-15969458
 ] 

Michael Gummelt commented on MAPREDUCE-6876:


bq. The job submitting code does not know where the input lives nor how to grab 
tokens for it – that's the responsibility of the input format.

That's fine, but it should be factored out into a separate method that the job 
submission code can then delegate to.  {{listStatus}} does not require 
delegation tokens, so it shouldn't fetch delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969374#comment-15969374
 ] 

Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:48 PM:
-

bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just fetching split information.  It doesn't 
create tasks.  So it shouldn't need to fetch delegation tokens.  That should be 
the responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

I'm not saying that delegation tokens aren't eventually needed for MapReduce 
jobs, it's just that this seems like the wrong place to fetch them.


was (Author: mgummelt):
bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just fetching split information.  It doesn't 
create tasks.  So it shouldn't need to fetch delegation tokens.  That should be 
the responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969374#comment-15969374
 ] 

Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:47 PM:
-

bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just fetching split information.  It doesn't 
create tasks.  So it shouldn't need to fetch delegation tokens.  That should be 
the responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.


was (Author: mgummelt):
bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969374#comment-15969374
 ] 

Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:42 PM:
-

bq. The input format must obtain the necessary tokens for the tasks to be able 
to access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.


was (Author: mgummelt):
> The input format must obtain the necessary tokens for the tasks to be able to 
> access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969374#comment-15969374
 ] 

Michael Gummelt commented on MAPREDUCE-6876:


> The input format must obtain the necessary tokens for the tasks to be able to 
> access the input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just return split information.  It don't create 
tasks.  So it shouldn't need to fetch delegation tokens.  That should be the 
responsibility of the job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to 
fetch split information, such as we do in Spark, wouldn't need to fetch 
delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -
>
> Key: MAPREDUCE-6876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
> another process.  This is causing issues described in the attached Spark 
> Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is 
> used to fetch the delegation tokens, assumes that certain MapReduce 
> configuration variables are set, which isn't true in the Spark calling code.  
> This is a separate problem, but nonetheless it wouldn't have arisen if 
> {{listStatus}} weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens

2017-04-13 Thread Michael Gummelt (JIRA)
Michael Gummelt created MAPREDUCE-6876:
--

 Summary: FileInputFormat.listStatus should not fetch delegation 
tokens
 Key: MAPREDUCE-6876
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Michael Gummelt


{{FileInputFormat.listStatus}} fetches delegation tokens: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213

AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to 
another process.  This is causing issues described in the attached Spark 
Kerberos ticket, because {{TokenCache.obtainTokensForNameNodes}}, which is used 
to fetch the delegation tokens, assumes that certain MapReduce configuration 
variables are set, which isn't true in the Spark calling code.  This is a 
separate problem, but nonetheless it wouldn't have arisen if {{listStatus}} 
weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org