[jira] [Commented] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350017#comment-15350017
 ] 

zhihai xu commented on MAPREDUCE-6727:
--

I uploaded a patch MAPREDUCE-6727.000.patch which add a configuration 
"mapreduce.job.input.size.limit" to limit the input size of the MapReduce job. 
The default value is -1 which means no limit.

> Add a configuration to limit the input size of the MapReduce job.
> -
>
> Key: MAPREDUCE-6727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6727.000.patch
>
>
> Add a configuration to limit the input size of the MapReduce job. It will be 
> useful for Hadoop admin to save Hadoop cluster resource by preventing users 
> from submitting bad mapreduce jobs or bad hive queries. The default behavior 
> is no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6727:
-
Status: Patch Available  (was: Open)

> Add a configuration to limit the input size of the MapReduce job.
> -
>
> Key: MAPREDUCE-6727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6727.000.patch
>
>
> Add a configuration to limit the input size of the MapReduce job. It will be 
> useful for Hadoop admin to save Hadoop cluster resource by preventing users 
> from submitting bad mapreduce jobs or bad hive queries. The default behavior 
> is no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6727:
-
Attachment: MAPREDUCE-6727.000.patch

> Add a configuration to limit the input size of the MapReduce job.
> -
>
> Key: MAPREDUCE-6727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6727.000.patch
>
>
> Add a configuration to limit the input size of the MapReduce job. It will be 
> useful for Hadoop admin to save Hadoop cluster resource by preventing users 
> from submitting bad mapreduce jobs or bad hive queries. The default behavior 
> is no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6727:
-
Attachment: (was: MAPREDUCE-6727.000.patch)

> Add a configuration to limit the input size of the MapReduce job.
> -
>
> Key: MAPREDUCE-6727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Add a configuration to limit the input size of the MapReduce job. It will be 
> useful for Hadoop admin to save Hadoop cluster resource by preventing users 
> from submitting bad mapreduce jobs or bad hive queries. The default behavior 
> is no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6727:
-
Attachment: MAPREDUCE-6727.000.patch

> Add a configuration to limit the input size of the MapReduce job.
> -
>
> Key: MAPREDUCE-6727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6727.000.patch
>
>
> Add a configuration to limit the input size of the MapReduce job. It will be 
> useful for Hadoop admin to save Hadoop cluster resource by preventing users 
> from submitting bad mapreduce jobs or bad hive queries. The default behavior 
> is no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.

2016-06-26 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6727:


 Summary: Add a configuration to limit the input size of the 
MapReduce job.
 Key: MAPREDUCE-6727
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 2.8.0
Reporter: zhihai xu
Assignee: zhihai xu


Add a configuration to limit the input size of the MapReduce job. It will be 
useful for Hadoop admin to save Hadoop cluster resource by preventing users 
from submitting bad mapreduce jobs or bad hive queries. The default behavior is 
no limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292646#comment-15292646
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

[~jianhe] thanks for reviewing and committing the patch!

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291961#comment-15291961
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

Also it looks like the check style issue is not a real issue. because all 
configuration definitions in MRJobConfig use "public static final", this patch 
follows the same rule. I don't why check style script reported this issue.

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.004.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291366#comment-15291366
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

The test failures are not related to my change. It is already reported at 
https://issues.apache.org/jira/browse/MAPREDUCE-6702

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290497#comment-15290497
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

Thanks [~jianhe]! These are good suggestions. I uploaded a new patch 
MAPREDUCE-6696.003.patch which addressed all your comments, Please review it 
thanks.

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.003.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288083#comment-15288083
 ] 

zhihai xu edited comment on MAPREDUCE-6696 at 5/18/16 2:23 AM:
---

Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was 
the very initial code for the task limit, but this will happen at AM so it 
still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is 
giving a hint about but my patch is based on InputFormat.getSplits, which will 
exactly match the number of mappers of the MapReduce Job:
{code}
   LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
  int maps = writeSplits(job, submitJobDir);
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
  LOG.info("number of splits:" + maps);
{code} 
writeSplits will call InputFormat.getSplits and writeJobSplitMetaInfo to create 
file "job.splitmetainfo". JobImpl called by AM will read  file 
"job.splitmetainfo" by calling createSplits and readSplitMetaInfo to get input 
split info.
{code}
   /** 
   * Logically split the set of input files for the job.  
   * 
   * Each {@link InputSplit} is then assigned to an individual {@link Mapper}
   * for processing.
   *
   * Note: The split is a logical split of the inputs and the
   * input files are not physically split into chunks. For e.g. a split could
   * be input-file-path, start, offset tuple.
   * 
   * @param job job configuration.
   * @param numSplits the desired number of splits, a hint.
   * @return an array of {@link InputSplit}s for the job.
   */
  InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
{code}
My patch will reject the job during submission, which can save AM container 
resource.


was (Author: zxu):
Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was 
the very initial code for the task limit, but this will happen at AM so it 
still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is 
giving a hint about but my patch is based on InputFormat.getSplits, which will 
exactly match the number of mappers of the MapReduce Job:
{code}
   LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
  int maps = writeSplits(job, submitJobDir);
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
  LOG.info("number of splits:" + maps);
{code} 
writeSplits will call InputFormat.getSplits. 
{code}
   /** 
   * Logically split the set of input files for the job.  
   * 
   * Each {@link InputSplit} is then assigned to an individual {@link Mapper}
   * for processing.
   *
   * Note: The split is a logical split of the inputs and the
   * input files are not physically split into chunks. For e.g. a split could
   * be input-file-path, start, offset tuple.
   * 
   * @param job job configuration.
   * @param numSplits the desired number of splits, a hint.
   * @return an array of {@link InputSplit}s for the job.
   */
  InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
{code}
My patch will reject the job during submission, which can save AM container 
resource.

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288083#comment-15288083
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was 
the very initial code for the task limit, but this will happen at AM so it 
still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is 
giving a hint about but my patch is based on InputFormat.getSplits, which will 
exactly match the number of mappers of the MapReduce Job:
{code}
   LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
  int maps = writeSplits(job, submitJobDir);
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
  LOG.info("number of splits:" + maps);
{code} 
writeSplits will call InputFormat.getSplits. 
{code}
   /** 
   * Logically split the set of input files for the job.  
   * 
   * Each {@link InputSplit} is then assigned to an individual {@link Mapper}
   * for processing.
   *
   * Note: The split is a logical split of the inputs and the
   * input files are not physically split into chunks. For e.g. a split could
   * be input-file-path, start, offset tuple.
   * 
   * @param job job configuration.
   * @param numSplits the desired number of splits, a hint.
   * @return an array of {@link InputSplit}s for the job.
   */
  InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
{code}
My patch will reject the job during submission, which can save AM container 
resource.

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Status: Open  (was: Patch Available)

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Status: Patch Available  (was: Open)

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287387#comment-15287387
 ] 

zhihai xu edited comment on MAPREDUCE-6696 at 5/17/16 7:50 PM:
---

All these test failures are not related to my changes:

TestMRCJCFileOutputCommitter is passed in my local build also the failure for 
TestMRCJCFileOutputCommitter is due to test environment problem:
{code}
2016-05-17 17:29:33,792 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable.
{code}
TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at 
launchContainer which already pass job submission phase after my code change.
{code}
2016-05-17 17:45:48,781 WARN  [ContainersLauncher #1] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(245)) - Exception from 
container-launch with container ID: container_1463507138005_0001_01_02 and 
exit code: 127
ExitCodeException exitCode=127: nice: bash: No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:946)
at org.apache.hadoop.util.Shell.run(Shell.java:850)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: zxu):
All these test failures are not related to my changes:

TestMRCJCFileOutputCommitter is passed in my local build also the failure for 
TestMRCJCFileOutputCommitter is due to test environment problem:
2016-05-17 17:29:33,792 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable.

TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at 
launchContainer which already pass job submission phase after my code change.
2016-05-17 17:45:48,781 WARN  [ContainersLauncher #1] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(245)) - Exception from 
container-launch with container ID: container_1463507138005_0001_01_02 and 
exit code: 127
ExitCodeException exitCode=127: nice: bash: No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:946)
at org.apache.hadoop.util.Shell.run(Shell.java:850)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> 

[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.002.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287387#comment-15287387
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

All these test failures are not related to my changes:

TestMRCJCFileOutputCommitter is passed in my local build also the failure for 
TestMRCJCFileOutputCommitter is due to test environment problem:
2016-05-17 17:29:33,792 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable.

TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at 
launchContainer which already pass job submission phase after my code change.
2016-05-17 17:45:48,781 WARN  [ContainersLauncher #1] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(245)) - Exception from 
container-launch with container ID: container_1463507138005_0001_01_02 and 
exit code: 127
ExitCodeException exitCode=127: nice: bash: No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:946)
at org.apache.hadoop.util.Shell.run(Shell.java:850)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.001.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: (was: MAPREDUCE-6696.001.patch)

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.001.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: (was: MAPREDUCE-6696.001.patch)

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.001.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned MAPREDUCE-6696:


Assignee: zhihai xu

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284250#comment-15284250
 ] 

zhihai xu commented on MAPREDUCE-6696:
--

I attached a patch MAPREDUCE-6696.000.patch, which added a configuration 
"mapreduce.job.max.map" to limit the number of map tasks. The default value is 
-1, which means no limit.

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Status: Patch Available  (was: Open)

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6696:
-
Attachment: MAPREDUCE-6696.000.patch

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-16 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6696:


 Summary: Add a configuration to limit the number of map tasks 
allowed per job.
 Key: MAPREDUCE-6696
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 2.8.0
Reporter: zhihai xu


Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
job with too many mappers may fail with OOM after running for long time. It 
will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6685) LocalDistributedCacheManager can have overlapping filenames

2016-04-26 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258854#comment-15258854
 ] 

zhihai xu commented on MAPREDUCE-6685:
--

Is this issue same as MAPREDUCE-6441?

> LocalDistributedCacheManager can have overlapping filenames
> ---
>
> Key: MAPREDUCE-6685
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6685
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: MAPREDUCE-6685.001.patch, MAPREDUCE-6685.002.patch
>
>
> LocalDistributedCacheManager has this setup:
> bq. AtomicLong uniqueNumberGenerator = new 
> AtomicLong(System.currentTimeMillis());
> to create this temporary filename:
> bq. new FSDownload(localFSFileContext, ugi, conf, new Path(destPath,  
> Long.toString(uniqueNumberGenerator.incrementAndGet())), resource);
> when using LocalJobRunner.  When two or more start on the same machine, then 
> it's possible to end up having the same timestamp or a large enough overlap 
> that two successive timestamps may not be sufficiently far apart.
> Given the assumptions:
> 1) Assume timestamp is the same. Then the most common starting random seed 
> will be the same.
> 2) Process ID will very likely be unique, but will likely be close in value.
> 3) Thread ID is not guaranteed to be unique.
> A unique ID based on PID as a seed (in addition to the timestamp) should be a 
> better unique identifier for temporary filenames.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182528#comment-15182528
 ] 

zhihai xu commented on MAPREDUCE-6622:
--

Committed it to 2.8 also.

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6622:
-
Fix Version/s: 2.8.0

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6622:
-
Fix Version/s: 2.6.5
   2.7.3

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.7.3, 2.9.0, 2.6.5
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182518#comment-15182518
 ] 

zhihai xu commented on MAPREDUCE-6622:
--

Committed it to both branch 2.6 and 2.7.

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6622:
-
Target Version/s: 2.8.0, 2.7.3, 2.6.5  (was: 2.8.0)

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-03-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179080#comment-15179080
 ] 

zhihai xu commented on MAPREDUCE-6622:
--

Thanks for the confirmation [~rchiang]! I will back port the patch to 2.6.5 and 
2.7.3 branch after waiting for several days if no one objects.

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-02-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6622:
-
Priority: Critical  (was: Major)

> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit

2016-02-27 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170516#comment-15170516
 ] 

zhihai xu commented on MAPREDUCE-6622:
--

This patch also fixed a memory leak issue due to a race condition at 
{{CachedHistoryStorage.getFullJob}}, We can reproduce this memory leak issue by 
keeping refreshing the JHS web page for a job with more than 40,000 mappers 
quickly. The race condition is {{fileInfo.loadJob()}} takes long time to load 
the job with more than 4 mappers, during that time, {{fileInfo.loadJob()}} 
is called multiple times for the same job because no synchronization between 
{{loadedJobCache.get(jobId)}} and {{loadJob(fileInfo)}}. You will see the used 
heap memory quickly go up. Looked at the heap dump, we find 56 {{CompletedJob}} 
instances for the same job ID, which have total more 2 million 
mappers(56*4). Based on the link from 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader)
This won't be an issue for com.google.common.cache.LoadingCache:
{code}
If another thread is currently loading the value for this key, simply waits for 
that thread to finish and returns its loaded value
{code}
This looks like a critical issue for me. Should we backport this patch to 2.7.3 
and 2.6.5 branch?


> Add capability to set JHS job cache to a task-based limit
> -
>
> Key: MAPREDUCE-6622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, 
> MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, 
> MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, 
> MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057625#comment-15057625
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks [~djp] and [~lewuathe]! I changed it to a blocker, because it may let 
more people notice this potential performance issue.
+1 for the latest patch. Will commit it shortly.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6436:
-
Issue Type: Improvement  (was: Bug)

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6436:
-
Fix Version/s: 2.6.4
   2.7.3
   2.8.0

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6436:
-
Target Version/s: 2.7.3, 2.6.4
Priority: Blocker  (was: Major)
 Description: 
Problem: 
HistoryFileManager.addIfAbsent produces large amount of logs if number of
cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
larger than mapreduce.jobhistory.joblist.cache.size by far.

Example:
For example, if the cache contains 5 entries in total and 10,000 entries
newer than mapreduce.jobhistory.max-age-ms where
mapreduce.jobhistory.joblist.cache.size is 2, HistoryFileManager.addIfAbsent
method produces 5 - 2 = 3 lines of "Waiting to remove  from
JobListCache because it is not in done yet" message.

It will attach a stacktrace.

Impact:
In addition to large disk consumption, this issue blocks JobHistory.getJob
long time and slows job execution down significantly because getJob is called
by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
multiple threads call scanIfNeeded simultaneously, one of them acquires lock
and the other threads are blocked until the first thread completes long-running
HistoryFileManager.addIfAbsent call.

Solution: 
* Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too 
long time.
* Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
  scanning if another thread is already scanning. This changes semantics of
  some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
  because scanIfNeeded keep outdated state.
* Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
  not blocked by a loop at scale of tens of thousands.
 
This patch implemented the first item.


  was:

Problem: 
HistoryFileManager.addIfAbsent produces large amount of logs if number of
cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
larger than mapreduce.jobhistory.joblist.cache.size by far.

Example:
For example, if the cache contains 5 entries in total and 10,000 entries
newer than mapreduce.jobhistory.max-age-ms where
mapreduce.jobhistory.joblist.cache.size is 2, HistoryFileManager.addIfAbsent
method produces 5 - 2 = 3 lines of "Waiting to remove  from
JobListCache because it is not in done yet" message.

It will attach a stacktrace.

Impact:
In addition to large disk consumption, this issue blocks JobHistory.getJob
long time and slows job execution down significantly because getJob is called
by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
multiple threads call scanIfNeeded simultaneously, one of them acquires lock
and the other threads are blocked until the first thread completes long-running
HistoryFileManager.addIfAbsent call.

Solution: 
* Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too 
long time.
* Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
  scanning if another thread is already scanning. This changes semantics of
  some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
  because scanIfNeeded keep outdated state.
* Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
  not blocked by a loop at scale of tens of thousands.
 
This patch implemented the first item.



> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time 

[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6436:
-
Target Version/s: 2.8.0, 2.7.3, 2.6.4  (was: 2.7.3, 2.6.4)

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057703#comment-15057703
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Committed it to trunk, branch-2, branch-2.6 and branch-2.7! Thanks [~lewuathe] 
for the contributions! Thanks [~djp] for the additional review!

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6436:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058917#comment-15058917
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for the finding [~aw], Just know we branched out 2.8. Will commit it to 
branch-2.8 shortly.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059349#comment-15059349
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks [~lewuathe] for suggestion! There is a task 
{{MoveIntermediateToDoneRunnable}} which will call scanIntermediateDirectory 
periodically. So most time the job will be found in the cache {{jobListCache}}. 
Also making scanIfNeeded asynchronous may change the functionality in RPC 
calls: cannot find the job information which can be found before. I think about 
the other way to improve the performance which can decrease the times to call 
scanIntermediateDirectory:
In getFileInfo, add scanOldDirsForJob before scanIntermediateDirectory, which 
means calling  scanOldDirsForJob twice:
one is before scanIntermediateDirectory, the other is after 
scanIntermediateDirectory.
{code}
  public HistoryFileInfo getFileInfo(JobId jobId) throws IOException {
// FileInfo available in cache.
HistoryFileInfo fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
  return fileInfo;
}
// call scanOldDirsForJob before scanIntermediateDirectory
fileInfo = scanOldDirsForJob(jobId);
if (fileInfo != null) {
  return fileInfo;
}

// OK so scan the intermediate to be sure we did not lose it that way
scanIntermediateDirectory();
fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
  return fileInfo;
}

// Intermediate directory does not contain job. Search through older ones.
fileInfo = scanOldDirsForJob(jobId);
if (fileInfo != null) {
  return fileInfo;
}
return null;
  }
{code}


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059594#comment-15059594
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Just committed it to branch-2.8!

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055461#comment-15055461
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for updating the patch [~lewuathe]! the new patch looks good except the 
checkstyle issue.
{code}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265:
 Line is longer than 80 characters (found 97).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267:
 Line is longer than 80 characters (found 102).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268:
 Line is longer than 80 characters (found 118).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271:
 Line is longer than 80 characters (found 94).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272:
 Line is longer than 80 characters (found 114).
{code}
Could you fix the above checkstyle issue?

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055199#comment-15055199
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

[~lewuathe], thanks for working on this issue. About the patch, We don't need 
to calculate the count for the entries being removed.
Can we do all the calculations in the {{else}} section:
{code}
if(firstValue.didMoveFail() &&
firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
...
} else {
  if (firstValue.didMoveFail()) {
if (moveFailedCount == 0) {
  firstMoveFailedKey = key;
}
moveFailedCount += 1;
  } else {
if (inIntermediateCount == 0) {
  firstInIntermediateKey = key;
}
inIntermediateCount += 1;
  }
}
{code}

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6549) multibyte delimiters with LineRecordReader cause duplicate records

2015-11-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006294#comment-15006294
 ] 

zhihai xu commented on MAPREDUCE-6549:
--

Nice catch! But I think this issue is not related to MAPREDUCE-6481. Without 
MAPREDUCE-6481, this issue will still happen. Also I think the same issue may 
also happen for compressed input. The attached patch only fix the issue for 
uncompressed input.

> multibyte delimiters with LineRecordReader cause duplicate records
> --
>
> Key: MAPREDUCE-6549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.7.2
>Reporter: Dustin Cote
>Assignee: Wilfred Spiegelenburg
> Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch
>
>
> LineRecorderReader currently produces duplicate records under certain 
> scenarios such as:
> 1) input string: "abc+++def++ghi++" 
> delimiter string: "+++" 
> test passes with all sizes of the split 
> 2) input string: "abc++def+++ghi++" 
> delimiter string: "+++" 
> test fails with a split size of 4 
> 2) input string: "abc+++def++ghi++" 
> delimiter string: "++" 
> test fails with a split size of 5 
> 3) input string "abc+++defg++hij++" 
> delimiter string: "++" 
> test fails with a split size of 4 
> 4) input string "abc++def+++ghi++" 
> delimiter string: "++" 
> test fails with a split size of 9 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6535) TaskID default constructor results in NPE on toString()

2015-11-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991244#comment-14991244
 ] 

zhihai xu commented on MAPREDUCE-6535:
--

+1 to use TaskType.REDUCE as the default task type and make it compatible with 
MR1.

> TaskID default constructor results in NPE on toString()
> ---
>
> Key: MAPREDUCE-6535
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6535
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> This code will reproduce the issue:
> {code}
> new TaskAttemptID().toString();
> {code}
> The issue is that the default constructor leaves the type {{null}}.  The 
> {{get()}} in {{CharTaskTypesMaps.getRepresentingCharacter()}} then throws an 
> NPE on the null type key.
> The simplest solution would be to only call the {{get()}} on line 288 of 
> {{TaskID.java}} if {{type}} is not {{null}} and return some other literal 
> otherwise.  Since no part of the code is tripping on the NPE, what we choose 
> for the literal shouldn't matter.  How about "x"?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-24 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906493#comment-14906493
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

Committed it to branch-2 and trunk.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905405#comment-14905405
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

thanks for the review [~asuresh]! That is a good suggestion. I attached a new 
patch MAPREDUCE-6484.001.patch, which addressed your comment.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: MAPREDUCE-6484.001.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: MAPREDUCE-6484.001.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: (was: MAPREDUCE-6484.001.patch)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Hadoop Flags: Reviewed

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905827#comment-14905827
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

Thanks for the review [~asuresh]! The new patch passed jenkins. I will commit 
it tomorrow if no one objects.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877279#comment-14877279
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

Moved the JIRA from YARN to MapReduce since all the changes are in Map Reduce 
project.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876944#comment-14876944
 ] 

zhihai xu commented on MAPREDUCE-6460:
--

Committed it to branch-2 and trunk. 

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Hadoop Flags: Reviewed

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876919#comment-14876919
 ] 

zhihai xu commented on MAPREDUCE-6460:
--

Thanks for the review [~rkanter]! I will commit it shortly.

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: YARN-4187.000.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Component/s: (was: resourcemanager)
 client

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Moved] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu moved YARN-4187 to MAPREDUCE-6484:


Component/s: (was: security)
 (was: resourcemanager)
 security
 resourcemanager
Key: MAPREDUCE-6484  (was: YARN-4187)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: (was: YARN-4187.000.patch)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805125#comment-14805125
 ] 

zhihai xu commented on MAPREDUCE-6481:
--

[~jlowe], thanks for the review and committing the patch! This patch will 
depend on MAPREDUCE-5948, I can apply the patch cleanly after apply 
MAPREDUCE-5948. Shall we add both MAPREDUCE-5948 and MAPREDUCE-6481 to 2.7.2 
release?

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> 
>
> Key: MAPREDUCE-6481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
> if (appendLength > 0) {
> if (ambiguousByteCount > 0) {
>   str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>   //appending the ambiguous characters (refer case 2.2)
>   bytesConsumed += ambiguousByteCount;
>   ambiguousByteCount=0;
> }
> str.append(buffer, startPosn, appendLength);
> txtLength += appendLength;
>   }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876068#comment-14876068
 ] 

zhihai xu commented on MAPREDUCE-6481:
--

Thanks [~jlowe]! It is great that we have both MAPREDUCE-5948 and 
MAPREDUCE-6481 fixed at 2.7.2 release.

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> 
>
> Key: MAPREDUCE-6481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
> if (appendLength > 0) {
> if (ambiguousByteCount > 0) {
>   str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>   //appending the ambiguous characters (refer case 2.2)
>   bytesConsumed += ambiguousByteCount;
>   ambiguousByteCount=0;
> }
> str.append(buffer, startPosn, appendLength);
> txtLength += appendLength;
>   }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: YARN-4187.000.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: (was: YARN-4187.000.patch)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6481:
-
Status: Patch Available  (was: Open)

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> 
>
> Key: MAPREDUCE-6481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
> if (appendLength > 0) {
> if (ambiguousByteCount > 0) {
>   str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>   //appending the ambiguous characters (refer case 2.2)
>   bytesConsumed += ambiguousByteCount;
>   ambiguousByteCount=0;
> }
> str.append(buffer, startPosn, appendLength);
> txtLength += appendLength;
>   }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6481:
-
Attachment: MAPREDUCE-6481.000.patch

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> 
>
> Key: MAPREDUCE-6481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
> if (appendLength > 0) {
> if (ambiguousByteCount > 0) {
>   str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>   //appending the ambiguous characters (refer case 2.2)
>   bytesConsumed += ambiguousByteCount;
>   ambiguousByteCount=0;
> }
> str.append(buffer, startPosn, appendLength);
> txtLength += appendLength;
>   }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-17 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791624#comment-14791624
 ] 

zhihai xu commented on MAPREDUCE-6481:
--

I attached a patch MAPREDUCE-6481.000.patch which should fix both issues. I add 
several test cases in the patch to cover all these corner cases, these test 
cases will help avoid regression in the future.

> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> 
>
> Key: MAPREDUCE-6481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: MAPREDUCE-6481.000.patch
>
>
> LineRecordReader may give incomplete record and wrong position/key 
> information for uncompressed input sometimes.
> There are two issues:
> # LineRecordReader may give incomplete record: some characters cut off at the 
> end of record.
> # LineRecordReader may give wrong position/key information.
> The first issue only happens for Custom Delimiter, which is caused by the 
> following code at {{LineReader#readCustomLine}}:
> {code}
> if (appendLength > 0) {
> if (ambiguousByteCount > 0) {
>   str.append(recordDelimiterBytes, 0, ambiguousByteCount);
>   //appending the ambiguous characters (refer case 2.2)
>   bytesConsumed += ambiguousByteCount;
>   ambiguousByteCount=0;
> }
> str.append(buffer, startPosn, appendLength);
> txtLength += appendLength;
>   }
> {code}
> If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will 
> be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
> bufferSize is 10 and splitLength is 12, the correct record should be 
> "123456789a" with length 10, but we get incomplete record "123456789" with 
> length 9 from current code.
> The second issue can happen for both Custom Delimiter and Default Delimiter, 
> which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
> {{UncompressedSplitLineReader#readLine}} may report wrong size information at 
> some corner cases. The reason is {{unusedBytes}} in the following code:
> {code}
> bytesRead += unusedBytes;
> unusedBytes = bufferSize - getBufferPosn();
> bytesRead -= unusedBytes;
> {code}
> If the last bytes read (bufferLength) is less than bufferSize, the previous 
> {{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
> {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
> value.
> For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
> bufferSize is 10 and two splits:first splitLength is 15 and second 
> splitLength 4:
> the current code will give the following result:
> First record: Key:0 Value:"1234567890"
> Second record: Key:12 Value:"12"
> Third Record: Key:21 Value:"345"
> You can see the Key for the third record is wrong, it should be 16 instead of 
> 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
> first time, for the second times, it only read 5 bytes, which is 5 bytes less 
> than the bufferSize. That is why the key we get is 5 bytes larger than the 
> correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: MAPREDUCE-6460.000.patch

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-09-17 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: (was: MAPREDUCE-6460.000.patch)

> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails
> ---
>
> Key: MAPREDUCE-6460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6460.000.patch
>
>
> TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> fails with the following logs:
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.606 sec  <<< FAILURE!
> java.lang.AssertionError: Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
>   at 
> org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Results :
> Failed tests: 
>   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
> Expected exception: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
> Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6481:
-
Description: 
LineRecordReader may give incomplete record and wrong position/key information 
for uncompressed input sometimes.
There are two issues:
# LineRecordReader may give incomplete record: some characters cut off at the 
end of record.
# LineRecordReader may give wrong position/key information.

The first issue only happens for Custom Delimiter, which is caused by the 
following code at {{LineReader#readCustomLine}}:
{code}
if (appendLength > 0) {
if (ambiguousByteCount > 0) {
  str.append(recordDelimiterBytes, 0, ambiguousByteCount);
  //appending the ambiguous characters (refer case 2.2)
  bytesConsumed += ambiguousByteCount;
  ambiguousByteCount=0;
}
str.append(buffer, startPosn, appendLength);
txtLength += appendLength;
  }
{code}
If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be 
triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
bufferSize is 10 and splitLength is 12, the correct record should be 
"123456789a" with length 10, but we get incomplete record "123456789" with 
length 9 from current code.

The second issue can happen for both Custom Delimiter and Default Delimiter, 
which is caused by the code in {{UncompressedSplitLineReader#readLine}}. 
{{UncompressedSplitLineReader#readLine}} may report wrong size information at 
some corner cases. The reason is {{unusedBytes}} in the following code:
{code}
bytesRead += unusedBytes;
unusedBytes = bufferSize - getBufferPosn();
bytesRead -= unusedBytes;
{code}
If the last bytes read (bufferLength) is less than bufferSize, the previous 
{{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
{{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
value.
For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 
4:
the current code will give the following result:
First record: Key:0 Value:"1234567890"
Second record: Key:12 Value:"12"
Third Record: Key:21 Value:"345"
You can see the Key for the third record is wrong, it should be 16 instead of 
21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
first time, for the second times, it only read 5 bytes, which is 5 bytes less 
than the bufferSize. That is why the key we get is 5 bytes larger than the 
correct one.

  was:
LineRecordReader may give incomplete record and wrong position/key information 
for uncompressed input sometimes.
There are two issues:
# LineRecordReader may give incomplete record: some characters cut off at the 
end of record.
# LineRecordReader may give wrong position/key information.

The first issue only happens for Custom Delimiter, which is caused by the 
following code at {{LineReader#readCustomLine}}:
{code}
if (appendLength > 0) {
if (ambiguousByteCount > 0) {
  str.append(recordDelimiterBytes, 0, ambiguousByteCount);
  //appending the ambiguous characters (refer case 2.2)
  bytesConsumed += ambiguousByteCount;
  ambiguousByteCount=0;
}
str.append(buffer, startPosn, appendLength);
txtLength += appendLength;
  }
{code}
If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be 
triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
bufferSize is 10 and splitLength is 12, the correct record should be 
"123456789a" with length 10, but we get incomplete record "123456789" with 
length 9 from current code.

The second issue can happen for both Custom Delimiter and Default Delimiter, 
which is caused by the code in {{UncompressedSplitLineReader#readLine}}.
{{UncompressedSplitLineReader#readLine}} may report wrong size information at 
some corner cases. The reason is {{unusedBytes}} in the following code:
{code}
bytesRead += unusedBytes;
unusedBytes = bufferSize - getBufferPosn();
bytesRead -= unusedBytes;
{code}
If the last bytes read (bufferLength) is less than bufferSize, the previous 
{{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
{{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
value.
For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 
4:
the current code will give the following result:
First record: Key:0 Value:"1234567890"
Second record: Key:12 Value:"12"
Third Record: Key:21 Value:"345"
You can see the Key for the third record is wrong, it should be 16 instead of 
21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
first time, for the second times, it only read 5 bytes, which is 5 bytes less 
than the bufferSize. That 

[jira] [Created] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-16 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6481:


 Summary: LineRecordReader may give incomplete record and wrong 
position/key information for uncompressed input sometimes.
 Key: MAPREDUCE-6481
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical


LineRecordReader may give incomplete record and wrong position/key information 
for uncompressed input sometimes.
There are two issues:
# LineRecordReader may give incomplete record: some characters cut off at the 
end of record.
# LineRecordReader may give wrong position/key information.
The first issue only happens for Custom Delimiter, which is caused by the 
following code at {{LineReader#readCustomLine}}:
{code}
if (appendLength > 0) {
if (ambiguousByteCount > 0) {
  str.append(recordDelimiterBytes, 0, ambiguousByteCount);
  //appending the ambiguous characters (refer case 2.2)
  bytesConsumed += ambiguousByteCount;
  ambiguousByteCount=0;
}
str.append(buffer, startPosn, appendLength);
txtLength += appendLength;
  }
{code}
If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be 
triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
bufferSize is 10 and splitLength is 12, the correct record should be 
"123456789a" with length 10, but we get incomplete record "123456789" with 
length 9 from current code.

The second issue can happen for both Custom Delimiter and Default Delimiter, 
which is caused by the code in {{UncompressedSplitLineReader#readLine}}.
{{UncompressedSplitLineReader#readLine}} may report wrong size information at 
some corner cases. The reason is {{unusedBytes}} in the following code:
{code}
bytesRead += unusedBytes;
unusedBytes = bufferSize - getBufferPosn();
bytesRead -= unusedBytes;
{code}
If the last bytes read (bufferLength) is less than bufferSize, the previous 
{{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
{{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
value.
For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 
4:
the current code will give the following result:
First record: Key:0 Value:"1234567890"
Second record: Key:12 Value:"12"
Third Record: Key:21 Value:"345"
You can see the Key for the third record is wrong, it should be 16 instead of 
21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
first time, for the second times, it only read 5 bytes, which is 5 bytes less 
than the bufferSize. That is why the key we get is 5 bytes larger than the 
correct one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.

2015-09-16 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6481:
-
Description: 
LineRecordReader may give incomplete record and wrong position/key information 
for uncompressed input sometimes.
There are two issues:
# LineRecordReader may give incomplete record: some characters cut off at the 
end of record.
# LineRecordReader may give wrong position/key information.

The first issue only happens for Custom Delimiter, which is caused by the 
following code at {{LineReader#readCustomLine}}:
{code}
if (appendLength > 0) {
if (ambiguousByteCount > 0) {
  str.append(recordDelimiterBytes, 0, ambiguousByteCount);
  //appending the ambiguous characters (refer case 2.2)
  bytesConsumed += ambiguousByteCount;
  ambiguousByteCount=0;
}
str.append(buffer, startPosn, appendLength);
txtLength += appendLength;
  }
{code}
If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be 
triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
bufferSize is 10 and splitLength is 12, the correct record should be 
"123456789a" with length 10, but we get incomplete record "123456789" with 
length 9 from current code.

The second issue can happen for both Custom Delimiter and Default Delimiter, 
which is caused by the code in {{UncompressedSplitLineReader#readLine}}.
{{UncompressedSplitLineReader#readLine}} may report wrong size information at 
some corner cases. The reason is {{unusedBytes}} in the following code:
{code}
bytesRead += unusedBytes;
unusedBytes = bufferSize - getBufferPosn();
bytesRead -= unusedBytes;
{code}
If the last bytes read (bufferLength) is less than bufferSize, the previous 
{{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
{{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
value.
For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 
4:
the current code will give the following result:
First record: Key:0 Value:"1234567890"
Second record: Key:12 Value:"12"
Third Record: Key:21 Value:"345"
You can see the Key for the third record is wrong, it should be 16 instead of 
21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
first time, for the second times, it only read 5 bytes, which is 5 bytes less 
than the bufferSize. That is why the key we get is 5 bytes larger than the 
correct one.

  was:
LineRecordReader may give incomplete record and wrong position/key information 
for uncompressed input sometimes.
There are two issues:
# LineRecordReader may give incomplete record: some characters cut off at the 
end of record.
# LineRecordReader may give wrong position/key information.
The first issue only happens for Custom Delimiter, which is caused by the 
following code at {{LineReader#readCustomLine}}:
{code}
if (appendLength > 0) {
if (ambiguousByteCount > 0) {
  str.append(recordDelimiterBytes, 0, ambiguousByteCount);
  //appending the ambiguous characters (refer case 2.2)
  bytesConsumed += ambiguousByteCount;
  ambiguousByteCount=0;
}
str.append(buffer, startPosn, appendLength);
txtLength += appendLength;
  }
{code}
If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be 
triggered. For example, input is "123456789aab", Custom Delimiter is "ab", 
bufferSize is 10 and splitLength is 12, the correct record should be 
"123456789a" with length 10, but we get incomplete record "123456789" with 
length 9 from current code.

The second issue can happen for both Custom Delimiter and Default Delimiter, 
which is caused by the code in {{UncompressedSplitLineReader#readLine}}.
{{UncompressedSplitLineReader#readLine}} may report wrong size information at 
some corner cases. The reason is {{unusedBytes}} in the following code:
{code}
bytesRead += unusedBytes;
unusedBytes = bufferSize - getBufferPosn();
bytesRead -= unusedBytes;
{code}
If the last bytes read (bufferLength) is less than bufferSize, the previous 
{{unusedBytes}} will be wrong, which should be {{bufferLength}} - 
{{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger 
value.
For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", 
bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 
4:
the current code will give the following result:
First record: Key:0 Value:"1234567890"
Second record: Key:12 Value:"12"
Third Record: Key:21 Value:"345"
You can see the Key for the third record is wrong, it should be 16 instead of 
21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the 
first time, for the second times, it only read 5 bytes, which is 5 bytes less 
than the bufferSize. That is 

[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-28 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-28 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Fix Version/s: 2.8.0

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-28 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720443#comment-14720443
 ] 

zhihai xu commented on MAPREDUCE-6452:
--

thanks [~asuresh], [~ajithshetty] and [~mohdshahidkhan] for the review! thanks 
[~bibinchundatt] for reporting this issue! Committed it to 2.8.0 and branch-2

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-27 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717953#comment-14717953
 ] 

zhihai xu commented on MAPREDUCE-6452:
--

thanks for the review [~asuresh], [~ajithshetty] and [~mohdshahidkhan], All 
these test failures are not related to the patch. All these test passed at my 
local build.
{code}
---
 T E S T S
---
Running org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.091 sec - in 
org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

---
 T E S T S
---
Running org.apache.hadoop.mapreduce.security.TestJHSSecurity
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.402 sec - in 
org.apache.hadoop.mapreduce.security.TestJHSSecurity\
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

---
 T E S T S
---
Running org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.584 sec - in 
org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
Results :
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
{code}

If no objection, will commit it tomorrow.

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Hadoop Flags: Reviewed

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Attachment: (was: MAPREDUCE-6452.000.patch)

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Attachment: MAPREDUCE-6452.002.patch

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707800#comment-14707800
 ] 

zhihai xu commented on MAPREDUCE-6460:
--

The failure is because the test didn't wait for the app attempt unregistered 
from ApplicationMasterService (ApplicationMasterService#unregisterAttempt). The 
fix is to wait for the app entering state {{RMAppState.KILLED}} which will make 
sure {{appAttempt.masterService.unregisterAttempt(appAttemptId)}} being called. 
I uploaded the patch MAPREDUCE-6460.000.patch for review.

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6460:


 Summary: 
TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails 
with the following logs:
---
 T E S T S
---
Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec  
FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
  Time elapsed: 2.606 sec   FAILURE!
java.lang.AssertionError: Expected exception: 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
at 
org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)


Results :

Failed tests: 
  TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
Expected exception: 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException

Tests run: 24, Failures: 1, Errors: 0, Skipped: 0




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Component/s: test

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: MAPREDUCE-6460.000.patch

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: (was: MAPREDUCE-6460.000.patch)

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Status: Patch Available  (was: Open)

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: MAPREDUCE-6460.000.patch

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6440) Duplicate Key in Json Output for Job details

2015-08-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704395#comment-14704395
 ] 

zhihai xu commented on MAPREDUCE-6440:
--

Maybe change the name {{type}} to {{taskType}} because it came from 
{{TaskType.toString()}}

 Duplicate Key in Json Output for Job details
 

 Key: MAPREDUCE-6440
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6440
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Anushri
Assignee: Bibin A Chundatt
Priority: Minor

 Duplicate key in Json Output for Job details for the url : 
 http://jhs_ip:jhs_port/ws/v1/history/mapreduce/jobs/job_id/tasks/task_id/attempts
 If the task type is REDUCE the json output for this url contains duplicate 
 key for type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Attachment: MAPREDUCE-6452.000.patch

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Ajith S
 Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6452:
-
Attachment: (was: MAPREDUCE-6452.000.patch)

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Ajith S
 Attachments: MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned MAPREDUCE-6452:


Assignee: zhihai xu  (was: Ajith S)

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: zhihai xu
 Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner

2015-08-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702646#comment-14702646
 ] 

zhihai xu commented on MAPREDUCE-6452:
--

thanks [~ajithshetty] and [~mohdshahidkhan] for the clarification, I see the 
commit Fixing MR intermediate spills.: 
https://github.com/apache/hadoop/commit/6b710a42e00acca405e085724c89cda016cf7442
 
change:
{code}
private static byte[] getEncryptionKey() throws IOException {
return TokenCache. getShuffleSecretKey(UserGroupInformation.getCurrentUser()
.getCredentials());
  }
{code}
To
{code}
private static byte[] getEncryptionKey() throws IOException {
return TokenCache.getEncryptedSpillKey(UserGroupInformation.getCurrentUser()
.getCredentials());
  }
{code}

The change Fixing MR intermediate spills. is added at 2.7.1 release. But the 
stack trace from this JIRA is based on the code 2.7.1 or later
because 
{code}
at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
{code}
doesn't match 2.7.0 code base.

 NPE when intermediate encrypt enabled for LocalRunner
 -

 Key: MAPREDUCE-6452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Ajith S
 Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, 
 TestLocalJobSubmission.java


 Enable the below properties try running mapreduce job
 mapreduce.framework.name=local
 mapreduce.job.encrypted-intermediate-data=true
 {code}
 2015-08-14 16:27:25,248 WARN  [Thread-21] mapred.LocalJobRunner 
 (LocalJobRunner.java:run(561)) - job_local473843898_0001
 java.lang.Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92)
 at 
 org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31)
 at 
 org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 Jobs are failing always



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >