[jira] [Commented] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350017#comment-15350017 ] zhihai xu commented on MAPREDUCE-6727: -- I uploaded a patch MAPREDUCE-6727.000.patch which add a configuration "mapreduce.job.input.size.limit" to limit the input size of the MapReduce job. The default value is -1 which means no limit. > Add a configuration to limit the input size of the MapReduce job. > - > > Key: MAPREDUCE-6727 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6727.000.patch > > > Add a configuration to limit the input size of the MapReduce job. It will be > useful for Hadoop admin to save Hadoop cluster resource by preventing users > from submitting bad mapreduce jobs or bad hive queries. The default behavior > is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6727: - Status: Patch Available (was: Open) > Add a configuration to limit the input size of the MapReduce job. > - > > Key: MAPREDUCE-6727 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6727.000.patch > > > Add a configuration to limit the input size of the MapReduce job. It will be > useful for Hadoop admin to save Hadoop cluster resource by preventing users > from submitting bad mapreduce jobs or bad hive queries. The default behavior > is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6727: - Attachment: MAPREDUCE-6727.000.patch > Add a configuration to limit the input size of the MapReduce job. > - > > Key: MAPREDUCE-6727 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6727.000.patch > > > Add a configuration to limit the input size of the MapReduce job. It will be > useful for Hadoop admin to save Hadoop cluster resource by preventing users > from submitting bad mapreduce jobs or bad hive queries. The default behavior > is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6727: - Attachment: (was: MAPREDUCE-6727.000.patch) > Add a configuration to limit the input size of the MapReduce job. > - > > Key: MAPREDUCE-6727 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > > Add a configuration to limit the input size of the MapReduce job. It will be > useful for Hadoop admin to save Hadoop cluster resource by preventing users > from submitting bad mapreduce jobs or bad hive queries. The default behavior > is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6727: - Attachment: MAPREDUCE-6727.000.patch > Add a configuration to limit the input size of the MapReduce job. > - > > Key: MAPREDUCE-6727 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6727.000.patch > > > Add a configuration to limit the input size of the MapReduce job. It will be > useful for Hadoop admin to save Hadoop cluster resource by preventing users > from submitting bad mapreduce jobs or bad hive queries. The default behavior > is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6727) Add a configuration to limit the input size of the MapReduce job.
zhihai xu created MAPREDUCE-6727: Summary: Add a configuration to limit the input size of the MapReduce job. Key: MAPREDUCE-6727 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6727 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 2.8.0 Reporter: zhihai xu Assignee: zhihai xu Add a configuration to limit the input size of the MapReduce job. It will be useful for Hadoop admin to save Hadoop cluster resource by preventing users from submitting bad mapreduce jobs or bad hive queries. The default behavior is no limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292646#comment-15292646 ] zhihai xu commented on MAPREDUCE-6696: -- [~jianhe] thanks for reviewing and committing the patch! > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291961#comment-15291961 ] zhihai xu commented on MAPREDUCE-6696: -- Also it looks like the check style issue is not a real issue. because all configuration definitions in MRJobConfig use "public static final", this patch follows the same rule. I don't why check style script reported this issue. > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.004.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291366#comment-15291366 ] zhihai xu commented on MAPREDUCE-6696: -- The test failures are not related to my change. It is already reported at https://issues.apache.org/jira/browse/MAPREDUCE-6702 > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290497#comment-15290497 ] zhihai xu commented on MAPREDUCE-6696: -- Thanks [~jianhe]! These are good suggestions. I uploaded a new patch MAPREDUCE-6696.003.patch which addressed all your comments, Please review it thanks. > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.003.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288083#comment-15288083 ] zhihai xu edited comment on MAPREDUCE-6696 at 5/18/16 2:23 AM: --- Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was the very initial code for the task limit, but this will happen at AM so it still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is giving a hint about but my patch is based on InputFormat.getSplits, which will exactly match the number of mappers of the MapReduce Job: {code} LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir)); int maps = writeSplits(job, submitJobDir); conf.setInt(MRJobConfig.NUM_MAPS, maps); LOG.info("number of splits:" + maps); {code} writeSplits will call InputFormat.getSplits and writeJobSplitMetaInfo to create file "job.splitmetainfo". JobImpl called by AM will read file "job.splitmetainfo" by calling createSplits and readSplitMetaInfo to get input split info. {code} /** * Logically split the set of input files for the job. * * Each {@link InputSplit} is then assigned to an individual {@link Mapper} * for processing. * * Note: The split is a logical split of the inputs and the * input files are not physically split into chunks. For e.g. a split could * be input-file-path, start, offset tuple. * * @param job job configuration. * @param numSplits the desired number of splits, a hint. * @return an array of {@link InputSplit}s for the job. */ InputSplit[] getSplits(JobConf job, int numSplits) throws IOException; {code} My patch will reject the job during submission, which can save AM container resource. was (Author: zxu): Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was the very initial code for the task limit, but this will happen at AM so it still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is giving a hint about but my patch is based on InputFormat.getSplits, which will exactly match the number of mappers of the MapReduce Job: {code} LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir)); int maps = writeSplits(job, submitJobDir); conf.setInt(MRJobConfig.NUM_MAPS, maps); LOG.info("number of splits:" + maps); {code} writeSplits will call InputFormat.getSplits. {code} /** * Logically split the set of input files for the job. * * Each {@link InputSplit} is then assigned to an individual {@link Mapper} * for processing. * * Note: The split is a logical split of the inputs and the * input files are not physically split into chunks. For e.g. a split could * be input-file-path, start, offset tuple. * * @param job job configuration. * @param numSplits the desired number of splits, a hint. * @return an array of {@link InputSplit}s for the job. */ InputSplit[] getSplits(JobConf job, int numSplits) throws IOException; {code} My patch will reject the job during submission, which can save AM container resource. > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288083#comment-15288083 ] zhihai xu commented on MAPREDUCE-6696: -- Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was the very initial code for the task limit, but this will happen at AM so it still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is giving a hint about but my patch is based on InputFormat.getSplits, which will exactly match the number of mappers of the MapReduce Job: {code} LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir)); int maps = writeSplits(job, submitJobDir); conf.setInt(MRJobConfig.NUM_MAPS, maps); LOG.info("number of splits:" + maps); {code} writeSplits will call InputFormat.getSplits. {code} /** * Logically split the set of input files for the job. * * Each {@link InputSplit} is then assigned to an individual {@link Mapper} * for processing. * * Note: The split is a logical split of the inputs and the * input files are not physically split into chunks. For e.g. a split could * be input-file-path, start, offset tuple. * * @param job job configuration. * @param numSplits the desired number of splits, a hint. * @return an array of {@link InputSplit}s for the job. */ InputSplit[] getSplits(JobConf job, int numSplits) throws IOException; {code} My patch will reject the job during submission, which can save AM container resource. > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Status: Open (was: Patch Available) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Status: Patch Available (was: Open) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287387#comment-15287387 ] zhihai xu edited comment on MAPREDUCE-6696 at 5/17/16 7:50 PM: --- All these test failures are not related to my changes: TestMRCJCFileOutputCommitter is passed in my local build also the failure for TestMRCJCFileOutputCommitter is due to test environment problem: {code} 2016-05-17 17:29:33,792 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. {code} TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at launchContainer which already pass job submission phase after my code change. {code} 2016-05-17 17:45:48,781 WARN [ContainersLauncher #1] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(245)) - Exception from container-launch with container ID: container_1463507138005_0001_01_02 and exit code: 127 ExitCodeException exitCode=127: nice: bash: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) at org.apache.hadoop.util.Shell.run(Shell.java:850) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: zxu): All these test failures are not related to my changes: TestMRCJCFileOutputCommitter is passed in my local build also the failure for TestMRCJCFileOutputCommitter is due to test environment problem: 2016-05-17 17:29:33,792 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at launchContainer which already pass job submission phase after my code change. 2016-05-17 17:45:48,781 WARN [ContainersLauncher #1] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(245)) - Exception from container-launch with container ID: container_1463507138005_0001_01_02 and exit code: 127 ExitCodeException exitCode=127: nice: bash: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) at org.apache.hadoop.util.Shell.run(Shell.java:850) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce >
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.002.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287387#comment-15287387 ] zhihai xu commented on MAPREDUCE-6696: -- All these test failures are not related to my changes: TestMRCJCFileOutputCommitter is passed in my local build also the failure for TestMRCJCFileOutputCommitter is due to test environment problem: 2016-05-17 17:29:33,792 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. TestMiniMRChildTask and TestMiniMRChildTask.testTaskOldEnv failure happened at launchContainer which already pass job submission phase after my code change. 2016-05-17 17:45:48,781 WARN [ContainersLauncher #1] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(245)) - Exception from container-launch with container ID: container_1463507138005_0001_01_02 and exit code: 127 ExitCodeException exitCode=127: nice: bash: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) at org.apache.hadoop.util.Shell.run(Shell.java:850) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.001.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: (was: MAPREDUCE-6696.001.patch) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.001.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: (was: MAPREDUCE-6696.001.patch) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.001.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reassigned MAPREDUCE-6696: Assignee: zhihai xu > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284250#comment-15284250 ] zhihai xu commented on MAPREDUCE-6696: -- I attached a patch MAPREDUCE-6696.000.patch, which added a configuration "mapreduce.job.max.map" to limit the number of map tasks. The default value is -1, which means no limit. > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Status: Patch Available (was: Open) > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6696: - Attachment: MAPREDUCE-6696.000.patch > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu > Attachments: MAPREDUCE-6696.000.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
zhihai xu created MAPREDUCE-6696: Summary: Add a configuration to limit the number of map tasks allowed per job. Key: MAPREDUCE-6696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 2.8.0 Reporter: zhihai xu Add a configuration "mapreduce.job.max.map" to limit the number of map tasks allowed per job. It will be useful for Hadoop admin to save Hadoop cluster resource by preventing users from submitting big mapreduce jobs. A mapredeuce job with too many mappers may fail with OOM after running for long time. It will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6685) LocalDistributedCacheManager can have overlapping filenames
[ https://issues.apache.org/jira/browse/MAPREDUCE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258854#comment-15258854 ] zhihai xu commented on MAPREDUCE-6685: -- Is this issue same as MAPREDUCE-6441? > LocalDistributedCacheManager can have overlapping filenames > --- > > Key: MAPREDUCE-6685 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6685 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ray Chiang >Assignee: Ray Chiang > Attachments: MAPREDUCE-6685.001.patch, MAPREDUCE-6685.002.patch > > > LocalDistributedCacheManager has this setup: > bq. AtomicLong uniqueNumberGenerator = new > AtomicLong(System.currentTimeMillis()); > to create this temporary filename: > bq. new FSDownload(localFSFileContext, ugi, conf, new Path(destPath, > Long.toString(uniqueNumberGenerator.incrementAndGet())), resource); > when using LocalJobRunner. When two or more start on the same machine, then > it's possible to end up having the same timestamp or a large enough overlap > that two successive timestamps may not be sufficiently far apart. > Given the assumptions: > 1) Assume timestamp is the same. Then the most common starting random seed > will be the same. > 2) Process ID will very likely be unique, but will likely be close in value. > 3) Thread ID is not guaranteed to be unique. > A unique ID based on PID as a seed (in addition to the timestamp) should be a > better unique identifier for temporary filenames. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182528#comment-15182528 ] zhihai xu commented on MAPREDUCE-6622: -- Committed it to 2.8 also. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Fix Version/s: 2.8.0 > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Fix Version/s: 2.6.5 2.7.3 > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.7.3, 2.9.0, 2.6.5 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182518#comment-15182518 ] zhihai xu commented on MAPREDUCE-6622: -- Committed it to both branch 2.6 and 2.7. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Target Version/s: 2.8.0, 2.7.3, 2.6.5 (was: 2.8.0) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179080#comment-15179080 ] zhihai xu commented on MAPREDUCE-6622: -- Thanks for the confirmation [~rchiang]! I will back port the patch to 2.6.5 and 2.7.3 branch after waiting for several days if no one objects. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Priority: Critical (was: Major) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170516#comment-15170516 ] zhihai xu commented on MAPREDUCE-6622: -- This patch also fixed a memory leak issue due to a race condition at {{CachedHistoryStorage.getFullJob}}, We can reproduce this memory leak issue by keeping refreshing the JHS web page for a job with more than 40,000 mappers quickly. The race condition is {{fileInfo.loadJob()}} takes long time to load the job with more than 4 mappers, during that time, {{fileInfo.loadJob()}} is called multiple times for the same job because no synchronization between {{loadedJobCache.get(jobId)}} and {{loadJob(fileInfo)}}. You will see the used heap memory quickly go up. Looked at the heap dump, we find 56 {{CompletedJob}} instances for the same job ID, which have total more 2 million mappers(56*4). Based on the link from http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader) This won't be an issue for com.google.common.cache.LoadingCache: {code} If another thread is currently loading the value for this key, simply waits for that thread to finish and returns its loaded value {code} This looks like a critical issue for me. Should we backport this patch to 2.7.3 and 2.6.5 branch? > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057625#comment-15057625 ] zhihai xu commented on MAPREDUCE-6436: -- Thanks [~djp] and [~lewuathe]! I changed it to a blocker, because it may let more people notice this potential performance issue. +1 for the latest patch. Will commit it shortly. > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6436: - Issue Type: Improvement (was: Bug) > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6436: - Fix Version/s: 2.6.4 2.7.3 2.8.0 > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6436: - Target Version/s: 2.7.3, 2.6.4 Priority: Blocker (was: Major) Description: Problem: HistoryFileManager.addIfAbsent produces large amount of logs if number of cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes larger than mapreduce.jobhistory.joblist.cache.size by far. Example: For example, if the cache contains 5 entries in total and 10,000 entries newer than mapreduce.jobhistory.max-age-ms where mapreduce.jobhistory.joblist.cache.size is 2, HistoryFileManager.addIfAbsent method produces 5 - 2 = 3 lines of "Waiting to remove from JobListCache because it is not in done yet" message. It will attach a stacktrace. Impact: In addition to large disk consumption, this issue blocks JobHistory.getJob long time and slows job execution down significantly because getJob is called by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When multiple threads call scanIfNeeded simultaneously, one of them acquires lock and the other threads are blocked until the first thread completes long-running HistoryFileManager.addIfAbsent call. Solution: * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time. * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips scanning if another thread is already scanning. This changes semantics of some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) because scanIfNeeded keep outdated state. * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are not blocked by a loop at scale of tens of thousands. This patch implemented the first item. was: Problem: HistoryFileManager.addIfAbsent produces large amount of logs if number of cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes larger than mapreduce.jobhistory.joblist.cache.size by far. Example: For example, if the cache contains 5 entries in total and 10,000 entries newer than mapreduce.jobhistory.max-age-ms where mapreduce.jobhistory.joblist.cache.size is 2, HistoryFileManager.addIfAbsent method produces 5 - 2 = 3 lines of "Waiting to remove from JobListCache because it is not in done yet" message. It will attach a stacktrace. Impact: In addition to large disk consumption, this issue blocks JobHistory.getJob long time and slows job execution down significantly because getJob is called by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When multiple threads call scanIfNeeded simultaneously, one of them acquires lock and the other threads are blocked until the first thread completes long-running HistoryFileManager.addIfAbsent call. Solution: * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time. * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips scanning if another thread is already scanning. This changes semantics of some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) because scanIfNeeded keep outdated state. * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are not blocked by a loop at scale of tens of thousands. This patch implemented the first item. > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time
[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6436: - Target Version/s: 2.8.0, 2.7.3, 2.6.4 (was: 2.7.3, 2.6.4) > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057703#comment-15057703 ] zhihai xu commented on MAPREDUCE-6436: -- Committed it to trunk, branch-2, branch-2.6 and branch-2.7! Thanks [~lewuathe] for the contributions! Thanks [~djp] for the additional review! > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6436: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058917#comment-15058917 ] zhihai xu commented on MAPREDUCE-6436: -- Thanks for the finding [~aw], Just know we branched out 2.8. Will commit it to branch-2.8 shortly. > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059349#comment-15059349 ] zhihai xu commented on MAPREDUCE-6436: -- Thanks [~lewuathe] for suggestion! There is a task {{MoveIntermediateToDoneRunnable}} which will call scanIntermediateDirectory periodically. So most time the job will be found in the cache {{jobListCache}}. Also making scanIfNeeded asynchronous may change the functionality in RPC calls: cannot find the job information which can be found before. I think about the other way to improve the performance which can decrease the times to call scanIntermediateDirectory: In getFileInfo, add scanOldDirsForJob before scanIntermediateDirectory, which means calling scanOldDirsForJob twice: one is before scanIntermediateDirectory, the other is after scanIntermediateDirectory. {code} public HistoryFileInfo getFileInfo(JobId jobId) throws IOException { // FileInfo available in cache. HistoryFileInfo fileInfo = jobListCache.get(jobId); if (fileInfo != null) { return fileInfo; } // call scanOldDirsForJob before scanIntermediateDirectory fileInfo = scanOldDirsForJob(jobId); if (fileInfo != null) { return fileInfo; } // OK so scan the intermediate to be sure we did not lose it that way scanIntermediateDirectory(); fileInfo = jobListCache.get(jobId); if (fileInfo != null) { return fileInfo; } // Intermediate directory does not contain job. Search through older ones. fileInfo = scanOldDirsForJob(jobId); if (fileInfo != null) { return fileInfo; } return null; } {code} > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059594#comment-15059594 ] zhihai xu commented on MAPREDUCE-6436: -- Just committed it to branch-2.8! > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, > stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055461#comment-15055461 ] zhihai xu commented on MAPREDUCE-6436: -- Thanks for updating the patch [~lewuathe]! the new patch looks good except the checkstyle issue. {code} ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265: Line is longer than 80 characters (found 97). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267: Line is longer than 80 characters (found 102). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268: Line is longer than 80 characters (found 118). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271: Line is longer than 80 characters (found 94). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272: Line is longer than 80 characters (found 114). {code} Could you fix the above checkstyle issue? > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue
[ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055199#comment-15055199 ] zhihai xu commented on MAPREDUCE-6436: -- [~lewuathe], thanks for working on this issue. About the patch, We don't need to calculate the count for the entries being removed. Can we do all the calculations in the {{else}} section: {code} if(firstValue.didMoveFail() && firstValue.jobIndexInfo.getFinishTime() <= cutoff) { ... } else { if (firstValue.didMoveFail()) { if (moveFailedCount == 0) { firstMoveFailedKey = key; } moveFailedCount += 1; } else { if (inIntermediateCount == 0) { firstInIntermediateKey = key; } inIntermediateCount += 1; } } {code} > JobHistory cache issue > -- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ryu Kobayashi >Assignee: Kai Sasaki > Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, > stacktrace1.txt, stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 5 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 2, > HistoryFileManager.addIfAbsent > method produces 5 - 2 = 3 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes > long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take > too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls > are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6549) multibyte delimiters with LineRecordReader cause duplicate records
[ https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006294#comment-15006294 ] zhihai xu commented on MAPREDUCE-6549: -- Nice catch! But I think this issue is not related to MAPREDUCE-6481. Without MAPREDUCE-6481, this issue will still happen. Also I think the same issue may also happen for compressed input. The attached patch only fix the issue for uncompressed input. > multibyte delimiters with LineRecordReader cause duplicate records > -- > > Key: MAPREDUCE-6549 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.7.2 >Reporter: Dustin Cote >Assignee: Wilfred Spiegelenburg > Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch > > > LineRecorderReader currently produces duplicate records under certain > scenarios such as: > 1) input string: "abc+++def++ghi++" > delimiter string: "+++" > test passes with all sizes of the split > 2) input string: "abc++def+++ghi++" > delimiter string: "+++" > test fails with a split size of 4 > 2) input string: "abc+++def++ghi++" > delimiter string: "++" > test fails with a split size of 5 > 3) input string "abc+++defg++hij++" > delimiter string: "++" > test fails with a split size of 4 > 4) input string "abc++def+++ghi++" > delimiter string: "++" > test fails with a split size of 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6535) TaskID default constructor results in NPE on toString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991244#comment-14991244 ] zhihai xu commented on MAPREDUCE-6535: -- +1 to use TaskType.REDUCE as the default task type and make it compatible with MR1. > TaskID default constructor results in NPE on toString() > --- > > Key: MAPREDUCE-6535 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6535 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.6.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > This code will reproduce the issue: > {code} > new TaskAttemptID().toString(); > {code} > The issue is that the default constructor leaves the type {{null}}. The > {{get()}} in {{CharTaskTypesMaps.getRepresentingCharacter()}} then throws an > NPE on the null type key. > The simplest solution would be to only call the {{get()}} on line 288 of > {{TaskID.java}} if {{type}} is not {{null}} and return some other literal > otherwise. Since no part of the code is tripping on the NPE, what we choose > for the literal shouldn't matter. How about "x"? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906493#comment-14906493 ] zhihai xu commented on MAPREDUCE-6484: -- Committed it to branch-2 and trunk. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905405#comment-14905405 ] zhihai xu commented on MAPREDUCE-6484: -- thanks for the review [~asuresh]! That is a good suggestion. I attached a new patch MAPREDUCE-6484.001.patch, which addressed your comment. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: MAPREDUCE-6484.001.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: MAPREDUCE-6484.001.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: (was: MAPREDUCE-6484.001.patch) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Hadoop Flags: Reviewed > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905827#comment-14905827 ] zhihai xu commented on MAPREDUCE-6484: -- Thanks for the review [~asuresh]! The new patch passed jenkins. I will commit it tomorrow if no one objects. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877279#comment-14877279 ] zhihai xu commented on MAPREDUCE-6484: -- Moved the JIRA from YARN to MapReduce since all the changes are in Map Reduce project. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876944#comment-14876944 ] zhihai xu commented on MAPREDUCE-6460: -- Committed it to branch-2 and trunk. > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Hadoop Flags: Reviewed > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876919#comment-14876919 ] zhihai xu commented on MAPREDUCE-6460: -- Thanks for the review [~rkanter]! I will commit it shortly. > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: YARN-4187.000.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Component/s: (was: resourcemanager) client > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Moved] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu moved YARN-4187 to MAPREDUCE-6484: Component/s: (was: security) (was: resourcemanager) security resourcemanager Key: MAPREDUCE-6484 (was: YARN-4187) Project: Hadoop Map/Reduce (was: Hadoop YARN) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: (was: YARN-4187.000.patch) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805125#comment-14805125 ] zhihai xu commented on MAPREDUCE-6481: -- [~jlowe], thanks for the review and committing the patch! This patch will depend on MAPREDUCE-5948, I can apply the patch cleanly after apply MAPREDUCE-5948. Shall we add both MAPREDUCE-5948 and MAPREDUCE-6481 to 2.7.2 release? > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > > > Key: MAPREDUCE-6481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6481.000.patch > > > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > There are two issues: > # LineRecordReader may give incomplete record: some characters cut off at the > end of record. > # LineRecordReader may give wrong position/key information. > The first issue only happens for Custom Delimiter, which is caused by the > following code at {{LineReader#readCustomLine}}: > {code} > if (appendLength > 0) { > if (ambiguousByteCount > 0) { > str.append(recordDelimiterBytes, 0, ambiguousByteCount); > //appending the ambiguous characters (refer case 2.2) > bytesConsumed += ambiguousByteCount; > ambiguousByteCount=0; > } > str.append(buffer, startPosn, appendLength); > txtLength += appendLength; > } > {code} > If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will > be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", > bufferSize is 10 and splitLength is 12, the correct record should be > "123456789a" with length 10, but we get incomplete record "123456789" with > length 9 from current code. > The second issue can happen for both Custom Delimiter and Default Delimiter, > which is caused by the code in {{UncompressedSplitLineReader#readLine}}. > {{UncompressedSplitLineReader#readLine}} may report wrong size information at > some corner cases. The reason is {{unusedBytes}} in the following code: > {code} > bytesRead += unusedBytes; > unusedBytes = bufferSize - getBufferPosn(); > bytesRead -= unusedBytes; > {code} > If the last bytes read (bufferLength) is less than bufferSize, the previous > {{unusedBytes}} will be wrong, which should be {{bufferLength}} - > {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger > value. > For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", > bufferSize is 10 and two splits:first splitLength is 15 and second > splitLength 4: > the current code will give the following result: > First record: Key:0 Value:"1234567890" > Second record: Key:12 Value:"12" > Third Record: Key:21 Value:"345" > You can see the Key for the third record is wrong, it should be 16 instead of > 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the > first time, for the second times, it only read 5 bytes, which is 5 bytes less > than the bufferSize. That is why the key we get is 5 bytes larger than the > correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876068#comment-14876068 ] zhihai xu commented on MAPREDUCE-6481: -- Thanks [~jlowe]! It is great that we have both MAPREDUCE-5948 and MAPREDUCE-6481 fixed at 2.7.2 release. > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > > > Key: MAPREDUCE-6481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: MAPREDUCE-6481.000.patch > > > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > There are two issues: > # LineRecordReader may give incomplete record: some characters cut off at the > end of record. > # LineRecordReader may give wrong position/key information. > The first issue only happens for Custom Delimiter, which is caused by the > following code at {{LineReader#readCustomLine}}: > {code} > if (appendLength > 0) { > if (ambiguousByteCount > 0) { > str.append(recordDelimiterBytes, 0, ambiguousByteCount); > //appending the ambiguous characters (refer case 2.2) > bytesConsumed += ambiguousByteCount; > ambiguousByteCount=0; > } > str.append(buffer, startPosn, appendLength); > txtLength += appendLength; > } > {code} > If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will > be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", > bufferSize is 10 and splitLength is 12, the correct record should be > "123456789a" with length 10, but we get incomplete record "123456789" with > length 9 from current code. > The second issue can happen for both Custom Delimiter and Default Delimiter, > which is caused by the code in {{UncompressedSplitLineReader#readLine}}. > {{UncompressedSplitLineReader#readLine}} may report wrong size information at > some corner cases. The reason is {{unusedBytes}} in the following code: > {code} > bytesRead += unusedBytes; > unusedBytes = bufferSize - getBufferPosn(); > bytesRead -= unusedBytes; > {code} > If the last bytes read (bufferLength) is less than bufferSize, the previous > {{unusedBytes}} will be wrong, which should be {{bufferLength}} - > {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger > value. > For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", > bufferSize is 10 and two splits:first splitLength is 15 and second > splitLength 4: > the current code will give the following result: > First record: Key:0 Value:"1234567890" > Second record: Key:12 Value:"12" > Third Record: Key:21 Value:"345" > You can see the Key for the third record is wrong, it should be 16 instead of > 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the > first time, for the second times, it only read 5 bytes, which is 5 bytes less > than the bufferSize. That is why the key we get is 5 bytes larger than the > correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: YARN-4187.000.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: (was: YARN-4187.000.patch) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6481: - Status: Patch Available (was: Open) > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > > > Key: MAPREDUCE-6481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: MAPREDUCE-6481.000.patch > > > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > There are two issues: > # LineRecordReader may give incomplete record: some characters cut off at the > end of record. > # LineRecordReader may give wrong position/key information. > The first issue only happens for Custom Delimiter, which is caused by the > following code at {{LineReader#readCustomLine}}: > {code} > if (appendLength > 0) { > if (ambiguousByteCount > 0) { > str.append(recordDelimiterBytes, 0, ambiguousByteCount); > //appending the ambiguous characters (refer case 2.2) > bytesConsumed += ambiguousByteCount; > ambiguousByteCount=0; > } > str.append(buffer, startPosn, appendLength); > txtLength += appendLength; > } > {code} > If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will > be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", > bufferSize is 10 and splitLength is 12, the correct record should be > "123456789a" with length 10, but we get incomplete record "123456789" with > length 9 from current code. > The second issue can happen for both Custom Delimiter and Default Delimiter, > which is caused by the code in {{UncompressedSplitLineReader#readLine}}. > {{UncompressedSplitLineReader#readLine}} may report wrong size information at > some corner cases. The reason is {{unusedBytes}} in the following code: > {code} > bytesRead += unusedBytes; > unusedBytes = bufferSize - getBufferPosn(); > bytesRead -= unusedBytes; > {code} > If the last bytes read (bufferLength) is less than bufferSize, the previous > {{unusedBytes}} will be wrong, which should be {{bufferLength}} - > {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger > value. > For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", > bufferSize is 10 and two splits:first splitLength is 15 and second > splitLength 4: > the current code will give the following result: > First record: Key:0 Value:"1234567890" > Second record: Key:12 Value:"12" > Third Record: Key:21 Value:"345" > You can see the Key for the third record is wrong, it should be 16 instead of > 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the > first time, for the second times, it only read 5 bytes, which is 5 bytes less > than the bufferSize. That is why the key we get is 5 bytes larger than the > correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6481: - Attachment: MAPREDUCE-6481.000.patch > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > > > Key: MAPREDUCE-6481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: MAPREDUCE-6481.000.patch > > > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > There are two issues: > # LineRecordReader may give incomplete record: some characters cut off at the > end of record. > # LineRecordReader may give wrong position/key information. > The first issue only happens for Custom Delimiter, which is caused by the > following code at {{LineReader#readCustomLine}}: > {code} > if (appendLength > 0) { > if (ambiguousByteCount > 0) { > str.append(recordDelimiterBytes, 0, ambiguousByteCount); > //appending the ambiguous characters (refer case 2.2) > bytesConsumed += ambiguousByteCount; > ambiguousByteCount=0; > } > str.append(buffer, startPosn, appendLength); > txtLength += appendLength; > } > {code} > If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will > be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", > bufferSize is 10 and splitLength is 12, the correct record should be > "123456789a" with length 10, but we get incomplete record "123456789" with > length 9 from current code. > The second issue can happen for both Custom Delimiter and Default Delimiter, > which is caused by the code in {{UncompressedSplitLineReader#readLine}}. > {{UncompressedSplitLineReader#readLine}} may report wrong size information at > some corner cases. The reason is {{unusedBytes}} in the following code: > {code} > bytesRead += unusedBytes; > unusedBytes = bufferSize - getBufferPosn(); > bytesRead -= unusedBytes; > {code} > If the last bytes read (bufferLength) is less than bufferSize, the previous > {{unusedBytes}} will be wrong, which should be {{bufferLength}} - > {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger > value. > For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", > bufferSize is 10 and two splits:first splitLength is 15 and second > splitLength 4: > the current code will give the following result: > First record: Key:0 Value:"1234567890" > Second record: Key:12 Value:"12" > Third Record: Key:21 Value:"345" > You can see the Key for the third record is wrong, it should be 16 instead of > 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the > first time, for the second times, it only read 5 bytes, which is 5 bytes less > than the bufferSize. That is why the key we get is 5 bytes larger than the > correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791624#comment-14791624 ] zhihai xu commented on MAPREDUCE-6481: -- I attached a patch MAPREDUCE-6481.000.patch which should fix both issues. I add several test cases in the patch to cover all these corner cases, these test cases will help avoid regression in the future. > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > > > Key: MAPREDUCE-6481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: MAPREDUCE-6481.000.patch > > > LineRecordReader may give incomplete record and wrong position/key > information for uncompressed input sometimes. > There are two issues: > # LineRecordReader may give incomplete record: some characters cut off at the > end of record. > # LineRecordReader may give wrong position/key information. > The first issue only happens for Custom Delimiter, which is caused by the > following code at {{LineReader#readCustomLine}}: > {code} > if (appendLength > 0) { > if (ambiguousByteCount > 0) { > str.append(recordDelimiterBytes, 0, ambiguousByteCount); > //appending the ambiguous characters (refer case 2.2) > bytesConsumed += ambiguousByteCount; > ambiguousByteCount=0; > } > str.append(buffer, startPosn, appendLength); > txtLength += appendLength; > } > {code} > If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will > be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", > bufferSize is 10 and splitLength is 12, the correct record should be > "123456789a" with length 10, but we get incomplete record "123456789" with > length 9 from current code. > The second issue can happen for both Custom Delimiter and Default Delimiter, > which is caused by the code in {{UncompressedSplitLineReader#readLine}}. > {{UncompressedSplitLineReader#readLine}} may report wrong size information at > some corner cases. The reason is {{unusedBytes}} in the following code: > {code} > bytesRead += unusedBytes; > unusedBytes = bufferSize - getBufferPosn(); > bytesRead -= unusedBytes; > {code} > If the last bytes read (bufferLength) is less than bufferSize, the previous > {{unusedBytes}} will be wrong, which should be {{bufferLength}} - > {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger > value. > For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", > bufferSize is 10 and two splits:first splitLength is 15 and second > splitLength 4: > the current code will give the following result: > First record: Key:0 Value:"1234567890" > Second record: Key:12 Value:"12" > Third Record: Key:21 Value:"345" > You can see the Key for the third record is wrong, it should be 16 instead of > 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the > first time, for the second times, it only read 5 bytes, which is 5 bytes less > than the bufferSize. That is why the key we get is 5 bytes larger than the > correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Attachment: MAPREDUCE-6460.000.patch > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Attachment: (was: MAPREDUCE-6460.000.patch) > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6481: - Description: LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. There are two issues: # LineRecordReader may give incomplete record: some characters cut off at the end of record. # LineRecordReader may give wrong position/key information. The first issue only happens for Custom Delimiter, which is caused by the following code at {{LineReader#readCustomLine}}: {code} if (appendLength > 0) { if (ambiguousByteCount > 0) { str.append(recordDelimiterBytes, 0, ambiguousByteCount); //appending the ambiguous characters (refer case 2.2) bytesConsumed += ambiguousByteCount; ambiguousByteCount=0; } str.append(buffer, startPosn, appendLength); txtLength += appendLength; } {code} If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", bufferSize is 10 and splitLength is 12, the correct record should be "123456789a" with length 10, but we get incomplete record "123456789" with length 9 from current code. The second issue can happen for both Custom Delimiter and Default Delimiter, which is caused by the code in {{UncompressedSplitLineReader#readLine}}. {{UncompressedSplitLineReader#readLine}} may report wrong size information at some corner cases. The reason is {{unusedBytes}} in the following code: {code} bytesRead += unusedBytes; unusedBytes = bufferSize - getBufferPosn(); bytesRead -= unusedBytes; {code} If the last bytes read (bufferLength) is less than bufferSize, the previous {{unusedBytes}} will be wrong, which should be {{bufferLength}} - {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger value. For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 4: the current code will give the following result: First record: Key:0 Value:"1234567890" Second record: Key:12 Value:"12" Third Record: Key:21 Value:"345" You can see the Key for the third record is wrong, it should be 16 instead of 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the first time, for the second times, it only read 5 bytes, which is 5 bytes less than the bufferSize. That is why the key we get is 5 bytes larger than the correct one. was: LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. There are two issues: # LineRecordReader may give incomplete record: some characters cut off at the end of record. # LineRecordReader may give wrong position/key information. The first issue only happens for Custom Delimiter, which is caused by the following code at {{LineReader#readCustomLine}}: {code} if (appendLength > 0) { if (ambiguousByteCount > 0) { str.append(recordDelimiterBytes, 0, ambiguousByteCount); //appending the ambiguous characters (refer case 2.2) bytesConsumed += ambiguousByteCount; ambiguousByteCount=0; } str.append(buffer, startPosn, appendLength); txtLength += appendLength; } {code} If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", bufferSize is 10 and splitLength is 12, the correct record should be "123456789a" with length 10, but we get incomplete record "123456789" with length 9 from current code. The second issue can happen for both Custom Delimiter and Default Delimiter, which is caused by the code in {{UncompressedSplitLineReader#readLine}}. {{UncompressedSplitLineReader#readLine}} may report wrong size information at some corner cases. The reason is {{unusedBytes}} in the following code: {code} bytesRead += unusedBytes; unusedBytes = bufferSize - getBufferPosn(); bytesRead -= unusedBytes; {code} If the last bytes read (bufferLength) is less than bufferSize, the previous {{unusedBytes}} will be wrong, which should be {{bufferLength}} - {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger value. For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 4: the current code will give the following result: First record: Key:0 Value:"1234567890" Second record: Key:12 Value:"12" Third Record: Key:21 Value:"345" You can see the Key for the third record is wrong, it should be 16 instead of 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the first time, for the second times, it only read 5 bytes, which is 5 bytes less than the bufferSize. That
[jira] [Created] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
zhihai xu created MAPREDUCE-6481: Summary: LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. Key: MAPREDUCE-6481 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6481 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. There are two issues: # LineRecordReader may give incomplete record: some characters cut off at the end of record. # LineRecordReader may give wrong position/key information. The first issue only happens for Custom Delimiter, which is caused by the following code at {{LineReader#readCustomLine}}: {code} if (appendLength > 0) { if (ambiguousByteCount > 0) { str.append(recordDelimiterBytes, 0, ambiguousByteCount); //appending the ambiguous characters (refer case 2.2) bytesConsumed += ambiguousByteCount; ambiguousByteCount=0; } str.append(buffer, startPosn, appendLength); txtLength += appendLength; } {code} If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", bufferSize is 10 and splitLength is 12, the correct record should be "123456789a" with length 10, but we get incomplete record "123456789" with length 9 from current code. The second issue can happen for both Custom Delimiter and Default Delimiter, which is caused by the code in {{UncompressedSplitLineReader#readLine}}. {{UncompressedSplitLineReader#readLine}} may report wrong size information at some corner cases. The reason is {{unusedBytes}} in the following code: {code} bytesRead += unusedBytes; unusedBytes = bufferSize - getBufferPosn(); bytesRead -= unusedBytes; {code} If the last bytes read (bufferLength) is less than bufferSize, the previous {{unusedBytes}} will be wrong, which should be {{bufferLength}} - {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger value. For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 4: the current code will give the following result: First record: Key:0 Value:"1234567890" Second record: Key:12 Value:"12" Third Record: Key:21 Value:"345" You can see the Key for the third record is wrong, it should be 16 instead of 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the first time, for the second times, it only read 5 bytes, which is 5 bytes less than the bufferSize. That is why the key we get is 5 bytes larger than the correct one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6481) LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6481: - Description: LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. There are two issues: # LineRecordReader may give incomplete record: some characters cut off at the end of record. # LineRecordReader may give wrong position/key information. The first issue only happens for Custom Delimiter, which is caused by the following code at {{LineReader#readCustomLine}}: {code} if (appendLength > 0) { if (ambiguousByteCount > 0) { str.append(recordDelimiterBytes, 0, ambiguousByteCount); //appending the ambiguous characters (refer case 2.2) bytesConsumed += ambiguousByteCount; ambiguousByteCount=0; } str.append(buffer, startPosn, appendLength); txtLength += appendLength; } {code} If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", bufferSize is 10 and splitLength is 12, the correct record should be "123456789a" with length 10, but we get incomplete record "123456789" with length 9 from current code. The second issue can happen for both Custom Delimiter and Default Delimiter, which is caused by the code in {{UncompressedSplitLineReader#readLine}}. {{UncompressedSplitLineReader#readLine}} may report wrong size information at some corner cases. The reason is {{unusedBytes}} in the following code: {code} bytesRead += unusedBytes; unusedBytes = bufferSize - getBufferPosn(); bytesRead -= unusedBytes; {code} If the last bytes read (bufferLength) is less than bufferSize, the previous {{unusedBytes}} will be wrong, which should be {{bufferLength}} - {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger value. For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 4: the current code will give the following result: First record: Key:0 Value:"1234567890" Second record: Key:12 Value:"12" Third Record: Key:21 Value:"345" You can see the Key for the third record is wrong, it should be 16 instead of 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the first time, for the second times, it only read 5 bytes, which is 5 bytes less than the bufferSize. That is why the key we get is 5 bytes larger than the correct one. was: LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes. There are two issues: # LineRecordReader may give incomplete record: some characters cut off at the end of record. # LineRecordReader may give wrong position/key information. The first issue only happens for Custom Delimiter, which is caused by the following code at {{LineReader#readCustomLine}}: {code} if (appendLength > 0) { if (ambiguousByteCount > 0) { str.append(recordDelimiterBytes, 0, ambiguousByteCount); //appending the ambiguous characters (refer case 2.2) bytesConsumed += ambiguousByteCount; ambiguousByteCount=0; } str.append(buffer, startPosn, appendLength); txtLength += appendLength; } {code} If {{appendLength}} is 0 and {{ambiguousByteCount}} is not 0, this bug will be triggered. For example, input is "123456789aab", Custom Delimiter is "ab", bufferSize is 10 and splitLength is 12, the correct record should be "123456789a" with length 10, but we get incomplete record "123456789" with length 9 from current code. The second issue can happen for both Custom Delimiter and Default Delimiter, which is caused by the code in {{UncompressedSplitLineReader#readLine}}. {{UncompressedSplitLineReader#readLine}} may report wrong size information at some corner cases. The reason is {{unusedBytes}} in the following code: {code} bytesRead += unusedBytes; unusedBytes = bufferSize - getBufferPosn(); bytesRead -= unusedBytes; {code} If the last bytes read (bufferLength) is less than bufferSize, the previous {{unusedBytes}} will be wrong, which should be {{bufferLength}} - {{bufferPosn}} instead of bufferSize - {{bufferPosn}}. It will return larger value. For example, input is "1234567890ab12ab345", Custom Delimiter is "ab", bufferSize is 10 and two splits:first splitLength is 15 and second splitLength 4: the current code will give the following result: First record: Key:0 Value:"1234567890" Second record: Key:12 Value:"12" Third Record: Key:21 Value:"345" You can see the Key for the third record is wrong, it should be 16 instead of 21. It is due to wrong {{unusedBytes}}. {{fillBuffer}} read 10 bytes for the first time, for the second times, it only read 5 bytes, which is 5 bytes less than the bufferSize. That is
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Resolution: Fixed Status: Resolved (was: Patch Available) NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Fix Version/s: 2.8.0 NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Fix For: 2.8.0 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720443#comment-14720443 ] zhihai xu commented on MAPREDUCE-6452: -- thanks [~asuresh], [~ajithshetty] and [~mohdshahidkhan] for the review! thanks [~bibinchundatt] for reporting this issue! Committed it to 2.8.0 and branch-2 NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Fix For: 2.8.0 Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717953#comment-14717953 ] zhihai xu commented on MAPREDUCE-6452: -- thanks for the review [~asuresh], [~ajithshetty] and [~mohdshahidkhan], All these test failures are not related to the patch. All these test passed at my local build. {code} --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.091 sec - in org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestJHSSecurity Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.402 sec - in org.apache.hadoop.mapreduce.security.TestJHSSecurity\ Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestBinaryTokenFile Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.584 sec - in org.apache.hadoop.mapreduce.security.TestBinaryTokenFile Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 {code} If no objection, will commit it tomorrow. NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Hadoop Flags: Reviewed NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Attachment: (was: MAPREDUCE-6452.000.patch) NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Attachment: MAPREDUCE-6452.002.patch NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.002.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707800#comment-14707800 ] zhihai xu commented on MAPREDUCE-6460: -- The failure is because the test didn't wait for the app attempt unregistered from ApplicationMasterService (ApplicationMasterService#unregisterAttempt). The fix is to wait for the app entering state {{RMAppState.KILLED}} which will make sure {{appAttempt.masterService.unregisterAttempt(appAttemptId)}} being called. I uploaded the patch MAPREDUCE-6460.000.patch for review. TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
zhihai xu created MAPREDUCE-6460: Summary: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Component/s: test TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Attachment: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Attachment: (was: MAPREDUCE-6460.000.patch) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Status: Patch Available (was: Open) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6460: - Attachment: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails --- Key: MAPREDUCE-6460 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6460.000.patch TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails with the following logs: --- T E S T S --- Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) Time elapsed: 2.606 sec FAILURE! java.lang.AssertionError: Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Results : Failed tests: TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException Expected exception: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6440) Duplicate Key in Json Output for Job details
[ https://issues.apache.org/jira/browse/MAPREDUCE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704395#comment-14704395 ] zhihai xu commented on MAPREDUCE-6440: -- Maybe change the name {{type}} to {{taskType}} because it came from {{TaskType.toString()}} Duplicate Key in Json Output for Job details Key: MAPREDUCE-6440 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6440 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Anushri Assignee: Bibin A Chundatt Priority: Minor Duplicate key in Json Output for Job details for the url : http://jhs_ip:jhs_port/ws/v1/history/mapreduce/jobs/job_id/tasks/task_id/attempts If the task type is REDUCE the json output for this url contains duplicate key for type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Attachment: MAPREDUCE-6452.000.patch NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Ajith S Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6452: - Attachment: (was: MAPREDUCE-6452.000.patch) NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Ajith S Attachments: MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reassigned MAPREDUCE-6452: Assignee: zhihai xu (was: Ajith S) NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: zhihai xu Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6452) NPE when intermediate encrypt enabled for LocalRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702646#comment-14702646 ] zhihai xu commented on MAPREDUCE-6452: -- thanks [~ajithshetty] and [~mohdshahidkhan] for the clarification, I see the commit Fixing MR intermediate spills.: https://github.com/apache/hadoop/commit/6b710a42e00acca405e085724c89cda016cf7442 change: {code} private static byte[] getEncryptionKey() throws IOException { return TokenCache. getShuffleSecretKey(UserGroupInformation.getCurrentUser() .getCredentials()); } {code} To {code} private static byte[] getEncryptionKey() throws IOException { return TokenCache.getEncryptedSpillKey(UserGroupInformation.getCurrentUser() .getCredentials()); } {code} The change Fixing MR intermediate spills. is added at 2.7.1 release. But the stack trace from this JIRA is based on the code 2.7.1 or later because {code} at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) {code} doesn't match 2.7.0 code base. NPE when intermediate encrypt enabled for LocalRunner - Key: MAPREDUCE-6452 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6452 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Ajith S Attachments: MAPREDUCE-6452.000.patch, MAPREDUCE-6452.1.patch, TestLocalJobSubmission.java Enable the below properties try running mapreduce job mapreduce.framework.name=local mapreduce.job.encrypted-intermediate-data=true {code} 2015-08-14 16:27:25,248 WARN [Thread-21] mapred.LocalJobRunner (LocalJobRunner.java:run(561)) - job_local473843898_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:463) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:523) Caused by: java.lang.NullPointerException at org.apache.hadoop.crypto.CryptoOutputStream.init(CryptoOutputStream.java:92) at org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.init(CryptoFSDataOutputStream.java:31) at org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:112) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1492) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:244) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Jobs are failing always -- This message was sent by Atlassian JIRA (v6.3.4#6332)