[jira] [Reopened] (MAPREDUCE-6987) JHS Log Scanner and Cleaner blocked
[ https://issues.apache.org/jira/browse/MAPREDUCE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reopened MAPREDUCE-6987: > JHS Log Scanner and Cleaner blocked > --- > > Key: MAPREDUCE-6987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > {code} > "Log Scanner/Cleaner #1" #81 prio=5 os_prio=0 tid=0x7fd6c010f000 > nid=0x11db waiting on condition [0x7fd6aa859000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xd6c88a80> (a > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47) > at > org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Log Scanner/Cleaner #0" #80 prio=5 os_prio=0 tid=0x7fd6c010c800 > nid=0x11da waiting on condition [0x7fd6aa95a000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xd6c8> (a > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47) > at > org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Both threads waiting on {{FutureTask.get()}} for infinite time after first > execution -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6987) JHS Log Scanner and Cleaner blocked
[ https://issues.apache.org/jira/browse/MAPREDUCE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6987. Resolution: Duplicate Fix Version/s: (was: 3.1.0) (was: 3.0.0) (was: 2.9.0) > JHS Log Scanner and Cleaner blocked > --- > > Key: MAPREDUCE-6987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > {code} > "Log Scanner/Cleaner #1" #81 prio=5 os_prio=0 tid=0x7fd6c010f000 > nid=0x11db waiting on condition [0x7fd6aa859000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xd6c88a80> (a > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47) > at > org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Log Scanner/Cleaner #0" #80 prio=5 os_prio=0 tid=0x7fd6c010c800 > nid=0x11da waiting on condition [0x7fd6aa95a000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xd6c8> (a > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47) > at > org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Both threads waiting on {{FutureTask.get()}} for infinite time after first > execution -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193832#comment-16193832 ] Andrew Wang commented on MAPREDUCE-5951: If it's going into 2.9.0, I think it's safe for 3.0.0 too. Please include it in branch-3.0 as well, thanks! > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-020.patch, MAPREDUCE-5951-trunk-021.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6925) CLONE - Make Counter limits consistent across JobClient, MRAppMaster, and YarnChild
[ https://issues.apache.org/jira/browse/MAPREDUCE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6925: --- Target Version/s: 2.9.0, 3.0.0 (was: 2.9.0, 3.0.0-beta1) > CLONE - Make Counter limits consistent across JobClient, MRAppMaster, and > YarnChild > --- > > Key: MAPREDUCE-6925 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6925 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, client, task >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > > Currently, counter limits "mapreduce.job.counters.*" handled by > {{org.apache.hadoop.mapreduce.counters.Limits}} are initialized > asymmetrically: on the client side, and on the AM, job.xml is ignored whereas > it's taken into account in YarnChild. > It would be good to make the Limits job-configurable, such that max > counters/groups is only increased when needed. With the current Limits > implementation relying on static constants, it's going to be challenging for > tools that submit jobs concurrently without resorting to class loading > isolation. > The patch that I am uploading is not perfect but demonstrates the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6946) Moving logging APIs over to slf4j in hadoop-mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6946: --- Target Version/s: 2.9.0, 3.0.0 (was: 2.9.0, 3.0.0-beta1) > Moving logging APIs over to slf4j in hadoop-mapreduce > - > > Key: MAPREDUCE-6946 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6946 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Akira Ajisaka > > MapReduce side of YARN-6712. This is an umbrella jira for MapReduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6960) Shuffle Handler prints disk error stack traces for every read failure.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6960: --- Fix Version/s: (was: 3.0.0) 3.0.0-beta1 > Shuffle Handler prints disk error stack traces for every read failure. > -- > > Key: MAPREDUCE-6960 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6960 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 2.9.0, 3.0.0-beta1, 3.1.0, 2.8.3 > > Attachments: MAPREDUCE-6960.001.patch > > > {code} > } catch (IOException e) { > LOG.error("Shuffle error :", e); > {code} > In cases where the read from a disk fails and throws a DiskErrorException, > the shuffle handler prints the entire stack trace for each and every one of > the failures causing the nodemanager logs to quickly fill up the disk. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6953) Skip the testcase testJobWithChangePriority if FairScheduler is used
[ https://issues.apache.org/jira/browse/MAPREDUCE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6953: --- Fix Version/s: (was: 3.0.0) 3.0.0-beta1 > Skip the testcase testJobWithChangePriority if FairScheduler is used > > > Key: MAPREDUCE-6953 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6953 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: client >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Fix For: 2.9.0, 3.0.0-beta1, 3.1.0 > > Attachments: MAPREDUCE-6953-001.patch > > > We run the unit tests with Fair Scheduler downstream. FS does not support > priorities at the moment, so TestMRJobs#testJobWithChangePriority fails. > Just add {{Assume.assumeFalse(usingFairScheduler);}} and JUnit will skip the > test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162092#comment-16162092 ] Andrew Wang commented on MAPREDUCE-6892: Peter, do you mind adding a release note to this JIRA summarizing the impact for our end users? Thanks! > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, > MAPREDUCE-6892-006.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161990#comment-16161990 ] Andrew Wang commented on MAPREDUCE-6870: Hi Erik, do you mind adding a release note summarizing the incompatibility? Would be nice for our end users. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6941. Resolution: Not A Problem I'm going to close this based on Ray's analysis. Junping, if you disagree, please re-open the JIRA. > The default setting doesn't work for MapReduce job > -- > > Key: MAPREDUCE-6941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Junping Du >Priority: Blocker > > On the deployment of hadoop 3 cluster (based on current trunk branch) with > default settings, the MR job will get failed as following exceptions: > {noformat} > 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 > running in uber mode : false > 2017-08-16 13:00:03,847 INFO mapreduce.Job: map 0% reduce 0% > 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed > with state FAILED due to: Application application_1502913552390_0001 failed 2 > times due to AM Container for appattempt_1502913552390_0001_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from > container-launch. > Container id: container_1502913552390_0001_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) > at org.apache.hadoop.util.Shell.run(Shell.java:887) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is because mapreduce related jar are not added into yarn setup by > default. To make MR job run successful, we need to add following > configurations to yarn-site.xml now: > {noformat} > > yarn.application.classpath > > ... > /share/hadoop/mapreduce/*, > /share/hadoop/mapreduce/lib/* > ... > > {noformat} > But this config is not necessary for previous version of Hadoop. We should > fix this issue before beta release otherwise it will be a regression for > configuration changes. > This could be more like a YARN issue (if so, we should move), depends on how > we fix it finally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146431#comment-16146431 ] Andrew Wang commented on MAPREDUCE-6941: [~djp] is Ray's explanation satisfactory? Wondering if we can close this, it's one of two unassigned blockers right now. > The default setting doesn't work for MapReduce job > -- > > Key: MAPREDUCE-6941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Junping Du >Priority: Blocker > > On the deployment of hadoop 3 cluster (based on current trunk branch) with > default settings, the MR job will get failed as following exceptions: > {noformat} > 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 > running in uber mode : false > 2017-08-16 13:00:03,847 INFO mapreduce.Job: map 0% reduce 0% > 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed > with state FAILED due to: Application application_1502913552390_0001 failed 2 > times due to AM Container for appattempt_1502913552390_0001_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from > container-launch. > Container id: container_1502913552390_0001_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) > at org.apache.hadoop.util.Shell.run(Shell.java:887) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is because mapreduce related jar are not added into yarn setup by > default. To make MR job run successful, we need to add following > configurations to yarn-site.xml now: > {noformat} > > yarn.application.classpath > > ... > /share/hadoop/mapreduce/*, > /share/hadoop/mapreduce/lib/* > ... > > {noformat} > But this config is not necessary for previous version of Hadoop. We should > fix this issue before beta release otherwise it will be a regression for > configuration changes. > This could be more like a YARN issue (if so, we should move), depends on how > we fix it finally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142217#comment-16142217 ] Andrew Wang commented on MAPREDUCE-6941: Thanks Ray. Should we just close this then? Or are the docs still lacking in some way? > The default setting doesn't work for MapReduce job > -- > > Key: MAPREDUCE-6941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Junping Du >Priority: Blocker > > On the deployment of hadoop 3 cluster (based on current trunk branch) with > default settings, the MR job will get failed as following exceptions: > {noformat} > 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 > running in uber mode : false > 2017-08-16 13:00:03,847 INFO mapreduce.Job: map 0% reduce 0% > 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed > with state FAILED due to: Application application_1502913552390_0001 failed 2 > times due to AM Container for appattempt_1502913552390_0001_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from > container-launch. > Container id: container_1502913552390_0001_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) > at org.apache.hadoop.util.Shell.run(Shell.java:887) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is because mapreduce related jar are not added into yarn setup by > default. To make MR job run successful, we need to add following > configurations to yarn-site.xml now: > {noformat} > > yarn.application.classpath > > ... > /share/hadoop/mapreduce/*, > /share/hadoop/mapreduce/lib/* > ... > > {noformat} > But this config is not necessary for previous version of Hadoop. We should > fix this issue before beta release otherwise it will be a regression for > configuration changes. > This could be more like a YARN issue (if so, we should move), depends on how > we fix it finally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6901) Remove @deprecated tags from DistributedCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113648#comment-16113648 ] Andrew Wang commented on MAPREDUCE-6901: A little ping since this is marked as critical and the patch looks ready for review. [~jlowe] or [~rkanter]? > Remove @deprecated tags from DistributedCache > - > > Key: MAPREDUCE-6901 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6901 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: distributed-cache >Affects Versions: 3.0.0-alpha3 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Attachments: MAPREDUCE-6901.001.patch > > > Doing this as part of Hadoop 3 cleanup. > DistributedCache has been marked as deprecated forever to the point where the > change that did it isn't in Git. > I don't really have a preference for whether we remove it or not, but I'd > like to have a discussion and have it properly documented as a release not > for Hadoop 3 before we hit final release. At the very least we can have a > Release Note that will sum up whatever discussion we have here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108021#comment-16108021 ] Andrew Wang commented on MAPREDUCE-6288: Thanks for handling the reverts Junping. I filed and linked MAPREDUCE-6924 so the reverts show up in the beta1 changelog, since I believe they were included in the alpha releases. > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288.002.patch, MAPREDUCE-6288-gera-001.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) > at
[jira] [Created] (MAPREDUCE-6924) Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875
Andrew Wang created MAPREDUCE-6924: -- Summary: Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875 Key: MAPREDUCE-6924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6924 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0-alpha1 Reporter: Andrew Wang Assignee: Junping Du Filing this JIRA so the reverts show up in the changelog. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6924) Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875
[ https://issues.apache.org/jira/browse/MAPREDUCE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6924. Resolution: Fixed Fix Version/s: 3.0.0-beta1 Resolving this changelog tracking JIRA. Thanks to [~djp] for doing the reverts! > Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875 > --- > > Key: MAPREDUCE-6924 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6924 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Junping Du > Fix For: 3.0.0-beta1 > > > Filing this JIRA so the reverts show up in the changelog. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6734) Add option to distcp to preserve file path structure of source files at the destination
[ https://issues.apache.org/jira/browse/MAPREDUCE-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6734: --- Fix Version/s: (was: 3.0.0-alpha4) > Add option to distcp to preserve file path structure of source files at the > destination > --- > > Key: MAPREDUCE-6734 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6734 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0-alpha2 > Environment: Software platform >Reporter: Frederick Tucker > Labels: distcp, newbie, patch > Attachments: MAPREDUCE-6734.3.0.0-alpha2.patch, > MAPREDUCE-6734.3.0.0-alpha2.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When copying files using distcp with globbed source files, all the matched > files in the glob are copied in a single flat directory. This causes > problems when the file structure at the source is important. It also is an > issue when there are two files matched in the glob with the same name because > it causes a duplicate file error at the target. I'd like to have an option > to preserve the file structure of the source files when globbing inputs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6697) Concurrent task limits should only be applied when necessary
[ https://issues.apache.org/jira/browse/MAPREDUCE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6697: --- Resolution: Fixed Status: Resolved (was: Patch Available) I'm going to close this so I can roll a release, please re-open if you need a Jenkins run after. > Concurrent task limits should only be applied when necessary > > > Key: MAPREDUCE-6697 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6697 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Nathan Roberts > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: MAPREDUCE-6697-v1.patch > > > The concurrent task limit feature should only adjust the ANY portion of the > AM heartbeat ask when a limit is truly necessary, otherwise extraneous > containers could be allocated by the RM to the AM adding some overhead to > both. Specifying a concurrent task limit that is beyond the total number of > tasks in the job should be the same as asking for no limit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6829) Add peak memory usage counter for each task
[ https://issues.apache.org/jira/browse/MAPREDUCE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6829: --- Fix Version/s: 3.0.0-alpha3 > Add peak memory usage counter for each task > --- > > Key: MAPREDUCE-6829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Yufei Gu >Assignee: Miklos Szegedi > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: MAPREDUCE-6829.000.patch, MAPREDUCE-6829.001.patch, > MAPREDUCE-6829.002.patch, MAPREDUCE-6829.003.patch, MAPREDUCE-6829.004.patch, > MAPREDUCE-6829.005.patch > > > Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which > are snapshots of memory usage of that task. They are not sufficient for users > to understand peak memory usage by that task, e.g. in order to diagnose task > failures, tune job parameters or change application design. This new feature > will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and > VIRTUAL_MEMORY_BYTES_MAX. > This JIRA has the same feature from MAPREDUCE-4710. I file this new YARN > JIRA since MAPREDUCE-4710 is pretty old one from MR 1.x era, it more or less > assumes a branch-1 architecture, should be close at this point. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950343#comment-15950343 ] Andrew Wang commented on MAPREDUCE-6288: Pinging this JIRA as it's still marked as a blocker for 3.x and unassigned. Is anyone planning on picking it up? > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288.002.patch, MAPREDUCE-6288-gera-001.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:257) > at
[jira] [Updated] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6873: --- Fix Version/s: (was: 3.0.0-beta1) 3.0.0-alpha3 > MR Job Submission Fails if MR framework application path not on defaultFS > - > > Key: MAPREDUCE-6873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.6.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Minor > Fix For: 2.9.0, 2.7.4, 2.8.1, 3.0.0-alpha3 > > Attachments: MAPREDUCE-6873.000.patch > > > {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that > {{mapreduce.framework.application.path}} has a FS which matches > {{fs.defaultFS}} which may not always be true. This is just a consequence of > using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, > Configuration)}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6101) on job submission, if input or output directories are encrypted, shuffle data should be encrypted at rest
[ https://issues.apache.org/jira/browse/MAPREDUCE-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6101: --- Status: Open (was: Patch Available) > on job submission, if input or output directories are encrypted, shuffle data > should be encrypted at rest > - > > Key: MAPREDUCE-6101 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6101 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission, security >Affects Versions: 2.6.0 >Reporter: Alejandro Abdelnur >Assignee: Arun Suresh > Attachments: MAPREDUCE-6101.1.patch, MAPREDUCE-6101.2.patch > > > Currently setting shuffle data at rest encryption has to be done explicitly > to work. If not set explicitly (ON or OFF) but the input or output HDFS > directories of the job are in an encrption zone, we should set it to ON. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6854) Each map task should create a unique temporary name that includes an object name
[ https://issues.apache.org/jira/browse/MAPREDUCE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6854: --- Target Version/s: 3.0.0-alpha3 (was: 3.0.0-alpha2) > Each map task should create a unique temporary name that includes an object > name > > > Key: MAPREDUCE-6854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6854 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0-alpha2 >Reporter: Gil Vernik > Labels: patch > Attachments: HADOOP-6854-001.patch, HADOOP-6854-002.patch > > > Consider an example: a local file "/data/a.txt" need to be copied into > swift://container.service/data/a.txt > The way distcp works is that first it will upload "/data/a.txt" into > swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_00_0 > Upon completion distcp will move > swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_00_0 > into swift://container.mil01/data/a.txt > > The temporary file naming convention assumes that each map task will > sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID > and then rename them to the final names. Most of Hadoop eco system > components use object.name which is part of the temporary name, however > distcp doesn't use such approach. > This JIRA propose to add a configuration key indicating that temporary > objects will also include object name as part of their temporary file name, > For example > "/data/a.txt" will be uploaded into > "swift://container.mil01/data/a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" > "a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" doesn't affects > flows in the access drivers, since "a.txt" is not considered as a > sub-directory so no special operations will be taken. > The benefits of the patch : > 1. Temp object names will be better distributed in object stores, since they > all has different prefix. > 2. Sometimes it's not possible to debug what data is copied and what failed. > Sometimes temp files are not renamed, it will be much better if expecting > temp name - one can figure what object names were copied. > 3. Different systems may expect > "a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" and extract value > prior "distcp.tmp" thus getting destination object name. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6857) Reduce number of exists() calls on the target object
[ https://issues.apache.org/jira/browse/MAPREDUCE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6857: --- Target Version/s: 3.0.0-alpha3 (was: 3.0.0-alpha2) Fix Version/s: (was: 3.0.0-alpha2) > Reduce number of exists() calls on the target object > > > Key: MAPREDUCE-6857 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6857 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0-alpha2 >Reporter: Gil Vernik > Attachments: HADOOP-6857-002.patch > > > CopyMapper.map(..) calls targetStatus = targetFS.getFileStatus(target). > Few steps later RetriableFileCopyCommand.promoteTmpToTarget(..) will call > again exists(target) and delete if present. > The second exists() is useless, since if targetStatus==null it can be easily > seen if overwrite mode is activated and so target object can be deleted. > The propose of this patch is to delete target object by using targetStatus > and thus avoid calling exists() method. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6854) Each map task should create a unique temporary name that includes an object name
[ https://issues.apache.org/jira/browse/MAPREDUCE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6854: --- Fix Version/s: (was: 3.0.0-alpha2) > Each map task should create a unique temporary name that includes an object > name > > > Key: MAPREDUCE-6854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6854 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0-alpha2 >Reporter: Gil Vernik > Labels: patch > Attachments: HADOOP-6854-001.patch, HADOOP-6854-002.patch > > > Consider an example: a local file "/data/a.txt" need to be copied into > swift://container.service/data/a.txt > The way distcp works is that first it will upload "/data/a.txt" into > swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_00_0 > Upon completion distcp will move > swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_00_0 > into swift://container.mil01/data/a.txt > > The temporary file naming convention assumes that each map task will > sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID > and then rename them to the final names. Most of Hadoop eco system > components use object.name which is part of the temporary name, however > distcp doesn't use such approach. > This JIRA propose to add a configuration key indicating that temporary > objects will also include object name as part of their temporary file name, > For example > "/data/a.txt" will be uploaded into > "swift://container.mil01/data/a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" > "a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" doesn't affects > flows in the access drivers, since "a.txt" is not considered as a > sub-directory so no special operations will be taken. > The benefits of the patch : > 1. Temp object names will be better distributed in object stores, since they > all has different prefix. > 2. Sometimes it's not possible to debug what data is copied and what failed. > Sometimes temp files are not renamed, it will be much better if expecting > temp name - one can figure what object names were copied. > 3. Different systems may expect > "a.txt.distcp.tmp.attempt_local2036034928_0001_m_00_0" and extract value > prior "distcp.tmp" thus getting destination object name. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6734) Add option to distcp to preserve file path structure of source files at the destination
[ https://issues.apache.org/jira/browse/MAPREDUCE-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6734: --- Fix Version/s: (was: 3.0.0-alpha2) 3.0.0-alpha3 > Add option to distcp to preserve file path structure of source files at the > destination > --- > > Key: MAPREDUCE-6734 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6734 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0-alpha2 > Environment: Software platform >Reporter: Frederick Tucker > Labels: distcp, newbie, patch > Fix For: 3.0.0-alpha3 > > Attachments: MAPREDUCE-6734.3.0.0-alpha2.patch, > MAPREDUCE-6734.3.0.0-alpha2.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When copying files using distcp with globbed source files, all the matched > files in the glob are copied in a single flat directory. This causes > problems when the file structure at the source is important. It also is an > issue when there are two files matched in the glob with the same name because > it causes a duplicate file error at the target. I'd like to have an option > to preserve the file structure of the source files when globbing inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6728: --- Resolution: Fixed Status: Resolved (was: Patch Available) Resolving this so it gets picked up in the 3.0.0-alpha2 release notes. Please reopen if/when you need a branch-2 precommit run. > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.006.patch, MAPREDUCE-6728-branch-2.8.06.patch, > mapreduce6728.branch-2.8.patch, mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6791) remove unnecessary dependency from hadoop-mapreduce-client-jobclient to hadoop-mapreduce-client-shuffle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6791: --- Release Note: An unnecessary dependency on hadoop-mapreduce-client-shuffle in hadoop-mapreduce-client-jobclient has been removed. > remove unnecessary dependency from hadoop-mapreduce-client-jobclient to > hadoop-mapreduce-client-shuffle > --- > > Key: MAPREDUCE-6791 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6791 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Minor > Labels: Incompatible > Fix For: 3.0.0-alpha2 > > Attachments: mapreduce6791.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752741#comment-15752741 ] Andrew Wang commented on MAPREDUCE-6704: Ping, we getting close on resolving this JIRA? > Container fail to launch for mapred application > --- > > Key: MAPREDUCE-6704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch, > ClusterSetup.html, MAPREDUCE-6704.0002.patch, MR-6704-branch2.8.tar.gz, > MR-6704-trunk-tempPatch.tar.gz, MR-6704-trunk.tar.gz, SingleCluster.html, > container-whitelist-env-wip.patch, temp.patch > > > Container fail to launch for mapred application. > As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set > .After > https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c >{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} > since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false. > {noformat} > 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with > state FAILED due to: Application application_1462155939310_0004 failed 2 > times due to AM Container for appattempt_1462155939310_0004_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: Exception from container-launch. > Container id: container_1462155939310_0004_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) > at org.apache.hadoop.util.Shell.run(Shell.java:850) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6288: --- Target Version/s: 2.8.0, 3.0.0-beta1 (was: 2.8.0, 3.0.0-alpha2) I'm going to retarget this for 3.0.0-beta1 rather than 3.0.0-alpha2, since it won't be a regression from alpha1. Is someone actively working on this JIRA? If not, I'd like to revert this code out of trunk and branch-2 too rather than kicking the can down the road every release. > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) > at >
[jira] [Updated] (MAPREDUCE-6565) Configuration to use host name in delegation token service is not read from job.xml during MapReduce job execution.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6565: --- Fix Version/s: 3.0.0-alpha2 > Configuration to use host name in delegation token service is not read from > job.xml during MapReduce job execution. > --- > > Key: MAPREDUCE-6565 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6565 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chris Nauroth >Assignee: Li Lu > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: MAPREDUCE-6565-trunk.001.patch > > > By default, the service field of a delegation token is populated based on > server IP address. Setting {{hadoop.security.token.service.use_ip}} to > {{false}} changes this behavior to use host name instead of IP address. > However, this configuration property is not read from job.xml. Instead, it's > read from a separate {{Configuration}} instance created during static > initialization of {{SecurityUtil}}. This does not work correctly with > MapReduce jobs if the framework is distributed by setting > {{mapreduce.application.framework.path}} and the > {{mapreduce.application.classpath}} is isolated to avoid reading > core-site.xml from the cluster nodes. MapReduce tasks will fail to > authenticate to HDFS, because they'll try to find a delegation token based on > the NameNode IP address, even though at job submission time the tokens were > generated using the host name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6288: --- Target Version/s: 2.8.0, 3.0.0-alpha2 (was: 2.8.0) > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:257) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) > at >
[jira] [Updated] (MAPREDUCE-6682) TestMRCJCFileOutputCommitter fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6682: --- Fix Version/s: (was: 3.0.0-alpha2) 3.0.0-alpha1 > TestMRCJCFileOutputCommitter fails intermittently > - > > Key: MAPREDUCE-6682 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6682 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Akira Ajisaka > Fix For: 3.0.0-alpha1 > > Attachments: MAPREDUCE-6682.00.patch, MAPREDUCE-6682.01.patch, > MAPREDUCE-6682.02.patch, MAPREDUCE-6682.03.patch, MAPREDUCE-6682.04.patch > > > {noformat} > java.lang.AssertionError: Output directory not empty expected:<0> but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.mapred.TestMRCJCFileOutputCommitter.testAbort(TestMRCJCFileOutputCommitter.java:153) > {noformat} > *PreCommit Report* > https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668967#comment-15668967 ] Andrew Wang commented on MAPREDUCE-6288: Hi folks, anything more to be said about this JIRA? It's marked as a blocker, and there's been no action for over a year. > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) > at > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:257) > at
[jira] [Updated] (MAPREDUCE-6467) Submitting streaming job is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6467: --- Target Version/s: 3.0.0-alpha2 (was: 3.0.0-alpha1) > Submitting streaming job is not thread safe > --- > > Key: MAPREDUCE-6467 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6467 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: job submission >Affects Versions: 2.7.1 >Reporter: jeremie simon >Assignee: Ivo Udelsmann >Priority: Minor > Labels: easyfix, streaming, thread-safety > Attachments: MAPREDUCE-6467.001.patch > > > The submission of the streaming job is not thread safe. > That is because the class StreamJob is using the OptionBuilder which is > itself not thread safe. > This can cause super tricky bugs. > An easy fix would be to simply create instances of Option through the normal > constructor and decorate the object if necessary. > This fix should be applied on the functions createOption and > createBoolOption. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596201#comment-15596201 ] Andrew Wang commented on MAPREDUCE-6704: Folks, is there any progress we can make on this JIRA? That this doesn't work out of the box anymore has been very surprising to our users. I'd like to get it fixed for alpha2 if possible. > Container fail to launch for mapred application > --- > > Key: MAPREDUCE-6704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch > > > Container fail to launch for mapred application. > As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set > .After > https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c >{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} > since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false. > {noformat} > 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with > state FAILED due to: Application application_1462155939310_0004 failed 2 > times due to AM Container for appattempt_1462155939310_0004_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: Exception from container-launch. > Container id: container_1462155939310_0004_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) > at org.apache.hadoop.util.Shell.run(Shell.java:850) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6536) hadoop-pipes doesn't use maven properties for openssl
[ https://issues.apache.org/jira/browse/MAPREDUCE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583966#comment-15583966 ] Andrew Wang commented on MAPREDUCE-6536: Ping on this JIRA. Looks like it's pretty close, should we target for alpha2? > hadoop-pipes doesn't use maven properties for openssl > - > > Key: MAPREDUCE-6536 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6536 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 3.0.0-alpha1 > Environment: OS X >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Attachments: HADOOP-12518.00.patch, HADOOP-12518.01.patch, > HADOOP-12518.02.patch, HADOOP-12518.03.patch, MAPREDUCE-6536.04.patch > > > hadoop-common has some maven properties that are used to define where OpenSSL > lives. hadoop-pipes should also use them so we can enable automated testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-5506) Hadoop-1.1.1 occurs ArrayIndexOutOfBoundsException with MultithreadedMapRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-5506. Resolution: Won't Fix Resolving as WONTFIX since mr1 has been removed. > Hadoop-1.1.1 occurs ArrayIndexOutOfBoundsException with MultithreadedMapRunner > -- > > Key: MAPREDUCE-5506 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5506 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 > Environment: RHEL 6.3 x86_64 >Reporter: sam liu >Priority: Blocker > > After I set: > - 'jobConf.setMapRunnerClass(MultithreadedMapRunner.class);' in MR app > - 'mapred.map.multithreadedrunner.threads = 2' in mapred-site.xml > A simple MR app failed as its Map task encountered > ArrayIndexOutOfBoundsException as below(please ignore the line numbers in the > exception as I added some log print codes): > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1331) > at java.io.DataOutputStream.write(DataOutputStream.java:101) > at org.apache.hadoop.io.Text.write(Text.java:282) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1060) > at > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591) > at study.hadoop.mapreduce.sample.WordCount$Map.map(WordCount.java:41) > at study.hadoop.mapreduce.sample.WordCount$Map.map(WordCount.java:1) > at > org.apache.hadoop.mapred.lib.MultithreadedMapRunner$MapperInvokeRunable.run(MultithreadedMapRunner.java:231) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) > at java.lang.Thread.run(Thread.java:738) > And the exception happens on line 'System.arraycopy(b, off, kvbuffer, > bufindex, len)' in MapTask.java#MapOutputBuffer#Buffer#write(). When the > exception occurs, 'b.length=4' but 'len=9'. > Btw, if I set 'mapred.map.multithreadedrunner.threads = 1', no exception > happened. So it should be an issue caused by multiple threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6792: --- Target Version/s: 2.9.0, 3.0.0-alpha2 (was: 2.9.0, 3.0.0-alpha1) > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak >Assignee: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells
[ https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6458: --- Target Version/s: 3.0.0-alpha2 (was: 3.0.0-alpha1) > Figure out the way to pass build-in classpath (files in distributed cache, > etc.) from parent to spawned shells > -- > > Key: MAPREDUCE-6458 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Dustin Cote > Attachments: MAPREDUCE-6458.00.patch > > > In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints > to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, > so jars in distributed cache can still work in child tasks. In trunk, we may > think some way different, like: involve additional env var to safely pass > build-in classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6729) Accurately compute the test execute time in DFSIO
[ https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6729: --- Flags: (was: Important) > Accurately compute the test execute time in DFSIO > - > > Key: MAPREDUCE-6729 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: benchmarks, performance, test >Affects Versions: 2.9.0 >Reporter: mingleizhang >Assignee: mingleizhang >Priority: Minor > Labels: performance, test > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6729.001.patch, MAPREDUCE-6729.002.patch > > > When doing DFSIO test as a distributed i/o benchmark tool. Then especially > writes plenty of files to disk or read from, both can cause performance issue > and imprecise value in a way. The question is that existing practices needs > to delete files when before running a job and that will cause extra time > consumption and furthermore cause performance issue, statistical time error > and imprecise throughput while the files are lots of. So we need to replace > or improve this hack to prevent this from happening in the future. > {code} > public static void testWrite() throws Exception { > FileSystem fs = cluster.getFileSystem(); > long tStart = System.currentTimeMillis(); > bench.writeTest(fs); // this line of code will cause extra time > consumption because of fs.delete(*,*) by the writeTest method > long execTime = System.currentTimeMillis() - tStart; > bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime); > } > private void writeTest(FileSystem fs) throws IOException { > Path writeDir = getWriteDir(config); > fs.delete(getDataDir(config), true); > fs.delete(writeDir, true); > runIOTest(WriteMapper.class, writeDir); > } > {code}  > [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6701) application master log can not be available when clicking jobhistory's am logs link
[ https://issues.apache.org/jira/browse/MAPREDUCE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6701: --- Flags: Patch (was: Patch,Important) > application master log can not be available when clicking jobhistory's am > logs link > --- > > Key: MAPREDUCE-6701 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6701 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.9.0 >Reporter: chenyukang >Assignee: Haibo Chen >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: yarn5041.001.patch, yarn5041.002.patch > > > In history server webapp, application master logs link is wrong. it shows "No > logs available for container container_1462419429440_0003_01_01". It > direct to a wrong nodemanager http port instead of a node manager' container > managerment port. I think YARN-4701 brought this bug -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6714) Refactor UncompressedSplitLineReader.fillBuffer()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447501#comment-15447501 ] Andrew Wang commented on MAPREDUCE-6714: FYI for git greppers, this was typo'd as MAPREDUCE-6741 in the message. > Refactor UncompressedSplitLineReader.fillBuffer() > - > > Key: MAPREDUCE-6714 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6714 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6714.001.patch > > > MAPREDUCE-6635 made this change: > {code} > - maxBytesToRead = Math.min(maxBytesToRead, > -(int)(splitLength - totalBytesRead)); > + long leftBytesForSplit = splitLength - totalBytesRead; > + // check if leftBytesForSplit exceed Integer.MAX_VALUE > + if (leftBytesForSplit <= Integer.MAX_VALUE) { > +maxBytesToRead = Math.min(maxBytesToRead, (int)leftBytesForSplit); > + } > {code} > The result is one more comparison than necessary and code that's a little > convoluted. The code can be simplified as: > {code} > long leftBytesForSplit = splitLength - totalBytesRead; > if (leftBytesForSplit < maxBytesToRead) { > maxBytesToRead = (int)leftBytesForSplit; > } > {code} > The comparison will auto promote {{maxBytesToRead}}, making it safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-6462) JobHistoryServer to support JvmPauseMonitor as a service
[ https://issues.apache.org/jira/browse/MAPREDUCE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reopened MAPREDUCE-6462: > JobHistoryServer to support JvmPauseMonitor as a service > > > Key: MAPREDUCE-6462 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6462 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: 0001-MAPREDUCE-6462.patch, 0002-MAPREDUCE-6462.patch, > HADOOP-12321-003.patch, HADOOP-12321-005-aggregated.patch > > > As JvmPauseMonitor is made as an AbstractService, subsequent method changes > are needed in all places which uses the monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6462) JobHistoryServer to support JvmPauseMonitor as a service
[ https://issues.apache.org/jira/browse/MAPREDUCE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6462. Resolution: Duplicate Fix Version/s: (was: 2.9.0) > JobHistoryServer to support JvmPauseMonitor as a service > > > Key: MAPREDUCE-6462 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6462 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: 0001-MAPREDUCE-6462.patch, 0002-MAPREDUCE-6462.patch, > HADOOP-12321-003.patch, HADOOP-12321-005-aggregated.patch > > > As JvmPauseMonitor is made as an AbstractService, subsequent method changes > are needed in all places which uses the monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447492#comment-15447492 ] Andrew Wang commented on MAPREDUCE-6454: Hi folks, I noticed this JIRA is not present in trunk, though Vinod's comment says: {quote} Committed this to trunk, branch-2, 2.7 and 2.6. Thanks Junping. {quote} Based on the above discussion, I think this was only intended for branch-2 and thus git is correct, but I would appreciate clarification. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.2, 2.6.2 > > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells
[ https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6458: --- Target Version/s: 3.0.0-alpha1 (was: ) > Figure out the way to pass build-in classpath (files in distributed cache, > etc.) from parent to spawned shells > -- > > Key: MAPREDUCE-6458 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Dustin Cote > Attachments: MAPREDUCE-6458.00.patch > > > In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints > to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, > so jars in distributed cache can still work in child tasks. In trunk, we may > think some way different, like: involve additional env var to safely pass > build-in classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6704) Container fail to launch for mapred application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6704: --- Target Version/s: 3.0.0-alpha2 (was: 3.0.0-alpha1) > Container fail to launch for mapred application > --- > > Key: MAPREDUCE-6704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch > > > Container fail to launch for mapred application. > As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set > .After > https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c >{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} > since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false. > {noformat} > 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with > state FAILED due to: Application application_1462155939310_0004 failed 2 > times due to AM Container for appattempt_1462155939310_0004_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: Exception from container-launch. > Container id: container_1462155939310_0004_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) > at org.apache.hadoop.util.Shell.run(Shell.java:850) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar
[ https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-4683: --- Target Version/s: 3.0.0-alpha2 (was: 3.0.0-alpha1) > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar > > > Key: MAPREDUCE-4683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Reporter: Arun C Murthy >Assignee: Akira Ajisaka >Priority: Critical > Attachments: MAPREDUCE-4683.patch > > > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4522) DBOutputFormat Times out on large batch inserts
[ https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-4522: --- Fix Version/s: (was: 3.0.0-alpha1) > DBOutputFormat Times out on large batch inserts > --- > > Key: MAPREDUCE-4522 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller >Affects Versions: 0.20.205.0 >Reporter: Nathan Jarus >Assignee: Shyam Gavulla > Labels: newbie > Attachments: MAPREDUCE-4522.001.patch > > > In DBRecordWriter#close(), progress is never updated. In large batch inserts, > this can cause the reduce task to time out due to the amount of time it takes > the SQL engine to process that insert. > Potential solutions I can see: > Don't batch inserts; do the insert when DBRecordWriter#write() is called > (awful) > Spin up a thread in DBRecordWriter#close() and update progress in that. > (gross) > I can provide code for either if you're interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6313) Audit/optimize tests in hadoop-mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6313: --- Fix Version/s: (was: 3.0.0-alpha1) > Audit/optimize tests in hadoop-mapreduce-client-jobclient > - > > Key: MAPREDUCE-6313 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6313 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Allen Wittenauer >Assignee: nijel > Labels: newbie > > The tests in this package take an extremely long time to run, with some tests > taking 15-20 minutes on their own. It would be worthwhile to verify and > optimize any tests in this package in order to reduce patch testing time or > perhaps even splitting the package up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6274) [Rumen] Support compact property description in configuration XML
[ https://issues.apache.org/jira/browse/MAPREDUCE-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6274: --- Fix Version/s: (was: 3.0.0-alpha1) > [Rumen] Support compact property description in configuration XML > - > > Key: MAPREDUCE-6274 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6274 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Kengo Seki >Assignee: Shen Yinjie > Labels: newbie, rumen > > HADOOP-6964 made it possible to define configuration properties using XML > attributes, but Rumen has own configuration parsers and they don’t recognize > XML attributes. So it would be better to support the new description. > We can simply apply the same modification as HADOOP-6964 to Rumen, but it > might be worth considering making the parse function in common (also with a > part of o.a.h.conf.Configuration.loadResource(), if possible), because Rumen > has similar codes in JobConfigurationParser and ParsedConfigFile. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4695) Fix LocalRunner on trunk after MAPREDUCE-3223 broke it
[ https://issues.apache.org/jira/browse/MAPREDUCE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-4695: --- Component/s: test > Fix LocalRunner on trunk after MAPREDUCE-3223 broke it > -- > > Key: MAPREDUCE-4695 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4695 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Harsh J >Assignee: Harsh J >Priority: Blocker > Fix For: 3.0.0-alpha1 > > Attachments: MAPREDUCE-4695.patch, MAPREDUCE-4695.patch > > > MAPREDUCE-3223 removed mapreduce.cluster.local.dir property from > mapred-default.xml (since NM local dirs are now used) but failed to counter > that LocalJobRunner, etc. still use it. > {code} > mr-3223.txt:- mapreduce.cluster.local.dir > mr-3223.txt-- ${hadoop.tmp.dir}/mapred/local > {code} > All local job tests have been failing since then. > This JIRA is to reintroduce it or provide an equivalent new config for fixing > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-3149) add a test to verify that buildDTAuthority works for cases with no authority.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-3149: --- Component/s: test > add a test to verify that buildDTAuthority works for cases with no authority. > - > > Key: MAPREDUCE-3149 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3149 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: John George >Assignee: John George > Fix For: 3.0.0-alpha1 > > Attachments: HADOOP-7602.patch > > > Add a test to verify that buildDTAuthority works for cases with no Authority. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-2632: --- Target Version/s: (was: ) Release Note: A partitioner is now only created if there are multiple reducers. I added a release note based on my understanding of this patch, please update if something's off. > Avoid calling the partitioner when the numReduceTasks is 1. > --- > > Key: MAPREDUCE-2632 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ravi Teja Ch N V >Assignee: Sunil G > Fix For: 3.0.0-alpha1 > > Attachments: 0001-MAPREDUCE-2632.patch, MAPREDUCE-2632-1.patch, > MAPREDUCE-2632.patch, mr-2632-2.patch, mr-2632-3.patch, mr-2632-4.patch > > > We can avoid the call to the partitioner when the number of reducers is > 1.This will avoid the unnecessary computations by the partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6223: --- Target Version/s: (was: ) Hadoop Flags: Reviewed (was: Incompatible change,Reviewed) > TestJobConf#testNegativeValueForTaskVmem failures > - > > Key: MAPREDUCE-6223 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Gera Shegalov >Assignee: Varun Saxena > Fix For: 3.0.0-alpha1 > > Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, > MAPREDUCE-6223.003.patch, MAPREDUCE-6223.004.patch, MAPREDUCE-6223.005.patch, > MAPREDUCE-6223.006.patch > > > {code} > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< > FAILURE! - in org.apache.hadoop.conf.TestJobConf > testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time > elapsed: 0.089 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6628: --- Affects Version/s: 2.6.4 Target Version/s: 2.8.0 > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 2.6.4 >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270264#comment-15270264 ] Andrew Wang commented on MAPREDUCE-6628: Thanks for sticking with this for so long Mariappan. The stream-related changes overall look good to me. One naming nit, could we call the boolean "closeWrapperStream" rather than "ownOutputStream"? I think that's more descriptive. The test should also be JUnit4 rather than JUnit3. Can someone more familiar with the MR side review the MapTask and unit test changes? It'd also be good to get confirmation about the overall idea from an MR person. > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6526) Remove usage of metrics v1 from hadoop-mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267721#comment-15267721 ] Andrew Wang commented on MAPREDUCE-6526: Still LGTM :) +1 > Remove usage of metrics v1 from hadoop-mapreduce > > > Key: MAPREDUCE-6526 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6526 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: MAPREDUCE-6526.00.patch, MAPREDUCE-6526.01.patch, > MAPREDUCE-6526.02.patch, MAPREDUCE-6526.03.patch > > > LocalJobRunnerMetrics and ShuffleClientMetrics are still using metrics v1. We > should remove these metrics or rewrite them to use metrics v2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6537) Include hadoop-pipes examples in the release tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6537: --- Affects Version/s: (was: 3.0.0) 2.8.0 > Include hadoop-pipes examples in the release tarball > > > Key: MAPREDUCE-6537 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6537 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 2.8.0 >Reporter: Allen Wittenauer >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HADOOP-12381.00.patch > > > Hadoop pipes examples are built but never packaged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6537) Include hadoop-pipes examples in the release tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6537: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 (was: 3.0.0) Status: Resolved (was: Patch Available) This affected branch-2 too, so I committed back through branch-2.8. Thanks again Kai for the patch, Allen for finding this issue. > Include hadoop-pipes examples in the release tarball > > > Key: MAPREDUCE-6537 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6537 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 2.8.0 >Reporter: Allen Wittenauer >Assignee: Kai Sasaki >Priority: Blocker > Fix For: 2.8.0 > > Attachments: HADOOP-12381.00.patch > > > Hadoop pipes examples are built but never packaged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6537) Include hadoop-pipes examples in the release tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6537: --- Summary: Include hadoop-pipes examples in the release tarball (was: hadoop pipes examples aren't in the mvn package tar ball) > Include hadoop-pipes examples in the release tarball > > > Key: MAPREDUCE-6537 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6537 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Kai Sasaki >Priority: Blocker > Attachments: HADOOP-12381.00.patch > > > Hadoop pipes examples are built but never packaged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6537) hadoop pipes examples aren't in the mvn package tar ball
[ https://issues.apache.org/jira/browse/MAPREDUCE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267636#comment-15267636 ] Andrew Wang commented on MAPREDUCE-6537: Tested the before and after, LGTM. Will commit shortly, thanks for the contribution [~lewuathe]! > hadoop pipes examples aren't in the mvn package tar ball > > > Key: MAPREDUCE-6537 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6537 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Kai Sasaki >Priority: Blocker > Attachments: HADOOP-12381.00.patch > > > Hadoop pipes examples are built but never packaged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6526) Remove usage of metrics v1 from hadoop-mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267590#comment-15267590 ] Andrew Wang commented on MAPREDUCE-6526: +1 LGTM, thanks Akira! > Remove usage of metrics v1 from hadoop-mapreduce > > > Key: MAPREDUCE-6526 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6526 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: MAPREDUCE-6526.00.patch, MAPREDUCE-6526.01.patch, > MAPREDUCE-6526.02.patch > > > LocalJobRunnerMetrics and ShuffleClientMetrics are still using metrics v1. We > should remove these metrics or rewrite them to use metrics v2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176874#comment-15176874 ] Andrew Wang commented on MAPREDUCE-2632: [~kasha] mind adding some release notes for this change? Doing some 3.0.0-related cleanup. > Avoid calling the partitioner when the numReduceTasks is 1. > --- > > Key: MAPREDUCE-2632 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ravi Teja Ch N V >Assignee: Sunil G > Fix For: 3.0.0 > > Attachments: 0001-MAPREDUCE-2632.patch, MAPREDUCE-2632-1.patch, > MAPREDUCE-2632.patch, mr-2632-2.patch, mr-2632-3.patch, mr-2632-4.patch > > > We can avoid the call to the partitioner when the number of reducers is > 1.This will avoid the unnecessary computations by the partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176873#comment-15176873 ] Andrew Wang commented on MAPREDUCE-6223: Same question here as I just posed on MAPREDUCE-6234, do we need to mark this change as incompatible if it's only present with MAPREDUCE-5785, which is already marked incompatible and only in 3.0.0? > TestJobConf#testNegativeValueForTaskVmem failures > - > > Key: MAPREDUCE-6223 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Gera Shegalov >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, > MAPREDUCE-6223.003.patch, MAPREDUCE-6223.004.patch, MAPREDUCE-6223.005.patch, > MAPREDUCE-6223.006.patch > > > {code} > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec <<< > FAILURE! - in org.apache.hadoop.conf.TestJobConf > testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time > elapsed: 0.089 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176871#comment-15176871 ] Andrew Wang commented on MAPREDUCE-6234: Should this change be marked incompatible? Sounds like it's fixing an issue only presented by MAPREDUCE-5785, which is already marked incompatible and only checked into trunk. > TestHighRamJob fails due to the change in MAPREDUCE-5785 > > > Key: MAPREDUCE-6234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/gridmix, mrv2 >Affects Versions: 3.0.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Fix For: 3.0.0 > > Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, > MAPREDUCE-6234.003.patch > > > TestHighRamJob fails by this. > {code} > --- > T E S T S > --- > Running org.apache.hadoop.mapred.gridmix.TestHighRamJob > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec <<< > FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob > testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) > Time elapsed: 1.102 sec <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) > at > org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6637) Testcase Failure : TestFileInputFormat.testSplitLocationInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155215#comment-15155215 ] Andrew Wang commented on MAPREDUCE-6637: +1 LGTM thanks Brahma! Committing shortly. > Testcase Failure : TestFileInputFormat.testSplitLocationInfo > > > Key: MAPREDUCE-6637 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6637 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: MAPREDUCE-6637.patch > > > Following testcase is failing after HADOOP-12810 > {noformat} > FAILED: org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo[0] > Error Message: > expected:<2> but was:<1> > Stack Trace: > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo(TestFileInputFormat.java:115) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6637) Testcase Failure : TestFileInputFormat.testSplitLocationInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-6637: --- Resolution: Fixed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Pushed to trunk, branch-2, branch-2.8, branch-2.7 for 2.7.3. Thanks for find and fix Brahma! > Testcase Failure : TestFileInputFormat.testSplitLocationInfo > > > Key: MAPREDUCE-6637 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6637 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.7.3 > > Attachments: MAPREDUCE-6637.patch > > > Following testcase is failing after HADOOP-12810 > {noformat} > FAILED: org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo[0] > Error Message: > expected:<2> but was:<1> > Stack Trace: > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo(TestFileInputFormat.java:115) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137493#comment-15137493 ] Andrew Wang commented on MAPREDUCE-6628: [~hitliuyi], any thoughts on this one? > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6455) Unable to use surefire 2.18
[ https://issues.apache.org/jira/browse/MAPREDUCE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717082#comment-14717082 ] Andrew Wang commented on MAPREDUCE-6455: LGTM! Thanks Charlie, I'll revert the original and commit this one down. Unable to use surefire 2.18 - Key: MAPREDUCE-6455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6455 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.1 Reporter: Charlie Helin Assignee: Charlie Helin Fix For: 3.0.0 Attachments: mr-6455.1.patch, mr-6455.2.patch, mr-6455.2.patch, mr-6455.3.patch, mr-6455.4.patch There are some compelling features in later version of surefire which lets one exclude/include tests based the content of a file, re-running of test case etc. However introduced in Surefire 2.18 is also https://issues.apache.org/jira/browse/SUREFIRE-649. Which changed the convention of null properties to empty string values (). This only applies to forked tests such as the MapReduce tests and cause a couple of them to fail because of functionality that is directly or indirectly dependent on the value being null. One such example is Configuration.substituteVars() and TaskLog.getBaseLogDir(). substituteVars() shows the issue when the getProperty returns empty String, skipping the getRaw(var) expression. One way to work around this could be {code} if (val == null || val.isEmpty()) { String raw = getRaw(var); if (raw != null) { // raw contains a value, otherwise default to whatever System.getProperty returned // since it could be an empty string val = raw; } } {code} getBaseLogDir, similarly when returns an empty string the schematics of java.io.File differs dependent on whether parent is null or . A null value is interpreted as new File(file); whereas will be interpreted as new File(defaultParent /* / */, file); This could simply be addressed with {code} static String getBaseLogDir() { String logDir = System.getProperty(hadoop.log.dir); // there is a difference how null and is treated as a parent // directory when creating a file return logDir == null || logDir.isEmpty() ? null : logDir; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6455) Unable to use surefire 2.18
[ https://issues.apache.org/jira/browse/MAPREDUCE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715842#comment-14715842 ] Andrew Wang commented on MAPREDUCE-6455: Nice, this pom change fixes it? Only nit is that we should keep test.build.dir with its comment, broken by the reordering, i.e.: {noformat} !-- TODO: all references in testcases should be updated to this default -- test.build.dir${test.build.dir}/test.build.dir {noformat} Unable to use surefire 2.18 - Key: MAPREDUCE-6455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6455 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.1 Reporter: Charlie Helin Assignee: Charlie Helin Fix For: 3.0.0 Attachments: mr-6455.1.patch, mr-6455.2.patch, mr-6455.2.patch, mr-6455.3.patch There are some compelling features in later version of surefire which lets one exclude/include tests based the content of a file, re-running of test case etc. However introduced in Surefire 2.18 is also https://issues.apache.org/jira/browse/SUREFIRE-649. Which changed the convention of null properties to empty string values (). This only applies to forked tests such as the MapReduce tests and cause a couple of them to fail because of functionality that is directly or indirectly dependent on the value being null. One such example is Configuration.substituteVars() and TaskLog.getBaseLogDir(). substituteVars() shows the issue when the getProperty returns empty String, skipping the getRaw(var) expression. One way to work around this could be {code} if (val == null || val.isEmpty()) { String raw = getRaw(var); if (raw != null) { // raw contains a value, otherwise default to whatever System.getProperty returned // since it could be an empty string val = raw; } } {code} getBaseLogDir, similarly when returns an empty string the schematics of java.io.File differs dependent on whether parent is null or . A null value is interpreted as new File(file); whereas will be interpreted as new File(defaultParent /* / */, file); This could simply be addressed with {code} static String getBaseLogDir() { String logDir = System.getProperty(hadoop.log.dir); // there is a difference how null and is treated as a parent // directory when creating a file return logDir == null || logDir.isEmpty() ? null : logDir; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6455) Unable to use surefire 2.18
[ https://issues.apache.org/jira/browse/MAPREDUCE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709967#comment-14709967 ] Andrew Wang commented on MAPREDUCE-6455: Sorry if I'm missing something, but IIUC that surefire change affects parsing of java system properties when running a test. Why are the fixes happening in Configuration, vs. in a test class or the pom or something? Unable to use surefire 2.18 - Key: MAPREDUCE-6455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6455 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.1 Reporter: Charlie Helin Assignee: Charlie Helin Fix For: 3.0.0 Attachments: mr-6455.1.patch, mr-6455.2.patch, mr-6455.2.patch There are some compelling features in later version of surefire which lets one exclude/include tests based the content of a file, re-running of test case etc. However introduced in Surefire 2.18 is also https://issues.apache.org/jira/browse/SUREFIRE-649. Which changed the convention of null properties to empty string values (). This only applies to forked tests such as the MapReduce tests and cause a couple of them to fail because of functionality that is directly or indirectly dependent on the value being null. One such example is Configuration.substituteVars() and TaskLog.getBaseLogDir(). substituteVars() shows the issue when the getProperty returns empty String, skipping the getRaw(var) expression. One way to work around this could be {code} if (val == null || val.isEmpty()) { String raw = getRaw(var); if (raw != null) { // raw contains a value, otherwise default to whatever System.getProperty returned // since it could be an empty string val = raw; } } {code} getBaseLogDir, similarly when returns an empty string the schematics of java.io.File differs dependent on whether parent is null or . A null value is interpreted as new File(file); whereas will be interpreted as new File(defaultParent /* / */, file); This could simply be addressed with {code} static String getBaseLogDir() { String logDir = System.getProperty(hadoop.log.dir); // there is a difference how null and is treated as a parent // directory when creating a file return logDir == null || logDir.isEmpty() ? null : logDir; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6455) Unable to use surefire 2.18
[ https://issues.apache.org/jira/browse/MAPREDUCE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710053#comment-14710053 ] Andrew Wang commented on MAPREDUCE-6455: I talked with [~chelin] about this offline, seems pretty complex. Charlie's current thinking is that surefire is somehow passing some properties incorrectly when running in fork mode, leading to some expected variables like hadoop.log.dir being unset, and then us running into this surefire behavior change. That sounds like a more fundamental issue than null vs. {{}}. The cleaner fix seems like setting these variables properly rather than relying on null/{{}}/default parsing, and will avoid modifying non-test and non-pom code. Thanks again to [~chelin] for the discussion and working on this issue. Unable to use surefire 2.18 - Key: MAPREDUCE-6455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6455 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.1 Reporter: Charlie Helin Assignee: Charlie Helin Fix For: 3.0.0 Attachments: mr-6455.1.patch, mr-6455.2.patch, mr-6455.2.patch There are some compelling features in later version of surefire which lets one exclude/include tests based the content of a file, re-running of test case etc. However introduced in Surefire 2.18 is also https://issues.apache.org/jira/browse/SUREFIRE-649. Which changed the convention of null properties to empty string values (). This only applies to forked tests such as the MapReduce tests and cause a couple of them to fail because of functionality that is directly or indirectly dependent on the value being null. One such example is Configuration.substituteVars() and TaskLog.getBaseLogDir(). substituteVars() shows the issue when the getProperty returns empty String, skipping the getRaw(var) expression. One way to work around this could be {code} if (val == null || val.isEmpty()) { String raw = getRaw(var); if (raw != null) { // raw contains a value, otherwise default to whatever System.getProperty returned // since it could be an empty string val = raw; } } {code} getBaseLogDir, similarly when returns an empty string the schematics of java.io.File differs dependent on whether parent is null or . A null value is interpreted as new File(file); whereas will be interpreted as new File(defaultParent /* / */, file); This could simply be addressed with {code} static String getBaseLogDir() { String logDir = System.getProperty(hadoop.log.dir); // there is a difference how null and is treated as a parent // directory when creating a file return logDir == null || logDir.isEmpty() ? null : logDir; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6171) The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone
[ https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6171. Resolution: Duplicate Fix Version/s: 2.7.0 Duping this to HADOOP-11341 since Dian reports that it fixes this issue. Thanks again Dian/Arun for finding and working on this. The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone --- Key: MAPREDUCE-6171 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Dian Fu Fix For: 2.7.0 The visibilities of the distributed cache files and archives are currently determined by the permission of these files or archives. The following is the logic of method isPublic() in class ClientDistributedCacheManager: {code} static boolean isPublic(Configuration conf, URI uri, MapURI, FileStatus statCache) throws IOException { FileSystem fs = FileSystem.get(uri, conf); Path current = new Path(uri.getPath()); //the leaf level file should be readable by others if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) { return false; } return ancestorsHaveExecutePermissions(fs, current.getParent(), statCache); } {code} At NodeManager side, it will use yarn user to download public files and use the user who submits the job to download private files. In normal cases, there is no problem with this. However, if the files are located in an encryption zone(HDFS-6134) and yarn user are configured to be disallowed to fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download process of this file will fail. You can reproduce this issue with the following steps (assume you submit job with user testUser): # create a clean cluster which has HDFS cryptographic FileSystem feature # create directory /data/ in HDFS and make it as an encryption zone with keyName testKey # configure KMS to only allow user testUser can decrypt DEK of key testKey in KMS {code} property namekey.acl.testKey.DECRYPT_EEK/name valuetestUser/value /property {code} # execute job teragen with user testUser: {code} su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar teragen 1 /data/terasort-input {code} # execute job terasort with user testUser: {code} su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar terasort /data/terasort-input /data/terasort-output {code} You will see logs like this at the job submitter's console: {code} INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due to: Application application_1416860917658_0002 failed 2 times due to AM Container for appattempt_1416860917658_0002_02 exited with exitCode: -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name [testKey]!! {code} The initial idea to solve this issue is to modify the logic in ClientDistributedCacheManager.isPublic to consider also whether this file is in an encryption zone. If it is in an encryption zone, this file should be considered as private. Then at NodeManager side, it will use user who submits the job to fetch the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6041) Fix TestOptionsParser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang moved HDFS-6872 to MAPREDUCE-6041: -- Component/s: (was: security) (was: namenode) security Fix Version/s: (was: fs-encryption (HADOOP-10150 and HDFS-6134)) fs-encryption Target Version/s: fs-encryption (was: fs-encryption (HADOOP-10150 and HDFS-6134)) Affects Version/s: (was: fs-encryption (HADOOP-10150 and HDFS-6134)) Key: MAPREDUCE-6041 (was: HDFS-6872) Project: Hadoop Map/Reduce (was: Hadoop HDFS) Fix TestOptionsParser - Key: MAPREDUCE-6041 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6041 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Charles Lamb Assignee: Charles Lamb Fix For: fs-encryption Attachments: HDFS-6872.001.patch Error Message expected:...argetPathExists=true[]} but was:...argetPathExists=true[, preserveRawXattrs=false]} Stacktrace org.junit.ComparisonFailure: expected:...argetPathExists=true[]} but was:...argetPathExists=true[, preserveRawXattrs=false]} at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.tools.TestOptionsParser.testToString(TestOptionsParser.java:361) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6041) Fix TestOptionsParser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104387#comment-14104387 ] Andrew Wang commented on MAPREDUCE-6041: While doing the CHANGES.TXT update, I noticed this was an HDFS JIRA in the MR CHANGES.txt, so I moved this to a MAPREDUCE JIRA. Fix TestOptionsParser - Key: MAPREDUCE-6041 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6041 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Charles Lamb Assignee: Charles Lamb Fix For: fs-encryption Attachments: HDFS-6872.001.patch Error Message expected:...argetPathExists=true[]} but was:...argetPathExists=true[, preserveRawXattrs=false]} Stacktrace org.junit.ComparisonFailure: expected:...argetPathExists=true[]} but was:...argetPathExists=true[, preserveRawXattrs=false]} at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.tools.TestOptionsParser.testToString(TestOptionsParser.java:361) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6040) Automatically use /.reserved/raw when run by the superuser
Andrew Wang created MAPREDUCE-6040: -- Summary: Automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: fs-encryption Reporter: Andrew Wang Assignee: Charles Lamb On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. Naturally, we'd also want a flag to disable this behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-6008) Update distcp docs to include new option that suppresses preservation of RAW.* namespace extended attributes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6008. Resolution: Not a Problem We took care of the docs in parent JIRA, no need for this one. Update distcp docs to include new option that suppresses preservation of RAW.* namespace extended attributes Key: MAPREDUCE-6008 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6008 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Update the docs to include this new option. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089876#comment-14089876 ] Andrew Wang commented on MAPREDUCE-6007: Nice work, this is way simpler. I think we're pretty close. * In the md.vm file, let's scratch the change to the table, I think the section is enough by itself. * Not sure we're fully qualifying relative paths correctly. I wrote a small test which I expected to work. Could you confirm? I think we just need to qualify the src paths with the src FileSystem first. {code} @Test public void testWorkingDir() throws Exception { final Path wd = fs.getWorkingDirectory(); try { fs.setWorkingDirectory(new Path(/.reserved/raw/)); doTestPreserveRawXAttrs(raw/src, raw/dest, -px, true, true, DistCpConstants.SUCCESS); } finally { fs.setWorkingDirectory(wd); } } {code} Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, MAPREDUCE-6007.003.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089878#comment-14089878 ] Andrew Wang commented on MAPREDUCE-6007: Eh, I looked at the test output, and it's complaining about raw/src doesn't exist. I guess distcp doesn't support relative paths? In that case, +1 pending the doc change. Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, MAPREDUCE-6007.003.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089890#comment-14089890 ] Andrew Wang commented on MAPREDUCE-6007: Okay, so I need to stop rushing this :) If you fix my above test by removing the raw path components, you'll see that the target path isn't being qualified before being checked. Try adding this near the top of SimpleCopyListing#validatePaths: {code} # Qualify the target path before checking targetPath = targetFS.makeQualified(targetPath); final boolean targetIsReservedRaw = Path.getPathWithoutSchemeAndAuthority(targetPath).toString(). startsWith(HDFS_RESERVED_RAW_DIRECTORY_NAME); {code} Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, MAPREDUCE-6007.003.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088300#comment-14088300 ] Andrew Wang commented on MAPREDUCE-6007: bq. the only typo above is the last line which should be no raw xattrs are preserved If none of these flags are specified, AFAIK neither non-raw or raw xattrs are preserved, i.e. no xattrs. Yes? bq. I convinced myself that a relative path could never be relative to /.reserved/raw since you can't set your working directory to that. AFAIK you can set your wd to whatever you want, and you can have .. in absolute paths too. We need to make sure that this path is fully normalized if we're doing a prefix check. Paths from a FileStatus are normalized, but paths coming from the user (like the ones coming out of a DistCpOptions) are suspect. setTargetPathExists has one of these suspect checks. Doc * This is hard to read, could we expand this into a separate section and a new table? I'd particularly like to see a fuller explanation of what happens with different dst options. CopyListing * Let's improve the InvalidInputException message. Paths don't really specify something, you could say starts with or something instead. We should also print the target path. * I don't quite understand this error either, why is a {{/.r/r}} src and {{-pd}} not okay? The exception also mentions the target not starting with {{/.r/r}}, but that's not part of the if check. * Line longer than 80chars * I expected to see a check that was if (-p || -px) !-pd src is /.r/r, then also check that the dst supports xattrs and is /.r/r. I wish there was a way to test that it's HDFS too, but looking for dest having /.r/r is probably good enough. CopyMapper * Can we expand the block comment to say that toCopyListingFileStatus is used to filter xattrs, and passing copyXAttrs in twice is okay because we already did it earlier? The double passing looks weird, though logically correct. DistCp: * I really don't like setting the DISABLERAWXATTRS flag in setTargetPathExists, since the expectation is that Options flags are set by the user. This method is also not named such that doing this there makes sense. We have the target path via the DistCpOptions, so let's be explicit and verbose with the checks instead. This is quite possibly why the CopyListing check is confusing to me. * To expand on the above, -px means preserving all xattrs, while -pxd means preserving non-raw xattrs. Then we have {{toCopyListingFileStatus}} where the {{preserveXAttrs}} parameter actually means preserve non-raw xattrs. This is also definitely confusing... DistCpOptionSwitch: * XATTR is not a standard capitalization style, let's lower case it as xattr here. XAttr isn't standard either, but that ship has sailed. Test * I'd like tests for weird src and dst paths, i.e. relative or containing ..s * We could also test the no preserve flags behavior, that no xattrs at all are preserved. Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085543#comment-14085543 ] Andrew Wang commented on MAPREDUCE-6007: Hi Charles, thanks for the patch, I had to take notes while reviewing this patch, the behavior is kind of complicated. We have a variety of flags that can be specified, and the destination FS can have different levels of support. It'd be very useful to specify this behavior in gory detail in the DistCp documentation. Check me on this though: Options: {noformat} -px : preserve raw and non-raw xattrs -pr : no xattrs are preserved -p : preserve raw xattrs -pxr: preserve non-raw xattrs : no xattrs are preserved {noformat} Behavior with a given src and dst, varying levels of dst support: * raw src, raw dst: the options apply as specified above * raw src, not-raw dst, dst supports xattrs but no {{/reserved/.raw}}: we will fail to set raw xattrs at runtime. * raw src, dst doesn't support xattrs: if {{-pX}} is specified, throws an exception. Else, silently discards raw xattrs. Some discussion on the above: * If the src is {{/reserved/.raw}}, the user is expecting preservation of raw xattrs when {{-p}} or {{-pX}} is specified. In this scenario, we should test that the dest is {{/.reserved/raw}} and that it's present on the dstFS. * There might be other weird cases, haven't thought through all of them Some code review comments: Misc: - We have both {{noPreserveRaw}} and {{preserveRaw}} booleans, can we standardize on one everywhere? I'd like a negative one, call it {{disableRaw}} or {{excludeRaw}} since it better captures the meaning of the flag. {{exclude}} feels a bit better IMO, but it looks like {{-pe}} is taken. - What's the expected behavior when the dest doesn't support xattrs or reserved raw, or supports xattrs but not reserved raw? - CopyListing, this is where we'd also test to see if the destFS has a /.reserved/raw directory - CopyMapper, two periods in the block comment Documentation: - I don't want to tie raw preservation just to encryption since we might also use it for compression, how about this instead: {quote} d: disable preservation of raw namespace extended attributes ... raw namespace extended attributes are preserved by default if supported. Specifying -pd disables preservation of these xattrs. {quote} - As noted above, it'd be good to have the expected preservation behavior laid out in the distcp documentation. DistCp: {code} if (!Path.getPathWithoutSchemeAndAuthority(target).toString(). {code} What if the target is a relative path here? Test: - Any reason this isn't part of the existing XAttr test? They seem pretty similar, and you also added a PXD test to the existing test. - Don't need to do makeFilesAndDirs inO the BeforeClass - Doesn't there need to be a non-raw attribute set so you can test some of these combinations? - Can we test what happens when the dest FS doesn't support xattrs or raw xattrs? Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5971) move the default options for distcp -p to DistCpOptionSwitch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063917#comment-14063917 ] Andrew Wang commented on MAPREDUCE-5971: Hi Charles, thanks for working on this, I took a quick look: Since we're adding a new getDefaultValue to DistCpOptionSwitch, shouldn't we make the handling of default values in CustomParser generic as well? Right now using the default value is still special cased only for PRESERVE_STATUS. Maybe build a map with the default values in CustomParser? move the default options for distcp -p to DistCpOptionSwitch Key: MAPREDUCE-5971 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5971 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: trunk Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: MAPREDUCE-5971.001.patch The default preserve flags for distcp -p are embedded in the OptionsParser code. Refactor to co-locate them with the actual flag initialization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5971) move the default options for distcp -p to DistCpOptionSwitch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064171#comment-14064171 ] Andrew Wang commented on MAPREDUCE-5971: +1 pending, thanks charles move the default options for distcp -p to DistCpOptionSwitch Key: MAPREDUCE-5971 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5971 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: trunk Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: MAPREDUCE-5971.001.patch, MAPREDUCE-5971.002.patch The default preserve flags for distcp -p are embedded in the OptionsParser code. Refactor to co-locate them with the actual flag initialization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5971) Move the default options for distcp -p to DistCpOptionSwitch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-5971: --- Summary: Move the default options for distcp -p to DistCpOptionSwitch (was: move the default options for distcp -p to DistCpOptionSwitch) Move the default options for distcp -p to DistCpOptionSwitch Key: MAPREDUCE-5971 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5971 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: trunk Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: MAPREDUCE-5971.001.patch, MAPREDUCE-5971.002.patch The default preserve flags for distcp -p are embedded in the OptionsParser code. Refactor to co-locate them with the actual flag initialization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5971) Move the default options for distcp -p to DistCpOptionSwitch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-5971: --- Resolution: Fixed Fix Version/s: 2.6.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks Charles Move the default options for distcp -p to DistCpOptionSwitch Key: MAPREDUCE-5971 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5971 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: trunk Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Fix For: 2.6.0 Attachments: MAPREDUCE-5971.001.patch, MAPREDUCE-5971.002.patch The default preserve flags for distcp -p are embedded in the OptionsParser code. Refactor to co-locate them with the actual flag initialization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002373#comment-14002373 ] Andrew Wang commented on MAPREDUCE-5867: Hey Devraj, I think TestKillAMPreemptionPolicy.java was committed with CRLFs rather than LFs, which messes up {{git diff}} for those of us using the git mirror. Do you mind fixing this? Thanks. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5867) Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/MAPREDUCE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002412#comment-14002412 ] Andrew Wang commented on MAPREDUCE-5867: Actually, nevermind, I fixed it myself. I learned something new about SVN, apparently we should be doing svn propset svn:eol-style native file on new files (thanks cmccabe for the tip). I ran {{dos2unix}} to convert the newlines too. Possible NPE in KillAMPreemptionPolicy related to ProportionalCapacityPreemptionPolicy -- Key: MAPREDUCE-5867 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5867 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Assignee: Sunil G Fix For: 3.0.0 Attachments: MapReduce-5867-updated.patch, MapReduce-5867-updated.patch, MapReduce-5867.2.patch, MapReduce-5867.3.patch, Yarn-1980.1.patch I configured KillAMPreemptionPolicy for My Application Master and tried to check preemption of queues. In one scenario I have seen below NPE in my AM 014-04-24 15:11:08,860 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.preemption.KillAMPreemptionPolicy.preempt(KillAMPreemptionPolicy.java:57) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:662) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:246) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:267) at java.lang.Thread.run(Thread.java:662) I was using 2.2.0 and merged MAPREDUCE-5189 to see how AM preemption works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5790) Default map hprof profile options do not work
Andrew Wang created MAPREDUCE-5790: -- Summary: Default map hprof profile options do not work Key: MAPREDUCE-5790 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Environment: java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: Andrew Wang I have an MR job doing the following: {code} Job job = Job.getInstance(conf); // Enable profiling job.setProfileEnabled(true); job.setProfileTaskRange(true, 0); job.setProfileTaskRange(false, 0); {code} When I run this job, some of my map tasks fail with this error message: {noformat} org.apache.hadoop.util.Shell$ExitCodeException: /data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh: line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA ${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout 2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr : bad substitution {noformat} It looks like ${mapreduce.task.profile.params} is not getting subbed in correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5620) distcp1 -delete fails when target directory contains files with percent signs
Andrew Wang created MAPREDUCE-5620: -- Summary: distcp1 -delete fails when target directory contains files with percent signs Key: MAPREDUCE-5620 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5620 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Andrew Wang Assignee: Andrew Wang Debugging a distcp1 issue, it fails to delete extra files in the target directory when there is a percent sign in the filename. I'm pretty sure this is an issue with how percent encoding is handled in FsShell (reproduced with just hadoop fs -rmr), but we can also fix this in distcp1 by using FileSystem instead of FsShell. This is what distcp2 does. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (MAPREDUCE-5620) distcp1 -delete fails when target directory contains files with percent signs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-5620. Resolution: Invalid Turns out this was due to running distcp1 with hadoop 2's FsShell. I couldn't repro this on a pure branch-1 setup, so resolving as invalid. distcp1 -delete fails when target directory contains files with percent signs - Key: MAPREDUCE-5620 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5620 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Andrew Wang Assignee: Andrew Wang Debugging a distcp1 issue, it fails to delete extra files in the target directory when there is a percent sign in the filename. I'm pretty sure this is an issue with how percent encoding is handled in FsShell (reproduced with just hadoop fs -rmr), but we can also fix this in distcp1 by using FileSystem instead of FsShell. This is what distcp2 does. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf
[ https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766901#comment-13766901 ] Andrew Wang commented on MAPREDUCE-5379: Thanks Karthik, the patch looks good to me. As I'm not well-versed in the ways of MR, it'd be good to get confirmation from someone else as well. Include token tracking ids in jobconf - Key: MAPREDUCE-5379 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission, security Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, MAPREDUCE-5379.patch, mr-5379-3.patch HDFS-4680 enables audit logging delegation tokens. By storing the tracking ids in the job conf, we can enable tracking what files each job touches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5193) A few MR tests use block sizes which are smaller than the default minimum block size
[ https://issues.apache.org/jira/browse/MAPREDUCE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-5193: --- Attachment: mapreduce-5193-1.patch ATM told me it was okay to poach this, so here's a patch. It sets the min block size to 0 in the /src/test/resource {{hdfs-site.xml}}, which is the same fix we used for the HDFS tests. I ran the failed tests from the MAPREDUCE-5156 patch successfully. Looking at the daily build, most of the other components are fine. I also ran the tests in the skipped components {{hs-plugin}} and examples successfully, so hopefully it'll fix everything. A few MR tests use block sizes which are smaller than the default minimum block size Key: MAPREDUCE-5193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5193 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.5-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: MAPREDUCE-5156.1.patch, mapreduce-5193-1.patch HDFS-4305 introduced a new configurable minimum block size of 1MB. A few MR tests deliberately set much smaller block sizes. This JIRA is to update those tests to fix these failing tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)
Andrew Wang created MAPREDUCE-5033: -- Summary: mapred shell script should respect usage flags (--help -help -h) Key: MAPREDUCE-5033 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y help flags. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-5033: --- Attachment: mapreduce-5033-1.patch Little patch attached. Tested manually by running the mapred script. mapred shell script should respect usage flags (--help -help -h) Key: MAPREDUCE-5033 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: mapreduce-5033-1.patch Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y help flags. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated MAPREDUCE-5033: --- Status: Patch Available (was: Open) mapred shell script should respect usage flags (--help -help -h) Key: MAPREDUCE-5033 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: mapreduce-5033-1.patch Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y help flags. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (MAPREDUCE-5026) For shortening the time of TaskTracker heartbeat, decouple the statics collection operations
[ https://issues.apache.org/jira/browse/MAPREDUCE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang moved HDFS-4527 to MAPREDUCE-5026: -- Component/s: (was: performance) tasktracker performance Fix Version/s: (was: 1.1.1) 1.1.1 Target Version/s: (was: 1.1.1) Affects Version/s: (was: 1.1.1) 1.1.1 Key: MAPREDUCE-5026 (was: HDFS-4527) Project: Hadoop Map/Reduce (was: Hadoop HDFS) For shortening the time of TaskTracker heartbeat, decouple the statics collection operations Key: MAPREDUCE-5026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5026 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, tasktracker Affects Versions: 1.1.1 Reporter: sam liu Labels: patch Fix For: 1.1.1 Attachments: HDFS-4527.patch Original Estimate: 24h Remaining Estimate: 24h In each heartbeat of TaskTracker, it will calculate some system statics, like the free disk space, available virtual/physical memory, cpu usage, etc. However, it's not necessary to calculate all the statics in every heartbeat, and this will consume many system resource and impace the performance of TaskTracker heartbeat. Furthermore, the characteristics of system properties(disk, memory, cpu) are different and it's better to collect their statics in different intervals. To reduce the latency of TaskTracker heartbeat, one solution is to decouple all the system statics collection operations from it, and issue separate threads to do the statics collection works when the TaskTracker starts. The threads could be three: the first one is to collect cpu related statics in a short interval; the second one is to collect memory related statics in a normal interval; the third one is to collect disk related statics in a long interval. And all the interval could be customized by the parameter mapred.stats.collection.interval in the mapred-site.xml. At last, the heartbeat could get values of system statics from the memory directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5026) For shortening the time of TaskTracker heartbeat, decouple the statics collection operations
[ https://issues.apache.org/jira/browse/MAPREDUCE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586136#comment-13586136 ] Andrew Wang commented on MAPREDUCE-5026: Hi Sam, Thanks for the patch. I moved your issue to MAPREDUCE, since the TaskTracker isn't a component of HDFS. A few minor comments: * Please rename Statics to Statistics in the code. * Could you provide some performance numbers, to quantify the before and after improvement? For shortening the time of TaskTracker heartbeat, decouple the statics collection operations Key: MAPREDUCE-5026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5026 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, tasktracker Affects Versions: 1.1.1 Reporter: sam liu Labels: patch Fix For: 1.1.1 Attachments: HDFS-4527.patch Original Estimate: 24h Remaining Estimate: 24h In each heartbeat of TaskTracker, it will calculate some system statics, like the free disk space, available virtual/physical memory, cpu usage, etc. However, it's not necessary to calculate all the statics in every heartbeat, and this will consume many system resource and impace the performance of TaskTracker heartbeat. Furthermore, the characteristics of system properties(disk, memory, cpu) are different and it's better to collect their statics in different intervals. To reduce the latency of TaskTracker heartbeat, one solution is to decouple all the system statics collection operations from it, and issue separate threads to do the statics collection works when the TaskTracker starts. The threads could be three: the first one is to collect cpu related statics in a short interval; the second one is to collect memory related statics in a normal interval; the third one is to collect disk related statics in a long interval. And all the interval could be customized by the parameter mapred.stats.collection.interval in the mapred-site.xml. At last, the heartbeat could get values of system statics from the memory directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira