[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321929#comment-15321929 ] Hadoop QA commented on MAPREDUCE-6690: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s {color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 2 new + 2508 unchanged - 1 fixed = 2510 total (was 2509) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 1s {color} | {color:red} hadoop-mapreduce-client-core in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 114m 22s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 135m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.tools.TestCLI | | | hadoop.mapred.TestMRCJCFileOutputCommitter | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12809100/MAPREDUCE-6690-trunk-v4.patch | | JIRA Issue | MAPREDUCE-6690 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux f6e7eb5194a4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1500a0a | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | javadoc |
[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321825#comment-15321825 ] Hadoop QA commented on MAPREDUCE-6690: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} | {color:red} Docker failed to build yetus/hadoop:2c91fd8. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12809100/MAPREDUCE-6690-trunk-v4.patch | | JIRA Issue | MAPREDUCE-6690 | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6543/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch, > MAPREDUCE-6690-trunk-v4.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated MAPREDUCE-6690: Attachment: MAPREDUCE-6690-trunk-v4.patch V4 attached. # Fixed checkstyle/javadoc. # Fixed TestMRJobs failures (test only changes). > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch, > MAPREDUCE-6690-trunk-v4.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6712) Support grouping values for reducer on java-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321658#comment-15321658 ] He Tianyi commented on MAPREDUCE-6712: -- Actually in my experiements (in-house workload) turning strings back and forth is not the bottleneck (does not make a difference with typedbytes). But just grouping values make a simple reducer 20% faster (for both text and typedbytes). Also, many users are using C/C++ to implement mapper/reducer which I think is possible to be more efficient than java/scala (smaller memory footprint, less gc, no virtual call overhead, better SIMD support, etc.). > Support grouping values for reducer on java-side > > > Key: MAPREDUCE-6712 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: He Tianyi >Priority: Minor > > In hadoop streaming, with TextInputWriter, reducer program will receive each > line representing a (k, v) tuple from {{stdin}}, in which values with > identical key is not grouped. > This brings some inefficiency, especially for runtimes based on interpreter > (e.g. cpython), coming from: > A. user program has to compare key with previous one (but on java side, > records already come to reducer in groups), > B. user program has to perform {{read}}, then {{find}} or {{split}} on each > record. even if there are multiple values with identical key, > C. if length of key is large, apparently this introduces inefficiency for > caching, > Suppose we need another InputWriter. But this is not enough, since the > interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not > {{writeValues}}. Though we can compare key in custom InputWriter and group > them, but this is also inefficient. Some other changes are also needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6712) Support grouping values for reducer on java-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321658#comment-15321658 ] He Tianyi edited comment on MAPREDUCE-6712 at 6/8/16 11:34 PM: --- Actually in my experiements (in-house workload) turning strings back and forth is not the bottleneck (does not make a difference with typedbytes). But just grouping values make a simple reducer 20% faster (for both text and typedbytes). Also, many users are using C/C++ to implement mapper/reducer which I think is possible to be more efficient than java/scala (smaller memory footprint, less gc, better SIMD support, etc.). was (Author: he tianyi): Actually in my experiements (in-house workload) turning strings back and forth is not the bottleneck (does not make a difference with typedbytes). But just grouping values make a simple reducer 20% faster (for both text and typedbytes). Also, many users are using C/C++ to implement mapper/reducer which I think is possible to be more efficient than java/scala (smaller memory footprint, less gc, no virtual call overhead, better SIMD support, etc.). > Support grouping values for reducer on java-side > > > Key: MAPREDUCE-6712 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: He Tianyi >Priority: Minor > > In hadoop streaming, with TextInputWriter, reducer program will receive each > line representing a (k, v) tuple from {{stdin}}, in which values with > identical key is not grouped. > This brings some inefficiency, especially for runtimes based on interpreter > (e.g. cpython), coming from: > A. user program has to compare key with previous one (but on java side, > records already come to reducer in groups), > B. user program has to perform {{read}}, then {{find}} or {{split}} on each > record. even if there are multiple values with identical key, > C. if length of key is large, apparently this introduces inefficiency for > caching, > Suppose we need another InputWriter. But this is not enough, since the > interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not > {{writeValues}}. Though we can compare key in custom InputWriter and group > them, but this is also inefficient. Some other changes are also needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6240: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks to [~kamrul] for the initial patch and [~chris.douglas] and [~ajithshetty] for reviews. > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321394#comment-15321394 ] Hudson commented on MAPREDUCE-6240: --- SUCCESS: Integrated in Hadoop-trunk-Commit #9932 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9932/]) MAPREDUCE-6240. Hadoop client displays confusing error message. (gera) (gera: rev 0af96a1c08594c809ecb254cee4f60dd22399772) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Cluster.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestClientProtocolProviderImpls.java > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321375#comment-15321375 ] Gera Shegalov commented on MAPREDUCE-6240: -- Actually sorry I misunderstood the comment by Ajith. It reminds me that I left extra wrapping for suppressed exceptions as an artifact of using MultiIOExceptions in prior patches. I would normally get rid of them, but I find adding the provider class name to the message actually quite useful. > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321330#comment-15321330 ] Jason Lowe commented on MAPREDUCE-6690: --- bq. I assumed that YARN-5192 would implement the check as part of the submit call so that the client gets immediate feedback. Note that YARN-5192 cannot do the check on application submit. An application submit only requires the resources necessary to get the ApplicationMaster localized. Subsequent containers for the application could have a completely different set of resources, and they won't be available in the application submission context for validation at submit time. MapReduce is an app framework that happens to localize all resources for all containers, but other application frameworks do not always do this. bq. I would like to find a way, however, to try to keep the two settings in sync if possible. Agreed it would be annoying for admins to have to keep these in sync, assuming nobody would ever want to configure the YARN limit higher than the MapReduce limit. bq. What about having the RM offer up its resource limits through a call? That would be one way to tackle it. There have been cases in the past where it would have been nice for clients to be able to query config settings via the central daemons (i.e.: namenode, resourcemanager, etc.) rather than assume the local settings in hdfs-site.xml or yarn-site.xml are the same as what the central daemon is using. That's a somewhat open-ended API change for YARN with backwards-compatibility concerns going forward, but maybe it's time we hammered out whether or not we're going to do it on a YARN JIRA and if not, what clients/users are supposed to do to better keep the client and the server in sync. > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321316#comment-15321316 ] Daniel Templeton commented on MAPREDUCE-6690: - Thanks for the clarification, [~jlowe]. I assumed that YARN-5192 would implement the check as part of the submit call so that the client gets immediate feedback. The point that I forgot about, though, is that regardless the submit only happens after the resources have been uploaded to HDFS. Given that this check specifically targets wide loads, the cases where the server-side check would reject the submit are exactly the ones that will waste the most time with the upload. I now see the light. I would like to find a way, however, to try to keep the two settings in sync if possible. I've seen cases, such as the number of concurrent moves in the HDFS mover, where the limit is set on both the client and server sides, and it ends up confusing customers. What about having the RM offer up its resource limits through a call? The client could then query the RM's limits and apply those. > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321305#comment-15321305 ] Jason Lowe commented on MAPREDUCE-6690: --- Implementing the check in MapReduce allows for fast-failure and more accurate/informative errors to the client. The check in MapReduce can prevent an unnecessary upload of one or more resources to the staging area in HDFS because the client knows the job is going to fail anyway. Also YARN-5192 will only be able to detect the error when a container starts to localize on a node that asks for a resource set that violate the limits. Since MapReduce localizes everything for all containers (including the AM) it will fail under YARN-5192 as soon as the AM tries to run on a node, but it might take a while for the AM to get scheduled. As for error reporting, if the violation comes from one or more files that were submitted locally then the paths via a YARN-5192 check will be for HDFS staging directories rather than the local path the client originally specified. The error also will not be reported to the job client submitting the job unless it hangs around to monitor the job after submission. With this check the job client will get the error directly when it tries to submit. If we don't care much about these differences then we can just go with the YARN-5192 implementation. > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321191#comment-15321191 ] Gera Shegalov commented on MAPREDUCE-6240: -- Thanks [~ajithshetty] and [~chris.douglas] for the comments. imo including additional exceptions only distracts from the suppressed root cause by making exception chains longer. Committing as is. > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321189#comment-15321189 ] Daniel Templeton commented on MAPREDUCE-6690: - Please suffer me a dumb question: assuming that YARN-5192 is implemented, why do we also need this JIRA? Doesn't having two settings to do the same thing from different ends make the system needlessly confusing? > Limit the number of resources a single map reduce job can submit for > localization > - > > Key: MAPREDUCE-6690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: MAPREDUCE-6690-trunk-v1.patch, > MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch > > > Users will sometimes submit a large amount of resources to be localized as > part of a single map reduce job. This can cause issues with YARN localization > that destabilize the cluster and potentially impact other user jobs. These > resources are specified via the files, libjars, archives and jobjar command > line arguments or directly through the configuration (i.e. distributed cache > api). The resources specified could be too large in multiple dimensions: > # Total size > # Number of files > # Size of an individual resource (i.e. a large fat jar) > We would like to encourage good behavior on the client side by having the > option of enforcing resource limits along the above dimensions. > There should be a separate effort to enforce limits at the YARN layer on the > server side, but this jira is only covering the map reduce layer on the > client side. In practice, having these client side limits will get us a long > way towards preventing these localization anti-patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320699#comment-15320699 ] Chris Douglas commented on MAPREDUCE-6240: -- +1 lgtm bq. would creating IOException inside catch block will be better? The suppressed exceptions are the interesting part. The code is easier to read as-is (IMO), but either way is fine. > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6712) Support grouping values for reducer on java-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320692#comment-15320692 ] Daniel Templeton commented on MAPREDUCE-6712: - Hadoop Streaming is limited by the fact that all intermediate data are passed as strings. In most cases the cost of translating those strings back into the intended data types makes Hadoop Streaming so much slower than Java MapReduce that tuning the Hadoop Streaming implementation won't make a significant dent. Turning strings into number is expensive. Using interpreted languages is expensive. If you want better performance you should consider Java MapReduce, or better yet, Spark, e.g. pyspark. > Support grouping values for reducer on java-side > > > Key: MAPREDUCE-6712 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: He Tianyi >Priority: Minor > > In hadoop streaming, with TextInputWriter, reducer program will receive each > line representing a (k, v) tuple from {{stdin}}, in which values with > identical key is not grouped. > This brings some inefficiency, especially for runtimes based on interpreter > (e.g. cpython), coming from: > A. user program has to compare key with previous one (but on java side, > records already come to reducer in groups), > B. user program has to perform {{read}}, then {{find}} or {{split}} on each > record. even if there are multiple values with identical key, > C. if length of key is large, apparently this introduces inefficiency for > caching, > Suppose we need another InputWriter. But this is not enough, since the > interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not > {{writeValues}}. Though we can compare key in custom InputWriter and group > them, but this is also inefficient. Some other changes are also needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320667#comment-15320667 ] Ajith S commented on MAPREDUCE-6240: Hi Thanks for the patch. Just a small concern, would creating IOException inside catch block will be better.? as stacktrace will indicate the line Exception object is created.? > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320652#comment-15320652 ] Daniel Templeton commented on MAPREDUCE-6542: - Thanks for the updated patch, [~piaoyu zhang]! Looks like I wasn't clear in my comment about {{dateFormat}}, and I apologize for yet. I meant to consider making it {{DATE_FORMAT}}, not {{fastDateFormat}}. :) Looking at it again, I'm OK with either {{dateFormat}} or {{DATE_FORMAT}}. My issue with {{fastDateFormat}} is this line: {code} this.fastDateFormat = FastDateFormat. getInstance("d-MMM- HH:mm:ss", tz); {code} which would be better formatted as: {code} this.fastDateFormat = FastDateFormat.getInstance("d-MMM- HH:mm:ss", tz); {code} If you're going to cut a new patch to fix that formatting, it's probably better instead to go back to {{dateFormat}} or switch to {{DATE_FORMAT}}. Sorry for the confusion. > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.2.0, 2.7.1 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, > MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, > MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, > MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, > MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > {code} > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > {code} > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I change to use the FastDateFormat to parse the time format and the > time become right -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320569#comment-15320569 ] Brahma Reddy Battula commented on MAPREDUCE-: - bq.Failed junit tests hadoop.mapred.TestMRCJCFileOutputCommitter It's unrelated this jira and tracked MAPREDUCE-6682. check-style : indentations are inline to existing NNBench help message. if we want to fix, we may need to fix full help message. > Support MultiThreads in a Map and Distribution of files in NNBench > -- > > Key: MAPREDUCE- > URL: https://issues.apache.org/jira/browse/MAPREDUCE- > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: MAPREDUCE--01.patch, MAPREDUCE--02.patch, > MAPREDUCE--03.patch, MAPREDUCE--04.patch > > > Support Distribution of files to multiple directories generated by NNBench. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320468#comment-15320468 ] Hadoop QA commented on MAPREDUCE-: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: The patch generated 6 new + 123 unchanged - 22 fixed = 129 total (was 145) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 112m 38s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 123m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapred.TestMRCJCFileOutputCommitter | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12808895/MAPREDUCE--04.patch | | JIRA Issue | MAPREDUCE- | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 576e7c34cbcc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 723432b | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt | | unit test logs |
[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320354#comment-15320354 ] Hadoop QA commented on MAPREDUCE-6542: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 46s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s {color} | {color:green} root: The patch generated 0 new + 122 unchanged - 3 fixed = 122 total (was 125) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 52s {color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} | {color:red} hadoop-mapreduce-client-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 55m 37s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.tools.TestCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12808886/MAPREDUCE-6542-v13.patch | | JIRA Issue | MAPREDUCE-6542 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 78674aa23dba 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 723432b | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6541/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt | | unit test logs | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6541/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt | | Test Results |
[jira] [Updated] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated MAPREDUCE-: Attachment: MAPREDUCE--04.patch > Support MultiThreads in a Map and Distribution of files in NNBench > -- > > Key: MAPREDUCE- > URL: https://issues.apache.org/jira/browse/MAPREDUCE- > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: MAPREDUCE--01.patch, MAPREDUCE--02.patch, > MAPREDUCE--03.patch, MAPREDUCE--04.patch > > > Support Distribution of files to multiple directories generated by NNBench. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320321#comment-15320321 ] zhangyubiao commented on MAPREDUCE-6542: MAPREDUCE-6542-v13.patch for review > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.2.0, 2.7.1 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, > MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, > MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, > MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, > MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > {code} > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > {code} > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I change to use the FastDateFormat to parse the time format and the > time become right -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangyubiao updated MAPREDUCE-6542: --- Attachment: MAPREDUCE-6542-v13.patch > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.2.0, 2.7.1 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, > MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, > MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, > MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, > MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > {code} > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > {code} > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I change to use the FastDateFormat to parse the time format and the > time become right -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6711) JobImpl fails to handle preemption events on state COMMITTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320129#comment-15320129 ] Prabhu Joseph commented on MAPREDUCE-6711: -- [~gtCarrera9] Hi Li, I have patch to fix this. Can you assign this jira to me > JobImpl fails to handle preemption events on state COMMITTING > - > > Key: MAPREDUCE-6711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6711 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Li Lu > > When a MR app being preempted on COMMITTING state, we saw the following > exceptions in its log: > {code} > ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event > at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_TASK_ATTEMPT_COMPLETED at COMMITTING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:744) > {code} > and > {code} > ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event > at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_MAP_TASK_RESCHEDULED at COMMITTING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:744) > {code} > Seems like we need to handle those preemption related events when the job is > being committed? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6712) Support grouping values for reducer on java-side
He Tianyi created MAPREDUCE-6712: Summary: Support grouping values for reducer on java-side Key: MAPREDUCE-6712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: He Tianyi Priority: Minor In hadoop streaming, with TextInputWriter, reducer program will receive each line representing a (k, v) tuple from {{stdin}}, in which values with identical key is not grouped. This brings some inefficiency, especially for runtimes based on interpreter (e.g. cpython), coming from: A. user program has to compare key with previous one (but on java side, records already come to reducer in groups), B. user program has to perform {{read}}, then {{find}} or {{split}} on each record. even if there are multiple values with identical key, C. if length of key is large, apparently this introduces inefficiency for caching, Suppose we need another InputWriter. But this is not enough, since the interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not {{writeValues}}. Though we can compare key in custom InputWriter and group them, but this is also inefficient. Some other changes are also needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org