[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321929#comment-15321929
 ] 

Hadoop QA commented on MAPREDUCE-6690:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core 
generated 2 new + 2508 unchanged - 1 fixed = 2510 total (was 2509) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 1s {color} | 
{color:red} hadoop-mapreduce-client-core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 114m 22s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 135m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.tools.TestCLI |
|   | hadoop.mapred.TestMRCJCFileOutputCommitter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809100/MAPREDUCE-6690-trunk-v4.patch
 |
| JIRA Issue | MAPREDUCE-6690 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux f6e7eb5194a4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1500a0a |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| javadoc | 

[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321825#comment-15321825
 ] 

Hadoop QA commented on MAPREDUCE-6690:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} 
| {color:red} Docker failed to build yetus/hadoop:2c91fd8. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809100/MAPREDUCE-6690-trunk-v4.patch
 |
| JIRA Issue | MAPREDUCE-6690 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6543/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch, 
> MAPREDUCE-6690-trunk-v4.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated MAPREDUCE-6690:

Attachment: MAPREDUCE-6690-trunk-v4.patch

V4 attached.
# Fixed checkstyle/javadoc.
# Fixed TestMRJobs failures (test only changes).

> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch, 
> MAPREDUCE-6690-trunk-v4.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6712) Support grouping values for reducer on java-side

2016-06-08 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321658#comment-15321658
 ] 

He Tianyi commented on MAPREDUCE-6712:
--

Actually in my experiements (in-house workload) turning strings back and forth 
is not the bottleneck (does not make a difference with typedbytes). But just 
grouping values make a simple reducer 20% faster (for both text and 
typedbytes). 
Also, many users are using C/C++ to implement mapper/reducer which I think is 
possible to be more efficient than java/scala (smaller memory footprint, less 
gc, no virtual call overhead, better SIMD support, etc.). 

> Support grouping values for reducer on java-side
> 
>
> Key: MAPREDUCE-6712
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: He Tianyi
>Priority: Minor
>
> In hadoop streaming, with TextInputWriter, reducer program will receive each 
> line representing a (k, v) tuple from {{stdin}}, in which values with 
> identical key is not grouped.
> This brings some inefficiency, especially for runtimes based on interpreter 
> (e.g. cpython), coming from:
> A. user program has to compare key with previous one (but on java side, 
> records already come to reducer in groups),
> B. user program has to perform {{read}}, then {{find}} or {{split}} on each 
> record. even if there are multiple values with identical key,
> C. if length of key is large, apparently this introduces inefficiency for 
> caching,
> Suppose we need another InputWriter. But this is not enough, since the 
> interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not 
> {{writeValues}}. Though we can compare key in custom InputWriter and group 
> them, but this is also inefficient. Some other changes are also needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6712) Support grouping values for reducer on java-side

2016-06-08 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321658#comment-15321658
 ] 

He Tianyi edited comment on MAPREDUCE-6712 at 6/8/16 11:34 PM:
---

Actually in my experiements (in-house workload) turning strings back and forth 
is not the bottleneck (does not make a difference with typedbytes). But just 
grouping values make a simple reducer 20% faster (for both text and 
typedbytes). 
Also, many users are using C/C++ to implement mapper/reducer which I think is 
possible to be more efficient than java/scala (smaller memory footprint, less 
gc, better SIMD support, etc.). 


was (Author: he tianyi):
Actually in my experiements (in-house workload) turning strings back and forth 
is not the bottleneck (does not make a difference with typedbytes). But just 
grouping values make a simple reducer 20% faster (for both text and 
typedbytes). 
Also, many users are using C/C++ to implement mapper/reducer which I think is 
possible to be more efficient than java/scala (smaller memory footprint, less 
gc, no virtual call overhead, better SIMD support, etc.). 

> Support grouping values for reducer on java-side
> 
>
> Key: MAPREDUCE-6712
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: He Tianyi
>Priority: Minor
>
> In hadoop streaming, with TextInputWriter, reducer program will receive each 
> line representing a (k, v) tuple from {{stdin}}, in which values with 
> identical key is not grouped.
> This brings some inefficiency, especially for runtimes based on interpreter 
> (e.g. cpython), coming from:
> A. user program has to compare key with previous one (but on java side, 
> records already come to reducer in groups),
> B. user program has to perform {{read}}, then {{find}} or {{split}} on each 
> record. even if there are multiple values with identical key,
> C. if length of key is large, apparently this introduces inefficiency for 
> caching,
> Suppose we need another InputWriter. But this is not enough, since the 
> interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not 
> {{writeValues}}. Though we can compare key in custom InputWriter and group 
> them, but this is also inefficient. Some other changes are also needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-6240:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks to [~kamrul] for the initial patch and 
[~chris.douglas] and [~ajithshetty] for reviews.

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321394#comment-15321394
 ] 

Hudson commented on MAPREDUCE-6240:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #9932 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9932/])
MAPREDUCE-6240. Hadoop client displays confusing error message. (gera) (gera: 
rev 0af96a1c08594c809ecb254cee4f60dd22399772)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Cluster.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestClientProtocolProviderImpls.java


> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321375#comment-15321375
 ] 

Gera Shegalov commented on MAPREDUCE-6240:
--

Actually sorry I misunderstood the comment by Ajith. 

It reminds me that I left extra wrapping for suppressed exceptions as an 
artifact of using MultiIOExceptions in prior patches. I would normally get rid 
of them, but I find adding the provider class name to the message actually 
quite useful.

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321330#comment-15321330
 ] 

Jason Lowe commented on MAPREDUCE-6690:
---

bq. I assumed that YARN-5192 would implement the check as part of the submit 
call so that the client gets immediate feedback.

Note that YARN-5192 cannot do the check on application submit.  An application 
submit only requires the resources necessary to get the ApplicationMaster 
localized.  Subsequent containers for the application could have a completely 
different set of resources, and they won't be available in the application 
submission context for validation at submit time.  MapReduce is an app 
framework that happens to localize all resources for all containers, but other 
application frameworks do not always do this.

bq.  I would like to find a way, however, to try to keep the two settings in 
sync if possible.

Agreed it would be annoying for admins to have to keep these in sync, assuming 
nobody would ever want to configure the YARN limit higher than the MapReduce 
limit.

bq. What about having the RM offer up its resource limits through a call?

That would be one way to tackle it.  There have been cases in the past where it 
would have been nice for clients to be able to query config settings via the 
central daemons (i.e.: namenode, resourcemanager, etc.) rather than assume the 
local settings in hdfs-site.xml or yarn-site.xml are the same as what the 
central daemon is using.  That's a somewhat open-ended API change for YARN with 
backwards-compatibility concerns going forward, but maybe it's time we hammered 
out whether or not we're going to do it on a YARN JIRA and if not, what 
clients/users are supposed to do to better keep the client and the server in 
sync.



> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321316#comment-15321316
 ] 

Daniel Templeton commented on MAPREDUCE-6690:
-

Thanks for the clarification, [~jlowe].  I assumed that YARN-5192 would 
implement the check as part of the submit call so that the client gets 
immediate feedback.  The point that I forgot about, though, is that regardless 
the submit only happens after the resources have been uploaded to HDFS.  Given 
that this check specifically targets wide loads, the cases where the 
server-side check would reject the submit are exactly the ones that will waste 
the most time with the upload.

I now see the light.  I would like to find a way, however, to try to keep the 
two settings in sync if possible.  I've seen cases, such as the number of 
concurrent moves in the HDFS mover, where the limit is set on both the client 
and server sides, and it ends up confusing customers.  What about having the RM 
offer up its resource limits through a call?  The client could then query the 
RM's limits and apply those.

> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321305#comment-15321305
 ] 

Jason Lowe commented on MAPREDUCE-6690:
---

Implementing the check in MapReduce allows for fast-failure and more 
accurate/informative errors to the client.  The check in MapReduce can prevent 
an unnecessary upload of one or more resources to the staging area in HDFS 
because the client knows the job is going to fail anyway.  Also YARN-5192 will 
only be able to detect the error when a container starts to localize on a node 
that asks for a resource set that violate the limits.  Since MapReduce 
localizes everything for all containers (including the AM) it will fail under 
YARN-5192 as soon as the AM tries to run on a node, but it might take a while 
for the AM to get scheduled.  As for error reporting, if the violation comes 
from one or more files that were submitted locally then the paths via a 
YARN-5192 check will be for HDFS staging directories rather than the local path 
the client originally specified.  The error also will not be reported to the 
job client submitting the job unless it hangs around to monitor the job after 
submission.  With this check the job client will get the error directly when it 
tries to submit.

If we don't care much about these differences then we can just go with the 
YARN-5192 implementation.

> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321191#comment-15321191
 ] 

Gera Shegalov commented on MAPREDUCE-6240:
--

Thanks [~ajithshetty] and [~chris.douglas] for the comments.
imo including additional exceptions only distracts from the suppressed root 
cause by making exception chains longer. Committing as is. 

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6690) Limit the number of resources a single map reduce job can submit for localization

2016-06-08 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321189#comment-15321189
 ] 

Daniel Templeton commented on MAPREDUCE-6690:
-

Please suffer me a dumb question: assuming that YARN-5192 is implemented, why 
do we also need this JIRA?  Doesn't having two settings to do the same thing 
from different ends make the system needlessly confusing?

> Limit the number of resources a single map reduce job can submit for 
> localization
> -
>
> Key: MAPREDUCE-6690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch, MAPREDUCE-6690-trunk-v3.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320699#comment-15320699
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

+1 lgtm

bq. would creating IOException inside catch block will be better?
The suppressed exceptions are the interesting part. The code is easier to read 
as-is (IMO), but either way is fine.

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6712) Support grouping values for reducer on java-side

2016-06-08 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320692#comment-15320692
 ] 

Daniel Templeton commented on MAPREDUCE-6712:
-

Hadoop Streaming is limited by the fact that all intermediate data are passed 
as strings.  In most cases the cost of translating those strings back into the 
intended data types makes Hadoop Streaming so much slower than Java MapReduce 
that tuning the Hadoop Streaming implementation won't make a significant dent.  
Turning strings into number is expensive.  Using interpreted languages is 
expensive.  If you want better performance you should consider Java MapReduce, 
or better yet, Spark, e.g. pyspark. 

> Support grouping values for reducer on java-side
> 
>
> Key: MAPREDUCE-6712
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: He Tianyi
>Priority: Minor
>
> In hadoop streaming, with TextInputWriter, reducer program will receive each 
> line representing a (k, v) tuple from {{stdin}}, in which values with 
> identical key is not grouped.
> This brings some inefficiency, especially for runtimes based on interpreter 
> (e.g. cpython), coming from:
> A. user program has to compare key with previous one (but on java side, 
> records already come to reducer in groups),
> B. user program has to perform {{read}}, then {{find}} or {{split}} on each 
> record. even if there are multiple values with identical key,
> C. if length of key is large, apparently this introduces inefficiency for 
> caching,
> Suppose we need another InputWriter. But this is not enough, since the 
> interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not 
> {{writeValues}}. Though we can compare key in custom InputWriter and group 
> them, but this is also inefficient. Some other changes are also needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320667#comment-15320667
 ] 

Ajith S commented on MAPREDUCE-6240:


Hi

Thanks for the patch. Just a small concern, would creating IOException inside 
catch block will be better.? as stacktrace will indicate the line Exception 
object is created.? 

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe

2016-06-08 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320652#comment-15320652
 ] 

Daniel Templeton commented on MAPREDUCE-6542:
-

Thanks for the updated patch, [~piaoyu zhang]!  Looks like I wasn't clear in my 
comment about {{dateFormat}}, and I apologize for yet.  I meant to consider 
making it {{DATE_FORMAT}}, not {{fastDateFormat}}. :)  Looking at it again, I'm 
OK with either {{dateFormat}} or {{DATE_FORMAT}}.  My issue with 
{{fastDateFormat}} is this line:

{code}
this.fastDateFormat = FastDateFormat.
getInstance("d-MMM- HH:mm:ss", tz);
{code}

which would be better formatted as:

{code}
this.fastDateFormat =
FastDateFormat.getInstance("d-MMM- HH:mm:ss", tz);
{code}

If you're going to cut a new patch to fix that formatting, it's probably better 
instead to go back to {{dateFormat}} or switch to {{DATE_FORMAT}}.  Sorry for 
the confusion.

> HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
> -
>
> Key: MAPREDUCE-6542
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.2.0, 2.7.1
> Environment: CentOS6.5 Hadoop  
>Reporter: zhangyubiao
>Assignee: zhangyubiao
> Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, 
> MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, 
> MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, 
> MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, 
> MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch
>
>
> I use SimpleDateFormat to Parse the JobHistory File before 
> {code}
> private static final SimpleDateFormat dateFormat =
> new SimpleDateFormat("-MM-dd HH:mm:ss");
>  public static String getJobDetail(JobInfo job) {
> StringBuffer jobDetails = new StringBuffer("");
> SummarizedJob ts = new SummarizedJob(job);
> jobDetails.append(job.getJobId().toString().trim()).append("\t");
> jobDetails.append(job.getUsername()).append("\t");
> jobDetails.append(job.getJobname().replaceAll("\\n", 
> "")).append("\t");
> jobDetails.append(job.getJobQueueName()).append("\t");
> jobDetails.append(job.getPriority()).append("\t");
> jobDetails.append(job.getJobConfPath()).append("\t");
> jobDetails.append(job.getUberized()).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t");
>return jobDetails.toString();
> }
> {code}
> But I find I query the SubmitTime and LaunchTime in hive and compare 
> JobHistory File time , I find that the submitTime  and launchTime was wrong.
> Finally,I change to use the FastDateFormat to parse the time format and the 
> time become right



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench

2016-06-08 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320569#comment-15320569
 ] 

Brahma Reddy Battula commented on MAPREDUCE-:
-

bq.Failed junit tests   hadoop.mapred.TestMRCJCFileOutputCommitter
It's unrelated this jira and tracked MAPREDUCE-6682.

check-style : indentations are inline to existing NNBench help message. if we 
want to fix, we may need to fix full help message.

> Support MultiThreads in a Map and Distribution of files in NNBench
> --
>
> Key: MAPREDUCE-
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: MAPREDUCE--01.patch, MAPREDUCE--02.patch, 
> MAPREDUCE--03.patch, MAPREDUCE--04.patch
>
>
> Support Distribution of files to multiple directories generated by NNBench.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench

2016-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320468#comment-15320468
 ] 

Hadoop QA commented on MAPREDUCE-:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:
 The patch generated 6 new + 123 unchanged - 22 fixed = 129 total (was 145) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 112m 38s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 123m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestMRCJCFileOutputCommitter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12808895/MAPREDUCE--04.patch
 |
| JIRA Issue | MAPREDUCE- |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 576e7c34cbcc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 723432b |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6542/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 |
| unit test logs |  

[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe

2016-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320354#comment-15320354
 ] 

Hadoop QA commented on MAPREDUCE-6542:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 46s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
19s {color} | {color:green} root: The patch generated 0 new + 122 unchanged - 3 
fixed = 122 total (was 125) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 52s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} 
| {color:red} hadoop-mapreduce-client-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.tools.TestCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12808886/MAPREDUCE-6542-v13.patch
 |
| JIRA Issue | MAPREDUCE-6542 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 78674aa23dba 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 723432b |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6541/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6541/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
 |
|  Test Results | 

[jira] [Updated] (MAPREDUCE-6666) Support MultiThreads in a Map and Distribution of files in NNBench

2016-06-08 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated MAPREDUCE-:

Attachment: MAPREDUCE--04.patch

> Support MultiThreads in a Map and Distribution of files in NNBench
> --
>
> Key: MAPREDUCE-
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: MAPREDUCE--01.patch, MAPREDUCE--02.patch, 
> MAPREDUCE--03.patch, MAPREDUCE--04.patch
>
>
> Support Distribution of files to multiple directories generated by NNBench.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe

2016-06-08 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320321#comment-15320321
 ] 

zhangyubiao commented on MAPREDUCE-6542:


MAPREDUCE-6542-v13.patch for review

> HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
> -
>
> Key: MAPREDUCE-6542
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.2.0, 2.7.1
> Environment: CentOS6.5 Hadoop  
>Reporter: zhangyubiao
>Assignee: zhangyubiao
> Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, 
> MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, 
> MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, 
> MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, 
> MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch
>
>
> I use SimpleDateFormat to Parse the JobHistory File before 
> {code}
> private static final SimpleDateFormat dateFormat =
> new SimpleDateFormat("-MM-dd HH:mm:ss");
>  public static String getJobDetail(JobInfo job) {
> StringBuffer jobDetails = new StringBuffer("");
> SummarizedJob ts = new SummarizedJob(job);
> jobDetails.append(job.getJobId().toString().trim()).append("\t");
> jobDetails.append(job.getUsername()).append("\t");
> jobDetails.append(job.getJobname().replaceAll("\\n", 
> "")).append("\t");
> jobDetails.append(job.getJobQueueName()).append("\t");
> jobDetails.append(job.getPriority()).append("\t");
> jobDetails.append(job.getJobConfPath()).append("\t");
> jobDetails.append(job.getUberized()).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t");
>return jobDetails.toString();
> }
> {code}
> But I find I query the SubmitTime and LaunchTime in hive and compare 
> JobHistory File time , I find that the submitTime  and launchTime was wrong.
> Finally,I change to use the FastDateFormat to parse the time format and the 
> time become right



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe

2016-06-08 Thread zhangyubiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated MAPREDUCE-6542:
---
Attachment: MAPREDUCE-6542-v13.patch

> HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
> -
>
> Key: MAPREDUCE-6542
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.2.0, 2.7.1
> Environment: CentOS6.5 Hadoop  
>Reporter: zhangyubiao
>Assignee: zhangyubiao
> Attachments: MAPREDUCE-6542-v10.patch, MAPREDUCE-6542-v11.patch, 
> MAPREDUCE-6542-v12.patch, MAPREDUCE-6542-v13.patch, MAPREDUCE-6542-v2.patch, 
> MAPREDUCE-6542-v3.patch, MAPREDUCE-6542-v4.patch, MAPREDUCE-6542-v5.patch, 
> MAPREDUCE-6542-v6.patch, MAPREDUCE-6542-v7.patch, MAPREDUCE-6542-v8.patch, 
> MAPREDUCE-6542-v9.patch, MAPREDUCE-6542.patch
>
>
> I use SimpleDateFormat to Parse the JobHistory File before 
> {code}
> private static final SimpleDateFormat dateFormat =
> new SimpleDateFormat("-MM-dd HH:mm:ss");
>  public static String getJobDetail(JobInfo job) {
> StringBuffer jobDetails = new StringBuffer("");
> SummarizedJob ts = new SummarizedJob(job);
> jobDetails.append(job.getJobId().toString().trim()).append("\t");
> jobDetails.append(job.getUsername()).append("\t");
> jobDetails.append(job.getJobname().replaceAll("\\n", 
> "")).append("\t");
> jobDetails.append(job.getJobQueueName()).append("\t");
> jobDetails.append(job.getPriority()).append("\t");
> jobDetails.append(job.getJobConfPath()).append("\t");
> jobDetails.append(job.getUberized()).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t");
>return jobDetails.toString();
> }
> {code}
> But I find I query the SubmitTime and LaunchTime in hive and compare 
> JobHistory File time , I find that the submitTime  and launchTime was wrong.
> Finally,I change to use the FastDateFormat to parse the time format and the 
> time become right



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6711) JobImpl fails to handle preemption events on state COMMITTING

2016-06-08 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320129#comment-15320129
 ] 

Prabhu Joseph commented on MAPREDUCE-6711:
--

[~gtCarrera9] Hi Li, I have patch to fix this. Can you assign this jira to me

> JobImpl fails to handle preemption events on state COMMITTING
> -
>
> Key: MAPREDUCE-6711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>
> When a MR app being preempted on COMMITTING state, we saw the following 
> exceptions in its log:
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_ATTEMPT_COMPLETED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> and 
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_MAP_TASK_RESCHEDULED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> Seems like we need to handle those preemption related events when the job is 
> being committed? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6712) Support grouping values for reducer on java-side

2016-06-08 Thread He Tianyi (JIRA)
He Tianyi created MAPREDUCE-6712:


 Summary: Support grouping values for reducer on java-side
 Key: MAPREDUCE-6712
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6712
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/streaming
Reporter: He Tianyi
Priority: Minor


In hadoop streaming, with TextInputWriter, reducer program will receive each 
line representing a (k, v) tuple from {{stdin}}, in which values with identical 
key is not grouped.
This brings some inefficiency, especially for runtimes based on interpreter 
(e.g. cpython), coming from:
A. user program has to compare key with previous one (but on java side, records 
already come to reducer in groups),
B. user program has to perform {{read}}, then {{find}} or {{split}} on each 
record. even if there are multiple values with identical key,
C. if length of key is large, apparently this introduces inefficiency for 
caching,

Suppose we need another InputWriter. But this is not enough, since the 
interface of {{InputWriter}} defined {{writeKey}} and {{writeValue}}, not 
{{writeValues}}. Though we can compare key in custom InputWriter and group 
them, but this is also inefficient. Some other changes are also needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org