[jira] [Commented] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

2017-12-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283072#comment-16283072
 ] 

Hadoop QA commented on MAPREDUCE-7022:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 12 new + 
1144 unchanged - 3 fixed = 1156 total (was 1147) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 20s{color} 
| {color:red} hadoop-mapreduce-client-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 24s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}146m 48s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
31s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestTaskProgressReporter |
|   | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator |
|   | hadoop.mapreduce.v2.TestUberAM |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 

[jira] [Created] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

2017-12-07 Thread Johan Gustavsson (JIRA)
Johan Gustavsson created MAPREDUCE-7022:
---

 Summary: Fast fail rogue jobs based on task scratch dir size
 Key: MAPREDUCE-7022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Reporter: Johan Gustavsson


With the introduction of MAPREDUCE-6489 there are some options to kill rogue 
tasks based on writes to local disk writes. In our environment are we mainly 
run Hive based jobs we noticed that this counter and the size of the local 
scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were 
at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so 
it didn't help us much. So to extend this feature tasks should monitor local 
scratchdir size and fail if they pass the limit. In these cases the tasks 
should not be retried either but instead the job should fast fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

2017-12-07 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated MAPREDUCE-7022:

Affects Version/s: 2.7.0
   2.8.0
   2.9.0
   Status: Patch Available  (was: Open)

> Fast fail rogue jobs based on task scratch dir size
> ---
>
> Key: MAPREDUCE-7022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.9.0, 2.8.0, 2.7.0
>Reporter: Johan Gustavsson
> Attachments: MAPREDUCE-7022.001.patch
>
>
> With the introduction of MAPREDUCE-6489 there are some options to kill rogue 
> tasks based on writes to local disk writes. In our environment are we mainly 
> run Hive based jobs we noticed that this counter and the size of the local 
> scratch dirs were very different. We had tasks where BYTES_WRITTEN counter 
> were at 300Gb and where it was at 10Tb both producing around 200Gb on local 
> disk, so it didn't help us much. So to extend this feature tasks should 
> monitor local scratchdir size and fail if they pass the limit. In these cases 
> the tasks should not be retried either but instead the job should fast fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

2017-12-07 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated MAPREDUCE-7022:

Attachment: MAPREDUCE-7022.001.patch

> Fast fail rogue jobs based on task scratch dir size
> ---
>
> Key: MAPREDUCE-7022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Johan Gustavsson
> Attachments: MAPREDUCE-7022.001.patch
>
>
> With the introduction of MAPREDUCE-6489 there are some options to kill rogue 
> tasks based on writes to local disk writes. In our environment are we mainly 
> run Hive based jobs we noticed that this counter and the size of the local 
> scratch dirs were very different. We had tasks where BYTES_WRITTEN counter 
> were at 300Gb and where it was at 10Tb both producing around 200Gb on local 
> disk, so it didn't help us much. So to extend this feature tasks should 
> monitor local scratchdir size and fail if they pass the limit. In these cases 
> the tasks should not be retried either but instead the job should fast fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2017-12-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6165:
---
Fix Version/s: (was: 2.7.6)
   2.7.5

> [JDK8] TestCombineFileInputFormat failed on JDK8
> 
>
> Key: MAPREDUCE-6165
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Akira Ajisaka
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1, 2.7.5
>
> Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, 
> MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, 
> MAPREDUCE-6165-reproduce.patch
>
>
> The error msg:
> {noformat}
> testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
>   Time elapsed: 2.487 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.failNotEquals(Assert.java:329)
>   at junit.framework.Assert.assertEquals(Assert.java:78)
>   at junit.framework.Assert.assertEquals(Assert.java:234)
>   at junit.framework.Assert.assertEquals(Assert.java:241)
>   at junit.framework.TestCase.assertEquals(TestCase.java:409)
>   at 
> org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
> testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
>   Time elapsed: 0.985 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.failNotEquals(Assert.java:329)
>   at junit.framework.Assert.assertEquals(Assert.java:78)
>   at junit.framework.Assert.assertEquals(Assert.java:234)
>   at junit.framework.Assert.assertEquals(Assert.java:241)
>   at junit.framework.TestCase.assertEquals(TestCase.java:409)
>   at 
> org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:45 AM:
--

cc [~asuresh] and [~kkaranasos] .Please help to review. 


was (Author: magicbunny):
cc [~asuresh] and @Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:42 AM:
--

cc [~asuresh] and @Konstantinos Karanasos
Please help to review. 


was (Author: magicbunny):
cc @Arun Suresh and @Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:42 AM:
--

cc @Arun Suresh and @Konstantinos Karanasos
Please help to review. 


was (Author: magicbunny):
cc https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asuresh Suresh 
and Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:37 AM:
--

cc https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asuresh Suresh 
and Konstantinos Karanasos
Please help to review. 


was (Author: magicbunny):
cc @Arun Suresh and Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:36 AM:
--

cc @Arun Suresh and Konstantinos Karanasos
Please help to review. 


was (Author: magicbunny):
cc Arun Suresh and Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2017-12-07 Thread jiayuhan-it (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660
 ] 

jiayuhan-it commented on MAPREDUCE-7017:


cc Arun Suresh and Konstantinos Karanasos
Please help to review. 

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org