[jira] [Commented] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size
[ https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283072#comment-16283072 ] Hadoop QA commented on MAPREDUCE-7022: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 50s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 12 new + 1144 unchanged - 3 fixed = 1156 total (was 1147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 20s{color} | {color:red} hadoop-mapreduce-client-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 24s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}146m 48s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 31s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapred.TestTaskProgressReporter | | | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator | | | hadoop.mapreduce.v2.TestUberAM | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639
[jira] [Created] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size
Johan Gustavsson created MAPREDUCE-7022: --- Summary: Fast fail rogue jobs based on task scratch dir size Key: MAPREDUCE-7022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Reporter: Johan Gustavsson With the introduction of MAPREDUCE-6489 there are some options to kill rogue tasks based on writes to local disk writes. In our environment are we mainly run Hive based jobs we noticed that this counter and the size of the local scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so it didn't help us much. So to extend this feature tasks should monitor local scratchdir size and fail if they pass the limit. In these cases the tasks should not be retried either but instead the job should fast fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size
[ https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Gustavsson updated MAPREDUCE-7022: Affects Version/s: 2.7.0 2.8.0 2.9.0 Status: Patch Available (was: Open) > Fast fail rogue jobs based on task scratch dir size > --- > > Key: MAPREDUCE-7022 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.9.0, 2.8.0, 2.7.0 >Reporter: Johan Gustavsson > Attachments: MAPREDUCE-7022.001.patch > > > With the introduction of MAPREDUCE-6489 there are some options to kill rogue > tasks based on writes to local disk writes. In our environment are we mainly > run Hive based jobs we noticed that this counter and the size of the local > scratch dirs were very different. We had tasks where BYTES_WRITTEN counter > were at 300Gb and where it was at 10Tb both producing around 200Gb on local > disk, so it didn't help us much. So to extend this feature tasks should > monitor local scratchdir size and fail if they pass the limit. In these cases > the tasks should not be retried either but instead the job should fast fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size
[ https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Gustavsson updated MAPREDUCE-7022: Attachment: MAPREDUCE-7022.001.patch > Fast fail rogue jobs based on task scratch dir size > --- > > Key: MAPREDUCE-7022 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Reporter: Johan Gustavsson > Attachments: MAPREDUCE-7022.001.patch > > > With the introduction of MAPREDUCE-6489 there are some options to kill rogue > tasks based on writes to local disk writes. In our environment are we mainly > run Hive based jobs we noticed that this counter and the size of the local > scratch dirs were very different. We had tasks where BYTES_WRITTEN counter > were at 300Gb and where it was at 10Tb both producing around 200Gb on local > disk, so it didn't help us much. So to extend this feature tasks should > monitor local scratchdir size and fail if they pass the limit. In these cases > the tasks should not be retried either but instead the job should fast fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-6165: --- Fix Version/s: (was: 2.7.6) 2.7.5 > [JDK8] TestCombineFileInputFormat failed on JDK8 > > > Key: MAPREDUCE-6165 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Wei Yan >Assignee: Akira Ajisaka >Priority: Minor > Fix For: 2.8.0, 3.0.0-alpha1, 2.7.5 > > Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, > MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, > MAPREDUCE-6165-reproduce.patch > > > The error msg: > {noformat} > testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) > Time elapsed: 2.487 sec <<< FAILURE! > junit.framework.AssertionFailedError: expected:<2> but was:<1> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at junit.framework.Assert.assertEquals(Assert.java:241) > at junit.framework.TestCase.assertEquals(TestCase.java:409) > at > org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) > testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) > Time elapsed: 0.985 sec <<< FAILURE! > junit.framework.AssertionFailedError: expected:<2> but was:<1> > at junit.framework.Assert.fail(Assert.java:57) > at junit.framework.Assert.failNotEquals(Assert.java:329) > at junit.framework.Assert.assertEquals(Assert.java:78) > at junit.framework.Assert.assertEquals(Assert.java:234) > at junit.framework.Assert.assertEquals(Assert.java:241) > at junit.framework.TestCase.assertEquals(TestCase.java:409) > at > org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:45 AM: -- cc [~asuresh] and [~kkaranasos] .Please help to review. was (Author: magicbunny): cc [~asuresh] and @Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:42 AM: -- cc [~asuresh] and @Konstantinos Karanasos Please help to review. was (Author: magicbunny): cc @Arun Suresh and @Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:42 AM: -- cc @Arun Suresh and @Konstantinos Karanasos Please help to review. was (Author: magicbunny): cc https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asuresh Suresh and Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:37 AM: -- cc https://issues.apache.org/jira/secure/ViewProfile.jspa?name=asuresh Suresh and Konstantinos Karanasos Please help to review. was (Author: magicbunny): cc @Arun Suresh and Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it edited comment on MAPREDUCE-7017 at 12/7/17 10:36 AM: -- cc @Arun Suresh and Konstantinos Karanasos Please help to review. was (Author: magicbunny): cc Arun Suresh and Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
[ https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281660#comment-16281660 ] jiayuhan-it commented on MAPREDUCE-7017: cc Arun Suresh and Konstantinos Karanasos Please help to review. > Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts > > > Key: MAPREDUCE-7017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0-alpha4 >Reporter: jiayuhan-it > Attachments: MAPREDUCE-7017.001.patch > > > MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the > dataLocalHosts for each task when the location of data split is IP, which > will call a lot of times ( taskNum * dfsReplication) of function > {{InetAddress::getByName}} and most of the funcition calls are redundant. > When the job has a great number of tasks and the speed of DNS resolution is > not fast enough, it will take a lot of time at this stage before the job > running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org