[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700044#comment-16700044 ] Akira Ajisaka commented on MAPREDUCE-6190: -- {quote}This problem has existed in our cluster for a year and occurred once in about a month. We finally found that it was a disk problem that led to a long container localization time, up to a few hours. We added a parameter of container start-up timeout to actively fail task with problematic start-up. {quote} Nice catch. This approach seems good to me and thanks [~uranus] for your patch. Some comments: * You need to document the new parameter in mapred-default.xml. * Can we make the type of {{count}} AtomicBoolean? The type long and using synchronized method seems overkill. * Would you make the newly added public methods package-private if possible? * Would you fix the checkstyle warnings? * Would you add {{@SuppressWarnings("unchecked")}} to the new test case? * Would you extend the task timeout to 1000ms or longer to avoid task timeout in the new test case? If the value is 100ms and the sleep is 100ms task timeout can occur. Minor comment: {code:java} while (iterator.hasNext()) { Map.Entry entry = iterator.next(); assertEquals(0, entry.getValue().count.longValue()); } {code} The while-loop can be replaced to for-each statement. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully co
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700049#comment-16700049 ] Akira Ajisaka commented on MAPREDUCE-6190: -- The test failure is not related to the patch. This failure is tracked by MAPREDUCE-7162. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) MapReduce unit test is broken
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700051#comment-16700051 ] Akira Ajisaka commented on MAPREDUCE-7162: -- I think it's fine to add the flush method just before {{out.writeBytes("\n")}}. > MapReduce unit test is broken > - > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) MapReduce unit test is broken
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700053#comment-16700053 ] Hadoop QA commented on MAPREDUCE-7162: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 5s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | MAPREDUCE-7162 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12949635/MAPREDUCE-7162.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a31bd4297bae 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 33e0df4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7543/testReport/ | | Max. process+thread count | 1607 (vs. ulimit of 1) | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7543/console | | Power
[jira] [Updated] (MAPREDUCE-7162) MapReduce unit test is broken
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated MAPREDUCE-7162: --- Attachment: MAPREDUCE-7162.003.patch > MapReduce unit test is broken > - > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated MAPREDUCE-7162: - Summary: TestEvents#testEvents fails (was: MapReduce unit test is broken) > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700060#comment-16700060 ] Zhaohui Xin commented on MAPREDUCE-7162: [~ajisakaa], added new patch. :D > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated MAPREDUCE-6190: --- Attachment: MAPREDUCE-6190.003.patch > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700150#comment-16700150 ] Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/27/18 10:00 AM: --- Hi, [~ajisakaa], I added new patch. :D was (Author: uranus): Hi, [~ajisakaa], added new patch. :D > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700150#comment-16700150 ] Zhaohui Xin commented on MAPREDUCE-6190: Hi, [~ajisakaa], added new patch. :D > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700221#comment-16700221 ] Hadoop QA commented on MAPREDUCE-7162: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 5s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 51s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | MAPREDUCE-7162 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12949640/MAPREDUCE-7162.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4619835dbcef 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2730ead | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7544/testReport/ | | Max. process+thread count | 1027 (vs. ulimit of 1) | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7544/console | | Pow
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700229#comment-16700229 ] Akira Ajisaka commented on MAPREDUCE-7162: -- +1, committing this. > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated MAPREDUCE-7162: - Component/s: test jobhistoryserver > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated MAPREDUCE-7162: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.1 3.3.0 3.1.2 Status: Resolved (was: Patch Available) Committed this to trunk, branch-3.2, and branch-3.1. Thank you, [~uranus]! > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700273#comment-16700273 ] Hudson commented on MAPREDUCE-7162: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15508 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15508/]) MAPREDUCE-7162. TestEvents#testEvents fails. Contributed by Zhaohui Xin. (aajisaka: rev 1aad99a71813660b83628cacfed393d0b3a123cc) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700256#comment-16700256 ] Zhaohui Xin commented on MAPREDUCE-7162: [~ajisakaa] , Thanks for reviewing! :D > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700305#comment-16700305 ] Hadoop QA commented on MAPREDUCE-6190: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 47s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 5 new + 548 unchanged - 2 fixed = 553 total (was 550) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 23s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.jobhistory.TestEvents | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | MAPREDUCE-6190 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12949649/MAPREDUCE-6190.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 4d60f522dbc1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x8
[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.
[ https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700657#comment-16700657 ] Kuhu Shukla commented on MAPREDUCE-7164: Thoughts? [~jlowe], [~jeagles] > FileOutputCommitter does not report progress while merging paths. > - > > Key: MAPREDUCE-7164 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 3.0.3, 2.8.5, 2.9.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: MAPREDUCE-7164.001.patch > > > In cases where the rename and merge path logic takes more time than usual, > the committer does not report progress and can cause job failure. This > behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old > behavior for 1.x is restored. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700685#comment-16700685 ] Wangda Tan commented on MAPREDUCE-7162: --- Apologize for introducing the issue, Thanks [~uranus], [~ajisakaa] to get the issue resolved. > TestEvents#testEvents fails > --- > > Key: MAPREDUCE-7162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, > MAPREDUCE-7162.003.patch > > > Mapreduce unit test is broken by > https://issues.apache.org/jira/browse/MAPREDUCE-7158 . > *I think we should keep the data consistent to avoid corruption when output, > so I roll back the previous code and attach the patch.* > Broken location _is > org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._ > {code:java} > org.codehaus.jackson.JsonParseException: Illegal unquoted character > ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in > name > at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446) > at > org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410) > at > org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283) > at > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495) > at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367) > at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361) > at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101) > at > org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700687#comment-16700687 ] Peter Bacsko commented on MAPREDUCE-7152: - OK, I can confirm that this is not a bug. Closing this ticket. > LD_LIBRARY_PATH is always passed from MR AM to tasks > > > Key: MAPREDUCE-7152 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, > MAPREDUCE-7152-lazyEval_POC01.patch > > > {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default > in Hadoop (as part of {{mapreduce.admin.user.env}} and > {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable > from AM container to task containers in the container launch context. > In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, > tasks will fail to load native library. A reliable way to fix this is to add > {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. > Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on > the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated MAPREDUCE-7152: Resolution: Not A Bug Status: Resolved (was: Patch Available) > LD_LIBRARY_PATH is always passed from MR AM to tasks > > > Key: MAPREDUCE-7152 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, > MAPREDUCE-7152-lazyEval_POC01.patch > > > {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default > in Hadoop (as part of {{mapreduce.admin.user.env}} and > {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable > from AM container to task containers in the container launch context. > In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, > tasks will fail to load native library. A reliable way to fix this is to add > {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. > Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on > the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.
[ https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701077#comment-16701077 ] Jason Lowe commented on MAPREDUCE-7164: --- Thanks for the patch! I think it would be fine to downcast as necessary, with {{instanceof(Progressable)}} checks as necessary, skipping the progress update if the context is not progressable. That way if someone uses file output committer algorithm v1 which does _not_ have a progress indicator (since this occurs in the AM rather than in task attempts) it still does the right thing. Similarly if something ends up calling the JobContext form of the constructor but does pass a context that is Progressable then it also continues to do the right thing. A simple utility function that takes the JobContext, does the instance check and calls progress if possible would make this a lot cleaner, since there's only one place where it would need to do the downcast. > FileOutputCommitter does not report progress while merging paths. > - > > Key: MAPREDUCE-7164 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 3.0.3, 2.8.5, 2.9.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: MAPREDUCE-7164.001.patch > > > In cases where the rename and merge path logic takes more time than usual, > the committer does not report progress and can cause job failure. This > behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old > behavior for 1.x is restored. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278 ] Akira Ajisaka commented on MAPREDUCE-6190: -- Thank you for the update. Some additional comments: * The time unit of the new parameter is millisecond, so the parameter name should include ms. * {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private. * I prefer using {{private final AtomicBoolean}} rather than {{private Boolean}} for {{reported}} because it is accessible from {{TaskHeartbeatHandler}} without using {{setReported()}} method. * {{assertEquals(false, entry.getValue().isReported())}} can be simplified to {{assertFalse(entry.getValue().isReported()}}. * Would you fix the checkstyle warnings? > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) ---
[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278 ] Akira Ajisaka edited comment on MAPREDUCE-6190 at 11/28/18 1:50 AM: Thank you for the update. Some additional comments: * The time unit of the new parameter is millisecond, so the parameter name should include ms. * {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private. * I prefer using {{private final AtomicBoolean}} rather than {{private Boolean}} for {{reported}} because it is accessible from {{TaskHeartbeatHandler}} without using {{setReported()}} method and that way {{reported}} is not synchronized. * {{assertEquals(false, entry.getValue().isReported())}} can be simplified to {{assertFalse(entry.getValue().isReported())}}. * Would you fix the checkstyle warnings? was (Author: ajisakaa): Thank you for the update. Some additional comments: * The time unit of the new parameter is millisecond, so the parameter name should include ms. * {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private. * I prefer using {{private final AtomicBoolean}} rather than {{private Boolean}} for {{reported}} because it is accessible from {{TaskHeartbeatHandler}} without using {{setReported()}} method and that way {{reported}] is not synchronized. * {{assertEquals(false, entry.getValue().isReported())}} can be simplified to {{assertFalse(entry.getValue().isReported())}}. * Would you fix the checkstyle warnings? > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapred
[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278 ] Akira Ajisaka edited comment on MAPREDUCE-6190 at 11/28/18 1:50 AM: Thank you for the update. Some additional comments: * The time unit of the new parameter is millisecond, so the parameter name should include ms. * {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private. * I prefer using {{private final AtomicBoolean}} rather than {{private Boolean}} for {{reported}} because it is accessible from {{TaskHeartbeatHandler}} without using {{setReported()}} method and that way {{reported}] is not synchronized. * {{assertEquals(false, entry.getValue().isReported())}} can be simplified to {{assertFalse(entry.getValue().isReported())}}. * Would you fix the checkstyle warnings? was (Author: ajisakaa): Thank you for the update. Some additional comments: * The time unit of the new parameter is millisecond, so the parameter name should include ms. * {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private. * I prefer using {{private final AtomicBoolean}} rather than {{private Boolean}} for {{reported}} because it is accessible from {{TaskHeartbeatHandler}} without using {{setReported()}} method. * {{assertEquals(false, entry.getValue().isReported())}} can be simplified to {{assertFalse(entry.getValue().isReported()}}. * Would you fix the checkstyle warnings? > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculatin
[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated MAPREDUCE-6190: --- Attachment: MAPREDUCE-6190.004.patch > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701375#comment-16701375 ] Zhaohui Xin commented on MAPREDUCE-6190: [~ajisakaa], Thanks for the review, all comments resolved. I added new patch. :D > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701378#comment-16701378 ] Akira Ajisaka commented on MAPREDUCE-6190: -- Is there any specific reason to use {{AtomicBoolean.compareAndSet(false, true)}}? I'm thinking {{AtomicBoolean.set(true)}} is sufficient. Otherwise I'm +1 pending Jenkins. Thanks! > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701384#comment-16701384 ] Zhaohui Xin commented on MAPREDUCE-6190: I think it is easier to understand to update the report value explicitly when the task heartbeat. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701399#comment-16701399 ] Hadoop QA commented on MAPREDUCE-6190: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 2 new + 548 unchanged - 2 fixed = 550 total (was 550) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 23s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 7s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | MAPREDUCE-6190 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12949789/MAPREDUCE-6190.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 6f4b41fc8d8c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/pr
[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701384#comment-16701384 ] Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/28/18 5:11 AM: -- I think it is easier to understand to update the report value explicitly when the task heartbeat firstly. was (Author: uranus): I think it is easier to understand to update the report value explicitly when the task heartbeat. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701405#comment-16701405 ] Akira Ajisaka commented on MAPREDUCE-6190: -- Thank you for clarifying. Would you fix the checkstyle warnings? I'm +1 if that is addressed. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701409#comment-16701409 ] Zhaohui Xin commented on MAPREDUCE-6190: The only checkstyle warnings are these, I think we can ignore these. :D {code:java} ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:361: static final String TASK_STUCK_TIMEOUT_MS =:3: Redundant 'static' modifier. [RedundantModifier] ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:363: static final long DEFAULT_TASK_STUCK_TIMEOUT_MS =:3: Redundant 'static' modifier. [RedundantModifier] {code} > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated MAPREDUCE-6190: --- Attachment: MAPREDUCE-6190.005.patch > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701460#comment-16701460 ] Zhaohui Xin commented on MAPREDUCE-6190: (y)Agreed. I resubmitted new patch.:D > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701460#comment-16701460 ] Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/28/18 6:52 AM: -- (y) Agreed. I resubmitted new patch.:D was (Author: uranus): (y)Agreed. I resubmitted new patch.:D > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701455#comment-16701455 ] Akira Ajisaka commented on MAPREDUCE-6190: -- If the warning are not actually problems, let's fix them as possible and keep the number of the warnings smaller. If there are too many warnings, it is hard to distinguish that a warning is false-positive or not. > MR Job is stuck because of one mapper stuck in STARTING > --- > > Key: MAPREDUCE-6190 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1 >Reporter: Ankit Malhotra >Assignee: Zhaohui Xin >Priority: Major > Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, > MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch > > > Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster > with map reduce jobs on YARN. > We had a job stuck for hours because one of the mappers never started up > fully. Basically, the map task had 2 attempts, the first one failed and the > AM tried to schedule a second one and the second attempt was stuck on STATE: > STARTING, STATUS: NEW. A node never got assigned and the task along with the > job was stuck indefinitely. > The AM logs had this being logged again and again: > {code} > 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1408745633994_450952_02_003807 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption > successful attempt_1408745633994_450952_r_48_1000 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:0 > 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1408745633994_450952_r_50_1000 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 > finalReduceMemLimit:1720320 netScheduledMapMem:2560 > netScheduledReduceMem:1722880 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 > AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 > ContRel:798 HostLocal:2944 RackLocal:155 > 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule, headroom=0 > {code} > On killing the task manually, the AM started up the task again, scheduled and > ran it successfully completing the task and the job with it. > Some quick code grepping led us here: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397 > But still dont quite understand why this would happen once in a while and why > the job would suddenly be ok once the stuck task is manually killed. > Note: Other jobs succeed on the cluster while this job is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7165) mapred-site.xml is misformatted in single node setup document
Akira Ajisaka created MAPREDUCE-7165: Summary: mapred-site.xml is misformatted in single node setup document Key: MAPREDUCE-7165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7165 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Reporter: Akira Ajisaka In https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node, there are two configuration tags in mapred-site.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org