[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700044#comment-16700044
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

{quote}This problem has existed in our cluster for a year and occurred once in 
about a month. We finally found that it was a disk problem that led to a long 
container localization time, up to a few hours. We added a parameter of 
container start-up timeout to actively fail task with problematic start-up.
{quote}
Nice catch. This approach seems good to me and thanks [~uranus] for your patch. 
Some comments:
 * You need to document the new parameter in mapred-default.xml.
 * Can we make the type of {{count}} AtomicBoolean? The type long and using 
synchronized method seems overkill.
 * Would you make the newly added public methods package-private if possible?
 * Would you fix the checkstyle warnings?
 * Would you add {{@SuppressWarnings("unchecked")}} to the new test case?
 * Would you extend the task timeout to 1000ms or longer to avoid task timeout 
in the new test case? If the value is 100ms and the sleep is 100ms task timeout 
can occur.

Minor comment:
{code:java}
while (iterator.hasNext()) {
  Map.Entry entry
  = iterator.next();
  assertEquals(0, entry.getValue().count.longValue());
}
{code}
The while-loop can be replaced to for-each statement.

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully co

[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700049#comment-16700049
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

The test failure is not related to the patch. This failure is tracked by 
MAPREDUCE-7162.

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) MapReduce unit test is broken

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700051#comment-16700051
 ] 

Akira Ajisaka commented on MAPREDUCE-7162:
--

I think it's fine to add the flush method just before {{out.writeBytes("\n")}}.

> MapReduce unit test is broken
> -
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) MapReduce unit test is broken

2018-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700053#comment-16700053
 ] 

Hadoop QA commented on MAPREDUCE-7162:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
5s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | MAPREDUCE-7162 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949635/MAPREDUCE-7162.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a31bd4297bae 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 33e0df4 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7543/testReport/ |
| Max. process+thread count | 1607 (vs. ulimit of 1) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7543/console |
| Power

[jira] [Updated] (MAPREDUCE-7162) MapReduce unit test is broken

2018-11-27 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated MAPREDUCE-7162:
---
Attachment: MAPREDUCE-7162.003.patch

> MapReduce unit test is broken
> -
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated MAPREDUCE-7162:
-
Summary: TestEvents#testEvents fails  (was: MapReduce unit test is broken)

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700060#comment-16700060
 ] 

Zhaohui Xin commented on MAPREDUCE-7162:


[~ajisakaa], added new patch. :D

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated MAPREDUCE-6190:
---
Attachment: MAPREDUCE-6190.003.patch

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700150#comment-16700150
 ] 

Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/27/18 10:00 AM:
---

Hi, [~ajisakaa], I added new patch. :D


was (Author: uranus):
Hi, [~ajisakaa], added new patch. :D

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700150#comment-16700150
 ] 

Zhaohui Xin commented on MAPREDUCE-6190:


Hi, [~ajisakaa], added new patch. :D

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700221#comment-16700221
 ] 

Hadoop QA commented on MAPREDUCE-7162:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
51s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | MAPREDUCE-7162 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949640/MAPREDUCE-7162.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4619835dbcef 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2730ead |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7544/testReport/ |
| Max. process+thread count | 1027 (vs. ulimit of 1) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7544/console |
| Pow

[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700229#comment-16700229
 ] 

Akira Ajisaka commented on MAPREDUCE-7162:
--

+1, committing this.

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated MAPREDUCE-7162:
-
Component/s: test
 jobhistoryserver

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated MAPREDUCE-7162:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.1
   3.3.0
   3.1.2
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-3.2, and branch-3.1. Thank you, [~uranus]!

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700273#comment-16700273
 ] 

Hudson commented on MAPREDUCE-7162:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15508 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15508/])
MAPREDUCE-7162. TestEvents#testEvents fails. Contributed by Zhaohui Xin. 
(aajisaka: rev 1aad99a71813660b83628cacfed393d0b3a123cc)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java


> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700256#comment-16700256
 ] 

Zhaohui Xin commented on MAPREDUCE-7162:


[~ajisakaa] , Thanks for reviewing! :D

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700305#comment-16700305
 ] 

Hadoop QA commented on MAPREDUCE-6190:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 5 new + 
548 unchanged - 2 fixed = 553 total (was 550) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
23s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.jobhistory.TestEvents |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | MAPREDUCE-6190 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949649/MAPREDUCE-6190.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 4d60f522dbc1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x8

[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-27 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700657#comment-16700657
 ] 

Kuhu Shukla commented on MAPREDUCE-7164:


Thoughts? [~jlowe], [~jeagles]

> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: MAPREDUCE-7164.001.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700685#comment-16700685
 ] 

Wangda Tan commented on MAPREDUCE-7162:
---

Apologize for introducing the issue, Thanks [~uranus], [~ajisakaa] to get the 
issue resolved.

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-11-27 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700687#comment-16700687
 ] 

Peter Bacsko commented on MAPREDUCE-7152:
-

OK, I can confirm that this is not a bug. Closing this ticket.

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-11-27 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated MAPREDUCE-7152:

Resolution: Not A Bug
Status: Resolved  (was: Patch Available)

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701077#comment-16701077
 ] 

Jason Lowe commented on MAPREDUCE-7164:
---

Thanks for the patch!  I think it would be fine to downcast as necessary, with 
{{instanceof(Progressable)}} checks as necessary, skipping the progress update 
if the context is not progressable.  That way if someone uses file output 
committer algorithm v1 which does _not_ have a progress indicator (since this 
occurs in the AM rather than in task attempts) it still does the right thing.  
Similarly if something ends up calling the JobContext form of the constructor 
but does pass a context that is Progressable then it also continues to do the 
right thing.  A simple utility function that takes the JobContext, does the 
instance check and calls progress if possible would make this a lot cleaner, 
since there's only one place where it would need to do the downcast.


> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: MAPREDUCE-7164.001.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

Thank you for the update. Some additional comments:

* The time unit of the new parameter is millisecond, so the parameter name 
should include ms.
* {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private.
* I prefer using {{private final AtomicBoolean}} rather than {{private 
Boolean}} for {{reported}} because it is accessible from 
{{TaskHeartbeatHandler}} without using {{setReported()}} method.
* {{assertEquals(false, entry.getValue().isReported())}} can be simplified to 
{{assertFalse(entry.getValue().isReported()}}.
* Would you fix the checkstyle warnings?

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---

[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278
 ] 

Akira Ajisaka edited comment on MAPREDUCE-6190 at 11/28/18 1:50 AM:


Thank you for the update. Some additional comments:

* The time unit of the new parameter is millisecond, so the parameter name 
should include ms.
* {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private.
* I prefer using {{private final AtomicBoolean}} rather than {{private 
Boolean}} for {{reported}} because it is accessible from 
{{TaskHeartbeatHandler}} without using {{setReported()}} method and that way 
{{reported}} is not synchronized.
* {{assertEquals(false, entry.getValue().isReported())}} can be simplified to 
{{assertFalse(entry.getValue().isReported())}}.
* Would you fix the checkstyle warnings?


was (Author: ajisakaa):
Thank you for the update. Some additional comments:

* The time unit of the new parameter is millisecond, so the parameter name 
should include ms.
* {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private.
* I prefer using {{private final AtomicBoolean}} rather than {{private 
Boolean}} for {{reported}} because it is accessible from 
{{TaskHeartbeatHandler}} without using {{setReported()}} method and that way 
{{reported}] is not synchronized.
* {{assertEquals(false, entry.getValue().isReported())}} can be simplified to 
{{assertFalse(entry.getValue().isReported())}}.
* Would you fix the checkstyle warnings?

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapred

[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701278#comment-16701278
 ] 

Akira Ajisaka edited comment on MAPREDUCE-6190 at 11/28/18 1:50 AM:


Thank you for the update. Some additional comments:

* The time unit of the new parameter is millisecond, so the parameter name 
should include ms.
* {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private.
* I prefer using {{private final AtomicBoolean}} rather than {{private 
Boolean}} for {{reported}} because it is accessible from 
{{TaskHeartbeatHandler}} without using {{setReported()}} method and that way 
{{reported}] is not synchronized.
* {{assertEquals(false, entry.getValue().isReported())}} can be simplified to 
{{assertFalse(entry.getValue().isReported())}}.
* Would you fix the checkstyle warnings?


was (Author: ajisakaa):
Thank you for the update. Some additional comments:

* The time unit of the new parameter is millisecond, so the parameter name 
should include ms.
* {{TaskHeartbeatHandler#getRunningAttempts}} can be package-private.
* I prefer using {{private final AtomicBoolean}} rather than {{private 
Boolean}} for {{reported}} because it is accessible from 
{{TaskHeartbeatHandler}} without using {{setReported()}} method.
* {{assertEquals(false, entry.getValue().isReported())}} can be simplified to 
{{assertFalse(entry.getValue().isReported()}}.
* Would you fix the checkstyle warnings?

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculatin

[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated MAPREDUCE-6190:
---
Attachment: MAPREDUCE-6190.004.patch

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701375#comment-16701375
 ] 

Zhaohui Xin commented on MAPREDUCE-6190:


[~ajisakaa], Thanks for the review, all comments resolved. I added new patch. :D

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701378#comment-16701378
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

Is there any specific reason to use {{AtomicBoolean.compareAndSet(false, 
true)}}? I'm thinking {{AtomicBoolean.set(true)}} is sufficient. Otherwise I'm 
+1 pending Jenkins. Thanks!

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701384#comment-16701384
 ] 

Zhaohui Xin commented on MAPREDUCE-6190:


I think it is easier to understand to update the report value explicitly when 
the task heartbeat. 

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701399#comment-16701399
 ] 

Hadoop QA commented on MAPREDUCE-6190:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 2 new + 
548 unchanged - 2 fixed = 550 total (was 550) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
23s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m  
7s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | MAPREDUCE-6190 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949789/MAPREDUCE-6190.004.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 6f4b41fc8d8c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/pr

[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701384#comment-16701384
 ] 

Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/28/18 5:11 AM:
--

I think it is easier to understand to update the report value explicitly when 
the task heartbeat firstly. 


was (Author: uranus):
I think it is easier to understand to update the report value explicitly when 
the task heartbeat. 

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701405#comment-16701405
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

Thank you for clarifying. Would you fix the checkstyle warnings? I'm +1 if that 
is addressed.

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701409#comment-16701409
 ] 

Zhaohui Xin commented on MAPREDUCE-6190:


The only checkstyle warnings are these, I think we can ignore these. :D
{code:java}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:361:
  static final String TASK_STUCK_TIMEOUT_MS =:3: Redundant 'static' modifier. 
[RedundantModifier]
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java:363:
  static final long DEFAULT_TASK_STUCK_TIMEOUT_MS =:3: Redundant 'static' 
modifier. [RedundantModifier]
{code}
 

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated MAPREDUCE-6190:
---
Attachment: MAPREDUCE-6190.005.patch

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701460#comment-16701460
 ] 

Zhaohui Xin commented on MAPREDUCE-6190:


(y)Agreed. I resubmitted new patch.:D

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701460#comment-16701460
 ] 

Zhaohui Xin edited comment on MAPREDUCE-6190 at 11/28/18 6:52 AM:
--

(y) Agreed. I resubmitted new patch.:D


was (Author: uranus):
(y)Agreed. I resubmitted new patch.:D

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch, MAPREDUCE-6190.005.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6190) MR Job is stuck because of one mapper stuck in STARTING

2018-11-27 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701455#comment-16701455
 ] 

Akira Ajisaka commented on MAPREDUCE-6190:
--

If the warning are not actually problems, let's fix them as possible and keep 
the number of the warnings smaller. If there are too many warnings, it is hard 
to distinguish that a warning is false-positive or not.

> MR Job is stuck because of one mapper stuck in STARTING
> ---
>
> Key: MAPREDUCE-6190
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6190
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.1
>Reporter: Ankit Malhotra
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: MAPREDUCE-6190.001.patch, MAPREDUCE-6190.002.patch, 
> MAPREDUCE-6190.003.patch, MAPREDUCE-6190.004.patch
>
>
> Trying to figure out a weird issue we started seeing on our CDH5.1.0 cluster 
> with map reduce jobs on YARN.
> We had a job stuck for hours because one of the mappers never started up 
> fully. Basically, the map task had 2 attempts, the first one failed and the 
> AM tried to schedule a second one and the second attempt was stuck on STATE: 
> STARTING, STATUS: NEW. A node never got assigned and the task along with the 
> job was stuck indefinitely.
> The AM logs had this being logged again and again:
> {code}
> 2014-12-09 19:25:12,347 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1408745633994_450952_02_003807
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce preemption 
> successful attempt_1408745633994_450952_r_48_1000
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
> scheduled reduces:0
> 2014-12-09 19:25:13,352 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
> attempt_1408745633994_450952_r_50_1000
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> completedMapPercent 0.99968 totalMemLimit:1722880 finalMapMemLimit:2560 
> finalReduceMemLimit:1720320 netScheduledMapMem:2560 
> netScheduledReduceMem:1722880
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2014-12-09 19:25:13,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:77 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,353 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:78 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:673 CompletedMaps:3124 CompletedReds:0 ContAlloc:4789 
> ContRel:798 HostLocal:2944 RackLocal:155
> 2014-12-09 19:25:14,359 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=0
> {code}
> On killing the task manually, the AM started up the task again, scheduled and 
> ran it successfully completing the task and the job with it.
> Some quick code grepping led us here:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-app/2.3.0/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#397
> But still dont quite understand why this would happen once in a while and why 
> the job would suddenly be ok once the stuck task is manually killed.
> Note: Other jobs succeed on the cluster while this job is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7165) mapred-site.xml is misformatted in single node setup document

2018-11-27 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created MAPREDUCE-7165:


 Summary: mapred-site.xml is misformatted in single node setup 
document
 Key: MAPREDUCE-7165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Akira Ajisaka


In 
https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node,
 there are two configuration tags in mapred-site.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org