[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2021-08-03 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7011:
--
Fix Version/s: 2.10.2

I cherry-picked this to 2.10, since we are seeing this there.

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Christopher Douglas
>Assignee: Christopher Douglas
>Priority: Trivial
> Fix For: 3.0.1, 2.10.2
>
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent

2021-07-07 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened MAPREDUCE-7203:
---

Sorry! I was in the wrong window!

> TestRuntimeEstimators fails intermittent
> 
>
> Key: MAPREDUCE-7203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> TestRuntimeEstimators fails intermittent.
> {code}
> [ERROR] 
> testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators)
>   Time elapsed: 9.637 s  <<< FAILURE!
> java.lang.AssertionError: We got the wrong number of successful speculations. 
> expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent

2021-07-07 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7203:
--
Fix Version/s: (was: 3.3.2)
   (was: 3.2.3)
   (was: 2.10.2)
   (was: 3.4.0)

> TestRuntimeEstimators fails intermittent
> 
>
> Key: MAPREDUCE-7203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TestRuntimeEstimators fails intermittent.
> {code}
> [ERROR] 
> testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators)
>   Time elapsed: 9.637 s  <<< FAILURE!
> java.lang.AssertionError: We got the wrong number of successful speculations. 
> expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent

2021-07-07 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved MAPREDUCE-7203.
---
Fix Version/s: 3.3.2
   3.2.3
   2.10.2
   3.4.0
   Resolution: Fixed

> TestRuntimeEstimators fails intermittent
> 
>
> Key: MAPREDUCE-7203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> TestRuntimeEstimators fails intermittent.
> {code}
> [ERROR] 
> testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators)
>   Time elapsed: 9.637 s  <<< FAILURE!
> java.lang.AssertionError: We got the wrong number of successful speculations. 
> expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-07-07 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376831#comment-17376831
 ] 

Eric Payne commented on MAPREDUCE-7353:
---

+1. Will commit now.
Thanks very much, [~BilwaST].

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-07-07 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376772#comment-17376772
 ] 

Eric Payne commented on MAPREDUCE-7353:
---

Thanks a lot, [~BilwaST], for the patch update. The code and UT LGTM. I want to 
run a few more tests in my environment, but once I've done that I'll commit.

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-23 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368405#comment-17368405
 ] 

Eric Payne commented on MAPREDUCE-7353:
---

[~BilwaST], the changes LGTM. Would it be possible to add unit tests, perhaps 
to {{TestTaskAttempt}}?

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-17 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364967#comment-17364967
 ] 

Eric Payne commented on MAPREDUCE-7353:
---

[~BilwaST], thanks for raising this. I have encountered a similar situation. 
This would be important to fix. I will try to look at this early next week. I 
appreciate your patience.

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> 

[jira] [Comment Edited] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

2021-01-25 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271674#comment-17271674
 ] 

Eric Payne edited comment on MAPREDUCE-7227 at 1/25/21, 9:00 PM:
-

[~luoyuan], I'm sorry for the long delay.
bq. I Set up two hdfs cluster, one named 'test-hdfs',another one named 
'alg-hdfs', the test-hdfs also runs on yarn.
So, IIUC, there is one YARN instance, but 2 HDFS instances, and YARN can use 
either one? And then each nodemanager would be configured to talk to both HDFS 
namenodes?

I'm not certain YARN can support that. Each application in YARN can talk to 
another HDFS instance by specifying the full scheme to the namenode, -but I 
have not heard of the use case as you have described.-
Is this setup a typical HDFS HA configuration?



was (Author: eepayne):
[~luoyuan], I'm sorry for the long delay.
bq. I Set up two hdfs cluster, one named 'test-hdfs',another one named 
'alg-hdfs', the test-hdfs also runs on yarn.
So, IIUC, there is one YARN instance, but 2 HDFS instances, and YARN can use 
either one? And then each nodemanager would be configured to talk to both HDFS 
namenodes?

I'm not certain YARN can support that. Each application in YARN can talk to 
another HDFS instance by specifying the full scheme to the namenode, but I have 
not heard of the use case as you have described.


> Fix job staging directory residual problem in a big yarn cluster composed of 
> multiple independent hdfs clusters
> ---
>
> Key: MAPREDUCE-7227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 2.6.0, 2.7.0, 3.1.2
>Reporter: Yuan LUO
>Assignee: Yuan LUO
>Priority: Major
> Attachments: 1.png, 2.png, HADOOP-MAPREDUCE-7227.001.patch, 
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch, 
> HADOOP-MAPREDUCE-7227.004.patch, HADOOP-MAPREDUCE-7227.005.patch, 
> Process_Analysis.png
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the 
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn 
> cluster, if the 'default.FS'  between client and nodemanager  is 
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because 
> it will produce two job staging dirs in our conditions by client and 
> appmaster. So we can modify AppMaster  through  client's ‘default.FS’ to 
> create job staging dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

2021-01-25 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271674#comment-17271674
 ] 

Eric Payne commented on MAPREDUCE-7227:
---

[~luoyuan], I'm sorry for the long delay.
bq. I Set up two hdfs cluster, one named 'test-hdfs',another one named 
'alg-hdfs', the test-hdfs also runs on yarn.
So, IIUC, there is one YARN instance, but 2 HDFS instances, and YARN can use 
either one? And then each nodemanager would be configured to talk to both HDFS 
namenodes?

I'm not certain YARN can support that. Each application in YARN can talk to 
another HDFS instance by specifying the full scheme to the namenode, but I have 
not heard of the use case as you have described.


> Fix job staging directory residual problem in a big yarn cluster composed of 
> multiple independent hdfs clusters
> ---
>
> Key: MAPREDUCE-7227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 2.6.0, 2.7.0, 3.1.2
>Reporter: Yuan LUO
>Assignee: Yuan LUO
>Priority: Major
> Attachments: 1.png, 2.png, HADOOP-MAPREDUCE-7227.001.patch, 
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch, 
> HADOOP-MAPREDUCE-7227.004.patch, HADOOP-MAPREDUCE-7227.005.patch, 
> Process_Analysis.png
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the 
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn 
> cluster, if the 'default.FS'  between client and nodemanager  is 
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because 
> it will produce two job staging dirs in our conditions by client and 
> appmaster. So we can modify AppMaster  through  client's ‘default.FS’ to 
> create job staging dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268715#comment-17268715
 ] 

Eric Payne commented on MAPREDUCE-7314:
---

[~BilwaST], on what version are you seeing this?

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-27 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7277:
--
Fix Version/s: 3.4.0
   2.10.1
   3.2.2
   3.1.4
   2.9.3
   3.3.0
   2.8.6
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~jeagles]. I've committed to 2.8, 2.9, 2.10, 3.1, 3.2 and trunk

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Fix For: 2.8.6, 3.3.0, 2.9.3, 3.1.4, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch, 
> MAPREDUCE-7277.002.patch, MAPREDUCE-7277.003.patch, 
> MAPREDUCE-7277.004-branch-2.10.patch, MAPREDUCE-7277.004.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-27 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094010#comment-17094010
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

[~jeagles], there are a couple of findbugs warnings. Should we be concerned?

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch, 
> MAPREDUCE-7277.002.patch, MAPREDUCE-7277.003.patch, 
> MAPREDUCE-7277.004-branch-2.10.patch, MAPREDUCE-7277.004.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-27 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093910#comment-17093910
 ] 

Eric Payne edited comment on MAPREDUCE-7277 at 4/27/20, 8:30 PM:
-

[~jeagles], we want this backported to 2.8, correct? I think patterns such as 
the following won't build in 2.10 because LOG.debug(...) expects the second 
parameter to be of type Throwable.
{code}
LOG.debug("IndexCache HIT: MapId {} found", mapId);
{code}
How would like to proceed?


was (Author: eepayne):
[~jeagles], we want this backported to 2.10, correct? I think patterns such as 
the following won't build in 2.10 because LOG.debug(...) expects the second 
parameter to be of type Throwable.
{code}
LOG.debug("IndexCache HIT: MapId {} found", mapId);
{code}
How would like to proceed?

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch, 
> MAPREDUCE-7277.002.patch, MAPREDUCE-7277.003.patch, MAPREDUCE-7277.004.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-27 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093910#comment-17093910
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

[~jeagles], we want this backported to 2.10, correct? I think patterns such as 
the following won't build in 2.10 because LOG.debug(...) expects the second 
parameter to be of type Throwable.
{code}
LOG.debug("IndexCache HIT: MapId {} found", mapId);
{code}
How would like to proceed?

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch, 
> MAPREDUCE-7277.002.patch, MAPREDUCE-7277.003.patch, MAPREDUCE-7277.004.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-27 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093847#comment-17093847
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

Thanks [~jeagles] for your explanations, comments, and updating the patch.

+1 from me.

I will commit this afternoon.

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch, 
> MAPREDUCE-7277.002.patch, MAPREDUCE-7277.003.patch, MAPREDUCE-7277.004.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-23 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090762#comment-17090762
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

bq. In the following code, if ever mapId is in queue but not in cache, I think 
totalMemoryUsed could still be out of sync
I think I've convinced myself that this risk is almost null since the 
{{mapId}}'s {{IndexInfo}} will not be in the cache if {{mapId}} is not in the 
queue.

I have an additional question, however.
- {{IndexCache#readIndexFileToCache}} / {{IndexCache#removeMap}}:
-- Is it possible for {{removeMap(mapId)}} to get called after 
{{readIndexFileToCache}} adds {{mapId}}'s IndexInfo to the cache but before it 
adds {{mapId}} to the queue? If so, the following code will leave its IndexInfo 
in the cache (and in the queue after {{readIndexFileToCache}} switches back in):
{code:title=IndexCache#removeMap}
if (!queue.remove(mapId)) {
  LOG.debug("Map ID {} not found in queue", mapId);
  return;
}
{code}


> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-22 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090063#comment-17090063
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

Thanks a lot [~jeagles] for raising this issue and providing a patch. I am 
still going through the test code, but here are my thoughts so far:

- IndexCache#readIndexFileToCache
 -- Why is {{checkTotalMemoryUsed()}} called in the finally block? The return 
value is not checked and AFAICT, it doesn't have any side effects.
 - IndexCache#removeMap
 -- In the following code, if {{mapId}} isn't in {{queue}}, does that 
necessarily follow that it is not in {{cache}}? I think the answer is yes, 
right? It only gets put in the {{queue}} once it's in the {{cache}}.
{code:java}
  public void removeMap(String mapId) throws IOException {
if (!queue.remove(mapId)) {
  LOG.debug("Map ID {} not found in queue", mapId);
  return;
}
 ...
  }
{code}

 - IndexCache#freeIndexInformation:
 -- In the following code, if ever {{mapId}} is in {{queue}} but not in 
{{cache}}, I think {{totalMemoryUsed}} could still be out of sync, because by 
the time freeIndexInformation is called, {{mapId}}'s indexinfo size should have 
already been added to {{totalMemoryUsed}}. But that should never happen, right? 
{{cache}} gets updated first and then {{queue}}, so if {{mapId}} is in 
{{queue}}, it should also be in {{cache}}
{code:java}
  private synchronized void freeIndexInformation() throws IOException {
while (totalMemoryUsed.get() > totalMemoryAllowed) {
  String mapId = queue.remove();
  IndexInformation info = cache.remove(mapId);
  if (info == null) {
LOG.warn("Map ID " + mapId + " not found in cache");
continue;
  }
  ...
}
  }
{code}

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-13 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082631#comment-17082631
 ] 

Eric Payne edited comment on MAPREDUCE-7272 at 4/13/20, 8:37 PM:
-

I committed to trunk, branch-3.3, branch-3.2, branch-3.1, branch-2.10, 
branch-2.9, and branch-2.8


was (Author: eepayne):
I committed to trunk, branch-3.3, branch-3.2, branch-3.1, branch-2.10, and 
branch-2.8

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 2.8.6, 3.3.0, 2.9.3, 3.1.4, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch, MAPREDUCE-7272.004.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-13 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7272:
--
Fix Version/s: 3.4.0
   2.9.3

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 2.8.6, 3.3.0, 2.9.3, 3.1.4, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch, MAPREDUCE-7272.004.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-13 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7272:
--
Fix Version/s: 2.10.1
   3.2.2
   3.1.4
   3.3.0
   2.8.6
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I committed to trunk, branch-3.3, branch-3.2, branch-3.1, branch-2.10, and 
branch-2.8

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 2.8.6, 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch, MAPREDUCE-7272.004.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-13 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082512#comment-17082512
 ] 

Eric Payne commented on MAPREDUCE-7272:
---

+1. [~ahussein], thanks for providing fixes for this problem.
[~ebadger], do you want me to commit this?

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch, MAPREDUCE-7272.004.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080635#comment-17080635
 ] 

Eric Payne commented on MAPREDUCE-7272:
---

Thanks for the new patch [~ahussein]!

It looks like findbugs is still complaining about {{ConcurrentHashMap may not 
be atomic}}
{code:title=TaskAttemptListenerImpl#statusUpdate:379}
if (logPair == null) {
  logPair = new TaskProgressLogPair(taskAttemptID);
  taskAttemptLogProgressStamps.put(taskAttemptID, logPair);
}
{code}
Maybe need to synchronize around this code?

Also, 2 additional findbugs warnings about TaskAttemptID type. Not sure why 
that's complaining.

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272.001.patch, MAPREDUCE-7272.002.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080031#comment-17080031
 ] 

Eric Payne commented on MAPREDUCE-7272:
---

{quote}
{noformat}
if (LOG.isDebugEnabled()) {
{noformat}
I believe this is unnecessary. Pretty sure the {{log.debug}} will only log if 
debug is enabled. 
{quote}
[~ebadger], while this is true, the convention is to surround the {{LOG.debug}} 
with {{if (LOG.isDebugEnabled())}} to avoid the performance hit from 
constructing the string only to throw it away if debugging is disabled.

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080012#comment-17080012
 ] 

Eric Payne commented on MAPREDUCE-7272:
---

Thanks [~ahussein] for reporting this issue and providing the patches.

Please address the findbugs warnings from branch-2.10 build.

Also, I have a few minor comments:
- TaskAttemptListenerImpl#ConcurrentHashMap:
-- I think the standard is for static final constant variables. I don't think 
TASK_ATTEMPT_PROGRESS_LOG_STAMPS is constant.
-- NIT: Comment for static class TaskAttemptProgressLogPair: 'and the progress' 
should be on the same line
- MRJobConfUtil#convertTaskProgressToFactor
-- NIT: redundant "the" -- "the the"


> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-29 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7079:
--
Fix Version/s: 2.10.1
   3.2.2
   3.1.4
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-3.2, branch-3.1, and branch-2.10.
Thanks [~ahussein].

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-28 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025412#comment-17025412
 ] 

Eric Payne commented on MAPREDUCE-7079:
---

It backports cleanly to branch-2.10.
I'll wait for additional comments and then, if no objections, commit tomorrow.

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-28 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025405#comment-17025405
 ] 

Eric Payne commented on MAPREDUCE-7079:
---

[~ahussein],
+1. Latest patch LGTM.
I assume we want to pull this back to branch-2.10. I will check to see if it 
comes back cleanly.

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-24 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023281#comment-17023281
 ] 

Eric Payne commented on MAPREDUCE-7079:
---

Thanks [~ahussein] for providing this patch! Just a couple of comments:

{{TestMRIntermediateDataEncryption}}:
- testTitle is not used
- Is FORCE_JVM_SECURITY_EGD necessary? It's always 'true' and can't be 
overridden.
- in {{runMergeTest}}: Not sure why it's necessary to catch and re-throw the 
same exception. The log message seems unnecessary since the entire exception 
should be printed at the top level. Can the catch and re-throw just be removed?
- I'm okay with adding the workaround for the entry problem. Can you please prov
ide the link to the JIRA that addresses the root cause?


> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-22 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021306#comment-17021306
 ] 

Eric Payne commented on MAPREDUCE-7079:
---

bq. 1000l is 1000 followed by "l".
Thanks [~ahussein], I missed that. I'll continue reviewing.

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-15 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016276#comment-17016276
 ] 

Eric Payne commented on MAPREDUCE-7079:
---

Hi [~ahussein]. I'm still working my way through the comments above and trying 
to understand the code. However, I have one comment to start with.

In {{JobHistory}}, the old code looks like it loops for about 10 seconds before 
the timeout kicks in (unless the shutdown completes or it's interrupted):
{code:title=JobHistory#serviceStop}
  while (!scheduledExecutor.isShutdown()
  && System.currentTimeMillis() > currentTime + 1000l && !interrupted) {
try {
  Thread.sleep(20);
{code}
However, if I'm reading this right, the new code has a timeout of 1 second:
{code:title=JobHistory#serviceStop}
  int retryCnt = 50;
  try {
while (!scheduledExecutor.awaitTermination(20,
TimeUnit.MILLISECONDS)) {
  if (--retryCnt == 0) {
{code}
{{20ms timeout X 50 retries = 1000 ms}}

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

2019-08-22 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913358#comment-16913358
 ] 

Eric Payne commented on MAPREDUCE-7227:
---

[~luoyuan], I am having trouble reproducing this issue. Can you provide 
detailed steps along with config details?

Here is what I did and the results of my test:
- Ran a sleep job on gateway for cluster1, but added the following property to 
the command line of the sleep job:
-- {{-Dfs.defaultFS='hdfs://cluster2-nn.mycompany.com'}}
- I performed a file listing in both 
{{hdfs://cluster1-nn.mycompany.com/user/me/.staging}} and 
{{hdfs://cluster2-nn.mycompany.com/user/me/.staging}}
- The sleep job ran on cluster1 and the job staging dir was in cluster2's 
{{/user/me/.staging/}}
- After the job completed, the job staging dir was gone from cluster2's 
{{/user/me/.staging}}

Any clarification would be appreciated.

> Fix job staging directory residual problem in a big yarn cluster composed of 
> multiple independent hdfs clusters
> ---
>
> Key: MAPREDUCE-7227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 2.6.0, 2.7.0, 3.1.2
>Reporter: Yuan LUO
>Assignee: Yuan LUO
>Priority: Major
> Attachments: HADOOP-MAPREDUCE-7227.001.patch, 
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch, 
> HADOOP-MAPREDUCE-7227.004.patch
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the 
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn 
> cluster, if the 'default.FS'  between client and nodemanager  is 
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because 
> it will produce two job staging dirs in our conditions by client and 
> appmaster. So we can modify AppMaster  through  client's ‘default.FS’ to 
> create job staging dir.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

2019-08-09 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903892#comment-16903892
 ] 

Eric Payne commented on MAPREDUCE-7227:
---

[~luoyuan], thank you for raising this issue and bringing it to our attention.

The fix in patch 004 doesn't look quite right to me. I feel that there should 
be a way to solve this within the MRAppMaster. Please give me some time to 
investigate.

> Fix job staging directory residual problem in a big yarn cluster composed of 
> multiple independent hdfs clusters
> ---
>
> Key: MAPREDUCE-7227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 2.6.0, 2.7.0, 3.1.2
>Reporter: Yuan LUO
>Assignee: Yuan LUO
>Priority: Major
> Attachments: HADOOP-MAPREDUCE-7227.001.patch, 
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch, 
> HADOOP-MAPREDUCE-7227.004.patch
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the 
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn 
> cluster, if the 'default.FS'  between client and nodemanager  is 
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because 
> it will produce two job staging dirs in our conditions by client and 
> appmaster. So we can modify AppMaster  through  client's ‘default.FS’ to 
> create job staging dir.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-3801:
--
   Resolution: Fixed
Fix Version/s: 2.8.6
   3.1.2
   2.9.2
   3.0.4
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.9.2, 3.1.2, 2.8.6
>
> Attachments: MAPREDUCE-3801.001.patch, 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619665#comment-16619665
 ] 

Eric Payne commented on MAPREDUCE-3801:
---

Thanks for the patch [~jlowe].

+1

Will commit shortly

> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-3801.001.patch, 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367455#comment-16367455
 ] 

Eric Payne edited comment on MAPREDUCE-7053 at 5/7/18 1:15 PM:
---

Thanks [~jlowe].

I committed MAPREDUCE-7053.001.patch to trunk, and cherry-picked to branch-3.1, 
branch-3.0, and -branch-3.0.1-.
 I committed MAPREDUCE-7053-branch-2.001.patch branch-2, branch-2.9 and 
branch-2.8


was (Author: eepayne):
Thanks [~jlowe].

I committed MAPREDUCE-7053.001.patch to trunk, and cherry-picked to branch-3.1, 
branch-3.0, and branch-3.0.1.
I committed MAPREDUCE-7053-branch-2.001.patch branch-2, branch-2.9 and 
branch-2.8

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.3
>
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465895#comment-16465895
 ] 

Eric Payne commented on MAPREDUCE-7053:
---

bq. Thanks for the work here. I noticed that you reverted it from 3.0.2, but 
per your comment above, it's in branch-3.0.1.
[~yzhangal], It was reverted from branch-3.0.1 as well. Sorry about the 
confusion.


> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.3
>
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-02-16 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7053:
--
   Resolution: Fixed
Fix Version/s: 3.0.2
   2.8.4
   2.9.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-02-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367455#comment-16367455
 ] 

Eric Payne commented on MAPREDUCE-7053:
---

Thanks [~jlowe].

I committed MAPREDUCE-7053.001.patch to trunk, and cherry-picked to branch-3.1, 
branch-3.0, and branch-3.0.1.
I committed MAPREDUCE-7053-branch-2.001.patch branch-2, branch-2.9 and 
branch-2.8

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-02-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366263#comment-16366263
 ] 

Eric Payne commented on MAPREDUCE-7053:
---

Thanks [~jlowe] for fixing this problem, and thanks [~pbacsko] for the review.

+1. The patch LGTM.

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7033) Map outputs implicitly rely on permissive umask for shuffle

2018-02-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-7033:
--
   Resolution: Fixed
Fix Version/s: 3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

> Map outputs implicitly rely on permissive umask for shuffle
> ---
>
> Key: MAPREDUCE-7033
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7033
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
> Attachments: MAPREDUCE-7033.001.patch, MAPREDUCE-7033.002.patch
>
>
> Map tasks do not explicitly set the permissions of their output files for 
> shuffle.  In a secure cluster the shuffle service is running as a different 
> user than the map task, so the output files require group readability in 
> order to serve up the data during the shuffle phase.  If the user's UNIX 
> umask is too restrictive (e.g.: 077) then the map task's file.out and 
> file.out.index permissions can be too restrictive to allow the shuffle 
> handler to access them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7033) Map outputs implicitly rely on permissive umask for shuffle

2018-01-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347646#comment-16347646
 ] 

Eric Payne commented on MAPREDUCE-7033:
---

+1
[~jlowe] thanks for taking the time to fix this issue. I will commit shortly.

> Map outputs implicitly rely on permissive umask for shuffle
> ---
>
> Key: MAPREDUCE-7033
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7033
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-7033.001.patch, MAPREDUCE-7033.002.patch
>
>
> Map tasks do not explicitly set the permissions of their output files for 
> shuffle.  In a secure cluster the shuffle service is running as a different 
> user than the map task, so the output files require group readability in 
> order to serve up the data during the shuffle phase.  If the user's UNIX 
> umask is too restrictive (e.g.: 077) then the map task's file.out and 
> file.out.index permissions can be too restrictive to allow the shuffle 
> handler to access them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6976) mapred job -set-priority claims to set priority higher than yarn.cluster.max-application-priority

2017-10-05 Thread Eric Payne (JIRA)
Eric Payne created MAPREDUCE-6976:
-

 Summary: mapred job -set-priority claims to set priority higher 
than yarn.cluster.max-application-priority
 Key: MAPREDUCE-6976
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6976
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.8.1, 2.9.0, 3.1.0
Reporter: Eric Payne
Priority: Minor


With {{yarn.cluster.max-application-priority}} set to 20 and 
{{job_1507226760578_0002}} running at priority 0, run the following command:
{noformat}
$ mapred job -set-priority job_1507226760578_0002 21
Changed job priority.
{noformat}
The above commands sets {{job_1507226760578_0002}} to priority 20. If 
{{job_1507226760578_0002}} is already at 20, the command does nothing.

Compare this behavior to running the {{yarn application -updatePriority}} 
command:
{code}
$ yarn application -updatePriority 21 -appId application_1507226760578_0002
Updating priority of an aplication application_1507226760578_0002
Updated priority of an application  application_1507226760578_0002 to cluster 
max priority OR keeping old priority as application is in final states
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6960) Shuffle Handler prints disk error stack traces for every read failure.

2017-09-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6960:
--
   Resolution: Fixed
Fix Version/s: 2.8.3
   3.1.0
   3.0.0
   2.9.0
   Status: Resolved  (was: Patch Available)

> Shuffle Handler prints disk error stack traces for every read failure.
> --
>
> Key: MAPREDUCE-6960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6960
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.9.0, 3.0.0, 3.1.0, 2.8.3
>
> Attachments: MAPREDUCE-6960.001.patch
>
>
> {code}
>  } catch (IOException e) {
>   LOG.error("Shuffle error :", e);
> {code}
> In cases where the read from a disk fails and throws a DiskErrorException, 
> the shuffle handler prints the entire stack trace for each and every one of 
> the failures causing the nodemanager logs to quickly fill up the disk. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6960) Shuffle Handler prints disk error stack traces for every read failure.

2017-09-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171910#comment-16171910
 ] 

Eric Payne commented on MAPREDUCE-6960:
---

Thanks [~kshukla]. Patch LGTM. Will commit to 3.1.0, 3.0.0, 2.9, and 2.8.
+1

> Shuffle Handler prints disk error stack traces for every read failure.
> --
>
> Key: MAPREDUCE-6960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6960
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: MAPREDUCE-6960.001.patch
>
>
> {code}
>  } catch (IOException e) {
>   LOG.error("Shuffle error :", e);
> {code}
> In cases where the read from a disk fails and throws a DiskErrorException, 
> the shuffle handler prints the entire stack trace for each and every one of 
> the failures causing the nodemanager logs to quickly fill up the disk. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6960) Shuffle Handler prints disk error stack traces for every read failure.

2017-09-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171872#comment-16171872
 ] 

Eric Payne commented on MAPREDUCE-6960:
---

Sure [~kshukla], I'll take a look.

> Shuffle Handler prints disk error stack traces for every read failure.
> --
>
> Key: MAPREDUCE-6960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6960
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: MAPREDUCE-6960.001.patch
>
>
> {code}
>  } catch (IOException e) {
>   LOG.error("Shuffle error :", e);
> {code}
> In cases where the read from a disk fails and throws a DiskErrorException, 
> the shuffle handler prints the entire stack trace for each and every one of 
> the failures causing the nodemanager logs to quickly fill up the disk. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer

2017-09-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171858#comment-16171858
 ] 

Eric Payne commented on MAPREDUCE-6958:
---

Thanks [~jlowe]
The branch-2.8 patch LGTM.
+1

> Shuffle audit logger should log size of shuffle transfer
> 
>
> Key: MAPREDUCE-6958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch, 
> MAPREDUCE-6958.003.patch, MAPREDUCE-6958-branch-2.002.patch, 
> MAPREDUCE-6958-branch-2.8.002.patch
>
>
> The shuffle audit logger currently logs the job ID and reducer ID but nothing 
> about the size of the requested transfer.  It calculates this as part of the 
> HTTP response headers, so it would be trivial to log the response size.  This 
> would be very valuable for debugging network traffic storms from the shuffle 
> handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6944) MR job got hanged forever when some NMs unstable for some time

2017-08-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141978#comment-16141978
 ] 

Eric Payne commented on MAPREDUCE-6944:
---

[~daemon], what version of Hadoop are you running?

> MR job got hanged forever when some NMs unstable for some time
> --
>
> Key: MAPREDUCE-6944
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6944
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Reporter: YunFan Zhou
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> We encountered several jobs in the production environment due to the fact 
> that some of the NM unstable cause one *MAP* of the job to be stuck there, 
> and the job can't finish properly.
> However, the problems we encountered were different from those mentioned in 
> [https://issues.apache.org/jira/browse/MAPREDUCE-6513].  Because in our 
> scenario, all of *MR REDUCEs* does not start executing.
> But when I manually kill the hanged *MAP*, the job will be finished normally.
> {noformat}
> 2017-08-17 12:25:06,548 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start 
> threshold not met. completedMapsForReduceSlowstart 15564
> 2017-08-17 12:25:07,555 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_e84_1502793246072_73922_01_015700
> 2017-08-17 12:25:07,556 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=
> 2017-08-17 12:25:07,556 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start 
> threshold not met. completedMapsForReduceSlowstart 15564
> 2017-08-17 12:25:07,556 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:1009 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:15563 CompletedReds:0 ContAlloc:15723 ContRel:26 
> HostLocal:4575 RackLocal:8121
> {noformat}
> {noformat}
> 2017-08-17 14:49:41,793 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:1009 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 
> AssignedReds:0 CompletedMaps:15563 CompletedReds:0 ContAlloc:15724 ContRel:26 
> HostLocal:4575 RackLocal:8121
> 2017-08-17 14:49:41,794 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Applying ask 
> limit of 1 for priority:5 and capability:
> 2017-08-17 14:49:41,799 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1502793246072_73922: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=4236
> 2017-08-17 14:49:41,799 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=
> 2017-08-17 14:49:41,799 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start 
> threshold not met. completedMapsForReduceSlowstart 15564
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
> containers 1
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigning 
> container Container: [ContainerId: 
> container_e84_1502793246072_73922_01_015726, NodeId: 
> bigdata-hdp-apache1960.xg01.diditaxi.com:8041, NodeHttpAddress: 
> bigdata-hdp-apache1960.xg01.diditaxi.com:8042, Resource:  vCores:1>, Priority: 5, Token: Token { kind: ContainerToken, service: 
> 10.93.111.36:8041 }, ] to fast fail map
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned from 
> earlierFailedMaps
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_e84_1502793246072_73922_01_015726 to 
> attempt_1502793246072_73922_m_012103_5
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=
> 2017-08-17 14:49:42,805 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start 
> threshold not met. 

[jira] [Updated] (MAPREDUCE-6801) Fix flaky TestKill.testKillJob()

2017-01-04 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6801:
--
Fix Version/s: 2.8.0

Thanks [~haibochen] for the fix. I have backported this to branch-2.8.

> Fix flaky TestKill.testKillJob()
> 
>
> Key: MAPREDUCE-6801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: mapreduce6801.001.patch, mapreduce6801.002.patch
>
>
> TestKill.testKillJob often fails for the same reason with the following error 
> message:
> {code}
> 1 tests failed.
> FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob
> Error Message:
> Task state not correct expected: but was:
> Stack Trace:
> java.lang.AssertionError: Task state not correct expected: but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
> {code}
> The root cause is that when the job is in KILLED state from an external view, 
> TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may 
> not have been processed by the dispatcher thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6675) TestJobImpl.testUnusableNode failed

2017-01-04 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6675:
--
Fix Version/s: 2.8.0

Thanks [~haibochen] for the fix. I have backported this to branch-2.8.

> TestJobImpl.testUnusableNode failed 
> 
>
> Key: MAPREDUCE-6675
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: mapreduce6675.001.patch
>
>
> TestJobImpl#testUnusableNodeTransition is flaky.
> 2016-02-13 09:16:42 Running 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
> 2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 8.324 sec <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
> 2016-02-13 09:16:50 
> testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
>   Time elapsed: 5.165 sec  <<< FAILURE!
> 2016-02-13 09:16:50 java.lang.AssertionError: expected: but 
> was:
> 2016-02-13 09:16:50   at org.junit.Assert.fail(Assert.java:88)
> 2016-02-13 09:16:50   at org.junit.Assert.failNotEquals(Assert.java:743)
> 2016-02-13 09:16:50   at org.junit.Assert.assertEquals(Assert.java:118)
> 2016-02-13 09:16:50   at org.junit.Assert.assertEquals(Assert.java:144)
> 2016-02-13 09:16:50   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
> 2016-02-13 09:16:50   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Results :
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Failed tests: 
> 2016-02-13 09:16:50   
> TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 
> expected: but was:
> 2016-02-13 09:16:50 
> 2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.
> Looking at the code, an JobUpdatedNodesEvent is handled by putting an 
> TaskAttemptKill event on the async dispatcher queue and return immediately, 
> but the event might not have been processed by the time  all JobTaskEvents 
> events are seen by the job (the jobTaskSucceeded events are handed to Job 
> immediately without going through the dispatcher). Therefore, there is a 
> slight chance that the job will see all three succeeded attempts and  
> transition to Committing state before the taskAttemptKill event is handled by 
> the dispatcher. Committing jobs will reject later JobTaskEvents received, 
> transition to InternalError state and cause the test to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6812) Capacity Scheduler: Support user-specific minimum user limit percent

2016-11-16 Thread Eric Payne (JIRA)
Eric Payne created MAPREDUCE-6812:
-

 Summary: Capacity Scheduler: Support user-specific minimum user 
limit percent
 Key: MAPREDUCE-6812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6812
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: yarn, capacity-sched
Reporter: Eric Payne


Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
property is per queue. A cluster admin should be able to set the minimum user 
limit percent on a per-user basis within the queue.

This functionality is needed so that when intra-queue preemption is enabled 
(YARN-4945 / YARN-2113), some users can be deemed as more important than other 
users, and resources from VIP users won't be as likely to be preempted.

For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
{{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
this:

{code}
  

yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
25
  

  

yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
75
  
{code}

NOTE: This should be implemented in a way that user-limit-percent-intra-queue 
preemption (YARN-2113) should not be affected.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317290#comment-15317290
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

bq. The patch doesn't resolve automatically for branch-2 and 2.8. It is 
straightforward and I will resolve it for those two branches.
[~mingma],
I did see that, but I was hoping it was straightforward enough that it didn't 
need a separate patch. Thanks for doing the extra work for the cherry-pick.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.013.patch, MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316754#comment-15316754
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

- FindBugs warning is not related. It pertains to 
{{org.apache.hadoop.yarn.api.records.ResourceRequest}} / 
{{ResourceRequest.java:[line 361]}}, which was not changed by this patch.
- Checkstyle warnings are as I expected (see my comment, above).
- Unit test failures all pass in my local environment for {{TestLogsCLI}}, 
which intermittently fails both with and without this patch, and 
{{TestYarnClient}}, which fails consistently both with and without the patch.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.013.patch, MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-03 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.013.patch

{{MAPREDUCE-5044.013.patch}} addresses most of the checkstyle warnings from the 
previous pre-commit build. The one exception is for those flagged in 
{{TestMRJobs.java}}. It is triggering the alert because of an inner assignments 
in the following for loop:
{code}
for (String line; (line = syslogReader.readLine()) != null; ) {
...
}
{code}
The code could be changed to something like the following, but I think that 
would be more awkward:
{code}
String line = syslogReader.readLine();
for (String line; line != null; ) {
...
line = syslogReader.readLine();
}
{code}

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.013.patch, MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313187#comment-15313187
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

Thanks [~aw]. I will look into those warnings.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312847#comment-15312847
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

I looked at the unit test failures from the pre-commit build. The all succeed 
in my local build environment except for TestYarnClient, which fails 
intermittently in trunk, both with and without this patch.

[~mingma], when you have some time, please have a look at the latest patch.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-06-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.012.patch

The new test case ({{TestMRJobs#testThreadDumpOnTaskTimeout}}) when run with 
{{TestUberAM}}, detected that timeout did not cause a thread dump within an 
uber AM. So, I added code in {{LocalContainerLauncher}} in the latest patch 
({{MAPREDUCE-5044.012.patch}}) to handle the timeout event.

Instead of having the uber AM connect to the NM which would then send the QUIT 
signal back to the uber AM, I chose to dump the stack directly from the uber 
AM. I chose to use {{ThreadMXBean#dumpAllThreads}} even though there was 
already a Hadoop {{ReflwctionUtils#printThreadInfo}} method which would create 
a dump. The reason is because the output of {{ThreadMXBean#dumpAllThreads}} 
much more closely resembles the standard thread stack dump than does the output 
of {{ReflwctionUtils#printThreadInfo}}.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.012.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304512#comment-15304512
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

Of the unit tests that failed in the precommit build, all pass for me in my 
local build environment except for 2:
- {{TestMiniMRChildTask}} fails in trunk with or without 
{{MAPREDUCE-5044.011.patch}}
- {{TestUberAM}} succeeds in trunk and fails with {{MAPREDUCE-5044.011.patch}}.
This is because {{TestUberAM}} extends {{TestMRJobs}}, to which I added the 
test {{testThreadDumpOnTaskTimeout}}. 
{{TestMRJobs#testThreadDumpOnTaskTimeout}} is having issues. I will fix and 
upload a new patch.


> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Eric Payne
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-25 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Status: Patch Available  (was: Open)

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-25 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Status: Open  (was: Patch Available)

It looks like the pre-commit build faild:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6528/console
{quote}
Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Sorry, this connection is 
closed.
{quote}

Cancelling patch and re-Submitting patch

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-25 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.011.patch

Thanks for the continuing review, [~mingma]. I made the suggested changes. 
Please find them in attachment {{MAPREDUCE-5044.011.patch}}.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.011.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-21 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.010.patch

Thank you very much, [~mingma], for your review and suggestions:
{quote}
- rename {{signalContainer}} to {{signalToContainer}}
{quote}
Done.
{quote}
-{{ContainerManagerImpl}}. It might be cleaner to abstract the common 
signal container code to a function used for both {{AM -> NM}} and {{RM -> NM}} 
cases.
{quote}
Done.
{quote}
-{{TaskAttemptImpl#PreemptedTransition}}. Given it is called only when the 
attempt is preempted, event.getType() == TaskAttemptEventType.TA_TIMED_OUT can 
be replaced by false.
{quote}
Very true. Good catch.
{quote}
-It will be useful to add an end-to-end new unit test, which can be found 
in Gera's original patch.
{quote}
Done.
{quote}
-Nit: {{ContainerLauncherImpl}}. Return value of 
{{getContainerManagementProtocol().signalContainer}} isn't used and can be 
removed.
{quote}
Done.
{quote}
-Nit: {{ContainerLauncherEvent}} has indent format issue.
{quote}
Done.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.010.patch, MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291977#comment-15291977
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

[~mingma], thank you for your reply and explanation.
{quote}
- signalContainers was initially suggested as an ordered list of 
signalContainer. So it could include requests from the same container or 
requests from different containers. It is true that the only use case we know 
of so far is to include requests from the same container.
{quote}
In that case, do we want to call it something like {{signalsToContainers}}? I'm 
open for ideas.

{quote}
- Will the required in the protocol buffer definition create any issue if we do 
rolling upgrade from 2.8 to 2.9 and the 2.9 MR AM might send a list of 
SignalContainerCommandProto to 2.8 NM? Maybe 2.8 NM just discards the message, 
not a big deal. Regardless, that is a separate issue that we don't need to 
address it here.
{quote}
Yes, this is a concern and something we need to look into more deeply and keep 
in mind.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289975#comment-15289975
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

[~mingma], thank you very much for the comments. I have one question:
{quote}
- ... it might be useful to rename signalContainer to signalContainers so that 
we don't need to modify the API later, which means some new structure like 
SignalContainersRequest. What is your take?
{quote}

I would rather not rename {{signalContainer}} to {{signalContainers}} because 
{{signalContainers}} sounds to me like the purpose is to send one signal to 
multiple containers rather than to send multiple signals to one container. 
Calling it {{signalsContainer}} (plural {{signals}}) also sounds awkward. So, I 
think {{signalContainer}} is the best option.

Regarding {{SignalContainerRequest}}, if we want the {{signalContainer}} API to 
be fully compatible with sending multiple signals, I think 
{{SignalContainerRequest}} would need to add an interface for 
{{SignalContainerRequest#newInstance}} that included both pause and a list of 
signals. Maybe something like this:
{code}
public static SignalContainerRequest newInstance(ContainerId containerId,
int pause, Iterable signals) {
...
}
{code}
I think it would be best to add that interface to {{SignalContainerRequest}} in 
the future when we are ready to implement the rest of the "sending multiple 
signals" feature. Thoughts?


> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-05-16 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.009.patch

[~jlowe], [~jira.shegalov], [~mingma], [~xgong],
Patch 008 was no longer applying to trunk. I upmerged the patch and attaching 
MAPREDUCE-5044.009.patch. Can I please ask one of you to look at it?

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, 
> MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, 
> MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6678) Allow ShuffleHandler readahead without drop-behind

2016-05-10 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6678:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks [~nroberts]. I committed these changes to trunk, branch-2, and 
branch-2.8.

> Allow ShuffleHandler readahead without drop-behind
> --
>
> Key: MAPREDUCE-6678
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6678
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4964.001.patch
>
>
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
> (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
> ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by 
> large numbers of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because 
> there are cases where the server can successfully write the shuffle bytes to 
> the network, BUT the client doesn't want the bytes right now (MergeManager 
> wants to WAIT is an example) so it ignores them and asks for them again a bit 
> later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on 
> mapreduce.shuffle.readahead.bytes==0, leaving 
> mapreduce.shuffle.manage.os.cache controlling only the drop-behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6678) Allow ShuffleHandler readahead without drop-behind

2016-05-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276694#comment-15276694
 ] 

Eric Payne commented on MAPREDUCE-6678:
---

Thanks, [~nroberts], for raising this issue and providing a patch.

bq. Tested this patch on a 10-node cluster using terasort. Verified using 
strace that nodemanager is issuing correct WILLNEED without DONTNEED.

I recognize that it's difficult to produce a unit test for the patch. Would it 
be possible for you to post a very brief justification of that?

Otherwise, patch looks good to me.
+1

> Allow ShuffleHandler readahead without drop-behind
> --
>
> Key: MAPREDUCE-6678
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6678
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4964.001.patch
>
>
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
> (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
> ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by 
> large numbers of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because 
> there are cases where the server can successfully write the shuffle bytes to 
> the network, BUT the client doesn't want the bytes right now (MergeManager 
> wants to WAIT is an example) so it ignores them and asks for them again a bit 
> later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on 
> mapreduce.shuffle.readahead.bytes==0, leaving 
> mapreduce.shuffle.manage.os.cache controlling only the drop-behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-04-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.008.patch

[~jira.shegalov], [~mingma], [~xgong], [~jlowe], 
Upmerged patch and attaching MAPREDUCE-5044.008.patch.



> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.v01.patch, 
> MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, 
> MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, 
> MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, 
> Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245966#comment-15245966
 ] 

Eric Payne commented on MAPREDUCE-6633:
---

Thanks [~shahrs87]. I cherry picked this back to 2.7.

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6633:
--
Fix Version/s: (was: 2.8.0)
   2.7.3

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6649) getFailureInfo not returning any failure info

2016-04-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1529#comment-1529
 ] 

Eric Payne commented on MAPREDUCE-6649:
---

[~ebadger], I have checked this fix into trunk, branch-2 and branch-2.8. It 
looks like it will need a separate patch if we want it to go into branch-2.7.

> getFailureInfo not returning any failure info
> -
>
> Key: MAPREDUCE-6649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6649.001.patch, MAPREDUCE-6649.002.patch
>
>
> The following command does not produce any failure info as to why the job 
> failed. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dmapreduce.jobtracker.split.metainfo.maxsize=10 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1 -rt 1
> {noformat}
> {noformat}
> 2016-03-07 10:34:58,112 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0004 failed with 
> state FAILED due to: 
> {noformat}
> To contrast, here is a command and associated command line output to show a 
> failed job that gives the correct failiure info. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dyarn.app.mapreduce.am.command-opts=-goober 
> -Dmapreduce.job.queuename=default -m 20 -r 0 -mt 3
> {noformat}
> {noformat}
> 2016-03-07 10:30:13,103 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0003 failed with 
> state FAILED due to: Application application_1457364518683_0003 failed 3 
> times due to AM Container for appattempt_1457364518683_0003_03 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1457364518683_0003_03_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6649) getFailureInfo not returning any failure info

2016-04-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244429#comment-15244429
 ] 

Eric Payne commented on MAPREDUCE-6649:
---

Thanks [~ebadger] for finding and fixing this problem.

+1 . The fix looks good to me. Will commit

> getFailureInfo not returning any failure info
> -
>
> Key: MAPREDUCE-6649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6649.001.patch, MAPREDUCE-6649.002.patch
>
>
> The following command does not produce any failure info as to why the job 
> failed. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dmapreduce.jobtracker.split.metainfo.maxsize=10 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1 -rt 1
> {noformat}
> {noformat}
> 2016-03-07 10:34:58,112 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0004 failed with 
> state FAILED due to: 
> {noformat}
> To contrast, here is a command and associated command line output to show a 
> failed job that gives the correct failiure info. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dyarn.app.mapreduce.am.command-opts=-goober 
> -Dmapreduce.job.queuename=default -m 20 -r 0 -mt 3
> {noformat}
> {noformat}
> 2016-03-07 10:30:13,103 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0003 failed with 
> state FAILED due to: Application application_1457364518683_0003 failed 3 
> times due to AM Container for appattempt_1457364518683_0003_03 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1457364518683_0003_03_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6633:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233181#comment-15233181
 ] 

Eric Payne commented on MAPREDUCE-6633:
---

{quote}
In this case the decompressor threw RuntimeException 
(ArrayIndexOutOfBondsException is a subclass).
If we had re run the map on another node, the job would have succeeded.
...
I understand your concern but I think its a good change according to me.
{quote}
Thanks [~shahrs87]]. It would be ideal to come up with a subset that would 
cover only the exceptions that could be thrown, but I agree that the change is 
fine as it is.
+1

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6334) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler

2016-04-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224413#comment-15224413
 ] 

Eric Payne commented on MAPREDUCE-6334:
---

[~vishal.rajan], what version of Hadoop are you running?

> Fetcher#copyMapOutput is leaking usedMemory upon IOException during 
> InMemoryMapOutput shuffle handler
> -
>
> Key: MAPREDUCE-6334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6334
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.1, 2.6.2
>
> Attachments: MAPREDUCE-6334.001.patch, MAPREDUCE-6334.002.patch
>
>
> We are seeing this happen when
> - an NM's disk goes bad during the creation of map output(s)
> - the reducer's fetcher can read the shuffle header and reserve the memory
> - but gets an IOException when trying to shuffle for InMemoryMapOutput
> - shuffle fetch retry is enabled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-03-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219077#comment-15219077
 ] 

Eric Payne commented on MAPREDUCE-6633:
---

Thanks [~shahrs87] for reporting this issue and providing a patch.

Overall, the patch looks good. I am a little nervous about re-fetching for 
_any_ exception. If there is a runtime exception on the reducer (memory error, 
NPE, etc.), maps would be re-run unnecessarily. Although I do understand that 
the risk of that is low, and in any case, no data would be lost, just a little 
time and wasted resources. What are your thoughts?

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-03-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198447#comment-15198447
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

[~mingma], [~xgong], [~jlowe], [~jira.shegalov], did you have a chance to look 
at this patch? I would really appreciate some feedback.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-02-20 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Target Version/s: 2.8.0, 2.7.3  (was: 2.8.0)
  Status: Patch Available  (was: Open)

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-02-20 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-5044:
--
Attachment: MAPREDUCE-5044.v07.local.patch

Thanks, [~jira.shegalov] for all of the work already done on this JIRA.

I have upmerged the latest patch and integrated it with the 
{{SignalContainerRequest}} that was added as part of YARN-445 and its children.

[~mingma], [~xgong], [~jlowe], [~jira.shegalov], would you please take a look?

I would like to see functionality in this JIRA implemented. We occasionally see 
containers time out, and it would be good if users could have direct feedback 
in the form of a jstack to help them debug their applications.

IIUC, YARN-445 and its children put in place the infrastructure for a {{Client 
-> RM -> NM -> Container}} signal path. However, in order to automatically dump 
the jstack when a container times out, we still need an {{AM -> NM -> 
Container}} signal path. This JIRA (MAPREDUCE-5044 along with YARN-1515) adds 
this signal path along with the ability to send multiple signals per call.

I think sending multiple signals per call could be split into a separate JIRA.


> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 
> 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-02-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154866#comment-15154866
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

Thanks, [~jira.shegalov]. Would it be okay if I upmerged 
{{MAPREDUCE-5044.v06.patch}} and integrated it with the 
{{SignalContainerRequest}} that was added as part of YARN-445 and its children?


> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen 
> Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

2016-02-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144582#comment-15144582
 ] 

Eric Payne commented on MAPREDUCE-5044:
---

Hi [~jira.shegalov]. I would like to see this functionality implemented. We 
occasionally see containers time out, and it would be good if users could have 
direct feedback in the form of a jstack to help them debug their applications.

I have been coming up to speed on the work that's already been committed in 
this area under YARN-445 and its children. IIUC, YARN-445 and its children put 
in place the infrastructure for a {{Client -> RM -> NM -> Container}} signal 
path. On the other hand, this JIRA (along with YARN-1515) implements an {{AM -> 
NM -> Container}} signal path and the ability to send multiple signals per call.

It seems that these pieces could possibly be split into separate JIRAs. Either 
way, I think that a lot of what has been done in this JIRA could be used to add 
the interface to {{ContainerManagementProtocol}} that would allow the AM to 
prompt the NM to signal the container to dump its stack prior to killing the 
container on a timeout.

Is there a possibility that this JIRA will move forward? Ideally, we would like 
it all ported back to 2.7. Please let me know if there's anything I can do.

> Have AM trigger jstack on task attempts that timeout before killing them
> 
>
> Key: MAPREDUCE-5044
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, 
> MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, 
> MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen 
> Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack 
> output via SIGQUIT before killing the task attempt.  This would be invaluable 
> for helping users debug their hung tasks, especially if they do not have 
> shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6473) Job submission can take a long time during Cluster initialization

2016-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094249#comment-15094249
 ] 

Eric Payne commented on MAPREDUCE-6473:
---

bq. In the above scenario when thread 2 comes along and finds that the provider 
list is null , since thread 1 is already in the synchronized block, thread 2 
will wait
[~kshukla], Yes, you are correct. I had missed that. Thanks for the 
clarification.

+1 (non-binding)

> Job submission can take a long time during Cluster initialization
> -
>
> Key: MAPREDUCE-6473
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6473
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: 99%ile.png, MAPREDUCE-6473-PerfTest.txt, 
> MAPREDUCE-6473-v1.patch, MAPREDUCE-6473-v2.patch, MAPREDUCE-6473-v3.patch, 
> MAPREDUCE-6473-v4.patch, MAPREDUCE-6473-v5.patch, MAPREDUCE-6473-v6.patch, 
> MAPREDUCE-6473-v7.patch, avgtime.png
>
>
> During initialization in Cluster.java, the framework provider classes are 
> loaded inside a sync block which can considerably increase job submission 
> time when the number of submissions are high. The motive is to reduce time 
> spent in this sync block safely to improve performance.
> {noformat}
> synchronized (frameworkLoader) {
>   for (ClientProtocolProvider provider : frameworkLoader) {
> LOG.debug("Trying ClientProtocolProvider : "
> + provider.getClass().getName());
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6473) Job submission can take a long time during Cluster initialization

2016-01-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092785#comment-15092785
 ] 

Eric Payne commented on MAPREDUCE-6473:
---

[~kshukla], Thanks for documenting this problem and providing the patch for it.

It looks like the following could happen in {{Cluster.java}}:
- thread 1 enters {{initProviderList}} and begins to process the entries in 
{{frameworkLoader}}
- thread 2 enters {{initProviderList}}, sees that {{providerList}} is null, and 
exits. Then, in {{initialize}}, no processing for {{providerList}} would happen 
and it would throw an IOE when {{clientProtocolProvider}} is null. Am I missing 
something?


> Job submission can take a long time during Cluster initialization
> -
>
> Key: MAPREDUCE-6473
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6473
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: 99%ile.png, MAPREDUCE-6473-PerfTest.txt, 
> MAPREDUCE-6473-v1.patch, MAPREDUCE-6473-v2.patch, MAPREDUCE-6473-v3.patch, 
> MAPREDUCE-6473-v4.patch, MAPREDUCE-6473-v5.patch, MAPREDUCE-6473-v6.patch, 
> avgtime.png
>
>
> During initialization in Cluster.java, the framework provider classes are 
> loaded inside a sync block which can considerably increase job submission 
> time when the number of submissions are high. The motive is to reduce time 
> spent in this sync block safely to improve performance.
> {noformat}
> synchronized (frameworkLoader) {
>   for (ClientProtocolProvider provider : frameworkLoader) {
> LOG.debug("Trying ClientProtocolProvider : "
> + provider.getClass().getName());
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

2016-01-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved MAPREDUCE-2011.
---
Resolution: Won't Fix

[~knoguchi], here are [~jlowe]'s comments from an offline discussion:
I think the distributed cache already behaves the way you desire, at least in 
YARN. When a resource request arrives at the nodemanager, it tries to lookup 
the local resource info based on that request. If it finds it (i.e.: a hit in 
the cache) then it just increments the refcount of the resource – I don't see 
any attempt to stat HDFS to verify it's still there in HDFS. The only time I 
see the timestamp of the request compared with HDFS is when it tries to 
download the resource from HDFS.

> Reduce number of getFileStatus call made from every 
> task(TaskDistributedCache) setup
> 
>
> Key: MAPREDUCE-2011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache
>Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks 
> resulting in 500 map tasks launched per second resulting in  10,000 
> getFileStatus calls to the namenode.  Namenode can handle this but asking to 
> see if we can reduce this somehow.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-10-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942414#comment-14942414
 ] 

Eric Payne commented on MAPREDUCE-6451:
---

+1 (non-binding)

Thanks [~kshukla] for reworking the patch.

The Release Audit warning seems to be quite unrelated. Also, as you pointed 
out, the checkstyle warnings are not things that you have control over.


> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-10-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940334#comment-14940334
 ] 

Eric Payne commented on MAPREDUCE-6451:
---

[~kshukla], Thanks for providing this fix! It looks good in general, but I have 
a few suggestions.

For the checkstyle warnings, please document the ones you will not be fixing 
and the reason why. None of them are much of a problem, but I personally would 
like to see the following fixes (both are in {{DynamicInputFormat.java}}):
- Please put the left parenthesis on the previous line
{code}
+  public  DynamicInputChunkContext getChunkContext
+  (Configuration configuration) throws IOException{
{code}
- I know it's just whitespace, but it does look a little awkward, so if you 
could, please change the indentation:
{code}
+DistCpUtils.getFileSize(chunkFilePath,
+chunkContext.getConfiguration()), null), taskAttemptContext);
   }
{code}

In {{TestDynamicInputFormat.java}}:
- I like the assertions to include a string with an error message that is 
provided when the assertion fails. I recognize that the other assertions in 
this file don't use that format, but I think it helps when running the tests. 
So, for example, something like this:
{code}
+Assert.assertTrue("Contexts from different DynamicInputChunkContext 
objects should be different.", !firstContext.equals(thirdContext));
{code}
- I didn't find any unit tests for the original functionality that got moved 
from {{DynamicInputChunk}} to {{DynamicInputChunkContext}}. If they don't 
exist, can you please open a separate JIRA to cover that?


> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-09-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739504#comment-14739504
 ] 

Eric Payne commented on MAPREDUCE-5870:
---

{quote}
As discussed if option-2 is fine, I will raise a separate ticket in YARN to 
handle RM-AM update of priority (through heartbeat). 
And I will separate the JobStatus priority update from this ticket for now.
{quote}
Thanks, [~sunilg]. If I understand correctly, you will update the patch for 
this JIRA so that the AM will not query the RM for it's job priority, and then 
in another JIRA, mkae the change to have the RM tell the AM its priority as 
part a heartbeat ack. Is that correct? I just want to make sure that this JIRA 
doesn't add that extra load on the RM for an AM JobStatus query.

> Support for passing Job priority through Application Submission Context in 
> Mapreduce Side
> -
>
> Key: MAPREDUCE-5870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
> 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, 
> 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch
>
>
> Job Prioirty can be set from client side as below [Configuration and api].
>   a.  JobConf.getJobPriority() and 
> Job.setPriority(JobPriority priority) 
>   b.  We can also use configuration 
> "mapreduce.job.priority".
>   Now this Job priority can be passed in Application Submission 
> context from Client side.
>   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-09-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737076#comment-14737076
 ] 

Eric Payne commented on MAPREDUCE-5870:
---

{quote}
There's only two ways I can currently think of to get around that. Either we 
need to obsolete the priority field from the job status and provide a separate 
call to get the priority when the client really wants to know it, or we have to 
find a way for the AM to know its job priority so it can return it in its 
JobStatus responses. For the latter we could have the RM send it down in 
heartbeat responses, but there would be a delay between when a client updates 
the priority and the AM reports the updated value.
{quote}
[~jlowe] and [~sunilg],
I would vote for the second option. That is, update the priority in the RM and 
then tell the AM about it when it hearbeats in. I think it would be fine to 
have the short delay between when the priority was updated and the AM knows 
about it. I can't foresee a use case where the client would atomically need to 
know about the change. The one thing that might cause confusion is if the 
client set the priority and immediately read the priority which might still be 
the old value. This may cause confusion, but as long as this behavior is well 
documented it should be fine.

> Support for passing Job priority through Application Submission Context in 
> Mapreduce Side
> -
>
> Key: MAPREDUCE-5870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
> 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, 
> 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch
>
>
> Job Prioirty can be set from client side as below [Configuration and api].
>   a.  JobConf.getJobPriority() and 
> Job.setPriority(JobPriority priority) 
>   b.  We can also use configuration 
> "mapreduce.job.priority".
>   Now this Job priority can be passed in Application Submission 
> context from Client side.
>   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-08-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703789#comment-14703789
 ] 

Eric Payne commented on MAPREDUCE-5870:
---

[~sunilg], 
bq. As YARN-4014 is getting in, I will handle mapred cli command also to set 
priority at run time in same ticket.
Are you saying that you will be combining pieces of this JIRA (MAPREDUCE-5870) 
with YARN-4014? Can you please clarify which pieces?

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
 0003-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-08-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663133#comment-14663133
 ] 

Eric Payne commented on MAPREDUCE-5870:
---

Hi [~sunilg],
One thing I forgot to mention was that I had to add the following line to 
{{YARNRunner.java}} to get it to compile.
{code}
import org.apache.hadoop.yarn.api.records.Priority;
{code}

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
 0003-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-08-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662064#comment-14662064
 ] 

Eric Payne commented on MAPREDUCE-5870:
---

[~sunilg], for what it's worth, I have downloaded the latest patch (version 
003) and tested and verified it in conjunction with the changes that were made 
for YARN-2003.

I performed the following sleep jobs with 10 tasks each. My one-node cluster 
can run 5 containers at once.
- I submit sleep job1 to the default queue, setting 
{{-Dmapreduce.job.priority=LOW}}
- Job1 starts running 5 containers and has 5 tasks pending.
- I submit sleep job2 to the default queue, setting 
{{-Dmapreduce.job.priority=HIGH}}
- All 10 job2 tasks are pending.
- Once tasks from job1 complete, job2 gets the containers. Although job1 has 5 
tasks pending, the number of running tasks for job1 remains 0 until job2 has no 
more pending tasks and job2's running tasks begin to complete.
- At that point, job1's tasks begin again to receive containers.

I also verified that you can specify {{-Dmapreduce.job.priority=_number_}}, and 
the container allocations go to the higher numbered jobs.
Finally, I verified that if you make the priority higher than the cluster max, 
it silently sets the job priority to cluster max.

So, the bottom line is LGTM :-)
+1

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
 0003-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6420) Interrupted Exception in LocalContainerLauncher should be logged in warn/info levle

2015-06-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608298#comment-14608298
 ] 

Eric Payne commented on MAPREDUCE-6420:
---

[~lichangleo], thank you for fixing this issue and putting up the patch.

+1, LGTM

 Interrupted Exception in LocalContainerLauncher should be logged in warn/info 
 levle
 ---

 Key: MAPREDUCE-6420
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6420
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6420.1.patch


 Interrupted Exception in LocalContainerLauncher should be logged in warn/info 
 levle instead of error because it won't fail the job. Otherwise it will cause 
 some confusions during debugging



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-06-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574711#comment-14574711
 ] 

Eric Payne commented on MAPREDUCE-6174:
---

Thank you, [~jira.shegalov]!

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.007.patch, MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-06-03 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Attachment: MAPREDUCE-6174.007.patch

Thanks [~jira.shegalov].
{quote}
We can argue, leave it as a reminder it was just used for tests actually, or 
remove it because of the new real (non)use. Leaving it up to you.
{quote}
I have attache version 007.

I took out the {{@VisibleForTesting}} from the all of the constructors, even 
the old one, since it is not needed.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.007.patch, MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-06-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569056#comment-14569056
 ] 

Eric Payne commented on MAPREDUCE-6174:
---

Thanks, [~jira.shegalov], for your review.
bq. The second deprecated constructor should also delegate for clarity:
Agreed. I will make this change. Good catch.

One additional question: Should the new constructor be {{@VisibleForTesting}}? 
It seems to be fine without it, but Id like to know your opinion.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-06-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Status: Patch Available  (was: Open)

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.6.0, 3.0.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-06-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567430#comment-14567430
 ] 

Eric Payne commented on MAPREDUCE-6174:
---

The checkstyle warning is because the new {{OnDiskMapOutput}} constructor has 8 
parameters instead of the checkstyle-approved 7. The deprecated constructor had 
10, so my contention is that the new one is going in the right direction.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-05-30 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Attachment: MAPREDUCE-6174.006.patch

Thank you, [~jira.shegalov], for your detailed feedback. I have made the 
suggested changes with the latest 006 patch.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.006.patch, 
 MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-05-23 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Attachment: MAPREDUCE-6174.005.patch

{quote}
This call to the OnDiskMapOutputConstructor could either
# calculate the outputPath from within MergeManagerImpl
# or, since MergeManagerImpl is the only one calling this constructor and 
MergeManagerImpl is being modified anyway, MergeManagerImpl could use the 
second constructor and the first one could be eliminated.
{quote}
[~jira.shegalov], I chose option 2 :-) Would you mind taking a look and telling 
me what you thihk?
Thanks

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.005.patch, MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546877#comment-14546877
 ] 

Eric Payne commented on MAPREDUCE-6174:
---

Thanks [~jira.shegalov], for your detailed suggestions. One question:
{quote}
- We can remove unused parameters {{reduceId}} and {{mapOutputFile}} from the 
{{OnDiskMapOutput}} constructors.
{quote}
In the first constructor for {{OnDiskMapOutput}}, 
{{mapOutputFile#getInputFileForWrite}} is actually used to get the 
{{outputPath}} path for the second constructor.

{{MergeManagerImpl}} appears to be the only place which calls the first 
constructor for {{OnDiskMapOutput}}. This call to the 
{{OnDiskMapOutputConstructor}} could either
# calculate the {{outputPath}} from within {{MergeManagerImpl}}
# or, since {{MergeManagerImpl}} is the only one calling this constructor and 
{{MergeManagerImpl}} is being modified anyway, {{MergeManagerImpl}} could use 
the second constructor and the first one could be eliminated.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.004.patch, MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >