[jira] [Created] (MAPREDUCE-7370) Parallelize MultipleOutputs#close call

2021-11-29 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7370:


 Summary: Parallelize MultipleOutputs#close call
 Key: MAPREDUCE-7370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7370
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Ravuri Sushma sree


This call takes more time when there are lot of files to close and there is a 
high latency to close. Parallelize MultipleOutputs#close call to improve the 
speed.

{code}
  public void close() throws IOException {
for (RecordWriter writer : recordWriters.values()) {
  writer.close(null);
}
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close

2021-11-18 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7369:


 Summary: MapReduce tasks timing out when spends more time on 
MultipleOutputs#close
 Key: MAPREDUCE-7369
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


MapReduce tasks timing out when spends more time on MultipleOutputs#close. 
MultipleOutputs#closes takes more time when there are multiple files to be 
closed & there is a high latency in closing a stream.

{code}
2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1634949471086_61268_m_001115_0: 
AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs
{code}

MapReduce task timeout can be increased but it is tough to set the right 
timeout value. The timeout can be disabled with 0 but that might lead to 
hanging tasks not getting killed.

The tasks are sending the ping every 3 seconds which are not honored by 
ApplicationMaster. It expects the status information which won't be send during 
MultipleOutputs#close. This jira is to add a config which considers the ping 
from task as part of Task Liveliness Check in the ApplicationMaster.








--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7355) Fix MRApps#getStagingAreaDir to fetch it from Job Configuration

2021-06-21 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7355:


 Summary: Fix MRApps#getStagingAreaDir to fetch it from Job 
Configuration
 Key: MAPREDUCE-7355
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to 
avoid this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7351) CleanupJob when handling SIGTERM signal

2021-06-12 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7351:


 Summary: CleanupJob when handling SIGTERM signal
 Key: MAPREDUCE-7351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Currently MR CleanupJob happens when the job is either successful or fail. But 
during kill, it is not handled. This leaves all the temporary folders under the 
output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7238) TestMRJobs.testThreadDumpOnTaskTimeout fails

2019-08-31 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7238:


 Summary: TestMRJobs.testThreadDumpOnTaskTimeout fails
 Key: MAPREDUCE-7238
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestMRJobs.testThreadDumpOnTaskTimeout fails

{code}
[ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs)  
Time elapsed: 43.282 s  <<< FAILURE!
java.lang.AssertionError: No thread dump
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7231) hadoop-mapreduce-client-jobclient fails with timeout

2019-08-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7231:


 Summary: hadoop-mapreduce-client-jobclient fails with timeout
 Key: MAPREDUCE-7231
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7231
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph
 Attachments: Maven_TestCase_Report.txt

hadoop-mapreduce-client-jobclient fails with timeout

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7230:


 Summary: TestHSWebApp.testLogsViewSingle fails
 Key: MAPREDUCE-7230
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestHSWebApp.testLogsViewSingle fails.

{code}
[ERROR] 
testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  Time 
elapsed: 0.294 s  <<< FAILURE!
Argument(s) are different! Wanted:
printWriter.write(
"Logs not available for container_10_0001_01_01. Aggregation may not be 
complete, Check back later or try the nodemanager at localhost:1234"
);
-> at 
org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
Actual invocations have different arguments:
printWriter.print(
"http://www.w3.org/TR/html4/strict.dtd;>"
);
-> at 
org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
printWriter.write(
"http://www.w3.org/TR/html4/strict.dtd;>"
);
-> at java.io.PrintWriter.print(PrintWriter.java:617)
printWriter.write(
"http://www.w3.org/TR/html4/strict.dtd;>",
0,
90
);
-> at java.io.PrintWriter.write(PrintWriter.java:473)
printWriter.println(

);
-> at 
org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
printWriter.print(
" at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
printWriter.write(
" at java.io.PrintWriter.print(PrintWriter.java:603)
printWriter.write(
"

[jira] [Created] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-10 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7217:


 Summary: TestMRTimelineEventHandling.testMRTimelineEventHandling 
fails
 Key: MAPREDUCE-7217
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


*TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*

{code:java}

ERROR] 
testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
  Time elapsed: 46.337 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7216) TeraSort Job Fails on S3

2019-06-07 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7216:


 Summary: TeraSort Job Fails on S3
 Key: MAPREDUCE-7216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and 
writes partition filename but DirectoryStagingCommitter expects output path to 
not exist.


{code}
9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
state FAILED due to: Job setup failed : 
org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as 
Task committer attempt_1559891760159_0011_m_00_0: Destination path exists 
and committer conflict resolution mode is "fail"

at 
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)

at 
org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
{code}

Creating partition filename in /tmp or some other directory fixes the issue.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent

2019-05-10 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7203:


 Summary: TestRuntimeEstimators fails intermittent
 Key: MAPREDUCE-7203
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestRuntimeEstimators fails intermittent.

{code}
[ERROR] 
testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators)
  Time elapsed: 9.637 s  <<< FAILURE!
java.lang.AssertionError: We got the wrong number of successful speculations. 
expected:<3> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243)
at 
org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-03 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7201:


 Summary: Make Job History File Permissions configurable
 Key: MAPREDUCE-7201
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 3.2.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
configure the intermediate user directory permission but still the jhist file 
permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-10 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7184:


 Summary: TestJobCounters#getFileSize can ignore crc file
 Key: MAPREDUCE-7184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestJobCounters test cases are failing in trunk while validating the input 
files size with BYTES_READ by the job. The crc files are considered in 
getFileSize whereas the job FileInputFormat ignores them.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7145) Improve ShuffleHandler Logging

2018-09-26 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7145:


 Summary: Improve ShuffleHandler Logging
 Key: MAPREDUCE-7145
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7145
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


ShuffleHandler logs SpillFile not found when there is a permission denied issue 
which is misleading.

{code}
 try {
spill = SecureIOUtils.openForRandomRead(spillfile, "r", user, null);
  } catch (FileNotFoundException e) {
LOG.info(spillfile + " not found");
return null;
}
{code}

SecureIOUtils.openForRandomRead should log  "Permission denied" or  "No such 
file or directory" instead of generic "file not found"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7087) NNBench shows invalid Avg exec time and Avg Lat

2018-04-25 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7087:


 Summary: NNBench shows invalid Avg exec time and Avg Lat
 Key: MAPREDUCE-7087
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7087
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


NNBench shows Invalid Avg exec time  and Avg Lat when there is zero successful 
file operations. Better to not show them instead of invalid numbers.

{code}
18/04/25 09:57:33 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 
Infinity
18/04/25 09:57:33 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7071) Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host

2018-03-29 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7071:


 Summary: Bypass the Fetcher and read directly from the local 
filesystem if source Mapper ran on the same host
 Key: MAPREDUCE-7071
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7071
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: task
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


In the case of the source mapper and reducer are on the same host bypass the 
Fetcher and read it directly from the local filesystem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7045) JobListCache grows unlimited when the jobs are failed to move to done directory

2018-01-29 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7045:


 Summary: JobListCache grows unlimited when the jobs are failed to 
move to done directory
 Key: MAPREDUCE-7045
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7045
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


When the jobs are failed to move to the done directory due to some reason like 
Permission issue, the JobListCache size grows unlimited with all failed jobs 
and the addIfAbsent() has to scan all the cache items.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2017-12-18 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7026:


 Summary: Shuffle Fetcher does not log the actual error message 
thrown by ShuffleHandler
 Key: MAPREDUCE-7026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


A job is failing with reduce tasks failed to fetch map output and the 
NodeManager ShuffleHandler failed to serve the map outputs with some 
IOException like below. ShuffleHandler sends the actual error message in 
response inside sendError() but the Fetcher does not log this message.

Logs from NodeManager ShuffleHandler:

{code}
2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
(ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
headers :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Owner 'hbase' for path 
/grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
 did not match expected owner 'bde'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
{code}

Fetcher Logs below Instead without the actual error message:

{code}
2017-12-18 10:10:17,688 INFO [IPC Server 

[jira] [Created] (MAPREDUCE-6993) Provide additional aggregated task stats at the Map / Reduce level

2017-10-27 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6993:


 Summary: Provide additional aggregated task stats at the Map / 
Reduce level
 Key: MAPREDUCE-6993
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6993
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


MapReduce ApplicationMaster can log aggregated tasks stats for Map / Reduce 
stage like below which will make debugging easier. Similar to what Tez provides 
TEZ-930

firstTaskStartTime,
firstTasksToStart
lastTaskFinishTime
lastTasksToFinish
minTaskDuration
maxTaskDuration 
avgTaskDuration
numSuccessfulTasks
shortestDurationTasks
longestDurationTasks
numFailedTaskAttempts
numKilledTaskAttempts
numCompletedTasks
numSucceededTasks
numKilledTasks
numFailedTasks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6981) Map Progress is misleading for Distcp job

2017-10-11 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6981:


 Summary: Map Progress is misleading for Distcp job
 Key: MAPREDUCE-6981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6981
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Priority: Minor


The Progress displayed by client when running Distcp job is misleading. The Map 
Progress reaches 100% earlier than the map tasks finishes. The issue reproduced 
by just running Distcp with multiple huge files. 

JobImpl returns progress 1.0 when either task finishes or task progress is 1.0. 
The MapTask of Distcp gets the progress from SequenceFileRecordReader which 
looks like updates the progress after reading the list of files and which does 
not account the time taken to copy the files into Destination.

{code}
17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed 
successfully
{code}

The MapTask Progress is displayed at 17/10/11 13:33:29 whereas the last map 
task finishes at 2017-10-11 13:34:45

{code}
2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1506610341926_0016_m_02 Task Transitioned from RUNNING to SUCCEEDED
{code}

Attaching the client and application logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6975) Logging task counters

2017-10-05 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6975:


 Summary: Logging task counters 
 Key: MAPREDUCE-6975
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Logging counters for each task at the end of it's syslog will make debug easier 
with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6951) Jobs fails when mapreduce.jobhistory.webapp.address is in wrong format

2017-09-07 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6951:


 Summary: Jobs fails when mapreduce.jobhistory.webapp.address is in 
wrong format
 Key: MAPREDUCE-6951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


MapReduce jobs fails with below exception when 
mapreduce.jobhistory.webapp.address is in wrong format instead of host:port, 
example user has set to 19888

{code}
java.util.NoSuchElementException 
at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) 
at 
org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.getApplicationWebURLOnJHSWithoutScheme(MRWebAppUtil.java:130)
 
at 
org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.getApplicationWebURLOnJHSWithScheme(MRWebAppUtil.java:156)
 
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.doUnregistration(RMCommunicator.java:218)
 
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.unregister(RMCommunicator.java:188)
 
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStop(RMCommunicator.java:268)
 
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStop(RMContainerAllocator.java:297)
 
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) 
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) 
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStop(MRAppMaster.java:888)
 
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) 
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) 
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 
at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) 
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1667)
 
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) 
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1168) 
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:603)
 
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:651)
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6886) Job History File Permissions configurable

2017-05-11 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6886:


 Summary: Job History File Permissions configurable
 Key: MAPREDUCE-6886
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6886
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Prabhu Joseph


Currently the mapreduce job history files are written with 770 permissions 
which can be accessed by job user or other user part of hadoop group. Customers 
has users who are not part of the hadoop group but want to access these history 
files. We can make it configurable like 770 (Strict) or 755 (All) permissions 
with default 770.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6884) YARN ContainerLocalizer logs are missing

2017-05-04 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved MAPREDUCE-6884.
--
Resolution: Fixed

Duplicate

> YARN ContainerLocalizer logs are missing
> 
>
> Key: MAPREDUCE-6884
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6884
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> YARN LCE ContainerLocalizer runs as a separate process and the logs / error 
> messages are not captured. We need to redirect them to a stdout or separate 
> log file which helps to debug Localization issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6884) YARN ContainerLocalizer logs are missing

2017-05-04 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6884:


 Summary: YARN ContainerLocalizer logs are missing
 Key: MAPREDUCE-6884
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6884
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Prabhu Joseph


YARN LCE ContainerLocalizer runs as a separate process and the logs / error 
messages are not captured. We need to redirect them to a stdout or separate log 
file which helps to debug Localization issues.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6797) Improvement in the fix of Mapreduce-6684

2016-10-19 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6797:


 Summary: Improvement in the fix of Mapreduce-6684
 Key: MAPREDUCE-6797
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.4.0, 2.8.0
Reporter: Prabhu Joseph
Priority: Critical


Description:

There is one more piece of code in HistoryFileManager where Synchronized 
keyword on HistoryFileInfo need to be removed. The JobHistoryServer contention 
issue is hit on our environment where stacktrace (attached) shows the 
HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock on 
HistoryFileInfo.

Synchronized on isMovePending and didMoveFail has been removed by 
Mapreduce-6684.

{code}
HistoryFileInfo firstValue = cache.get(key);
synchronized(firstValue) {  ---> Synchronized is not needed here
  if (firstValue.isMovePending()) {
if(firstValue.didMoveFail() && 
firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
  cache.remove(key);
  //Now lets try to delete it
  try {
firstValue.delete();
  } catch (IOException e) {
LOG.error("Error while trying to delete history files" +
" that could not be moved to done.", e);
  }
} else {
  LOG.warn("Waiting to remove " + key
  + " from JobListCache because it is not in done yet.");
}
  } else {
cache.remove(key);
  }
}

{code}


{code}

Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest 
hadoop as well

"2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 
nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
- waiting to lock <0x00040145c4d8> (a 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
- locked <0x000400375388> (a 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6530) Jobtracker is slow when more JT UI requests

2015-10-29 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6530:


 Summary: Jobtracker is slow when more JT UI requests
 Key: MAPREDUCE-6530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Prabhu Joseph
Priority: Blocker


JobTracker is slow when there are huge number of Jobs running and 30
connections were established to info port to view Job status and counters.

hadoop job -list took 4m22.412s

We took Jstack traces and found most of the server threads waiting on 
JobTracker object and the thread which has the lock on JobTracker waits for 
ResourceBundle object.

"retireJobs" prio=10 tid=0x7f2345200800 nid=0x11c1 waiting for
monitor entry [0x7f22e3499000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
- waiting to lock <0x000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
- locked <0x0007f8411608> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
at
org.apache.hadoop.mapred.JobTracker$RetireJobs.addToCache(JobTracker.java:657)
- locked <0x9644ae08> (a
org.apache.hadoop.mapred.JobTracker$RetireJobs)
at
org.apache.hadoop.mapred.JobTracker$RetireJobs.run(JobTracker.java:769)
- locked <0x964c5550> (a
org.apache.hadoop.mapred.FairScheduler)
- locked <0x9644a9d0> (a java.util.Collections$SynchronizedMap)
- locked <0x962ac660> (a org.apache.hadoop.mapred.JobTracker)
at java.lang.Thread.run(Thread.java:745)


The ResourceBundle object is locked most of the time by JT GUI jobtracker_jsp 
and does getMapCounters().


"926410165@qtp-1732070199-56" daemon prio=10 tid=0x7f232c4df000 nid=0x27c0
runnable [0x7f22db7bf000]
   java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x00061a49ede0> (a java.util.MissingResourceException)
at java.lang.Throwable.(Throwable.java:287)
at java.lang.Exception.(Exception.java:84)
at java.lang.RuntimeException.(RuntimeException.java:80)
at
java.util.MissingResourceException.(MissingResourceException.java:85)
at
java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1499)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1322)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:1028)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
- locked <0x000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
- locked <0x0007ed1024b8> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
at org.apache.hadoop.mapred.JSPUtil.generateJobTable(JSPUtil.java:436)
at
org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:202)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)