from:"Peter Bacsko \(JIRA\)"

[jira] [Created] (MAPREDUCE-6831) Flaky test TestJobImpl.testKilledDuringKillAbort

2017-01-09 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6831:
---

 Summary: Flaky test TestJobImpl.testKilledDuringKillAbort
 Key: MAPREDUCE-6831
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6831
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The test case TestJobImpl.testKilledDuringKillAbort() is flaky.

Example of a failure:

{noformat:title=Error Message}
expected: but was:
{noformat}
{noformat:title=Stack Trace}
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:978)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:516)
{noformat}
{noformat:title=Standard Output}
2016-12-12 00:26:29,724 INFO  [Thread-12] event.AsyncDispatcher 
(AsyncDispatcher.java:register(202)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2016-12-12 00:26:29,729 INFO  [Thread-12] event.AsyncDispatcher 
(AsyncDispatcher.java:register(202)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
2016-12-12 00:26:29,729 INFO  [Thread-12] event.AsyncDispatcher 
(AsyncDispatcher.java:register(202)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5
2016-12-12 00:26:29,730 INFO  [Thread-12] event.AsyncDispatcher 
(AsyncDispatcher.java:register(202)) - Registering class 
org.apache.hadoop.mapreduce.jobhistory.EventType for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5
2016-12-12 00:26:29,730 INFO  [Thread-12] event.AsyncDispatcher 
(AsyncDispatcher.java:register(202)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5
2016-12-12 00:26:29,730 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:setup(1523)) - Adding job token for job_123456789_0001 to 
jobTokenSecretManager
2016-12-12 00:26:29,731 WARN  [Thread-12] impl.JobImpl 
(JobImpl.java:setup(1529)) - Shuffle secret key missing from job credentials. 
Using job token secret as shuffle secret.
2016-12-12 00:26:29,733 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:makeUberDecision(1294)) - Not uberizing job_123456789_0001 
because: not enabled;
2016-12-12 00:26:29,734 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:createMapTasks(1551)) - Input size for job job_123456789_0001 
= 0. Number of splits = 2
2016-12-12 00:26:29,734 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:createReduceTasks(1568)) - Number of reduces for job 
job_123456789_0001 = 1
2016-12-12 00:26:29,734 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from NEW 
to INITED
2016-12-12 00:26:29,736 INFO  [CommitterEvent Processor #0] 
commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing 
the event EventType: JOB_SETUP
2016-12-12 00:26:29,737 INFO  [Thread-12] impl.JobImpl 
(JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from 
INITED to SETUP
2016-12-12 00:26:29,738 INFO  [AsyncDispatcher event handler] impl.JobImpl 
(JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from SETUP 
to RUNNING
{noformat}

Reproduction: insert a {{Thread.sleep(50);}} after {{job.handle(new 
JobStartEvent(jobId));}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-6201) TestNetworkedJob fails on trunk

2017-02-07 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reopened MAPREDUCE-6201:
-
  Assignee: Peter Bacsko  (was: Brahma Reddy Battula)

I'm reopening this because I was able to reproduce this.

> TestNetworkedJob fails on trunk
> ---
>
> Key: MAPREDUCE-6201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
>
> Currently, {{TestNetworkedJob}} is failing on trunk:
> {noformat}
> Running org.apache.hadoop.mapred.TestNetworkedJob
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob
> testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob)  Time elapsed: 
> 67.363 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6856) TestRecovery.testSpeculative fails if testCrashed fails

2017-03-03 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6856:
---

 Summary: TestRecovery.testSpeculative fails if testCrashed fails
 Key: MAPREDUCE-6856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6856
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The test {{testSpeculative}} in 
{{org.apache.hadoop.mapreduce.v2.app.TestRecovery}} is unstable.

Based on my findings, the test itself is not problematic. It only fails if 
{{testCrashed}} in the same class fails before it.

The reason is not completely clear to me, but I whenever I explicitly stop the 
MRAppMaster in {{testCrashed}} in a finally block, then the issue disappears. I 
think the reason is that both tests uses the same folder for staging.

Solution: wrap logic in {{testCrashed}} in a try-finally block and then stop 
the MRAppMaster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-05-24 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6892:
---

 Summary: Issues with the count of failed/killed tasks in the jhist 
file
 Key: MAPREDUCE-6892
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobhistoryserver
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Recently we encountered some issues with the value of failed tasks. After 
parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
there were failures. 

Another minor thing is that you cannot get the number of killed tasks (although 
this can be calculated).

The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
successful map/reduce task counts. Number of failed (or killed) tasks are not 
stored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6898) TestKill.testKillTask is flaky

2017-06-14 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6898:
---

 Summary: TestKill.testKillTask is flaky
 Key: MAPREDUCE-6898
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6898
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


TestKill.testKillTask() can fail the async dispatcher thread is slower than the 
test's thread.

{noformat}
2017-05-26 11:43:26,532 INFO  [AsyncDispatcher event handler] impl.JobImpl 
(JobImpl.java:handle(1006)) - job_0_Job Transitioned from INITED to SETUP
Job State is : RUNNING
Job State is : RUNNING Waiting for state : SUCCEEDED   map progress : 0.0   
reduce progress : 0.0
2017-05-26 11:43:26,538 INFO  [CommitterEvent Processor #0] 
commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing 
the event EventType: JOB_SETUP
2017-05-26 11:43:26,540 INFO  [AsyncDispatcher event handler] impl.TaskImpl 
(TaskImpl.java:handle(661)) - task_0__m_00 Task Transitioned from NEW 
to KILLED
2017-05-26 11:43:26,540 ERROR [AsyncDispatcher event handler] impl.JobImpl 
(JobImpl.java:handle(998)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
JOB_TASK_COMPLETED at SETUP
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1366)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1362)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2017-05-26 11:43:26,541 INFO  [AsyncDispatcher event handler] impl.JobImpl 
(JobImpl.java:handle(1006)) - job_0_Job Transitioned from SETUP to ERROR
2017-05-26 11:43:26,542 INFO  [AsyncDispatcher event handler] app.MRAppMaster 
(MRAppMaster.java:serviceStop(978)) - Skipping cleaning up the staging dir. 
assuming AM will be retried.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6939) Follow-up on MAPREDUCE-6870

2017-08-15 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6939:
---

 Summary: Follow-up on MAPREDUCE-6870
 Key: MAPREDUCE-6939
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6939
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Peter Bacsko
Assignee: Peter Bacsko
Priority: Minor


Some minor changes should be made after MAPREDUCE-6870 was committed upstream:

1. Fix JavaDoc in {{JobImpl.java}}
2. Correct the description of the method, that is, it might not be entirely 
clear what the "improvement" is or what it really improves
3. Small typo in the name of the new testcase



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6953) Skip the testcase testJobWithChangePriority if FairScheduler is used

2017-09-08 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6953:
---

 Summary: Skip the testcase testJobWithChangePriority if 
FairScheduler is used
 Key: MAPREDUCE-6953
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6953
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: client
Reporter: Peter Bacsko
Assignee: Peter Bacsko


We run the unit tests with Fair Scheduler downstream. FS does not support 
priorities at the moment, so TestMRJobs#testJobWithChangePriority fails.

Just add {{Assume.assumeFalse(usingFairScheduler);}} and JUnit will skip the 
test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6954) Disable erasure coding for files that are uploaded to the MR staging area

2017-09-08 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-6954:
---

 Summary: Disable erasure coding for files that are uploaded to the 
MR staging area
 Key: MAPREDUCE-6954
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6954
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Depending on the encoder/decoder used and the type or MR workload, EC might 
negatively affect the performance of an MR job if too many files are localized.

In such a scenario, users might want to disable EC in the staging area to speed 
up the execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded

2017-11-27 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7015:
---

 Summary: Possible race condition in JHS if the job is not loaded
 Key: MAPREDUCE-7015
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Peter Bacsko
Assignee: Peter Bacsko


There could be a race condition inside JHS. In our build environment, 
{{TestMRJobClient.testJobClient()}} failed with this exception:

{noformat}
ava.io.FileNotFoundException: File does not exist: 
hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml
at 
org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94)
at 
org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551)
at 
org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167)
{noformat}

Root cause:
1. MapReduce job completes
2. CLI calls {{cluster.getJob(jobid)}}
3. The job is finished and the client side gets redirected to JHS
4. The job data is missing from CachedHistoryStorage so JHS tries to find the 
job
5. First it scans the intermediate directory and finds the job
6. The call moveToDone() is scheduled for execution on a separate thread inside 
moveToDoneExecutor but does not get the chance to run immediately
7. RPC invocation returns with the path pointing to 
/tmp/hadoop-yarn/staging/history/done_intermediate
8. The call to moveToDone() completes which moves the contents of 
done_intermediate to done
9. Hadoop CLI tries to download the config file from done_intermediate but it's 
no longer there

Usually step #6 is fast enough to complete before step #7, but sometimes it can 
get behind, causing this race condition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7046) Enhance logging related to retrieving Job

2018-02-02 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7046:
---

 Summary: Enhance logging related to retrieving Job
 Key: MAPREDUCE-7046
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7046
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Peter Bacsko
Assignee: Peter Bacsko


We recently encountered an interesting problem. In one case, Hive Driver was 
unable to retrieve the status of a MapReduce job. The following stack trace was 
printed:

{noformat}
[main] INFO  org.apache.hadoop.hive.ql.exec.Task  - 2018-01-15 00:18:09,324 
Stage-2 map = 0%,  reduce = 0%, Cumulative CPU 1679.31 sec
 [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = 
job_1511036412170_1322169 with exception 'java.io.IOException(Could not find 
status of job:job_1511036412170_1322169)'
java.io.IOException: Could not find status of job:job_1511036412170_1322169
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
{noformat}

We examined the logs from JHS and AM, but haven't seen anything suspicious. For 
some reason a {{null}} was returned but it's not obvious why. The MR job was 
running at this point.

Some ideas:
1. We already have logging in place related to JobClient->AM and JobClient->JHS 
communication, but that's on TRACE level and that could be too low. It might 
make more sense to raise the level to DEBUG.

2. We need new {{LOG.debug()}} calls at some crucial points




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7048) AM can still crash after MAPREDUCE-7020

2018-02-05 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7048:
---

 Summary: AM can still crash after MAPREDUCE-7020
 Key: MAPREDUCE-7048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7048
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The testcase TestUberAM#testThreadDumpOnTaskTimeout was supposed to be fixed by 
MAPREDUCE-7020. However, it still fails, see: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7325/testReport/junit/org.apache.hadoop.mapreduce.v2/TestMRJobs/testThreadDumpOnTaskTimeout/
 (note: other tests failed as well, but those look unrelated).

When I tried to reproduce it locally, it failed again, although with a slightly 
different error message (it was actually the same as before):

{noformat}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.mapreduce.v2.TestUberAM
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 128.192 
s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestUberAM
[ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestUberAM)  
Time elapsed: 79.539 s  <<< FAILURE!
java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

*Root cause:* {{System.exit()}} is still invoked at {{Task.statusUpdate()}}

{noformat}
  public void statusUpdate(TaskUmbilicalProtocol umbilical) 
  throws IOException {
int retries = MAX_RETRIES;
while (true) {
  try {
if (!umbilical.statusUpdate(getTaskID(), taskStatus).getTaskFound()) {
  LOG.warn("Parent died.  Exiting "+taskId);
  System.exit(66);
}
taskStatus.clearStatus();
return;
...
{noformat}

At this point, the task was not found and return value of 
{{umbilical.statusUpdate()}} is false. Checking whether we run in uber mode 
seems to solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7049) Testcase TestMRJobs#testJobClassloaderWithCustomClasses fails

2018-02-06 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7049:
---

 Summary: Testcase TestMRJobs#testJobClassloaderWithCustomClasses 
fails 
 Key: MAPREDUCE-7049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The testcase TestMRJobs#testJobClassloaderWithCustomClasses fails consistently 
with this error:

{noformat}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.mapreduce.v2.TestMRJobs
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 54.325 
s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
[ERROR] 
testJobClassloaderWithCustomClasses(org.apache.hadoop.mapreduce.v2.TestMRJobs)  
Time elapsed: 10.531 s  <<< FAILURE!
java.lang.AssertionError: 
Job status: Application application_1517928628935_0001 failed 2 times due to AM 
Container for appattempt_1517928628935_0001_02 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2018-02-06 15:50:38.688]Exception from 
container-launch.
Container id: container_1517928628935_0001_02_01
Exit code: 1

[2018-02-06 15:50:38.693]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.


[2018-02-06 15:50:38.694]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.


For more detailed output, check the application tracking page: 
http://ubuntu:46235/cluster/app/application_1517928628935_0001 Then click on 
links to logs of each attempt.
. Failing the application.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:529)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloaderWithCustomClasses(TestMRJobs.java:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

Today I found the offending commit with {{git bisect}} and this failure is 
caused by {{YARN-2185}}.

The application master fails because of the following error:

{noformat}
2018-02-05 17:15:18,530 DEBUG [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1694)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:554)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:534)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1802)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:534)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:311)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)

[jira] [Created] (MAPREDUCE-7052) TestFixedLengthInputFormat#testFormatCompressedIn is flaky

2018-02-13 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7052:
---

 Summary: TestFixedLengthInputFormat#testFormatCompressedIn is flaky
 Key: MAPREDUCE-7052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Sometimes the test case TestFixedLengthInputFormat#testFormatCompressedIn can 
fail with the following error:

{noformat}
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at 
org.apache.hadoop.mapred.TestFixedLengthInputFormat.runRandomTests(TestFixedLengthInputFormat.java:322)
at 
org.apache.hadoop.mapred.TestFixedLengthInputFormat.testFormatCompressedIn(TestFixedLengthInputFormat.java:90)
{noformat}

*Root cause:* under special circumstances, the following line can return a huge 
number:

{noformat}
  // Test a split size that is less than record len
  numSplits = (int)(fileSize/Math.floor(recordLength/2));
{noformat}

For example, let {{seed}} be 2026428718. This causes {{recordLength}} to be 1 
at iteration 19. {{Math.floor()}} returns negative Infinity, which becomes 
positve infinity after the divison. Casting it to {{int}} yields 
{{Integer.MAX_VALUE}}. Eventually we get an OOME because the test wants to 
create a huge {{InputSplit}} array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7056) Ensure that mapreduce.job.reduces is not negative

2018-02-20 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7056:
---

 Summary: Ensure that mapreduce.job.reduces is not negative
 Key: MAPREDUCE-7056
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7056
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Recently we've seen a strange problem that was related to 
{{mapreduce.job.reduces}} being set to -1. If this value is negative, two 
things can happen:

1. If we use the old API, then the mappers will pass, but the number of 
reducers will be recorded as "-1" if we open it later from JHS. This can 
confuse Hadoop users.

2. If use the new API, then we'll see a not-so-obvious stack trace:

{noformat}
2018-02-20 06:37:35,493 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2018-02-20 06:37:35,507 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2018-02-20 06:37:35,507 INFO [main] org.apache.hadoop.mapred.MapTask: kvbuffer 
is null. Skipping flush.
2018-02-20 06:37:35,508 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:51)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1891)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1527)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:735)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:805)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
{noformat}

and the job fails.

We should either fail if this property is negative or set to "0" to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-03-12 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7064:
---

 Summary: Flaky test TestTaskAttempt#testReducerCustomResourceTypes
 Key: MAPREDUCE-7064
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
fail with the following error:

{noformat}
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, minimum 
allocation: 0, maximum allocation: 9223372036854775807]
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
{noformat}

The root cause seems to be an interference from previous tests that start 
instance(s) of {{FailingAttemptsMRApp}} or 
{{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
{{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7132) Check erasure coding in JobSplitWriter to avoid warnings

2018-08-29 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7132:
---

 Summary: Check erasure coding in JobSplitWriter to avoid warnings
 Key: MAPREDUCE-7132
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7132
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 3.1.1
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Currently, {{JobSplitWriter}} compares the number of hosts for a certain block 
against a static value that comes from {{mapreduce.job.max.split.locations}}.

However, an EC shema like RS-10-4 requires at least 14 host. In this case, 14 
block locations will be returned and {{JobSplitWriter}} prints a warning, which 
can confuse users.

A possible solution could check whether EC is enabled for a block and increase 
this value dynamically if needed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7144) Speculative execution can cause race condition

2018-09-25 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7144:
---

 Summary: Speculative execution can cause race condition
 Key: MAPREDUCE-7144
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7144
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Reporter: Peter Bacsko


In our internal build environment, we observed that the test case 
{{TestMRIntermediateDataEncryption#testMultipleReducers}} was flaky and failed 
randomly on multiple branches.

After a long investigation, it turned out that the problems were caused by 
speculative execution and timing issues around it.

Detailed explanation:

1. AppMaster speculatively starts two reducers:

{noformat}
2018-09-19 04:09:31,022 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START 
task_1537355349087_0001_r_01
...
2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START 
task_1537355349087_0001_r_00
{noformat}

2. Both attempts are scheduled and run in parallel:

{noformat}
2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1537355349087_0001_r_00_0 TaskAttempt Transitioned from ASSIGNED to 
RUNNING
...
2018-09-19 04:09:46,036 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1537355349087_0001_r_00_1 TaskAttempt Transitioned from ASSIGNED to 
RUNNING
{noformat}

3.  attempt_1537355349087_0001_r_00_1 is finished earlier and reached 
progress of 1.0
{noformat}
2018-09-19 04:10:05,747 INFO [IPC Server handler 3 on 36796] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1537355349087_0001_r_00_1 is : 1.0
2018-09-19 04:10:05,751 INFO [IPC Server handler 2 on 36796] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from 
attempt_1537355349087_0001_r_00_1
{noformat}

4. There's no need for attempt_1537355349087_0001_r_00_0, so the AppMaster 
decides to kill it:
{noformat}
2018-09-19 04:10:05,755 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Issuing kill to other 
attempt attempt_1537355349087_0001_r_00_0
{noformat}

5.  Right after this, the MapReduce job transitions to COMMITTING phase, which 
involves moving files on HDFS, deleting the temporary directory and creating a 
file named _SUCCESS:
{noformat}
2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange 
(FSNamesystem.java:deleteInt(4181)) - DIR* NameSystem.delete: 
/test/output/_temporary
2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange 
(FSDirectory.java:delete(1334)) - DIR* FSDirectory.delete: 
/test/output/_temporary
2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange 
(FSDirectory.java:unprotectedDelete(1480)) - DIR* 
FSDirectory.unprotectedDelete: _temporary is removed
2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange 
(FSNamesystem.java:deleteInternal(4251)) - DIR* Namesystem.delete: 
/test/output/_temporary is removed
2018-09-19 04:10:05,837 INFO  [IPC Server handler 3 on 45026] 
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(9826)) - allowed=true 
  ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=delete  
src=/test/output/_temporary dst=nullperm=null   proto=rpc
2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] 
metrics.TopMetrics (TopMetrics.java:report(122)) - a metric is reported: cmd: 
delete user: jenkins (auth:SIMPLE)
2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] 
top.TopAuditLogger (TopAuditLogger.java:logAuditEvent(78)) - 
--- logged event for top service: allowed=true   
ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1   cmd=delete  
src=/test/output/_temporary dst=nullperm=null
2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange 
(NameNodeRpcServer.java:create(596)) - *DIR* NameNode.create: file 
/test/output/_SUCCESS for DFSClient_NONMAPREDUCE_-188083900_1 at 127.0.0.1
2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange 
(FSNamesystem.java:startFileInt(2748)) - DIR* NameSystem.startFile: 
src=/test/output/_SUCCESS, holder=DFSClient_NONMAPREDUCE_-188083900_1, 
clientMachine=127.0.0.1, createParent=true, replication=2, createFlag=[CREATE, 
OVERWRITE], blockSize=134217728, 
supportedVersions=[CryptoProtocolVersion{description='Encryption zones', 
version=2, unknownValue=null}]
2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] 
namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: 
_SUCCESS, posixAclInheritanceEnabled: false, modes: { masked: rw-rw-rw-, 
unmasked

[jira] [Created] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-10-15 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7152:
---

 Summary: LD_LIBRARY_PATH is always passed from MR AM to tasks
 Key: MAPREDUCE-7152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Peter Bacsko
Assignee: Peter Bacsko


{{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default in 
Hadoop (as part of {{mapreduce.admin.user.env}} and 
{{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable from 
AM container to task containers in the container launch context.

In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
tasks will fail to load native library. A reliable way to fix this is to add 
{{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.

Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on the 
NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

2018-10-31 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7156:
---

 Summary: NullPointerException when reaching max shuffle connections
 Key: MAPREDUCE-7156
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used

2018-11-20 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7159:
---

 Summary: FrameworkUploader: ensure proper permissions of generated 
framework tar.gz if restrictive umask is used
 Key: MAPREDUCE-7159
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7159
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.1.1
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily

2019-01-02 Thread Peter Bacsko (JIRA)

Peter Bacsko created MAPREDUCE-7175:
---

 Summary: JobSubmitter: validateFilePath() throws an exception 
because it requests a local FS unnecessarily
 Key: MAPREDUCE-7175
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7175
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.9.2, 3.1.1
Reporter: Peter Bacsko
Assignee: Peter Bacsko


After a security fix, we receive the following exception in Oozie if we want to 
use {{mapreduce.job.log4j-properties-file}}


{noformat}
org.apache.oozie.action.ActionExecutorException: UnsupportedOperationException: 
Accessing local file system is not allowed
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1246)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1424)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: Accessing local file system 
is not allowed
at 
org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:48)
at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:358)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.validateFilePath(JobResourceUploader.java:303)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyLog4jPropertyFile(JobResourceUploader.java:248)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.addLog4jToDistributedCache(JobResourceUploader.java:223)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:175)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1231)
... 11 more{noformat}
 

Note that this happens even if the scheme is {{hdfs://}}. The solution is what 
mentioned in MAPREDUCE-6052: move
FileSystem localFs = FileSystem.getLocal(conf);
inside the {{if}} block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily

2019-01-03 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved MAPREDUCE-7175.
-
Resolution: Duplicate

> JobSubmitter: validateFilePath() throws an exception because it requests a 
> local FS unnecessarily
> -
>
> Key: MAPREDUCE-7175
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7175
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.1.1, 2.9.2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> After a security fix, we receive the following exception in Oozie if we want 
> to use {{mapreduce.job.log4j-properties-file}}
> {noformat}
> org.apache.oozie.action.ActionExecutorException: 
> UnsupportedOperationException: Accessing local file system is not allowed
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1246)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1424)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException: Accessing local file 
> system is not allowed
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:48)
> at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
> at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:358)
> at 
> org.apache.hadoop.mapreduce.JobResourceUploader.validateFilePath(JobResourceUploader.java:303)
> at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyLog4jPropertyFile(JobResourceUploader.java:248)
> at 
> org.apache.hadoop.mapreduce.JobResourceUploader.addLog4jToDistributedCache(JobResourceUploader.java:223)
> at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:175)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1231)
> ... 11 more{noformat}
>  
> Note that this happens even if the scheme is {{hdfs://}}. The solution is 
> what mentioned in MAPREDUCE-6052: move
> {noformat}
> FileSystem localFs = FileSystem.getLocal(conf);{noformat}
> inside the {{if}} block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2019-10-18 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reopened MAPREDUCE-6441:
-

Reopening this to attach patch for branch-3.1 too.

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441-branch-3.1.001.patch, 
> MAPREDUCE-6441.004.patch, MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, 
> MAPREDUCE-6441.008.patch, MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch, 
> MAPREDUCE-6441.011.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-27 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reopened MAPREDUCE-7240:
-

Reopening it to attach patches for branch-3.2 and branch-3.1.

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: Reviewed, applicationmaster, mrv2
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.

[jira] [Created] (MAPREDUCE-7250) FrameworkUploader: add option to skip replication check

2019-12-04 Thread Peter Bacsko (Jira)

Peter Bacsko created MAPREDUCE-7250:
---

 Summary: FrameworkUploader: add option to skip replication check
 Key: MAPREDUCE-7250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The framework uploader tool has this piece of code which makes sure that all 
block of the uploaded mapreduce tarball has been replicated:

{noformat}
  while(endTime - startTime < timeout * 1000 &&
   currentReplication < acceptableReplication) {
Thread.sleep(1000);
endTime = System.currentTimeMillis();
currentReplication = getSmallestReplicatedBlockCount();
  }
{noformat}

There are cases, however, when we don't want to wait for this (eg. we want to 
speed up Hadoop installation).

I suggest adding {{--skiprelicationcheck}} switch which disables this 
replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7273) JHS: make sure that Kerberos relogin is performed when KDC becomes offline then online again

2020-04-14 Thread Peter Bacsko (Jira)

Peter Bacsko created MAPREDUCE-7273:
---

 Summary: JHS: make sure that Kerberos relogin is performed when 
KDC becomes offline then online again
 Key: MAPREDUCE-7273
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7273
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Peter Bacsko
Assignee: Peter Bacsko


In JHS, if the KDC goes offline, the IPC layer does try to relogin, but it's 
not always enough. You have to wait for 60 seconds for the next retry. In the 
meantime, if the KDC comes back, the following error might occur:

{noformat}
2020-04-09 03:27:52,075 DEBUG ipc.Server (Server.java:processSaslToken(1952)) - 
Have read input token of size 708 for processing by 
saslServer.evaluateResponse()
2020-04-09 03:27:52,077 DEBUG ipc.Server (Server.java:saslProcess(1829)) - 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - 
Cannot find key of appropriate type to decrypt AP REP - AES128 CTS mode with 
HMAC SHA1-96)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
...
{noformat}

When this happens, JHS has to be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7302) Upgrading to JUnit 4.13 causes tests in TestFetcher.testCorruptedIFile() fail

2020-10-20 Thread Peter Bacsko (Jira)

Peter Bacsko created MAPREDUCE-7302:
---

 Summary: Upgrading to JUnit 4.13 causes tests in 
TestFetcher.testCorruptedIFile() fail
 Key: MAPREDUCE-7302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7302
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


See related ticket YARN-10460. JUnit 4.13 causes the same test failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7303) Fix TestJobResourceUploader failures after HADOOP-16878

2020-10-21 Thread Peter Bacsko (Jira)

Peter Bacsko created MAPREDUCE-7303:
---

 Summary: Fix TestJobResourceUploader failures after HADOOP-16878
 Key: MAPREDUCE-7303
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7303
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Currently, two test cases fail with NPE:

{{org.apache.hadoop.mapreduce.TestJobResourceUploader.testOriginalPathIsRoot()}}
{{org.apache.hadoop.mapreduce.TestJobResourceUploader.testOriginalPathEndsInSlash()}}

Root cause is the src/dst qualified path check introduced by HADOOP-16878.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests

2021-06-14 Thread Peter Bacsko (Jira)

Peter Bacsko created MAPREDUCE-7352:
---

 Summary: ArithmeticException in some MapReduce tests
 Key: MAPREDUCE-7352
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7352
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Peter Bacsko
Assignee: Peter Bacsko


There are some ArithmeticException failures in certain MapReduce test cases, 
for example:

{noformat}
2021-06-14 14:14:20,078 INFO  [main] service.AbstractService 
(AbstractService.java:noteFailure(267)) - Service 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1015)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301)
at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285)
at 
org.apache.hadoop.mapreduce.v2.app.TestMRApp.testUpdatedNodes(TestMRApp.java:223)
{noformat}

We have to set {{detailsInterval}} when the async dispatcher is spied. For some 
reason, despite the fact that {{serviceInit()}} is called, this variable 
remains zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests

2021-06-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved MAPREDUCE-7352.
-
Resolution: Duplicate

> ArithmeticException in some MapReduce tests
> ---
>
> Key: MAPREDUCE-7352
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7352
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> There are some ArithmeticException failures in certain MapReduce test cases, 
> for example:
> {noformat}
> 2021-06-14 14:14:20,078 INFO  [main] service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED
> java.lang.ArithmeticException: / by zero
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1015)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285)
>   at 
> org.apache.hadoop.mapreduce.v2.app.TestMRApp.testUpdatedNodes(TestMRApp.java:223)
> {noformat}
> We have to set {{detailsInterval}} when the async dispatcher is spied. For 
> some reason, despite the fact that {{serviceInit()}} is called, this variable 
> remains zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6831) Flaky test TestJobImpl.testKilledDuringKillAbort

[jira] [Reopened] (MAPREDUCE-6201) TestNetworkedJob fails on trunk

[jira] [Created] (MAPREDUCE-6856) TestRecovery.testSpeculative fails if testCrashed fails

[jira] [Created] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

[jira] [Created] (MAPREDUCE-6898) TestKill.testKillTask is flaky

[jira] [Created] (MAPREDUCE-6939) Follow-up on MAPREDUCE-6870

[jira] [Created] (MAPREDUCE-6953) Skip the testcase testJobWithChangePriority if FairScheduler is used

[jira] [Created] (MAPREDUCE-6954) Disable erasure coding for files that are uploaded to the MR staging area

[jira] [Created] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded

[jira] [Created] (MAPREDUCE-7046) Enhance logging related to retrieving Job

[jira] [Created] (MAPREDUCE-7048) AM can still crash after MAPREDUCE-7020

[jira] [Created] (MAPREDUCE-7049) Testcase TestMRJobs#testJobClassloaderWithCustomClasses fails

[jira] [Created] (MAPREDUCE-7052) TestFixedLengthInputFormat#testFormatCompressedIn is flaky

[jira] [Created] (MAPREDUCE-7056) Ensure that mapreduce.job.reduces is not negative

[jira] [Created] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

[jira] [Created] (MAPREDUCE-7132) Check erasure coding in JobSplitWriter to avoid warnings

[jira] [Created] (MAPREDUCE-7144) Speculative execution can cause race condition

[jira] [Created] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

[jira] [Created] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

[jira] [Created] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used

[jira] [Created] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily

[jira] [Resolved] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily

[jira] [Reopened] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

[jira] [Reopened] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

[jira] [Created] (MAPREDUCE-7250) FrameworkUploader: add option to skip replication check

[jira] [Created] (MAPREDUCE-7273) JHS: make sure that Kerberos relogin is performed when KDC becomes offline then online again

[jira] [Created] (MAPREDUCE-7302) Upgrading to JUnit 4.13 causes tests in TestFetcher.testCorruptedIFile() fail

[jira] [Created] (MAPREDUCE-7303) Fix TestJobResourceUploader failures after HADOOP-16878

[jira] [Created] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests

[jira] [Resolved] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests

30 matches

Site Navigation

Mail list logo

Footer information