[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2020-08-13 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177385#comment-17177385
 ] 

Kihwal Lee commented on MAPREDUCE-7069:
---

This should have made to other release branches.  Cherry-picked to branch-3.1 
and branch-2.10.

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0, 2.10.1, 3.1.5
>
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2020-08-13 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-7069:
--
Fix Version/s: 3.1.5
   2.10.1

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0, 2.10.1, 3.1.5
>
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7177) Disable speculative execution in TestDFSIO

2019-01-16 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-7177:
-

 Summary: Disable speculative execution in TestDFSIO
 Key: MAPREDUCE-7177
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7177
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.8.5, 3.2.0
Reporter: Kihwal Lee


When TestDFSIO runs in a certain environment where a subset of mappers are 
slow, the speculative execution can start.  In the write phase, this will make 
existing mapper to fail in next addBlock() since the output files are 
overwritten.

To make the benchmark more predictable and repeatable, speculation must be 
implicitly disabled in TestDFSIO itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6767) TestSlive fails after a common change

2016-08-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435706#comment-15435706
 ] 

Kihwal Lee commented on MAPREDUCE-6767:
---

E.g.
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/143/testReport/junit/org.apache.hadoop.fs.slive/TestSlive/testSelection/

> TestSlive fails after a common change
> -
>
> Key: MAPREDUCE-6767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kihwal Lee
>
> It looks like this was broken after HADOOP-12726.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6767) TestSlive fails after a common change

2016-08-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6767:
--
Description: It looks like this was broken after HADOOP-12726.

> TestSlive fails after a common change
> -
>
> Key: MAPREDUCE-6767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kihwal Lee
>
> It looks like this was broken after HADOOP-12726.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6767) TestSlive fails after a common change

2016-08-24 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-6767:
-

 Summary: TestSlive fails after a common change
 Key: MAPREDUCE-6767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Kihwal Lee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2016-08-22 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6750:
--
Fix Version/s: (was: 2.9.0)
   2.8.0

> TestHSAdminServer.testRefreshSuperUserGroups is failing
> ---
>
> Key: MAPREDUCE-6750
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6750.patch
>
>
> HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
> {{getGroupNames()}}. It should work if the mocks are updated to stub the 
> right method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2016-08-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned MAPREDUCE-6750:
-

Assignee: Kihwal Lee

> TestHSAdminServer.testRefreshSuperUserGroups is failing
> ---
>
> Key: MAPREDUCE-6750
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Attachments: MAPREDUCE-6750.patch
>
>
> HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
> {{getGroupNames()}}. It should work if the mocks are updated to stub the 
> right method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2016-08-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6750:
--
Status: Patch Available  (was: Open)

> TestHSAdminServer.testRefreshSuperUserGroups is failing
> ---
>
> Key: MAPREDUCE-6750
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Priority: Minor
> Attachments: MAPREDUCE-6750.patch
>
>
> HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
> {{getGroupNames()}}. It should work if the mocks are updated to stub the 
> right method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2016-08-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6750:
--
Attachment: MAPREDUCE-6750.patch

> TestHSAdminServer.testRefreshSuperUserGroups is failing
> ---
>
> Key: MAPREDUCE-6750
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Priority: Minor
> Attachments: MAPREDUCE-6750.patch
>
>
> HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
> {{getGroupNames()}}. It should work if the mocks are updated to stub the 
> right method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2016-08-09 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-6750:
-

 Summary: TestHSAdminServer.testRefreshSuperUserGroups is failing
 Key: MAPREDUCE-6750
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Kihwal Lee
Priority: Minor


HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
{{getGroupNames()}}. It should work if the mocks are updated to stub the right 
method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6527) Data race on field org.apache.hadoop.mapred.JobConf.credentials

2016-03-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180367#comment-15180367
 ] 

Kihwal Lee commented on MAPREDUCE-6527:
---

LocalJobRunner is for testing map reduce locally without involving the actual 
cluster. Impact of the race is minimal. If the input or output path is in a 
secure HDFS, it might cause the local job instance to fail. If the job uses the 
local file system or a HDFS with security disabled, there will be no issue.

> Data race on field org.apache.hadoop.mapred.JobConf.credentials
> ---
>
> Key: MAPREDUCE-6527
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6527
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ali Kheradmand
>Assignee: Haibo Chen
> Attachments: mapreduce6527.001.patch
>
>
> I am running the test suite against a dynamic race detector called 
> RV-Predict. Here is a race report that I got: 
> {noformat}
> Data race on field org.apache.hadoop.mapred.JobConf.credentials: {{{
> Concurrent read in thread T327 (locks held: {})
>  >  at org.apache.hadoop.mapred.JobConf.getCredentials(JobConf.java:505)
> at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.(JobContextImpl.java:70)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:524)
> T327 is created by T22
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:218)
> Concurrent write in thread T22 (locks held: {Monitor@496c673a, 
> Monitor@496319b0})
>  >  at org.apache.hadoop.mapred.JobConf.setCredentials(JobConf.java:510)
> at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:787)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:241)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>  locked Monitor@496319b0 at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:n/a)
>  
> at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:245)
>  locked Monitor@496c673a at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:229)
>  
> T22 is created by T1
> at 
> org.apache.hadoop.mapred.jobcontrol.TestJobControl.doJobControlTest(TestJobControl.java:111)
> }}}
> {noformat}
> In the source code of org.apache.hadoop.mapreduce.JobStatus.submitJob 
> function, we have the following lines:
> {code}
> Job job = new Job(JobID.downgrade(jobid), jobSubmitDir);
> job.job.setCredentials(credentials);
> {code}
> It looks a bit suspicious: Job extends thread and at the end of its 
> constructor it starts a new thread which creates a new instance of 
> JobContextImpl which reads credentials. However, the first thread 
> concurrently sets credentials after a creating the Job instance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-10-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983321#comment-14983321
 ] 

Kihwal Lee commented on MAPREDUCE-6451:
---

bq did you forget the DynamicInputChunkContext class when you commit?
It is a Friday. :)  Fixed it. Thanks for reporting.

> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.7.2
>
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-10-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983156#comment-14983156
 ] 

Kihwal Lee commented on MAPREDUCE-6451:
---

+1

> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-10-30 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6451:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.2
   3.0.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-2 and branch-2.7. Thanks for working on 
the fix, Kuhu. Thank you gentlemen for the reviews.

> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.7.2
>
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2015-08-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705598#comment-14705598
 ] 

Kihwal Lee commented on MAPREDUCE-6451:
---

Kicked the precommit: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5947/

 DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
 -

 Key: MAPREDUCE-6451
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.6.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
 Attachments: MAPREDUCE-6451-v1.patch


 DistCp when used with dynamic strategy does not update the chunkFilePath and 
 other static variables any time other than for the first job. This is seen 
 when DistCp::run() is used. 
 A single copy succeeds but multiple jobs finish successfully without any real 
 copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-11-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201012#comment-14201012
 ] 

Kihwal Lee commented on MAPREDUCE-5958:
---

+1 the patch looks good. Thanks for adding the test case, Jason.

 Wrong reduce task progress if map output is compressed
 --

 Key: MAPREDUCE-5958
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Emilio Coppa
Assignee: Emilio Coppa
Priority: Minor
  Labels: progress, reduce
 Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch


 If the map output is compressed (_mapreduce.map.output.compress_ set to 
 _true_) then the reduce task progress may be highly underestimated.
 In the reduce phase (but also in the merge phase), the progress of a reduce 
 task is computed as the ratio between the number of processed bytes and the 
 number of total bytes. But:
 - the number of total bytes is computed by summing up the uncompressed 
 segment sizes (_Merger.Segment.getRawDataLength()_)
 - the number of processed bytes is computed by exploiting the position of the 
 current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
 refer to the position in the underlying on disk file (which may be compressed)
 Thus, if the map outputs are compressed then the progress may be 
 underestimated (e.g., only 1 map output ondisk file, the compressed file is 
 25% of its original size, then the reduce task progress during the reduce 
 phase will range between 0 and 0.25 and then artificially jump to 1.0).
 Attached there is a patch: the number of processed bytes is now computed by 
 exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
 _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-11-06 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5958:
--
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Wrong reduce task progress if map output is compressed
 --

 Key: MAPREDUCE-5958
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Emilio Coppa
Assignee: Emilio Coppa
Priority: Minor
  Labels: progress, reduce
 Fix For: 2.6.0

 Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch


 If the map output is compressed (_mapreduce.map.output.compress_ set to 
 _true_) then the reduce task progress may be highly underestimated.
 In the reduce phase (but also in the merge phase), the progress of a reduce 
 task is computed as the ratio between the number of processed bytes and the 
 number of total bytes. But:
 - the number of total bytes is computed by summing up the uncompressed 
 segment sizes (_Merger.Segment.getRawDataLength()_)
 - the number of processed bytes is computed by exploiting the position of the 
 current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
 refer to the position in the underlying on disk file (which may be compressed)
 Thus, if the map outputs are compressed then the progress may be 
 underestimated (e.g., only 1 map output ondisk file, the compressed file is 
 25% of its original size, then the reduce task progress during the reduce 
 phase will range between 0 and 0.25 and then artificially jump to 1.0).
 Attached there is a patch: the number of processed bytes is now computed by 
 exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
 _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed

2014-11-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201015#comment-14201015
 ] 

Kihwal Lee commented on MAPREDUCE-5958:
---

Committed this to trunk, branch-2 and branch-2.6.

 Wrong reduce task progress if map output is compressed
 --

 Key: MAPREDUCE-5958
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Reporter: Emilio Coppa
Assignee: Emilio Coppa
Priority: Minor
  Labels: progress, reduce
 Fix For: 2.6.0

 Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch


 If the map output is compressed (_mapreduce.map.output.compress_ set to 
 _true_) then the reduce task progress may be highly underestimated.
 In the reduce phase (but also in the merge phase), the progress of a reduce 
 task is computed as the ratio between the number of processed bytes and the 
 number of total bytes. But:
 - the number of total bytes is computed by summing up the uncompressed 
 segment sizes (_Merger.Segment.getRawDataLength()_)
 - the number of processed bytes is computed by exploiting the position of the 
 current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may 
 refer to the position in the underlying on disk file (which may be compressed)
 Thus, if the map outputs are compressed then the progress may be 
 underestimated (e.g., only 1 map output ondisk file, the compressed file is 
 25% of its original size, then the reduce task progress during the reduce 
 phase will range between 0 and 0.25 and then artificially jump to 1.0).
 Attached there is a patch: the number of processed bytes is now computed by 
 exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then 
 _getPosition()_ already returns exactly this field).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6022) map_input_file is missing from streaming job environment

2014-10-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188616#comment-14188616
 ] 

Kihwal Lee commented on MAPREDUCE-6022:
---

+1 looks good to me.

 map_input_file is missing from streaming job environment
 

 Key: MAPREDUCE-6022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-6022.patch, MAPREDUCE-6022v2.patch


 When running a streaming job the 'map_input_file' environment variable is not 
 being set.  This property is deprecated, but in the past deprecated 
 properties still appeared in a stream job's environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6022) map_input_file is missing from streaming job environment

2014-10-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6022:
--
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks for fixing the bug, Jason.

 map_input_file is missing from streaming job environment
 

 Key: MAPREDUCE-6022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: MAPREDUCE-6022.patch, MAPREDUCE-6022v2.patch


 When running a streaming job the 'map_input_file' environment variable is not 
 being set.  This property is deprecated, but in the past deprecated 
 properties still appeared in a stream job's environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6022) map_input_file is missing from streaming job environment

2014-10-07 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-6022:
--
Assignee: Jason Lowe

 map_input_file is missing from streaming job environment
 

 Key: MAPREDUCE-6022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-6022.patch


 When running a streaming job the 'map_input_file' environment variable is not 
 being set.  This property is deprecated, but in the past deprecated 
 properties still appeared in a stream job's environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-5939) StartTime showing up as the epoch time in JHS UI after upgrade

2014-06-23 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-5939:
-

 Summary: StartTime showing up as the epoch time in JHS UI after 
upgrade
 Key: MAPREDUCE-5939
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5939
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee


After upgrading from 0.23.x to 2.5, the start time of old apps are showing up 
as the epoch time.  It looks like 2.5 expects start time to be encoded at the 
end of the jhist file name (-[timestamp].jhist). It should have been 
made backward compatible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail

2014-04-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984461#comment-13984461
 ] 

Kihwal Lee commented on MAPREDUCE-5868:
---

The test output contains only this.
{panel}
2014-04-29 14:05:07,398 INFO  \[main\] util.ProcessTree 
(ProcessTree.java:isSetsidSupported(64)) - setsid exited with exit code 0
{panel}

In the test worksapce, I see {{cache.sh}} and {{outfile}}. {{outfile}} is 
0-byte.

 TestPipeApplication causing nightly build to fail
 -

 Key: MAPREDUCE-5868
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe

 TestPipeApplication appears to be timing out which causes the nightly build 
 to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5749) TestRMContainerAllocator#testReportedAppProgress Failed

2014-04-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979753#comment-13979753
 ] 

Kihwal Lee commented on MAPREDUCE-5749:
---

This has been causing failures in the nightly build.  Attaching the full test 
log from last night for reference.

 TestRMContainerAllocator#testReportedAppProgress Failed
 ---

 Key: MAPREDUCE-5749
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5749
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: trunk
Reporter: shenhong
 Attachments: MAPREDUCE-5749.patch


 When execute mvn test 
 -Dtest=TestRMContainerAllocator#testReportedAppProgress, It failed with 
 message:
 {code}
 Caused by: java.io.FileNotFoundException: File 
 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001/job.xml
  does not exist
 {code}
 But in fact, the job.xml exits:
 {code}
 -rw-rw-r-- 1 yuling.sh yuling.sh 65791  2月 10 13:13 
 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/yuling.sh/.staging/job_1392009213299_0001/job.xml
 {code}
 See the following code:
 {code}
 public Job submit(Configuration conf, boolean mapSpeculative,
   boolean reduceSpeculative) throws Exception {
 String user = conf.get(MRJobConfig.USER_NAME, UserGroupInformation
 .getCurrentUser().getShortUserName());
 conf.set(MRJobConfig.USER_NAME, user);
 conf.set(MRJobConfig.MR_AM_STAGING_DIR, testAbsPath.toString());
 conf.setBoolean(MRJobConfig.MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, true);
 // TODO: fix the bug where the speculator gets events with
 // not-fully-constructed objects. For now, disable speculative exec
 conf.setBoolean(MRJobConfig.MAP_SPECULATIVE, mapSpeculative);
 conf.setBoolean(MRJobConfig.REDUCE_SPECULATIVE, reduceSpeculative);
 init(conf);
 start();
 DefaultMetricsSystem.shutdown();
 Job job = getContext().getAllJobs().values().iterator().next();
 if (assignedQueue != null) {
   job.setQueueName(assignedQueue);
 }
 // Write job.xml
 String jobFile = MRApps.getJobFile(conf, user,
 TypeConverter.fromYarn(job.getID()));
 LOG.info(Writing job conf to  + jobFile);
 new File(jobFile).getParentFile().mkdirs();
 conf.writeXml(new FileOutputStream(jobFile));
 return job;
   }
 {code}
 At first, user is yuling.sh,  but the UGI is setted to attemptId at 
 start();, after that, job.xml write to 
 yuling.sh/.staging/job_1392009213299_0001/job.xml. But when the job is 
 running, MRAppMaster can't find the job.xml at 
 appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5749) TestRMContainerAllocator#testReportedAppProgress Failed

2014-04-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5749:
--

Attachment: TestRMContainerAllocator_failure.txt

 TestRMContainerAllocator#testReportedAppProgress Failed
 ---

 Key: MAPREDUCE-5749
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5749
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: trunk
Reporter: shenhong
 Attachments: MAPREDUCE-5749.patch, 
 TestRMContainerAllocator_failure.txt


 When execute mvn test 
 -Dtest=TestRMContainerAllocator#testReportedAppProgress, It failed with 
 message:
 {code}
 Caused by: java.io.FileNotFoundException: File 
 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001/job.xml
  does not exist
 {code}
 But in fact, the job.xml exits:
 {code}
 -rw-rw-r-- 1 yuling.sh yuling.sh 65791  2月 10 13:13 
 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/yuling.sh/.staging/job_1392009213299_0001/job.xml
 {code}
 See the following code:
 {code}
 public Job submit(Configuration conf, boolean mapSpeculative,
   boolean reduceSpeculative) throws Exception {
 String user = conf.get(MRJobConfig.USER_NAME, UserGroupInformation
 .getCurrentUser().getShortUserName());
 conf.set(MRJobConfig.USER_NAME, user);
 conf.set(MRJobConfig.MR_AM_STAGING_DIR, testAbsPath.toString());
 conf.setBoolean(MRJobConfig.MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, true);
 // TODO: fix the bug where the speculator gets events with
 // not-fully-constructed objects. For now, disable speculative exec
 conf.setBoolean(MRJobConfig.MAP_SPECULATIVE, mapSpeculative);
 conf.setBoolean(MRJobConfig.REDUCE_SPECULATIVE, reduceSpeculative);
 init(conf);
 start();
 DefaultMetricsSystem.shutdown();
 Job job = getContext().getAllJobs().values().iterator().next();
 if (assignedQueue != null) {
   job.setQueueName(assignedQueue);
 }
 // Write job.xml
 String jobFile = MRApps.getJobFile(conf, user,
 TypeConverter.fromYarn(job.getID()));
 LOG.info(Writing job conf to  + jobFile);
 new File(jobFile).getParentFile().mkdirs();
 conf.writeXml(new FileOutputStream(jobFile));
 return job;
   }
 {code}
 At first, user is yuling.sh,  but the UGI is setted to attemptId at 
 start();, after that, job.xml write to 
 yuling.sh/.staging/job_1392009213299_0001/job.xml. But when the job is 
 running, MRAppMaster can't find the job.xml at 
 appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5804) TestMRJobsWithProfiler#testProfiler timesout

2014-03-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943304#comment-13943304
 ] 

Kihwal Lee commented on MAPREDUCE-5804:
---

+1

 TestMRJobsWithProfiler#testProfiler timesout
 

 Key: MAPREDUCE-5804
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5804
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: LOG.txt, MAPREDUCE-5804.patch


 {noformat}
 testProfiler(org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler)  Time 
 elapsed: 154.972 sec   ERROR!
 java.lang.Exception: test timed out after 12 milliseconds
   at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
   at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
   at java.io.File.exists(File.java:813)
   at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
   at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at org.apache.log4j.spi.LoggingEvent.init(LoggingEvent.java:165)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at 
 org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:208)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:338)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
   at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1570)
   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)
   at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306)
   at 
 org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testProfiler(TestMRJobsWithProfiler.java:138)
 Results :
 Tests in error: 
   TestMRJobsWithProfiler.testProfiler:138 »  test timed out after 12 
 millise...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5804) TestMRJobsWithProfiler#testProfiler timesout

2014-03-21 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5804:
--

   Resolution: Fixed
Fix Version/s: 2.5.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for working on this. I've committed this to trunk and branch-2.

 TestMRJobsWithProfiler#testProfiler timesout
 

 Key: MAPREDUCE-5804
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5804
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: LOG.txt, MAPREDUCE-5804.patch


 {noformat}
 testProfiler(org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler)  Time 
 elapsed: 154.972 sec   ERROR!
 java.lang.Exception: test timed out after 12 milliseconds
   at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
   at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
   at java.io.File.exists(File.java:813)
   at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
   at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at org.apache.log4j.spi.LoggingEvent.init(LoggingEvent.java:165)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at 
 org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:208)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:338)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419)
   at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1570)
   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)
   at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306)
   at 
 org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testProfiler(TestMRJobsWithProfiler.java:138)
 Results :
 Tests in error: 
   TestMRJobsWithProfiler.testProfiler:138 »  test timed out after 12 
 millise...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-3184) Improve handling of fetch failures when a tasktracker is not responding on HTTP

2014-03-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-3184:
--

Assignee: Todd Lipcon  (was: Jordan Zimmerman)

 Improve handling of fetch failures when a tasktracker is not responding on 
 HTTP
 ---

 Key: MAPREDUCE-3184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.205.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 1.0.1

 Attachments: mr-3184.txt


 On a 100 node cluster, we had an issue where one of the TaskTrackers was hit 
 by MAPREDUCE-2386 and stopped responding to fetches. The behavior observed 
 was the following:
 - every reducer would try to fetch the same map task, and fail after ~13 
 minutes.
 - At that point, all reducers would report this failed fetch to the JT for 
 the same task, and the task would be re-run.
 - Meanwhile, the reducers would move on to the next map task that ran on the 
 TT, and hang for another 13 minutes.
 The job essentially made no progress for hours, as each map task that ran on 
 the bad node was serially marked failed.
 To combat this issue, we should introduce a second type of failed fetch 
 notification, used when the TT does not respond at all (ie 
 SocketTimeoutException, etc). These fetch failure notifications should count 
 against the TT at large, rather than a single task. If more than half of the 
 reducers report such an issue for a given TT, then all of the tasks from that 
 TT should be re-run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900842#comment-13900842
 ] 

Kihwal Lee commented on MAPREDUCE-5757:
---

+1 lgtm

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5757.patch


 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.

2013-12-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847742#comment-13847742
 ] 

Kihwal Lee commented on MAPREDUCE-5623:
---

+1 The patch looks good to me.

 TestJobCleanup fails because of RejectedExecutionException and NPE.
 ---

 Key: MAPREDUCE-5623
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, 
 MAPREDUCE-5623.3.patch


 org.apache.hadoop.mapred.TestJobCleanup can fail because of 
 RejectedExecutionException by NonAggregatingLogHandler. This problem is 
 described in YARN-1409. TestJobCleanup can still fail after fixing 
 RejectedExecutionException, because of NPE by Job#getCounters()'s returning 
 null.
 {code}
 ---
 Test set: org.apache.hadoop.mapred.TestJobCleanup
 ---
 Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec 
  FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup
 testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup)  Time elapsed: 
 31.068 sec   ERROR!
 java.lang.NullPointerException: null
 at 
 org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199)
 at 
 org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-11-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824255#comment-13824255
 ] 

Kihwal Lee commented on MAPREDUCE-5603:
---

The patch looks good. Since the MAPREDUCE build is broken, can you post your 
own test result?

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5373) TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently

2013-11-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824264#comment-13824264
 ] 

Kihwal Lee commented on MAPREDUCE-5373:
---

+1 Looks good to me.

 TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently
 --

 Key: MAPREDUCE-5373
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5373
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-5373.patch


 The unit test case could fail intermittently on both Linux and Windows in my 
 testing. The error message seems suggesting the task status was wrong during 
 testing.
 An example Linux failure:
 {noformat}
 ---
 Test set: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
 ---
 Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.235 sec 
  FAILURE!
 testFetchFailureMultipleReduces(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure)
   Time elapsed: 1261 sec   FAILURE!
 java.lang.AssertionError: expected:SUCCEEDED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestFetchFailure.testFetchFailureMultipleReduces(TestFetchFailure.java:332)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
   at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
   at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
   at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
 {noformat}
 An example Windows failure:
 {noformat}
 ---
 Test set: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
 ---
 Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 50.342 sec 
  FAILURE!
 testFetchFailureMultipleReduces(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure)
   Time elapsed: 36175 sec   FAILURE!
 java.lang.AssertionError: expected:SUCCEEDED but was:RUNNING
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 

[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions

2013-08-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729589#comment-13729589
 ] 

Kihwal Lee commented on MAPREDUCE-5446:
---

+1 the patch looks good.

 TestJobHistoryEvents and TestJobHistoryParsing have race conditions
 ---

 Key: MAPREDUCE-5446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5446.patch


 TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for 
 MRApp to finish.  Currently they are polling the service state looking for 
 Service.STATE.STOPPED, but the service can appear to be in that state 
 *before* it is fully stopped.  This causes tests to finish with MRApp threads 
 still in-flight, and those threads can conflict with subsequent tests when 
 they collide in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions

2013-08-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5446:
--

   Resolution: Fixed
Fix Version/s: 2.1.1-beta
   3.0.0
   Status: Resolved  (was: Patch Available)

The patch has been committed to trunk, branch-2 and branch-2.1-beta. Thanks for 
the patch, Jason and for the review, Tsuyoshi.

 TestJobHistoryEvents and TestJobHistoryParsing have race conditions
 ---

 Key: MAPREDUCE-5446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: MAPREDUCE-5446.patch


 TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for 
 MRApp to finish.  Currently they are polling the service state looking for 
 Service.STATE.STOPPED, but the service can appear to be in that state 
 *before* it is fully stopped.  This causes tests to finish with MRApp threads 
 still in-flight, and those threads can conflict with subsequent tests when 
 they collide in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-3894) 0.23 and trunk MR builds fail intermittently

2013-07-31 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved MAPREDUCE-3894.
---

Resolution: Fixed

 0.23 and trunk MR builds fail intermittently
 

 Key: MAPREDUCE-3894
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3894
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.24.0, 0.23.2
Reporter: Kihwal Lee

 The builds occasionally report ABORTED or FAILURE, which is not caused by the 
 new code change included in the builds. We are not sure since when they have 
 been broken this way, but Bobby's guess is around Feb 10.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2013-07-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720137#comment-13720137
 ] 

Kihwal Lee commented on MAPREDUCE-1981:
---

+1 The patch for branch-0.23 looks good too.

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
 mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
 mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2013-07-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716688#comment-13716688
 ] 

Kihwal Lee commented on MAPREDUCE-1981:
---

+1 The patch looks good. I also ran some tests and they worked successfully. 
Thanks for fixing both mapred and mapreduce. 

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
 mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
 mapredListFiles.patch, MAPREDUCE-1981.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-04-16 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633470#comment-13633470
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

I've committed this to trunk, branch-2 and branch-0.23. 

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-5065.branch-0.23.patch, 
 MAPREDUCE-5065.branch-2.patch


 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.
 Edit: I've modified the fix to warn the user (instead of skipping the 
 checksum-check). Skipping parity-checks is unsafe. The code now fails the 
 copy, and suggests that the user either use -pb to preserve block-size, or 
 consider -skipCrc (and forgo copy validation entirely).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-04-16 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-5065:
--

   Resolution: Fixed
Fix Version/s: 0.23.8
   2.0.5-beta
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 3.0.0, 2.0.5-beta, 0.23.8

 Attachments: MAPREDUCE-5065.branch-0.23.patch, 
 MAPREDUCE-5065.branch-2.patch


 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.
 Edit: I've modified the fix to warn the user (instead of skipping the 
 checksum-check). Skipping parity-checks is unsafe. The code now fails the 
 copy, and suggests that the user either use -pb to preserve block-size, or 
 consider -skipCrc (and forgo copy validation entirely).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-04-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626624#comment-13626624
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

The patch looks good to me. [~cutting], are you okay with the change?

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-5065.branch-0.23.patch, 
 MAPREDUCE-5065.branch-2.patch


 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.
 Edit: I've modified the fix to warn the user (instead of skipping the 
 checksum-check). Skipping parity-checks is unsafe. The code now fails the 
 copy, and suggests that the user either use -pb to preserve block-size, or 
 consider -skipCrc (and forgo copy validation entirely).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605626#comment-13605626
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

Review comments:
* Add a reasonable timeout to the test case. This is a relatively new rule. It 
applies even when you are modifying existing test cases. Please take account 
that tests may run on a slower hardware.
* If we suggest -skipCrc along with -pb, we should probably inform users of the 
risk of skipping validation.

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: MAPREDUCE-5065.branch-0.23.patch, 
 MAPREDUCE-5065.branch-2.patch


 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603493#comment-13603493
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

bq. Another option might be to implement a checksum that's 
blocksize-independent...

Reading whole metadata may be too much, especially for huge files. It will be 
better if we make computation happen where the data is. :)
 
Most hashing is incremental, so if DFSClient feeds the last state of hash into 
the next datanode and let it continue updating it, the result will be 
independent of block size. The current way of doing file checksum allows 
calculating individual block checksums in parallel, but we are not taking 
advantage of it in DFSClient anyway. So I don't think there won't be any 
significant changes in performance or overhead.

We should probably continue this discussion in a separate jira.

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603494#comment-13603494
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

bq. So I don't think there won't be any significant changes in performance or 
overhead.
Sorry, unintended double negation.

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.

2013-03-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603502#comment-13603502
 ] 

Kihwal Lee commented on MAPREDUCE-5065:
---

Filed HDFS-4605 for block-size independent FileChecksum in HDFS.

 DistCp should skip checksum comparisons if block-sizes are different on 
 source/target.
 --

 Key: MAPREDUCE-5065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 When copying files between 2 clusters with different default block-sizes, one 
 sees that the copy fails with a checksum-mismatch, even though the files have 
 identical contents.
 The reason is that on HDFS, a file's checksum is unfortunately a function of 
 the block-size of the file. So you could have 2 different files with 
 identical contents (but different block-sizes) have different checksums. 
 (Thus, it's also possible for DistCp to fail to copy files on the same 
 file-system, if the source-file's block-size differs from HDFS default, and 
 -pb isn't used.)
 I propose that we skip checksum comparisons under the following conditions:
 1. -skipCrc is specified.
 2. File-size is 0 (in which case the call to the checksum-servlet is moot).
 3. source.getBlockSize() != target.getBlockSize(), since the checksums are 
 guaranteed to differ in this case.
 I have a patch for #3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2013-01-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565758#comment-13565758
 ] 

Kihwal Lee commented on MAPREDUCE-1700:
---

Merged to branch-0.23.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2013-01-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547831#comment-13547831
 ] 

Kihwal Lee commented on MAPREDUCE-1700:
---

+1 The patch looks good. I hope people try this with many different use cases.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-12-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526775#comment-13526775
 ] 

Kihwal Lee commented on MAPREDUCE-1700:
---

{quote}
bq. Tom, one thing I've forgot to mention in my previous comment, we should see 
how to enable the classloader on the client side as well as it may be required 
(to use different JARs) for the submission code.

I think this is a slightly different problem, since users generally have more 
control over the JVM they submit from than the JVM the task runs in. So, yes, 
another JIRA would be appropriate.
{quote}

I think AM also runs user code, if a custom output format is defined.

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs

2012-12-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509888#comment-13509888
 ] 

Kihwal Lee commented on MAPREDUCE-1700:
---

Now that we have a much better way of dealing with dependency conflicts, what 
will be the fate of mapreduce.job.user.classpath.first feature? Is there any 
use case where this feature works but the CCL approach don't or somehow is 
preferred over CCL for some reason? If none, shall we deprecate it?

 User supplied dependencies may conflict with MapReduce system JARs
 --

 Key: MAPREDUCE-1700
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, 
 MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch


 If user code has a dependency on a version of a JAR that is different to the 
 one that happens to be used by Hadoop, then it may not work correctly. This 
 happened with user code using a different version of Avro, as reported 
 [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081].
 The problem is analogous to the one that application servers have with WAR 
 loading. Using a specialized classloader in the Child JVM is probably the way 
 to solve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4451) fairscheduler fail to init job with kerberos authentication configured

2012-09-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463881#comment-13463881
 ] 

Kihwal Lee commented on MAPREDUCE-4451:
---

Erik, you can run src/test/bin/test-patch.sh manually and post the result.

 fairscheduler fail to init job with kerberos authentication configured
 --

 Key: MAPREDUCE-4451
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4451
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 1.0.3
Reporter: Erik.fang
 Attachments: MAPREDUCE-4451_branch-1.patch, 
 MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch, 
 MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch


 Using FairScheduler in Hadoop 1.0.3 with kerberos authentication configured. 
 Job initialization fails:
 {code}
 2012-07-17 15:15:09,220 ERROR org.apache.hadoop.mapred.JobTracker: Job 
 initialization failed:
 java.io.IOException: Call to /192.168.7.80:8020 failed on local exception: 
 java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed 
 [Caused by GSSException: No valid credentials provided (Mechanism level: 
 Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1129)
 at org.apache.hadoop.ipc.Client.call(Client.java:1097)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
 at $Proxy7.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
 at 
 org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:125)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:329)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:294)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
 at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at 
 org.apache.hadoop.security.Credentials.writeTokenStorageFile(Credentials.java:169)
 at 
 org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3558)
 at 
 org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:696)
 at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3911)
 at 
 org.apache.hadoop.mapred.FairScheduler$JobInitializer$InitJob.run(FairScheduler.java:301)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:543)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:488)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:590)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:187)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1228)
 at org.apache.hadoop.ipc.Client.call(Client.java:1072)
 ... 20 more
 Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
 at 
 org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:134)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:385)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:187)
 at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:583)
 at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:580)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 

[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461918#comment-13461918
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

Hemanth,
As you pointed out, the core thread timeout only takes effect when there is no 
work (i.e. no job completion). I took the heap dump of JT to see the overhead 
of these extra threads. In terms of memory, counting HashMapEntry, Thread, 
Worker and thread local objects that are unique to a thread, it is well under 
1KB per thread. I suspect stack and other supporting system data structures 
take more memory. So, having them around all the time doesn't seem to add much 
resource overhead.  Probably starting and stopping them frequently will create 
more overhead. 

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch, 
 mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461978#comment-13461978
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

I've also verified that the all worker threads for the pool are exiting after 
the timeout.

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch, 
 mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459713#comment-13459713
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

bq. One solution is to specify maximum number of queued requests for 
LinkedBlockingQueue.

That could be it, but this solution needs more changes. When the queue is full 
and the max number of threads are running, new task will be rejected. We could 
apply CallerRunsPolicy, but the whole point of having ThreadPoolExecutor is to 
avoid blocking of JobTracker for doing job completion.

I think the main requirements here are:
* Absorb bursty job completions - queueing with sufficient capacity or fast 
dispatching with a large thread pool.
* Avoid limiting job throughput - enough number of worker threads
* Minimize consumption of extra resource - limit the number of worker threads
* Don't drop anything.

To satisfy the first and second requirements, one can think of the following 
two approaches.

* Have a bounded queue and a sufficiently large thread pool. Since we cannot 
drop any job completion, we want CallerRunsPolicy for rejected ones. 

* Alternatively, use an unbounded queue and a reasonable number of core 
threads. No work will be rejected in this case.

Between the two, the second one has an advantage, considering the third 
requirement and its simplicity. The question is, what is the reasonable number 
of core threads to avoid lagging behind forever? Base on our experience, 3 to 5 
seems reasonable.  The moveToDone() throughput varies a lot, but it topped at 
around 0.8/second in one of busiest clusters I've seen. If the job completion 
rate goes over this rate for a long time, the queue will grow and history won't 
show up for most of newer jobs.

Here are the two approaches in code:

* The queue is bounded but will absorb bursts of about 100. If the core thread 
cannot keep up, up to 10 more threads will be created to help the core thread 
drain the queue.  If the queue cannot be drained fast enough, the caller will 
directly execute the work. This will block the job tracker, since 
JobTracker#finalizeJob() is a synchronized method. So the thread pool size and 
the queue size must be sufficiently large.

{noformat}
 executor = new ThreadPoolExecutor(1, 10, 1, TimeUnit.HOURS, 
 new LinkedBlockingQueueRunnable(100), 
ThreadPoolExecutor.CallerRunsPolicy);
{noformat}


* The following will eventually start up 5 threads and keep them running. 
Non-blocking and least amount of changes.

{noformat}
 executor = new ThreadPoolExecutor(5, 5, 1, TimeUnit.HOURS, new 
LinkedBlockingQueueRunnable());
{noformat}

What do you think is better? Or can you think of any better approaches?

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves

 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459892#comment-13459892
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

In the second approach, we can also add {{executor.allowsCoreThreadTimeOut()}} 
to make core threads expire after the keepalive time. I think this will be very 
close to the original design intention.

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves

 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned MAPREDUCE-4662:
-

Assignee: Kihwal Lee

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4662:
--

Attachment: mapreduce-4662.branch-1.patch

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4662:
--

Status: Patch Available  (was: Open)

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459902#comment-13459902
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

My bad. It should be {{executor.allowCoreThreadTimeOut(true)}}.

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch, 
 mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4662:
--

Attachment: mapreduce-4662.branch-1.patch

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch, 
 mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands

2012-09-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459966#comment-13459966
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---

Manually ran test-patch against branch-1.

{noformat}
-1 overall.  

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) 
warnings.
{noformat}

* test not included since it is hard to get to the private variables for 
JobHistoryFilesManager and the ThreadPoolExecutor inside it.
* findbugs warnings : there is actually no new warning. The numbers before and 
after patch are identical.

 JobHistoryFilesManager thread pool never expands
 

 Key: MAPREDUCE-4662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 1.0.2
Reporter: Thomas Graves
Assignee: Kihwal Lee
 Attachments: mapreduce-4662.branch-1.patch, 
 mapreduce-4662.branch-1.patch


 The job history file manager creates a threadpool with core size 1 thread, 
 max pool size 3.   It never goes beyond 1 thread though because its using a 
 LinkedBlockingQueue which doesn't have a max size. 
 void start() {
   executor = new ThreadPoolExecutor(1, 3, 1,
   TimeUnit.HOURS, new LinkedBlockingQueueRunnable());
 }
 According to the ThreadPoolExecutor java doc page it only increases the 
 number of threads when the queue is full. Since the queue we are using has no 
 max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization

2012-07-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4467:
--

Attachment: mapreduce-4467.patch.txt

The new patch addresses Tom's comment.

 IndexCache failures due to missing synchronization
 --

 Key: MAPREDUCE-4467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.2
Reporter: Andrey Klochkov
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: mapreduce-4467.patch.txt, mapreduce-4467.patch.txt


 TestMRJobs.testSleepJob fails randomly due to synchronization error in 
 IndexCache:
 {code}
 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] 
 mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle 
 error: 
 java.lang.IllegalMonitorStateException
   at java.lang.Object.wait(Native Method)
   at 
 org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 A related issue is MAPREDUCE-4384. The change introduced there removed 
 synchronized keyword and hence info.wait() call fails. Tbis needs to be 
 wrapped into a synchronized block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile

2012-07-23 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-4470:
-

 Summary: Fix TestCombineFileInputFormat.testForEmptyFile
 Key: MAPREDUCE-4470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.1.0-alpha, 3.0.0


TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. 

It expects one split on an empty input file, but with HADOOP-8599 it gets zero. 
The new behavior seems correct, but is it breaking anything else?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4427) Enable the RM to work with AM's that are not managed by it

2012-07-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420993#comment-13420993
 ] 

Kihwal Lee commented on MAPREDUCE-4427:
---

TestClientRMService.testGetQueueInfo has been consistently failing since 
MAPREDUCE-4427.
MAPREDUCE-4471 has been filed.

 Enable the RM to work with AM's that are not managed by it
 --

 Key: MAPREDUCE-4427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4427
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Bikas Saha
Assignee: Bikas Saha
  Labels: mrv2
 Fix For: 2.1.0-alpha

 Attachments: MAPREDUCE-4427-1.patch, MAPREDUCE-4427-2.patch, 
 MAPREDUCE-4427-3.patch


 Currently, the RM itself manages the AM by allocating a container for it and 
 negotiating the launch on the NodeManager and manages the AM lifecycle. 
 Thereafter, the AM negotiates resources with the RM and launches tasks to do 
 the real work.
 It would be a useful improvement to enhance this model by allowing the AM to 
 be launched independently by the client without requiring the RM. These AM's 
 would be launched on a gateway machine that can talk to the cluster. This 
 would open up new use cases such as the following
 1) Easy debugging of AM, specially during initial development. Having the AM 
 launched on an arbitrary cluster node makes it hard to looks at logs or 
 attach a debugger to the AM. If it can be launched locally then these tasks 
 would be easier.
 2) Running AM's that need special privileges that may not be available on 
 machines managed by the NodeManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427

2012-07-23 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-4471:
-

 Summary: TestClientRMService.testGetQueueInfo failing after MR-4427
 Key: MAPREDUCE-4471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.1.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.1.0-alpha, head


TestClientRMService.testGetQueueInfo has been consistently failing since 
MAPREDUCE-4427.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427

2012-07-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420995#comment-13420995
 ] 

Kihwal Lee commented on MAPREDUCE-4471:
---

Apparently MAPREDUCE-4440 fixed it.

 TestClientRMService.testGetQueueInfo failing after MR-4427
 --

 Key: MAPREDUCE-4471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.1.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.1.0-alpha, head


 TestClientRMService.testGetQueueInfo has been consistently failing since 
 MAPREDUCE-4427.
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427

2012-07-23 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved MAPREDUCE-4471.
---

Resolution: Invalid

 TestClientRMService.testGetQueueInfo failing after MR-4427
 --

 Key: MAPREDUCE-4471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.1.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.1.0-alpha, head


 TestClientRMService.testGetQueueInfo has been consistently failing since 
 MAPREDUCE-4427.
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4427) Enable the RM to work with AM's that are not managed by it

2012-07-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420996#comment-13420996
 ] 

Kihwal Lee commented on MAPREDUCE-4427:
---

Nevermind. Arun fixed it in MAPREDUCE-4440.

 Enable the RM to work with AM's that are not managed by it
 --

 Key: MAPREDUCE-4427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4427
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Bikas Saha
Assignee: Bikas Saha
  Labels: mrv2
 Fix For: 2.1.0-alpha

 Attachments: MAPREDUCE-4427-1.patch, MAPREDUCE-4427-2.patch, 
 MAPREDUCE-4427-3.patch


 Currently, the RM itself manages the AM by allocating a container for it and 
 negotiating the launch on the NodeManager and manages the AM lifecycle. 
 Thereafter, the AM negotiates resources with the RM and launches tasks to do 
 the real work.
 It would be a useful improvement to enhance this model by allowing the AM to 
 be launched independently by the client without requiring the RM. These AM's 
 would be launched on a gateway machine that can talk to the cluster. This 
 would open up new use cases such as the following
 1) Easy debugging of AM, specially during initial development. Having the AM 
 launched on an arbitrary cluster node makes it hard to looks at logs or 
 attach a debugger to the AM. If it can be launched locally then these tasks 
 would be easier.
 2) Running AM's that need special privileges that may not be available on 
 machines managed by the NodeManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization

2012-07-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4467:
--

Attachment: mapreduce-4467.patch.txt

Sorry about the bug. Patch attached.

 IndexCache failures due to missing synchronization
 --

 Key: MAPREDUCE-4467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.2
Reporter: Andrey Klochkov
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: mapreduce-4467.patch.txt


 TestMRJobs.testSleepJob fails randomly due to synchronization error in 
 IndexCache:
 {code}
 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] 
 mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle 
 error: 
 java.lang.IllegalMonitorStateException
   at java.lang.Object.wait(Native Method)
   at 
 org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 A related issue is MAPREDUCE-4384. The change introduced there removed 
 synchronized keyword and hence info.wait() call fails. Tbis needs to be 
 wrapped into a synchronized block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-4467) IndexCache failures due to missing synchronization

2012-07-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned MAPREDUCE-4467:
-

Assignee: Kihwal Lee

 IndexCache failures due to missing synchronization
 --

 Key: MAPREDUCE-4467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.2
Reporter: Andrey Klochkov
Assignee: Kihwal Lee
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: mapreduce-4467.patch.txt


 TestMRJobs.testSleepJob fails randomly due to synchronization error in 
 IndexCache:
 {code}
 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] 
 mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle 
 error: 
 java.lang.IllegalMonitorStateException
   at java.lang.Object.wait(Native Method)
   at 
 org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 A related issue is MAPREDUCE-4384. The change introduced there removed 
 synchronized keyword and hence info.wait() call fails. Tbis needs to be 
 wrapped into a synchronized block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization

2012-07-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4467:
--

Fix Version/s: 2.2.0-alpha
   3.0.0
   0.23.3
   Status: Patch Available  (was: Open)

 IndexCache failures due to missing synchronization
 --

 Key: MAPREDUCE-4467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.2
Reporter: Andrey Klochkov
Assignee: Kihwal Lee
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: mapreduce-4467.patch.txt


 TestMRJobs.testSleepJob fails randomly due to synchronization error in 
 IndexCache:
 {code}
 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] 
 mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle 
 error: 
 java.lang.IllegalMonitorStateException
   at java.lang.Object.wait(Native Method)
   at 
 org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 A related issue is MAPREDUCE-4384. The change introduced there removed 
 synchronized keyword and hence info.wait() call fails. Tbis needs to be 
 wrapped into a synchronized block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4467) IndexCache failures due to missing synchronization

2012-07-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419527#comment-13419527
 ] 

Kihwal Lee commented on MAPREDUCE-4467:
---

Additional test not needed. Existing test case detected the breakage.

 IndexCache failures due to missing synchronization
 --

 Key: MAPREDUCE-4467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.2
Reporter: Andrey Klochkov
Assignee: Kihwal Lee
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: mapreduce-4467.patch.txt


 TestMRJobs.testSleepJob fails randomly due to synchronization error in 
 IndexCache:
 {code}
 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] 
 mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle 
 error: 
 java.lang.IllegalMonitorStateException
   at java.lang.Object.wait(Native Method)
   at 
 org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 A related issue is MAPREDUCE-4384. The change introduced there removed 
 synchronized keyword and hence info.wait() call fails. Tbis needs to be 
 wrapped into a synchronized block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-12 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Attachment: mapreduce-4416.patch.txt

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4416.patch.txt


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do with this, but I can reproduce 
 the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have 
 a problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-12 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Status: Patch Available  (was: Open)

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4416.patch.txt


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do with this, but I can reproduce 
 the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have 
 a problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-12 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Description: 
There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the 
issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a 
problem and it has been reported to the jira.

  was:
There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

I thought MAPREDUCE-4253 had something to do this, but I can reproduce the 
issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a 
problem and it has been reported to the jira.


 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4416.patch.txt


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do with this, but I can reproduce 
 the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have 
 a problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-12 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Attachment: mapreduce-4416.patch.txt

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4416.patch.txt, mapreduce-4416.patch.txt


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do with this, but I can reproduce 
 the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have 
 a problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2012-07-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413294#comment-13413294
 ] 

Kihwal Lee commented on MAPREDUCE-4393:
---

I think use of ZK is fine since it won't be pretty for routers to poll status 
from RM (to get the list of AMs) and AM (to get updates on app instances).  
Multiple AMs can run on the same node, so a predefined port number cannot be 
used. Then there has to be a way to discover the port number. Having ZK in the 
picture certainly helps.

But depending on the requirement on router, all external dependencies (router  
zk) can be substituted with another YARN app! PaaS System App? If we do this, 
the PaaS app can be made to talk to any one of the two types of management 
system.

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: MAPREDUCE-4393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: examples
Affects Versions: 0.23.1
Reporter: Jaigak Song
Assignee: Jaigak Song
 Fix For: 3.0.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2012-07-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413337#comment-13413337
 ] 

Kihwal Lee commented on MAPREDUCE-4393:
---

I didn't mean that the manager AM is responsible for launching app AMs. I think 
it can be a separate yarn app. They don't even have to be any start-up 
dependency among them, if we design communication protocol well. This also 
makes restart easy.

If we can (re)launch the manager AM on one of the predefined set of hosts, most 
of the requirements can be met.  By storing system state in the hdfs and 
reading back on restart, it can go back in sync fast and offer service again. 
Routers can be provisioned similarly, but they will acquire state information 
from the manager AM. The service discovery is simplified by the fact that they 
will be on specific hosts. If a VIP is used to deal with service up/down or 
migration among the given set of hosts, the service discovery is further 
simplified. Since they are independent app instances or independent yarn apps, 
a crash/restart of one thing won't force termination of others.

The one thing I am not sure about is the ability to specifying a specific set 
of candidate hosts for launching AM. If not supported already, we can launch AM 
on a random host and then launch containers on a specific set of hosts, but 
that lowers the reliability.  Or maybe the AM can be anywhere and the container 
launched from it will only be used for service discovery.

I am not insisting on doing this now, but it will be nice if everything is 
contained in YARN so that setting up is simpler and it is easily demoable.

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: MAPREDUCE-4393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: examples
Affects Versions: 0.23.1
Reporter: Jaigak Song
Assignee: Jaigak Song
 Fix For: 3.0.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4253) Tests for mapreduce-client-core are lying under mapreduce-client-jobclient

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411316#comment-13411316
 ] 

Kihwal Lee commented on MAPREDUCE-4253:
---

Harsh,
I noticed that your {{svn mv}} script actually moved only 19 files out of 31. I 
think the ones that go into non-existing directories failed. 

 Tests for mapreduce-client-core are lying under mapreduce-client-jobclient
 --

 Key: MAPREDUCE-4253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4253
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
 Fix For: 2.0.1-alpha

 Attachments: MR-4253.1.patch, MR-4253.2.patch, 
 crossing_project_checker.rb, result.txt


 Many of the tests for client libs from mapreduce-client-core are lying under 
 mapreduce-client-jobclient.
 We should investigate if this is the right thing to do and if not, move the 
 tests back into client-core.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411354#comment-13411354
 ] 

Kihwal Lee commented on MAPREDUCE-4416:
---

The failing tests are using MiniMRCluster and submit jobs. Jobs fail because 
containers' classpath does not contain the clover jar. If I make the leaf 
project to pick up non-clovered mr-client-app, at least AM works. But custom 
mapper, reducers, etc. defined in mr-client-jobclient will be instrumented and 
be part of the code running inside containers, so the containers should be able 
to locate the clover jar.

We have a record of these tests working with clover at least on June 24. So I 
went back and tried the old revision but it didn't work this time...  I wonder 
how it ever worked. 

Before MAPREDUCE-4082, it seems the classpath in mr-client-app contained the 
clover jar. The jira comments also shows clover being in the generated 
classpath. The now problematic clovered tests might have worked okay back then. 
 Some tests were also being ignored.

There is MAPREDUCE-4141 that removed the hard dependency on clover. If these 
tests accidentally worked before, this might have stopped it. 

Maybe running clovered test code in yarn containers does not make sense. They 
are separate processes launched by something other than the test framework. The 
clover instrumentation doesn't seem to be designed to naturally cover them. We 
could exclude some of test helper classes from instrumentation.

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 It seems this started happening after MAPREDUCE-4253.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411372#comment-13411372
 ] 

Kihwal Lee commented on MAPREDUCE-4416:
---

AM as well as mapper/reudcer fails, if run tests (e.g. {{TestChild}}) normally 
with {{-Pclover}}.

{noformat}
[CLOVER] FATAL ERROR: Clover could not be initialised. Are you sure you have 
Clover in the runtime classpath? (class 
java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
Exception in thread main java.lang.NoClassDefFoundError: 
com_cenqua_clover/CoverageRecorder
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1017)
{noformat}

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 It seems this started happening after MAPREDUCE-4253.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Description: 
There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

I thought MAPREDUCE-4253 had something to do this, but I can reproduce the 
issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a 
problem and it has been reported to the jira.

  was:
There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

It seems this started happening after MAPREDUCE-4253.


 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do this, but I can reproduce the 
 issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a 
 problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4253) Tests for mapreduce-client-core are lying under mapreduce-client-jobclient

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411497#comment-13411497
 ] 

Kihwal Lee commented on MAPREDUCE-4253:
---

Maybe I am doing something wrong, but I still see only 19 files moved in both 
branch-2.0 and trunk from the revision history. The postings by Jenkins builds 
above also show only 19 files.

 Tests for mapreduce-client-core are lying under mapreduce-client-jobclient
 --

 Key: MAPREDUCE-4253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4253
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
 Fix For: 2.0.1-alpha

 Attachments: MR-4253.1.patch, MR-4253.2.patch, 
 crossing_project_checker.rb, result.txt


 Many of the tests for client libs from mapreduce-client-core are lying under 
 mapreduce-client-jobclient.
 We should investigate if this is the right thing to do and if not, move the 
 tests back into client-core.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411911#comment-13411911
 ] 

Kihwal Lee commented on MAPREDUCE-4393:
---

You have probably done it already, but the first thing to make sure is that 
everything builds okay for all targets and profiles. e.g. build and run test 
with clover (-Pclover). The test-patch process is most useful when existing 
code is modified, so in your case it would be nice if you could report more 
testing results.

People will also like to hear about your experience on writing a new YARN app.  
There are on-going works to make it easier to develop and debug apps. I am sure 
these efforts will benefit from your input.

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: MAPREDUCE-4393
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: examples
Affects Versions: 0.23.1
Reporter: Jaigak Song
Assignee: Jaigak Song
 Fix For: 3.0.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412381#comment-13412381
 ] 

Kihwal Lee commented on MAPREDUCE-4416:
---

Patrick, thanks for the comment. That might work, but the fact that it's not 
really an app-level dependency bothers me.  So I ended up adding the dependency 
inside the clover profile in hadoop-project/pom.xml. This causes each module to 
have clover jar as a dependency when -Pclover is specified. All 
mrapp-generated-classpath files will include the path to the clover jar. The 
package build will copy and include the clover jar, but we can't really use 
instrumented packages anyway.

As an alternative to globally adding the dependency, we can do it per module 
whenever necessary. At least the following two needs the dependency specified.
- hadoop-yarn-applications
- hadoop-mapreduce-client-jobclient


 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 I thought MAPREDUCE-4253 had something to do this, but I can reproduce the 
 issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a 
 problem and it has been reported to the jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars

2012-07-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410390#comment-13410390
 ] 

Kihwal Lee commented on MAPREDUCE-4421:
---

bq. How do we handle the native lib dependencies though? Do we ship that as 
well, or keep a static resource at the NM?

There are two parts to this issue:

# Specifying dependency: App-level dependencies on native libs should be 
specified within the app, not by YARN. Apps can also allow each job to specify 
additional dependencies. A proper merging of LD_LIBRARY_PATH from job, app and 
hadoop must be done. (was -Djava.library.path in mrv1) Who merges what needs to 
be made clear (a section in the app writer's guide?). 
# Making libs available: The simplest way is to let the app ship them for each 
job. But admins may choose to host app-level dependencies (in a multiversion 
aware manner) and even some of popular job-level dependencies. YARN should 
never automatically shove everything to apps. There are pros and cons in both 
approaches. A well designed app will support both.

YARN should not remove or override legitimate app/job-level dependencies. I 
think YARN already satisfies this, but there might be some areas that need 
improvement. We should also provide a clear guide on how to manage dependencies 
for app writers and admins.

This is quite similar to jar dependency management (this jira) in principle. 

 Remove dependency on deployed MR jars
 -

 Key: MAPREDUCE-4421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 Currently MR AM depends on MR jars being deployed on all nodes via implicit 
 dependency on YARN_APPLICATION_CLASSPATH. 
 We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, 
 probably, just rely on adding a shaded MR jar along with job.jar to the 
 dist-cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-10 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Priority: Critical  (was: Major)

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
Priority: Critical
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 It seems this started happening after MAPREDUCE-4253.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4416) Some tests run twice or fail if Clover is enabled

2012-07-09 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-4416:
-

 Summary: Some tests run twice or fail if Clover is enabled
 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


Some tests run twice. E.g. try mvn test -Dtest=TestJobConf. It runs under 
hadoop-mapreduce-client-core and hadoop-mapreduce-client-jobclient.

There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

It seems this started happening after MAPREDUCE-4253.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4416) Some tests run twice or fail if Clover is enabled

2012-07-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409918#comment-13409918
 ] 

Kihwal Lee commented on MAPREDUCE-4416:
---

There are actually two different TestJobConf. One in o.a.h.conf and another one 
in o.a.h.mapred. It's confusing, but not really a problem.

I had 37 failures/errors in jobclient when Clover is enabled.

{noformat}
Failed tests:   testChild(org.apache.hadoop.mapreduce.TestChild)
  
testDefaultCleanupAndAbort(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter):
 Job failed!
  
testCustomAbort(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter): 
Job failed!
  
testCustomCleanup(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter):
 Job failed!
  testValidProxyUser(org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser)
  testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job expected 
to succeed failed
  testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs)
  testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs)
  testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs)
  testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)
  testSleepJob(org.apache.hadoop.mapreduce.v2.TestUberAM)
  testRandomWriter(org.apache.hadoop.mapreduce.v2.TestUberAM)
  testFailingMapper(org.apache.hadoop.mapreduce.v2.TestUberAM): 
expected:false but was:true
  
testSpeculativeExecution(org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution)
  testLazyOutput(org.apache.hadoop.mapreduce.TestMapReduceLazyOutput)
  testHeapUsageCounter(org.apache.hadoop.mapred.TestJobCounters): Job 
job_1341837408279_0001 failed!
  testDefaultCleanupAndAbort(org.apache.hadoop.mapred.TestJobCleanup): Done 
file 
/home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-0/_SUCCESS
 missing for job job_1341837505379_0001
  testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup): Done file 
/home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-1/_SUCCESS
 missing for job job_1341837505379_0002
  testCustomCleanup(org.apache.hadoop.mapred.TestJobCleanup): Done file 
/home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-2/_custom_cleanup
 missing for job job_1341837505379_0003
  testTaskTempDir(org.apache.hadoop.mapred.TestMiniMRChildTask)
  testTaskEnv(org.apache.hadoop.mapred.TestMiniMRChildTask): The environment 
checker job failed.
  testTaskOldEnv(org.apache.hadoop.mapred.TestMiniMRChildTask): The environment 
checker job failed.
  testJob(org.apache.hadoop.mapred.TestMiniMRClientCluster)

Tests in error: 
  testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0
  testMR(org.apache.hadoop.mapred.TestClusterMRNotification): Job failed!
  testComplexName(org.apache.hadoop.mapred.TestJobName): Job failed!
  testComplexNameWithRegex(org.apache.hadoop.mapred.TestJobName): Job failed!
  
testReduceFromPartialMem(org.apache.hadoop.mapred.TestReduceFetchFromPartialMem):
 Job failed!
  testClassPath(org.apache.hadoop.mapred.TestMiniMRClasspath): Job failed!
  testExternalWritable(org.apache.hadoop.mapred.TestMiniMRClasspath): Job 
failed!
  testWithDFS(org.apache.hadoop.mapred.TestJobSysDirWithDFS): Job failed!
  
testReduceFromPartialMem(org.apache.hadoop.mapred.TestReduceFetchFromPartialMem):
 Job failed!
  testLazyOutput(org.apache.hadoop.mapred.TestLazyOutput): Job failed!
  
testDistinctUsers(org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers): 
Job failed!
  
testMultipleSpills(org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers):
 Job failed!
  testMapReduce(org.apache.hadoop.mapred.TestClusterMapReduceTestCase): Job 
failed!
  
testMapReduceRestarting(org.apache.hadoop.mapred.TestClusterMapReduceTestCase): 
Job failed!

Tests run: 381, Failures: 23, Errors: 14, Skipped: 14
{noformat}


For the failing test cases, the container's stderr files contain the following:

{noformat}
[CLOVER] FATAL ERROR: Clover could not be initialised. Are you sure you have
Clover in the runtime classpath? (class
java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
{noformat}



 Some tests run twice or fail if Clover is enabled
 -

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled

2012-07-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4416:
--

Description: 
There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

It seems this started happening after MAPREDUCE-4253.

  was:
Some tests run twice. E.g. try mvn test -Dtest=TestJobConf. It runs under 
hadoop-mapreduce-client-core and hadoop-mapreduce-client-jobclient.

There are number of tests running under hadoop-mapreduce-client-jobclient that 
fail if Clover is enabled. Whenever a job is launched, AM doesn't start because 
it can't locate the clover jar file.

It seems this started happening after MAPREDUCE-4253.

Summary: Some tests fail if Clover is enabled  (was: Some tests run 
twice or fail if Clover is enabled)

 Some tests fail if Clover is enabled
 

 Key: MAPREDUCE-4416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


 There are number of tests running under hadoop-mapreduce-client-jobclient 
 that fail if Clover is enabled. Whenever a job is launched, AM doesn't start 
 because it can't locate the clover jar file.
 It seems this started happening after MAPREDUCE-4253.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-05 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4384:
--

Attachment: mapreduce-4384.patch

Attaching new patch without the test case.

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, 
 mapreduce-4384.patch, mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405121#comment-13405121
 ] 

Kihwal Lee commented on MAPREDUCE-4384:
---

MAPREDUCE-4253 moved the test file to a different directory.
I will post an updated patch.

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-02 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4384:
--

Attachment: mapreduce-4384.patch

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch, mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-02 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated MAPREDUCE-4384:
--

Attachment: mapreduce-4384.patch

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, 
 mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405245#comment-13405245
 ] 

Kihwal Lee commented on MAPREDUCE-4384:
---

I posted an updated patch but the PreCommit-Admin build job hasn't run for 
almost two hours...

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, 
 mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache

2012-07-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405289#comment-13405289
 ] 

Kihwal Lee commented on MAPREDUCE-4384:
---

I ran test-patch manually. There were 2066 (-3) javac warnings with the new 
patch.

 Race conditions in IndexCache
 -

 Key: MAPREDUCE-4384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.3, 2.0.1-alpha, 3.0.0

 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, 
 mapreduce-4384.patch


 TestIndexCache is intermittently failing due to a race condition. Up on 
 inspection of IndexCache implementation, more potential issues have been 
 discovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM

2012-06-29 Thread Kihwal Lee (JIRA)
Kihwal Lee created MAPREDUCE-4387:
-

 Summary: RM gets fatal error and exits during TestRM
 Key: MAPREDUCE-4387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


It doesn't happen on my desktop, but it happens frequently during the builds 
with clover enabled. Surefire will report it as fork failure.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM

2012-06-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404197#comment-13404197
 ] 

Kihwal Lee commented on MAPREDUCE-4387:
---

The test was calling {{ReourceManager#stop()}} after it thought it's done. This 
hit {{GenericEventHandler}} with an interrupt while it was trying to enqueue an 
event. This bubbled up and hit the RM's {{EventProcessor}} loop, which did 
System.exit(-1).  It checks whether the JVM is being shutdown, but this is 
before {{ShutdownHookManager}} is activated. 

It seems {{EventProcessor}} shouldn't do exit(-1) if it's got an exception 
during shutdown.

 RM gets fatal error and exits during TestRM
 ---

 Key: MAPREDUCE-4387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


 It doesn't happen on my desktop, but it happens frequently during the builds 
 with clover enabled. Surefire will report it as fork failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM

2012-06-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404212#comment-13404212
 ] 

Kihwal Lee commented on MAPREDUCE-4387:
---

{{EventProcessor#stop()}} is setting {stopped} to true before interrupting the 
thread. So we could just add one more condition and let it terminate normally.

{code}
while (!stopped  !Thread.currentThread().isInterrupted()) {
  try {
event = eventQueue.take();
  } catch (InterruptedException e) {
LOG.error(Returning, interrupted :  + e);
return; // TODO: Kill RM.
  }

  try {
scheduler.handle(event);
  } catch (Throwable t) {
+   if (stopped) {
+ LOG.warn(Exception during shutdown: , t);
+ break;
+   }
LOG.fatal(Error in handling event type  + event.getType()
+  to the scheduler, t);
if (shouldExitOnError
 !ShutdownHookManager.get().isShutdownInProgress()) {
  LOG.info(Exiting, bbye..);
  System.exit(-1);
}
  }
}
{code}

 RM gets fatal error and exits during TestRM
 ---

 Key: MAPREDUCE-4387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


 It doesn't happen on my desktop, but it happens frequently during the builds 
 with clover enabled. Surefire will report it as fork failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM

2012-06-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404227#comment-13404227
 ] 

Kihwal Lee commented on MAPREDUCE-4387:
---

I tested this idea and it worked. With {{-Pclover}}, TestRM always fails. With 
the added check, it succeeds.

BEFORE
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.yarn.server.resourcemanager.TestRM

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-server-resourcemanager  FAILURE [5.180s]

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on 
project hadoop-yarn-server-resourcemanager: ExecutionException; nested 
exception is java.util.concurrent.ExecutionException: 
org.apache.maven.surefire.booter.SurefireBooterForkException: Error occurred in 
starting fork, check output in log - [Help 1]
{noformat}


AFTER
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.yarn.server.resourcemanager.TestRM
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.973 sec

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
{noformat}

 RM gets fatal error and exits during TestRM
 ---

 Key: MAPREDUCE-4387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


 It doesn't happen on my desktop, but it happens frequently during the builds 
 with clover enabled. Surefire will report it as fork failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM

2012-06-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned MAPREDUCE-4387:
-

Assignee: Kihwal Lee

 RM gets fatal error and exits during TestRM
 ---

 Key: MAPREDUCE-4387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 2.0.1-alpha, 3.0.0


 It doesn't happen on my desktop, but it happens frequently during the builds 
 with clover enabled. Surefire will report it as fork failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >