[jira] [Commented] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar

2016-05-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281074#comment-15281074
 ] 

Akira AJISAKA commented on MAPREDUCE-4683:
--

Hi [~jianhe], can we target this to trunk? This fix is needed for 
MAPREDUCE-4253.

> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar
> 
>
> Key: MAPREDUCE-4683
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Arun C Murthy
>Assignee: Akira AJISAKA
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4683.patch
>
>
> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6108) ShuffleError OOM while reserving memory by MergeManagerImpl

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281053#comment-15281053
 ] 

Wangda Tan commented on MAPREDUCE-6108:
---

[~kasha], [~vinodkv] is this still an issue in existing code base? Can we close 
as not-reproducible if it cannot be reproduced?

Thanks,

> ShuffleError OOM while reserving memory by MergeManagerImpl
> ---
>
> Key: MAPREDUCE-6108
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6108
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Dongwook Kwon
>Priority: Critical
>
> Shuffle has OOM issue from time to time.  
> Such as this email reported.
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201408.mbox/%3ccabwxxjnk-on0xtrmurijd8sdgjjtamsvqw2czpm3oekj3ym...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6099) Adding getSplits(JobContext job, List stats) to mapreduce CombineFileInputFormat

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved MAPREDUCE-6099.

Resolution: Won't Fix

Close as Jason mentioned 

> Adding  getSplits(JobContext job, List stats) to mapreduce 
> CombineFileInputFormat
> -
>
> Key: MAPREDUCE-6099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6099
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.1
>Reporter: Pankit Thapar
>Priority: Critical
> Attachments: MAPREDUCE-6099.patch
>
>
> Currently we have getSplits(JobContext job) in CombineFileInputFormat. 
> This api does not give freedom to the client to create a list if file status 
> it self and then create splits on the resultant List stats.
> The client might be able to perform some filtering on its end on the File 
> sets in the input paths. For the reasons, above it would be a good idea to 
> have getSplits(JobContext, List).
> Please let me know what you think about this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-4758) jobhistory web ui not showing correct # failed reducers

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4758:
---
Target Version/s: 2.9.0  (was: 2.8.0)
Priority: Major  (was: Critical)

An improvement on the UI.
Unlikely, this will get done. move out

> jobhistory web ui not showing correct # failed reducers
> ---
>
> Key: MAPREDUCE-4758
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4758
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, webapps
>Affects Versions: 0.23.4
>Reporter: Thomas Graves
>
> we had a job fail due to a reducer failing 4 times.  Unfortunately the job 
> history UI didn't show  this particular failed reducer which lead to 
> confusion as to why the job failed. 
> This reducer failed to launch all 4 task attempts with a Token Expiration 
> error and the jobhistory file only gets an event when the task attempt 
> transitions to launched.  The webapp JobInfo object only counts the task 
> attempts in the jobhistory file to display under the "Attempt Type" table, so 
> since this task didn't have an attempt with it, it did show it on the UI.
> We need to reconcile the task list with the task attempts or also shows more 
> stats for the tasks vs task attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4683:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

I guess this could break existing script , close

> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar
> 
>
> Key: MAPREDUCE-4683
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Arun C Murthy
>Assignee: Akira AJISAKA
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4683.patch
>
>
> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280963#comment-15280963
 ] 

Jian He commented on MAPREDUCE-6513:


looks like TaskAttemptKillEvent will be sent twice for each mapper 
First at below code in RMContainerAllocator#handleUpdatedNodes,  JobImpl will 
in turn send the  TaskAttemptKillEvent event for each mapper on the unusable 
node.
{code}
  // send event to the job to act upon completed tasks
  eventHandler.handle(new JobUpdatedNodesEvent(getJob().getID(),
  updatedNodes));
{code}
Second time at this code in the same method  
{code}
// If map, reschedule next task attempt.
boolean rescheduleNextAttempt = (i == 0) ? true : false;
eventHandler.handle(new TaskAttemptKillEvent(tid,
"TaskAttempt killed because it ran on unusable node"
+ taskAttemptNodeId, rescheduleNextAttempt));
  }
{code}

This is how it was long time ago, Not sure why that is.  With the new change, 
will this cause more container requests get scheduled ?

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280928#comment-15280928
 ] 

Hadoop QA commented on MAPREDUCE-6657:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs: 
patch generated 2 new + 16 unchanged - 0 fixed = 18 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 3s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 27s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 46s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800866/mapreduce6657.005.patch
 |
| JIRA Issue | MAPREDUCE-6657 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 95c33ef8963a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280712#comment-15280712
 ] 

Haibo Chen commented on MAPREDUCE-6657:
---

Sorry for misunderstanding your previous comments. Do you think we should 
create a subclass of RetriableException for this instead? [~djp] The message is 
derived from a instance method this.nn.getRole(), and doing string matching is 
probably not the cleanest way. If so, I can create file a follow-up jira in 
HDFS and update isNameNodeStillNotStarted() when we have the new 
'NameNodeNotStartedException' that extends RetriableException.

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6657:
--
Attachment: (was: mapreduce6657.006.patch)

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280564#comment-15280564
 ] 

Junping Du commented on MAPREDUCE-6657:
---

Thanks for updating the patch, [~haibochen].
My above comments is actually trying to say we should define static string in 
where exception get throw. 
In this case, we should also change NameNodeRpcServer.java:
{noformat}
  private void checkNNStartup() throws IOException {
if (!this.nn.isStarted()) {
  throw new RetriableException(this.nn.getRole() + " still not started");
}
  }
{noformat}
If we define some static string in HDFS and use in both side (NameNodeRpcServer 
and HistoryFileManager), that can make sure we won't hit this issue again in 
future if we update exception string.

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch, 
> mapreduce6657.006.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280516#comment-15280516
 ] 

Haibo Chen commented on MAPREDUCE-6657:
---

Thanks very much for your review, [~djp]. I have updated the patch according to 
your comments. 

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch, 
> mapreduce6657.006.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6657:
--
Attachment: mapreduce6657.006.patch

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch, 
> mapreduce6657.006.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280426#comment-15280426
 ] 

Junping Du commented on MAPREDUCE-6657:
---

Thanks [~haibochen] for the patch.
The hard code of checking message string is very flaky:
{noformat}
+return ex.toString().contains("SafeModeException") ||
+(ex instanceof RetriableException && ex.getMessage().contains(
+"NameNode still not started"));
{noformat}
If HDFS in future change exception message to something else. i.e. "Namenode 
not start yet.", then the issue will come up again. Instead, we should define 
the message as a static string. 
Other looks fine.

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6693) Job history entry missing when JOB name is of mapreduce.jobhistory.jobname.limit length

2016-05-11 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280373#comment-15280373
 ] 

Kousuke Saruta commented on MAPREDUCE-6693:
---

On the second thought, only 
{code}
if (encodedString.length() < limitLength)
{code}
should be changed to
{code}
if (encodedString.length() <= limitLength)
{code}

and 

{code}
index + increase > limitLength
{code}
should be kept.

The reason is if we have
{code}
if (encodedString.length() <= limitLength) {
  return encodedString;
}
{code}
the size of strBytes is at least limitLength + 1, means maximum index is 
limitLength. So even if index + increase is limitLength, it's safe.

> Job history entry missing when JOB name is of 
> mapreduce.jobhistory.jobname.limit length
> ---
>
> Key: MAPREDUCE-6693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6693
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Critical
>
> Job history entry missing when JOB name is of 
> {{mapreduce.jobhistory.jobname.limit}} character
> {noformat}
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Interrupting 
> Event Handling thread
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Waiting for 
> Event Handling thread to complete
> 2016-05-10 06:51:00,674 ERROR [eventHandlingThread] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[eventHandlingThread,5,main] threw an Exception.
> java.lang.ArrayIndexOutOfBoundsException: 50
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.trimURLEncodedString(FileNameIndexUtils.java:326)
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:86)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1147)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:635)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:341)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer for Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,676 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Closing Writer
> {noformat}
> Looks like 50 character check is going wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6558) multibyte delimiters with compressed input files generate duplicate records

2016-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280287#comment-15280287
 ] 

Jason Lowe commented on MAPREDUCE-6558:
---

Thanks, [~wilfreds]!  Patch looks good overall.

I think we can significantly reduce the size of the testcase file since the 
problem occurs early in it.  I noticed that if we cut the file down to just 530 
records instead of 20,000 records and compress with bzip2 -1 it still catches 
the failure but is only a 10K binary rather than a 409K binary.


> multibyte delimiters with compressed input files generate duplicate records
> ---
>
> Key: MAPREDUCE-6558
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6558
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: MAPREDUCE-6558.1.patch
>
>
> This is the follow up for MAPREDUCE-6549. Compressed files cause record 
> duplications as shown in different junit tests. The number of duplicated 
> records changes with the splitsize:
> Unexpected number of records in split (splitsize = 10)
> Expected: 41051
> Actual: 45062
> Unexpected number of records in split (splitsize = 10)
> Expected: 41051
> Actual: 41052
> Test passes with splitsize = 147445 which is the compressed file length.The 
> file is a bzip2 file with 100k blocks and a total of 11 blocks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280098#comment-15280098
 ] 

Daniel Templeton commented on MAPREDUCE-6657:
-

OK.  Latest patch looks good to me.  [~rkanter]?

> job history server can fail on startup when NameNode is in start phase
> --
>
> Key: MAPREDUCE-6657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch, mapreduce6657.004.patch, mapreduce6657.005.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org