[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-15 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496605#comment-14496605
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

I created JIRA YARN-3490 for the application decorator proposal. 

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6304.20150410-1.patch, 
 MAPREDUCE-6304.20150411-1.patch


 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-13 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492818#comment-14492818
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

I mean hadoop could provide a new mechanism such as a decorator for the 
ApplicationSubmissionContext. When the method submitApplication() in class 
ClientRMService is called, hadoop decorates the ApplicationSubmissionContext 
before it calls the following line. For example, manipulates the 
amLabelExpression. 

  rmAppManager.submitApplication(submissionContext,
  System.currentTimeMillis(), user);

Hadoop could provide a default decorator that does nothing. But users could 
override the default decorator in yarn-site.xml by a new configuration 
parameter, for example, yarn.app.submission.context.decorator.class.

This new mechanism is not directly related to the change you are making, but it 
is more generic so that the platform providers could update the 
ApplicationSubmissionContext in their own ways. Once we have such a new 
mechanism in place, you do not really need to add anything new to your label 
code for my use case. Instead, the custom logic will be included in the custom 
decorator provided by the platform provider. For example, we could provide a 
decorator to update amLabelExpression in ApplicationSubmissionContext. Other 
fields of ApplicationSubmissionContext could be changed as well to meet user's 
needs.



 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6304.20150410-1.patch, 
 MAPREDUCE-6304.20150411-1.patch


 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-13 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492608#comment-14492608
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

A more generic way could be to add a decorator to ApplicationSubmissionContext 
in class ClientRMService so that people can change the 
ApplicationSubmissionContext in the method submitApplication(). The default 
decorator from Apache does nothing, but hadoop allows users to use a custom 
decorator from hadoop configuration. 

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6304.20150410-1.patch, 
 MAPREDUCE-6304.20150411-1.patch


 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-08 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485817#comment-14485817
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

YarnRunner is the right place to set the labels for a MR job. However, there is 
one concern here. If I understand correctly, YarnRunner runs at the job client 
side, right? There should also be a way to hook in the labels on the server 
side, i.e., resource manager side. The reason is that many Hadoop users do not 
understand or set the labels by themselves and they simply rely on the Hadoop 
platform provider (or system admins for on-premise clusters) to set up the 
labels for them. I am not sure if this is a general use case, but it is 
definitely a feature that we need.

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R

 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-08 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485532#comment-14485532
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

mapreduce.job.label may not be enough. There should be at least another 
parameter such as mapreduce.job.am.label for the application master. For 
example, on EC2, we don't want to run an application master on a spot instance, 
but we do allow MR tasks to run on spot instances (otherwise, what is the 
purpose to use instances?). Furthermore, Application Master is a special Yarn 
container and MRAppMaster does not run as a YarnChild, right?

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R

 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-08 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485884#comment-14485884
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

I understand your point from on-premise cluster perspective. However, it is not 
very practical to manage mapred-site.xml or queue files for users if hadoop is 
a service in cloud. As a hadoop developer, you should consider both on-premise 
hadoop cluster and hadoop in cloud. 

There are many many users for a hadoop cloud service. Usually they launch their 
own hadoop clusters in cloud and control their own queue files or 
mapred-site.xml.  Some of them even run their hadoop jobs on their own gateways 
that the hadoop platform provider does not have access to. But the hadoop 
service provider may still want to have a mechanism to set up some global 
labels for all users to improve their user experiences. For example, a failure 
of an application master on a spot instance due to the termination of a spot 
instance will cause more trouble than a failure of one MR task. These types of 
settings most likely can only be done by hadoop cloud service providers based 
on their deep knowledge in their own cloud services.

Or could hadoop provide a mechanism for hadoop providers to extend so that you 
only need to specify the labels in YarnRunner in Vanilla hadoop?  


 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R

 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-08 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485918#comment-14485918
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

Thanks Naganarasimha for your understanding. However, user/group mapping may 
not work for us since we don't have control of that as a hadoop service 
provider. I would prefer a plugin mechanism rather than a solution here so that 
we can extend that for our service. But I think the change for YarnRunner is 
still needed for hadoop users anyway. 

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R

 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-01 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391724#comment-14391724
 ] 

Jian Fang commented on MAPREDUCE-6304:
--

Link related JIRAs

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang

 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-04-01 Thread Jian Fang (JIRA)
Jian Fang created MAPREDUCE-6304:


 Summary: Specifying node labels when submitting MR jobs
 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang


Per the discussion on Yarn-796, we need a mechanism in MAPREDUCE to specify 
node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-13 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320647#comment-14320647
 ] 

Jian Fang commented on MAPREDUCE-6258:
--

It is not uncommon, all users run hadoop clusters in cloud should face the same 
issue. For example, we have to write a specific progress to dump out the JHS 
files to local disks continuously and then upload to multiple places such as 
s3. As we observed, the single process did not scale well for a big and busy 
cluster and the overhead to synchronize the local JHS files and the files on 
HDFS is nontrivial. Furthermore, we need to have the JHS files available once a 
Job is finished that rules out distcp. 

As far as I understand, the current JHS files are stored on HDFS only by 
looking at its internal implementation. I think the reason is that they have to 
be remotely accessible if the job history server runs in another node after the 
JHS server is separated out from the job tracker in Hadoop one and the job 
tracker is split into multiple distributed components in hadoop two. You cannot 
really just dump the JHS files to somewhere. The somewhere must be reliable 
and accessible by the JHS server. As a result, I think this feature is the easy 
way to achieve our goal. Furthermore, this feature is off by default, users 
turn it on only when they need it.

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-12 Thread Jian Fang (JIRA)
Jian Fang created MAPREDUCE-6258:


 Summary: add support to back up JHS files from application master
 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang


In hadoop two, job history files are stored on HDFS with a default retention 
period of one week. In a cloud environment, these HDFS files are actually 
stored on the disks of ephemeral instances that could go away once the 
instances are terminated. Users may want to back up the job history files for 
issue investigation and performance analysis before and after the cluster is 
terminated. 


A centralized backup mechanism could have a scalability issue for big and busy 
Hadoop clusters where there are probably tens of thousands of jobs every day. 
As a result, it is preferred to have a distributed way to back up the job 
history files in this case. To achieve this goal, we could add a new feature to 
back up the job history files in Application master. More specifically, we 
could copy the job history files to a backup path when they are moved from the 
temporary staging directory to the intermediate_done path in application 
master. Since application masters could run on any slave nodes on a Hadoop 
cluster, we could achieve a better scalability by backing up the job history 
files in a distributed fashion.

Please be aware, the backup path should be managed by the Hadoop users based on 
their needs. For example, some Hadoop users may copy the job history files to a 
cloud storage directly and keep them there forever. While some other users may 
want to store the job history files on local disks and clean them up from time 
to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-12 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6258:
-
Attachment: MAPREDUCE-6258.patch

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-12 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6258:
-
Status: Patch Available  (was: Open)

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-05 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308060#comment-14308060
 ] 

Jian Fang commented on MAPREDUCE-6242:
--

Thanks for your quick fix.

 Progress report log is incredibly excessive in application master
 -

 Key: MAPREDUCE-6242
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.4.0
Reporter: Jian Fang
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6242.001.patch


 We saw incredibly excessive logs in application master for a long running one 
 with many task attempts. The log write rate is around 1MB/sec in some cases. 
 Most of the log entries were from the progress report such as the following 
 ones.
 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.15605757
 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.4108217
 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_02_0 is : 0.06634143
 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.6506
 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_01_0 is : 0.21723115
 Looks like the report interval is controlled by a hard-coded variable 
 PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
 should allow users to set the appropriate progress interval for their 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-04 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305738#comment-14305738
 ] 

Jian Fang commented on MAPREDUCE-6242:
--

This seems to be easy to fix. Could you please let me know when the patch will 
be available? We have production issues due to this.


 Progress report log is incredibly excessive in application master
 -

 Key: MAPREDUCE-6242
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.4.0
Reporter: Jian Fang
Assignee: Varun Saxena

 We saw incredibly excessive logs in application master for a long running one 
 with many task attempts. The log write rate is around 1MB/sec in some cases. 
 Most of the log entries were from the progress report such as the following 
 ones.
 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.15605757
 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.4108217
 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_02_0 is : 0.06634143
 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.6506
 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_01_0 is : 0.21723115
 Looks like the report interval is controlled by a hard-coded variable 
 PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
 should allow users to set the appropriate progress interval for their 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-10-02 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Attachment: MAPREDUCE-6111.2.patch

Fixed the user parent path permission issue.

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-10-02 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Status: Open  (was: Patch Available)

Need to fix the user permission

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-10-02 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Status: Patch Available  (was: Open)

Fixed user parent path permission issue.

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-10-02 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Attachment: (was: MAPREDUCE-6111.patch)

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.2.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)
Jian Fang created MAPREDUCE-6111:


 Summary: Hadoop users' staging directories should be under a user 
folder
 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang


Right now, Hadoop puts all users' staging directories under 
/tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user 
hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other 
purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
difficult to track all users' folders without adding extra logic to exclude 
other known folders. 

As a result, we should move all users' folders to a user sub-folder, i.e., 
/tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Attachment: MAPREDUCE-6111.patch

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Status: Patch Available  (was: Open)

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Attachment: MAPREDUCE-6111.patch

Upload patch with fix for unit test.

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch, MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Status: Open  (was: Patch Available)

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Attachment: (was: MAPREDUCE-6111.patch)

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder

2014-09-26 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6111:
-
Status: Patch Available  (was: Open)

Resubmit patch with updated unit test

 Hadoop users' staging directories should be under a user folder
 ---

 Key: MAPREDUCE-6111
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.4.1, 2.5.0
Reporter: Jian Fang
 Attachments: MAPREDUCE-6111.patch


 Right now, Hadoop puts all users' staging directories under 
 /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for 
 user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for 
 other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold 
 finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it 
 difficult to track all users' folders without adding extra logic to exclude 
 other known folders. 
 As a result, we should move all users' folders to a user sub-folder, i.e., 
 /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder 
 becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5703) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED

2014-08-15 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-5703:
-

Priority: Critical  (was: Major)

 Job client gets failure though RM side job execution result is FINISHED and 
 SUCCEEDED
 -

 Key: MAPREDUCE-5703
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5703
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Ashutosh Jindal
Priority: Critical

 1) Run MR job 
 2) After reduce completed and while JHS file writing, restart DN.
 RM side job is shown as successful.
 JHS doesnt have info about the job.
 Job client gets NPE and exit code as 255.
 java.io.IOException: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:929)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2080)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2074)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382)
   at 
 org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529)
   at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
   at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665)
   at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349)
   at 
 org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:407)
   at 
 org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:855)
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:835)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5703) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED

2014-08-15 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099376#comment-14099376
 ] 

Jian Fang commented on MAPREDUCE-5703:
--

We have a cluster with 3 data nodes, but due to some reason, the job history 
was not persisted successfully as shown in the AM log.

-LOG--
 ERROR [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing 
History Event: 
org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@1f2cfc93
java.io.IOException: All datanodes 10.253.21.212:9200 are bad. Aborting...
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1140)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:936)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:491)


As a result, the job failed and threw NPE. The bad thing is the job was marked 
as failed even though the job actually finished successfully. 

This means this NPE could happen frequently if the cluster size is not big and 
it is not an edge case. 

The method GetTaskAttemptCompletionEventsResponse() fetched a Job by calling 
verifyAndGetJob(), but it never checked if job was null or not, which was the 
root cause of this issue.

public GetTaskAttemptCompletionEventsResponse
getTaskAttemptCompletionEvents(
GetTaskAttemptCompletionEventsRequest request) throws IOException {
  JobId jobId = request.getJobId();
  int fromEventId = request.getFromEventId();
  int maxEvents = request.getMaxEvents();

  Job job = verifyAndGetJob(jobId);
  GetTaskAttemptCompletionEventsResponse response = 
recordFactory.newRecordInstance(GetTaskAttemptCompletionEventsResponse.class);
  
response.addAllCompletionEvents(Arrays.asList(job.getTaskAttemptCompletionEvents(fromEventId,
 maxEvents)));
  return response;
}

Since people may face this problem often for a small cluster, what would be the 
best way to fix this issue then? Do retry when save the job to HDFS? Or 
something else?



 Job client gets failure though RM side job execution result is FINISHED and 
 SUCCEEDED
 -

 Key: MAPREDUCE-5703
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5703
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Ashutosh Jindal

 1) Run MR job 
 2) After reduce completed and while JHS file writing, restart DN.
 RM side job is shown as successful.
 JHS doesnt have info about the job.
 Job client gets NPE and exit code as 255.
 java.io.IOException: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:929)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2080)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2074)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382)
   at 
 org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529)
   at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
   at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at