[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-29 Thread Wang Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668188#comment-16668188
 ] 

Wang Yan commented on MAPREDUCE-7148:
-

[~jlowe] Hi, I assigned the ticket to you, could you please help review? Thanks 
in advance!

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-29 Thread Wang Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Yan reassigned MAPREDUCE-7148:
---

Assignee: Jason Lowe  (was: Wang Yan)

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7027) HadoopArchiveLogs shouldn't delete the original logs if the HAR creation fails

2018-10-29 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated MAPREDUCE-7027:
-
Fix Version/s: 2.9.2
  Component/s: (was: mrv2)
   harchive

Cherry-picked this to branch-2.9.

> HadoopArchiveLogs shouldn't delete the original logs if the HAR creation fails
> --
>
> Key: MAPREDUCE-7027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.2
>
> Attachments: MAPREDUCE-7027.001.patch
>
>
> If the hadoop archive command fails for any reason (for example because of an 
> OutOfMemoryError) the HadoopArchiveLogs tool will still delete the original 
> log files, so all the logs will be lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-10-29 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667258#comment-16667258
 ] 

Peter Bacsko commented on MAPREDUCE-7152:
-

[~jlowe] by looking at MRJobConfig.java, the default value of LD_LIBRARY_PATH 
comes from DEFAULT_MAPRED_ADMIN_USER_ENV where it's already cross-platformified:
{noformat}
public static final String DEFAULT_MAPRED_ADMIN_USER_ENV =
Shell.WINDOWS ?
   "PATH=%PATH%;%HADOOP_COMMON_HOME%\\bin" :
"LD_LIBRARY_PATH=" + Apps.crossPlatformify("HADOOP_COMMON_HOME") +
"/lib/native";{noformat}
 

I mistakenly thought that the expansion occurs inside the AM. Looks like it's 
already working properly. Will chat with other devs and probably close this as 
Won't Fix.

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-10-29 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667178#comment-16667178
 ] 

Jason Lowe commented on MAPREDUCE-7152:
---

Can this be solved by simply changing:
{noformat}
$HADOOP_COMMON_HOME/lib/native
{noformat}
to
{noformat}
{{HADOOP_COMMON_HOME}}/lib/native
{noformat}
so the expansion of HADOOP_COMMON_HOME is not done by the job client but by the 
NM when the container is run?

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-10-29 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667026#comment-16667026
 ] 

Peter Bacsko commented on MAPREDUCE-7152:
-

Ping [~jlowe] [~haibochen]

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org