[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation
[ https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668188#comment-16668188 ] Wang Yan commented on MAPREDUCE-7148: - [~jlowe] Hi, I assigned the ticket to you, could you please help review? Thanks in advance! > Fast fail jobs when exceeds dfs quota limitation > > > Key: MAPREDUCE-7148 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.0, 2.8.0, 2.9.0 > Environment: hadoop 2.7.3 >Reporter: Wang Yan >Assignee: Jason Lowe >Priority: Major > Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, > MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, > MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch > > > We are running hive jobs with a DFS quota limitation per job(3TB). If a job > hits DFS quota limitation, the task that hit it will fail and there will be a > few task reties before the job actually fails. The retry is not very helpful > because the job will always fail anyway. In some worse cases, we have a job > which has a single reduce task writing more than 3TB to HDFS over 20 hours, > the reduce task exceeds the quota limitation and retries 4 times until the > job fails in the end thus consuming a lot of unnecessary resource. This > ticket aims at providing the feature to let a job fail fast when it writes > too much data to the DFS and exceeds the DFS quota limitation. The fast fail > feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation
[ https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Yan reassigned MAPREDUCE-7148: --- Assignee: Jason Lowe (was: Wang Yan) > Fast fail jobs when exceeds dfs quota limitation > > > Key: MAPREDUCE-7148 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.0, 2.8.0, 2.9.0 > Environment: hadoop 2.7.3 >Reporter: Wang Yan >Assignee: Jason Lowe >Priority: Major > Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, > MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, > MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch > > > We are running hive jobs with a DFS quota limitation per job(3TB). If a job > hits DFS quota limitation, the task that hit it will fail and there will be a > few task reties before the job actually fails. The retry is not very helpful > because the job will always fail anyway. In some worse cases, we have a job > which has a single reduce task writing more than 3TB to HDFS over 20 hours, > the reduce task exceeds the quota limitation and retries 4 times until the > job fails in the end thus consuming a lot of unnecessary resource. This > ticket aims at providing the feature to let a job fail fast when it writes > too much data to the DFS and exceeds the DFS quota limitation. The fast fail > feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7027) HadoopArchiveLogs shouldn't delete the original logs if the HAR creation fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated MAPREDUCE-7027: - Fix Version/s: 2.9.2 Component/s: (was: mrv2) harchive Cherry-picked this to branch-2.9. > HadoopArchiveLogs shouldn't delete the original logs if the HAR creation fails > -- > > Key: MAPREDUCE-7027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7027 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: harchive >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Fix For: 3.1.0, 2.10.0, 2.9.2 > > Attachments: MAPREDUCE-7027.001.patch > > > If the hadoop archive command fails for any reason (for example because of an > OutOfMemoryError) the HadoopArchiveLogs tool will still delete the original > log files, so all the logs will be lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667258#comment-16667258 ] Peter Bacsko commented on MAPREDUCE-7152: - [~jlowe] by looking at MRJobConfig.java, the default value of LD_LIBRARY_PATH comes from DEFAULT_MAPRED_ADMIN_USER_ENV where it's already cross-platformified: {noformat} public static final String DEFAULT_MAPRED_ADMIN_USER_ENV = Shell.WINDOWS ? "PATH=%PATH%;%HADOOP_COMMON_HOME%\\bin" : "LD_LIBRARY_PATH=" + Apps.crossPlatformify("HADOOP_COMMON_HOME") + "/lib/native";{noformat} I mistakenly thought that the expansion occurs inside the AM. Looks like it's already working properly. Will chat with other devs and probably close this as Won't Fix. > LD_LIBRARY_PATH is always passed from MR AM to tasks > > > Key: MAPREDUCE-7152 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, > MAPREDUCE-7152-lazyEval_POC01.patch > > > {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default > in Hadoop (as part of {{mapreduce.admin.user.env}} and > {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable > from AM container to task containers in the container launch context. > In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, > tasks will fail to load native library. A reliable way to fix this is to add > {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. > Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on > the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667178#comment-16667178 ] Jason Lowe commented on MAPREDUCE-7152: --- Can this be solved by simply changing: {noformat} $HADOOP_COMMON_HOME/lib/native {noformat} to {noformat} {{HADOOP_COMMON_HOME}}/lib/native {noformat} so the expansion of HADOOP_COMMON_HOME is not done by the job client but by the NM when the container is run? > LD_LIBRARY_PATH is always passed from MR AM to tasks > > > Key: MAPREDUCE-7152 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, > MAPREDUCE-7152-lazyEval_POC01.patch > > > {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default > in Hadoop (as part of {{mapreduce.admin.user.env}} and > {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable > from AM container to task containers in the container launch context. > In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, > tasks will fail to load native library. A reliable way to fix this is to add > {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. > Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on > the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667026#comment-16667026 ] Peter Bacsko commented on MAPREDUCE-7152: - Ping [~jlowe] [~haibochen] > LD_LIBRARY_PATH is always passed from MR AM to tasks > > > Key: MAPREDUCE-7152 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, > MAPREDUCE-7152-lazyEval_POC01.patch > > > {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default > in Hadoop (as part of {{mapreduce.admin.user.env}} and > {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable > from AM container to task containers in the container launch context. > In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, > tasks will fail to load native library. A reliable way to fix this is to add > {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. > Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on > the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org