[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180218#comment-14180218 ] Jian He commented on MAPREDUCE-6126: make sense, +1 (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180240#comment-14180240 ] Hudson commented on MAPREDUCE-6126: --- FAILURE: Integrated in Hadoop-trunk-Commit #6311 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6311/]) MAPREDUCE-6126. Fixed Rumen JobBuilder to ignore NormalizedResourceEvent. Contributed by Junping Du (jianhe: rev b8f7966c7a0d6aa0c0835fc0c4a4254420ab26a6) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java * hadoop-mapreduce-project/CHANGES.txt (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6126: --- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.6. thanks Junping ! (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Fix For: 2.6.0 Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180254#comment-14180254 ] Hudson commented on MAPREDUCE-6126: --- FAILURE: Integrated in Hadoop-trunk-Commit #6312 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6312/]) Updated CHANGES.txt to move MAPREDUCE-6126 to 2.6 (jianhe: rev d67214fd69f44391b7b43f7b7d18d8eefc3dd2da) * hadoop-mapreduce-project/CHANGES.txt (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Fix For: 2.6.0 Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5933) Enable MR AM to post history events to the timeline server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180283#comment-14180283 ] Zhijie Shen commented on MAPREDUCE-5933: The latest patch looks much better. There're some minor comments about it. 1. SUBMIT_TIME is set twice {code} tEvent.addEventInfo(SUBMIT_TIME, jse.getSubmitTime()); tEvent.addEventInfo(QUEUE_NAME, jse.getJobQueueName()); tEvent.addEventInfo(JOB_NAME, jse.getJobName()); tEvent.addEventInfo(USER_NAME, jse.getUserName()); tEvent.addEventInfo(SUBMIT_TIME, jse.getSubmitTime()); {code} 2. Make MAPREDUCE_JOB and MAPREDUCE_TASK constants? 3. Would you please make task attempt Id obj toString? Otherwise, it will result in a nested structure in json content. Same for the other getXXXAttemptId() that is not followed by toString(). {code} tEvent.addEventInfo(SUCCESSFUL_TASK_ATTEMPT_ID, tfe2.getSuccessfulTaskAttemptId()); {code} 4. In addition to set related entity, it's better to add the job ID to the primary filter of a MR task entity, such that we can support a common query as follows: {code} http://localhost:8188/ws/v1/timeline/MAPREDUCE_TASK?primaryFilter=PARENT_JOB:job_1413998833197_0001 {code} In fact, there could be some other optimization to speed up the potential queries. For example, to answer the query of JHS as follows: {code} http://10.22.2.115:19888/jobhistory/attempts/job_1413998833197_0001/m/SUCCESSFUL {code} It's good also have task type and task final state been put into the other info field for in-memory filtering or even put into primary filter field for index in the store (which is much more expensive store space usage). I think we should do the store schema optimization according the particular queries in a separate Jira, as it seems not to be the straightforward addition to this patch. Let's focus on posting events in this one. The patch is working properly on a insecure cluster. Will try this patch on a secure cluster too. Enable MR AM to post history events to the timeline server -- Key: MAPREDUCE-5933 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5933 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mr-am Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, mr_timelineserver_response.txt Nowadays, MR AM collects the history events and writes it to HDFS for JHS to source. With the timeline server, MR AM can put these events there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180471#comment-14180471 ] Siqi Li commented on MAPREDUCE-4818: [~jlowe] Hi Jason, can you take a look at this patch and give me some feedback Easier identification of tasks that timeout during localization --- Key: MAPREDUCE-4818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 0.23.3, 2.0.3-alpha Reporter: Jason Lowe Assignee: Siqi Li Labels: usability Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful. The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started. There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5933) Enable MR AM to post history events to the timeline server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180540#comment-14180540 ] Zhijie Shen commented on MAPREDUCE-5933: Tried the patch on the cluster with Kerberos http auth and ssl enabled and it seemed to work fine. Enable MR AM to post history events to the timeline server -- Key: MAPREDUCE-5933 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5933 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mr-am Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, MAPREDUCE-5933.patch, mr_timelineserver_response.txt Nowadays, MR AM collects the history events and writes it to HDFS for JHS to source. With the timeline server, MR AM can put these events there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6134) Add entity-level info or primary filters to facilitate job history data query
Zhijie Shen created MAPREDUCE-6134: -- Summary: Add entity-level info or primary filters to facilitate job history data query Key: MAPREDUCE-6134 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6134 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Reporter: Zhijie Shen Per discussion in [MAPREDUCE-5933|https://issues.apache.org/jira/browse/MAPREDUCE-5933?focusedCommentId=14180283page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14180283], we may need to add some properties in the events to MR job or task entity's otherInfo or primaryFilter to support some particular query. For example, getting all the tasks of one MR job: {code} http://localhost:8188/ws/v1/timeline/MAPREDUCE_TASK?primaryFilter=PARENT_JOB:job_1413998833197_0001 {code} By adding \{PARENT_JOB:job_1413998833197_0001\} to the primary filter of each task entity of job_1413998833197_0001 is going to significantly shorten the time to search the target task entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6132) Rumen unable to accept hdfs as scheme
[ https://issues.apache.org/jira/browse/MAPREDUCE-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180746#comment-14180746 ] Akira AJISAKA commented on MAPREDUCE-6132: -- I think this is not a bug. Looks like hadoop-hdfs jar is missing. It's better to specify all the hadoop classpath by {{java -cp `hadoop classpath`}}. Rumen unable to accept hdfs as scheme - Key: MAPREDUCE-6132 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6132 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.4.1 Reporter: Mayank Mishra Priority: Minor while running, java -cp hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar:hadoop-2.4.1/share/hadoop/tools/lib/hadoop-rumen-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-cli-1.2.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-lang-2.6.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/tools/lib/guava-11.0.2.jar:hadoop-2.4.1/share/hadoop/tools/lib/guava-11.0.2.jar:hadoop-2.4.1/share/hadoop/tools/lib/commons-collections-3.2.1.jar:hadoop-2.4.1/share/hadoop/common/lib/hadoop-auth-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/slf4j-api-1.7.5.jar:hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar:hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar org.apache.hadoop.tools.rumen.TraceBuilder file:///pathto/rumen/jobjars/job-trace.json file:///pathto/rumen/jobjars/topology hdfs://path to jhist file We are getting, java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.processInputArgument(TraceBuilder.java:134) at org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.init(TraceBuilder.java:91) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:206) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6052) Support overriding log4j.properties per job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180819#comment-14180819 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-6052: With your proposal, user has to set (1) log4j.configuration to the log-file name (a very weird configuration) and then (2) explicitly add the log-file to distributed cache. I am proposing that we simply have (0) _mapreduce.job.log4j-configuration-file_ set to file:///home/vinodkv/container-log4j.properties#my-short-name which is then recognized by JobClient, automatically uploaded to HDFS similar to job.jar if it is a local file, and also added to distributed cache. Support overriding log4j.properties per job --- Key: MAPREDUCE-6052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6052 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6052-v2.patch, MAPREDUCE-6052.patch For current MR application, the log4j.configuration is hard coded to container-log4j.properties within each node. We still need flexibility to override it per job like what we do in MRV1. {code} public static void addLog4jSystemProperties( String logLevel, long logSize, int numBackups, ListString vargs) { vargs.add(-Dlog4j.configuration=container-log4j.properties); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6135) Job staging directory remains if MRAppMaster is OOM
Ming Ma created MAPREDUCE-6135: -- Summary: Job staging directory remains if MRAppMaster is OOM Key: MAPREDUCE-6135 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6135 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ming Ma If MRAppMaster attempts run out of memory, it won't go through the normal job clean up process to move history files to history server location. When customers try to find out why the job failed, the data won't be available on history server webUI. The work around is to extract the container id and NM id from the jhist file in the job staging directory; then use yarn logs command to get the AM logs. It would be great the platform can take care of it by moving these hist files automatically to history server if AM attempts don't exit properly. We discuss ideas on how to address this and would like get suggestions from others. Not sure if timeline server design covers this scenario. 1. Define some protocol for YARN to tell AppMaster you have exceeded AM max attempt, please clean up. For example, YARN can launch AppMaster one more time after AM max attempt and MRAppMaster use that as the indication this is clean-up-only attempt. 2. Have some program periodically check job statuses and move files from job staging directory to history server for those finished jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6136) MRAppMaster doesn't shutdown file systems
Noah Watkins created MAPREDUCE-6136: --- Summary: MRAppMaster doesn't shutdown file systems Key: MAPREDUCE-6136 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6136 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.1 Reporter: Noah Watkins When MRAppMaster exit it doesn't call close on its open file systems instances. MAPREDUCE-3614 sets conf.setBoolean(fs.automatic.close, false); in MRAppMaster::main and then called FileSystem.closeAll() in MRAppMasterShutdownHook. However, MAPREDUCE-4205 removed the call to FileSystem.closeAll() MRAppMasterShutdownHook but left `fs.automatic.close` set to false. Removing `conf.setBoolean(fs.automatic.close, false);` worked for me, but it wasn't clear if this had other implications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6052) Support overriding log4j.properties per job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180935#comment-14180935 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-6052: My proposal in concrete - Add a new config _mapreduce.job.log4j-properties-uri_ or _mapreduce.job.log4j-properties-file_. - JobClient adds file this to distributed-cache, as a class-path file before submission. The 'key' in distributed-cache is the same URI. -- If _mapreduce.job.log4j-properties-uri_ is a local file-system URI, the file automatically gets uploaded to HDFS and then gets distributed everywhere. -- If it is a HDFS location, it is simply distributed everywhere via dist-cache. - MR AM reads the config property, and if it is set, appends -Dlog4j.configuration=$(file-name). Here file-name is the path component of uri or a URI fragment if present (this is the convention for distribute-cache too). Support overriding log4j.properties per job --- Key: MAPREDUCE-6052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6052 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6052-v2.patch, MAPREDUCE-6052.patch For current MR application, the log4j.configuration is hard coded to container-log4j.properties within each node. We still need flexibility to override it per job like what we do in MRV1. {code} public static void addLog4jSystemProperties( String logLevel, long logSize, int numBackups, ListString vargs) { vargs.add(-Dlog4j.configuration=container-log4j.properties); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)