[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5403: Labels: BB2015-05-TBR (was: ) MR changes to accommodate yarn.application.classpath being moved to the server-side --- Key: MAPREDUCE-5403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: BB2015-05-TBR Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, MAPREDUCE-5403.patch yarn.application.classpath is a confusing property because it is used by MapReduce and not YARN, and MapReduce already has mapreduce.application.classpath, which provides the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94
[ https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3807: Labels: BB2015-05-TBR newbie (was: newbie) JobTracker needs fix similar to HDFS-94 --- Key: MAPREDUCE-3807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Harsh J Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-3807.patch 1.0 JobTracker's jobtracker.jsp page currently shows: {code} h2Cluster Summary (Heap Size is %= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %/%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %)/h2 {code} It could use an improvement same as HDFS-94 to reflect live heap usage more accurately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5188: Labels: BB2015-05-TBR contrib/raid (was: contrib/raid) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java --- Key: MAPREDUCE-5188 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 2.0.2-alpha Reporter: junjin Assignee: junjin Priority: Critical Labels: BB2015-05-TBR, contrib/raid Fix For: 2.0.2-alpha Attachments: MAPREDUCE-5188.patch error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java need change xorParityLength in line #379 to rsParityLength since it's for verifying RS_SOURCE type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5365: Labels: BB2015-05-TBR (was: ) Set mapreduce.job.classloader to true by default Key: MAPREDUCE-5365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: BB2015-05-TBR Attachments: MAPREDUCE-5365.patch MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a custom classloader to separate system classes from user classes. It seems like there are only rare cases when a user would not want this on, and that it should enabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4346: Labels: BB2015-05-TBR (was: ) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient -- Key: MAPREDUCE-4346 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Labels: BB2015-05-TBR Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch The current implementation for JobTracker.getAllJobs() returns all submitted jobs in any state, in addition to retired jobs. This list can be long and represents an unneeded overhead especially in the case of clients only interested in jobs in specific state(s). It is beneficial to include a refined version where only jobs having specific statuses are returned and retired jobs are optional to include. I'll be uploading an initial patch momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
[ https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4330: Labels: BB2015-05-TBR (was: ) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful --- Key: MAPREDUCE-4330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Labels: BB2015-05-TBR Attachments: MAPREDUCE-4330-20130415.1.patch, MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, MAPREDUCE-4330-21032013.patch The previously completed attempt is removed from successAttemptCompletionEventNoMap and marked OBSOLETE. After that, if the newly completed attempt is successful then it is added to the successAttemptCompletionEventNoMap. This seems wrong because the newly completed attempt could be failed and thus there is no need to invalidate the successful attempt. One error case would be when a speculative attempt completes with killed/failed after the successful version has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent
[ https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4273: Labels: BB2015-05-TBR (was: ) Make CombineFileInputFormat split result JDK independent Key: MAPREDUCE-4273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Labels: BB2015-05-TBR Attachments: MAPREDUCE-4273-branch1-v2.patch, mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, mapreduce-4273.patch The split result of CombineFileInputFormat depends on the iteration order of nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap implementation hence JDK dependent. This is manifested as TestCombineFileInputFormat failures on alternative JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5377) JobID is not displayed truly by hadoop job -history command
[ https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5377: Labels: BB2015-05-TBR newbie (was: newbie) JobID is not displayed truly by hadoop job -history command - Key: MAPREDUCE-5377 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-5377.patch JobID output by hadoop job -history command is wrong string. {quote} [hadoop@hadoop hadoop]$ hadoop job -history terasort Hadoop job: 0001_1374260789919_hadoop = Job tracker host name: job job tracker start time: Tue May 18 15:39:51 PDT 1976 User: hadoop JobName: TeraSort JobConf: hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml Submitted At: 19-7-2013 12:06:29 Launched At: 19-7-2013 12:06:30 (0sec) Finished At: 19-7-2013 12:06:44 (14sec) Status: SUCCESS {quote} In this example, it should show job_201307191206_0001 at Hadoop job:, but shows 0001_1374260789919_hadoop. In addition, Job tracker host name and job tracker start time is invalid. This problem can solve by fixing setting of jobId in HistoryViewer(). In addition, it should fix the information of JobTracker at HistoryViewr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5150: Labels: BB2015-05-TBR (was: ) Backport 2009 terasort (MAPREDUCE-639) to branch-1 -- Key: MAPREDUCE-5150 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Affects Versions: 1.2.0 Reporter: Gera Shegalov Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5150-branch-1.patch Users evaluate performance of Hadoop clusters using different benchmarks such as TeraSort. However, terasort version in branch-1 is outdated. It works on teragen dataset that cannot exceed 4 billion unique keys and it does not have the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3936: Labels: BB2015-05-TBR (was: ) Clients should not enforce counter limits -- Key: MAPREDUCE-3936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Tom White Assignee: Tom White Labels: BB2015-05-TBR Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6251: Labels: BB2015-05-TBR (was: ) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases --- Key: MAPREDUCE-6251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.6.0 Reporter: Craig Welch Assignee: Craig Welch Labels: BB2015-05-TBR Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch The JobClient is used to get job status information for running and completed jobs. Final state and history for a job is communicated from the application master to the job history server via a distributed file system - where the history is uploaded by the application master to the dfs and then scanned/loaded by the jobhistory server. While HDFS has strong consistency guarantees not all Hadoop DFS's do. When used in conjunction with a distributed file system which does not have this guarantee there will be cases where the history server may not see an uploaded file, resulting in the dreaded no such job and a null value for the RunningJob in the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5819) Binary token merge should be done once in TokenCache#obtainTokensForNamenodesInternal()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5819: Labels: BB2015-05-TBR (was: ) Binary token merge should be done once in TokenCache#obtainTokensForNamenodesInternal() --- Key: MAPREDUCE-5819 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5819 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Labels: BB2015-05-TBR Attachments: mapreduce-5819-v1.txt Currently mergeBinaryTokens() is called by every invocation of obtainTokensForNamenodesInternal(FileSystem, Credentials, Configuration) in the loop of obtainTokensForNamenodesInternal(Credentials, Path[], Configuration). This can be simplified so that mergeBinaryTokens() is called only once in obtainTokensForNamenodesInternal(Credentials, Path[], Configuration). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2340) optimize JobInProgress.initTasks()
[ https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2340: Labels: BB2015-05-TBR critical-0.22.0 (was: critical-0.22.0) optimize JobInProgress.initTasks() -- Key: MAPREDUCE-2340 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.1, 0.21.0 Reporter: Kang Xiao Labels: BB2015-05-TBR, critical-0.22.0 Attachments: MAPREDUCE-2340.patch, MAPREDUCE-2340.patch, MAPREDUCE-2340.r1.diff JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() and JobInProgress.createCache() significantly. A test for 1 job with 10 maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() and createCache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5258) Memory Leak while using LocalJobRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5258: Labels: BB2015-05-TBR patch (was: patch) Memory Leak while using LocalJobRunner -- Key: MAPREDUCE-5258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5258 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.2 Reporter: Subroto Sanyal Assignee: skrho Labels: BB2015-05-TBR, patch Fix For: 1.1.3 Attachments: mapreduce-5258 _001.txt, mapreduce-5258.txt Every-time a LocalJobRunner is launched it creates JobTrackerInstrumentation and QueueMetrics. While creating this MetricsSystem ; it registers and adds a Callback to ArrayList which keeps on growing as the DefaultMetricsSystem is Singleton. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6350: Labels: BB2015-05-TBR (was: ) JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Labels: BB2015-05-TBR Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6284) Add a 'task attempt state' to MapReduce Application Master REST API
[ https://issues.apache.org/jira/browse/MAPREDUCE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6284: Labels: BB2015-05-TBR (was: ) Add a 'task attempt state' to MapReduce Application Master REST API --- Key: MAPREDUCE-6284 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6284 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ryu Kobayashi Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6284.1.patch, MAPREDUCE-6284.1.patch, MAPREDUCE-6284.2.patch, MAPREDUCE-6284.3.patch, MAPREDUCE-6284.3.patch It want to 'task attempt state' on the 'App state' similarly REST API. GET http://proxy http address:port/proxy/application _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state PUT http://proxy http address:port/proxy/application _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6338) MR AppMaster does not honor ephemeral port range
[ https://issues.apache.org/jira/browse/MAPREDUCE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6338: Labels: BB2015-05-TBR (was: ) MR AppMaster does not honor ephemeral port range Key: MAPREDUCE-6338 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6338 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.6.0 Reporter: Frank Nguyen Assignee: Frank Nguyen Labels: BB2015-05-TBR Attachments: MAPREDUCE-6338.002.patch The MR AppMaster should only use port ranges defined in the yarn.app.mapreduce.am.job.client.port-range property. On initial startup of the MRAppMaster, it does use the port range defined in the property. However, it also opens up a listener on a random ephemeral port. This is not the Jetty listener. It is another listener opened by the MRAppMaster via another thread and is recognized by the RM. Other nodes will try to communicate to it via that random port. With firewall settings on, the MR job will fail because the random port is not opened. This problem has caused others to have all OS ephemeral ports opened to have MR jobs run. This is related to MAPREDUCE-4079 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6332) Add more required API's to MergeManager interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6332: Labels: BB2015-05-TBR (was: ) Add more required API's to MergeManager interface -- Key: MAPREDUCE-6332 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6332 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.5.0, 2.6.0, 2.7.0 Reporter: Rohith Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-MAPREDUCE-6332.patch, 0002-MAPREDUCE-6332.patch MR provides ability to the user for plugin custom ShuffleConsumerPlugin using *mapreduce.job.reduce.shuffle.consumer.plugin.class*. When the user is allowed to use this configuration as plugin, user also interest in implementing his own MergeManagerImpl. But now , user is forced to use MR provided MergeManagerImpl instead of custom MergeManagerImpl when user is using shuffle.consumer.plugin class. There should be well defined API's in MergeManager that can be used for any implementation without much effort to user for custom implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5733: Labels: BB2015-05-TBR (was: ) Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Labels: BB2015-05-TBR Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5203) Make AM of M/R Use NMClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5203: Labels: BB2015-05-TBR (was: ) Make AM of M/R Use NMClient --- Key: MAPREDUCE-5203 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5203 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: BB2015-05-TBR Attachments: MAPREDUCE-5203.1.patch, MAPREDUCE-5203.2.patch, MAPREDUCE-5203.3.patch, MAPREDUCE-5203.4.patch, MAPREDUCE-5203.5.patch YARN-422 adds NMClient. AM of mapreduce should use it instead of using the raw ContainerManager proxy directly. ContainerLauncherImpl needs to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2632: Labels: BB2015-05-TBR (was: ) Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Assignee: Ravi Teja Ch N V Labels: BB2015-05-TBR Attachments: MAPREDUCE-2632-1.patch, MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5374) CombineFileRecordReader does not set map.input.* configuration parameters for first file read
[ https://issues.apache.org/jira/browse/MAPREDUCE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5374: Labels: BB2015-05-TBR (was: ) CombineFileRecordReader does not set map.input.* configuration parameters for first file read --- Key: MAPREDUCE-5374 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5374 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0 Reporter: Dave Beech Assignee: Dave Beech Labels: BB2015-05-TBR Attachments: MAPREDUCE-5374.patch, MAPREDUCE-5374.patch The CombineFileRecordReader operates on splits consisting of multiple files. Each time a new record reader is initialised for a chunk, certain parameters are supposed to be set on the configuration object (map.input.file, map.input.start and map.input.length) However, the first reader is initialised in a different way to subsequent ones (i.e. initialize is called by the MapTask directly rather than from inside the record reader class). Because of this, these config parameters are not set properly and are returned as null when you access them from inside a mapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG
[ https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5981: Labels: BB2015-05-TBR (was: ) Log levels of certain MR logs can be changed to DEBUG - Key: MAPREDUCE-5981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Labels: BB2015-05-TBR Attachments: MAPREDUCE-5981.patch Following map reduce logs can be changed to DEBUG log level. 1. In org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 313), the second log is not required to be at info level. This can be moved to debug as a warn log is anyways printed if verifyReply fails. SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey); LOG.info(for url=+msgToEncode+ sent hash and received reply); 2. Thread related info need not be printed in logs at INFO level. Below 2 logs can be moved to DEBUG a) In org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java : 381), below log can be changed to DEBUG LOG.info(Assigning + host + with + host.getNumKnownMapOutputs() + to + Thread.currentThread().getName()); b) In org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java : 411), below log can be changed to DEBUG LOG.info(assigned + includedMaps + of + totalSize + to + host + to + Thread.currentThread().getName()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5362: Labels: BB2015-05-TBR (was: ) clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Labels: BB2015-05-TBR Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6020: Labels: BB2015-05-TBR (was: ) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5889: Labels: BB2015-05-TBR newbie (was: newbie) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String) --- Key: MAPREDUCE-5889 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, MAPREDUCE-5889.patch {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail to parse commaSeparatedPaths if a comma is included in the file path. (e.g. Path: {{/path/file,with,comma}}) We should deprecate these methods and document to use {{setInputPaths(Job job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5929: Labels: BB2015-05-TBR newbie patch (was: newbie patch) YARNRunner.java, path for jobJarPath not set correctly -- Key: MAPREDUCE-5929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chao Tian Assignee: Rahul Palamuttam Labels: BB2015-05-TBR, newbie, patch Attachments: MAPREDUCE-5929.patch In YARNRunner.java, line 357, Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR)); This causes the job.jar file to miss scheme, host and port number on distributed file systems other than hdfs. If we compare line 357 with line 344, there job.xml is actually set as Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE); It appears jobSubmitDir is missing on line 357, which causes this problem. In hdfs, the additional qualify process will correct this problem, but not other generic distributed file systems. The proposed change is to replace 35 7 with Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR)); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6038: Labels: BB2015-05-TBR (was: ) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial --- Key: MAPREDUCE-6038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 Project: Hadoop Map/Reduce Issue Type: Bug Environment: java version 1.8.0_11 hostspot 64-bit Reporter: Pei Ma Assignee: Tsuyoshi Ozawa Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6038.1.patch As a beginner, when I learned about the basic of the mr, I found that I cound't run the WordCount2 using the command bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 45, the returned default value of conf.getBoolean is true. That is to say when wordcount.skip.patterns is not set ,the WordCount2 will continue to execute getCacheFiles.. Then patternsURIs gets the null value. When the -skip option dosen't exist, wordcount.skip.patterns will not be set. Then a NullPointerException come out. At all, the block after the if-statement in line no. 45 shoudn't be executed when the -skip option dosen't exist in command. Maybe the line 45 should like that if (conf.getBoolean(wordcount.skip.patterns, false)) { .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5817: Labels: BB2015-05-TBR (was: ) mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Labels: BB2015-05-TBR Attachments: mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5490: Labels: BB2015-05-TBR (was: ) MapReduce doesn't set the environment variable for children processes - Key: MAPREDUCE-5490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: BB2015-05-TBR Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch Currently, MapReduce uses the command line argument to pass the classpath to the child. This breaks if the process forks a child that needs the same classpath. Such a case happens in Hive when it uses map-side joins. I propose that we make MapReduce in branch-1 use the CLASSPATH environment variable like YARN does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5499) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists
[ https://issues.apache.org/jira/browse/MAPREDUCE-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5499: Labels: BB2015-05-TBR (was: ) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists --- Key: MAPREDUCE-5499 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5499 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Xuan Gong Labels: BB2015-05-TBR Attachments: MAPREDUCE-5499.1.patch, MAPREDUCE-5499.2.patch Similar to YARN-609. There're the following *PBImpls which need to be fixed: 1. GetDiagnosticsResponsePBImpl 2. GetTaskAttemptCompletionEventsResponsePBImpl 3. GetTaskReportsResposnePBImpl 4. CounterGroupPBImpl 5. JobReportPBImpl 6. TaskReportPBImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5392: Labels: BB2015-05-TBR (was: ) mapred job -history all command throws IndexOutOfBoundsException -- Key: MAPREDUCE-5392 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Labels: BB2015-05-TBR Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch When I use an all option by mapred job -history comamnd, the following exceptions are displayed and do not work. {code} Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1875) at org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117) at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233) {code} This is because a node name recorded in History file is not given tracker_. Therefore it makes modifications to be able to read History file even if a node name is not given by tracker_. In addition, it fixes the URL of displayed task log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4065) Add .proto files to built tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4065: Labels: BB2015-05-TBR (was: ) Add .proto files to built tarball - Key: MAPREDUCE-4065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.23.2, 2.4.0 Reporter: Ralph H Castain Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-4065.1.patch Please add the .proto files to the built tarball so that users can build 3rd party tools that use protocol buffers without having to do an svn checkout of the source code. Sorry I don't know more about Maven, or I would provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh
[ https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6030: Labels: BB2015-05-TBR (was: ) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh Key: MAPREDUCE-6030 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Youngjoon Kim Assignee: Youngjoon Kim Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6030.patch In mr-jobhistory-daemon.sh, some env variables are exported before sourcing mapred-env.sh, so these variables don't use values defined in mapred-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6040: Labels: BB2015-05-TBR (was: ) distcp should automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Labels: BB2015-05-TBR Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. This behavior only occurs if none of the src and target pathnames are /.reserved/raw. The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6068) Illegal progress value warnings in map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6068: Labels: BB2015-05-TBR (was: ) Illegal progress value warnings in map tasks Key: MAPREDUCE-6068 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6068 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, task Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Binglin Chang Labels: BB2015-05-TBR Attachments: MAPREDUCE-6068.002.patch, MAPREDUCE-6068.v1.patch When running a terasort on latest trunk, I see the following in my task logs: {code} 2014-09-02 17:42:28,437 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal progress value found, progress is larger than 1. Progress will be changed to 1 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal progress value found, progress is larger than 1. Progress will be changed to 1 2014-09-02 17:42:42,241 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output {code} We should eliminate these warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6315: Labels: BB2015-05-TBR (was: ) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory Key: MAPREDUCE-6315 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mr-am Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Critical Labels: BB2015-05-TBR Attachments: MAPREDUCE-6315.001.patch When all AM attempts crash, there is no record of them in JHS. Thus no easy way to get the logs. This JIRA automates the procedure by utilizing the jhist file in the staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6246: Labels: BB2015-05-TBR DB2 mapreduce (was: DB2 mapreduce) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: BB2015-05-TBR, DB2, mapreduce Attachments: MAPREDUCE-6246.002.patch, MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6316) Task Attempt List entries should link to the task overview
[ https://issues.apache.org/jira/browse/MAPREDUCE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6316: Labels: BB2015-05-TBR (was: ) Task Attempt List entries should link to the task overview -- Key: MAPREDUCE-6316 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6316 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: AM attempt page.png, AM task page.png, All Attempts page.png, MAPREDUCE-6316.v1.patch, MAPREDUCE-6316.v2.patch, MAPREDUCE-6316.v3.patch, Task Overview page.png Typical workflow is to click on the list of failed attempts. Then you want to look at the counters, or the list of attempts of just one task in general. If each entry task attempt id linked the task id portion of it back to the task, we would not have to go through the list of tasks to search for the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5465: Labels: BB2015-05-TBR (was: ) Container killed before hprof dumps profile.out --- Key: MAPREDUCE-5465 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Radim Kolar Assignee: Ming Ma Labels: BB2015-05-TBR Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, MAPREDUCE-5465.patch If there is profiling enabled for mapper or reducer then hprof dumps profile.out at process exit. It is dumped after task signaled to AM that work is finished. AM kills container with finished work without waiting for hprof to finish dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 works) , it could not finish dump in time before being killed making entire dump unusable because cpu and heap stats are missing. There needs to be better delay before container is killed if profiling is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6305: Labels: BB2015-05-TBR (was: ) AM/Task log page should be able to link back to the job --- Key: MAPREDUCE-6305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: MAPREDUCE-6305.v1.patch, MAPREDUCE-6305.v2.patch, MAPREDUCE-6305.v3.patch, MAPREDUCE-6305.v4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6241) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC
[ https://issues.apache.org/jira/browse/MAPREDUCE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6241: Labels: BB2015-05-TBR features (was: features) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC Key: MAPREDUCE-6241 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6241 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.6.0 Environment: Debian/Jessie, kernel 3.18.5, ppc64 GNU/Linux gcc (Debian 4.9.1-19) protobuf 2.6.1 OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2) OpenJDK Zero VM (build 24.65-b04, interpreted mode) source was cloned (and updated) from Apache-Hadoop's git repository Reporter: Stephan Drescher Assignee: Binglin Chang Priority: Minor Labels: BB2015-05-TBR, features Attachments: MAPREDUCE-6241.001.patch, MAPREDUCE-6241.002.patch Issue when using assembler code for performance optimization on the powerpc platform (compiled for 32bit) mvn compile -Pnative -DskipTests [exec] /usr/bin/c++ -Dnativetask_EXPORTS -m32 -DSIMPLE_MEMCPY -fno-strict-aliasing -Wall -Wno-sign-compare -g -O2 -DNDEBUG -fPIC -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/javah -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native -I/home/hadoop/Java/java7/include -I/home/hadoop/Java/java7/include/linux -isystem /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/include -o CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o -c /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc [exec] CMakeFiles/nativetask.dir/build.make:744: recipe for target 'CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o' failed [exec] make[2]: Leaving directory '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' [exec] CMakeFiles/Makefile2:95: recipe for target 'CMakeFiles/nativetask.dir/all' failed [exec] make[1]: Leaving directory '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' [exec] Makefile:76: recipe for target 'all' failed [exec] /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc: In function ‘void NativeTask::init_cpu_support_flag()’: /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:611:14: error: impossible register constraint in ‘asm’ -- popl %%ebx : =a (eax), [ebx] =r(ebx), =c(ecx), =d(edx) : a (eax_in) : cc); -- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6336: Labels: BB2015-05-TBR (was: ) Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6269) improve JobConf to add option to not share Credentials between jobs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6269: Labels: BB2015-05-TBR (was: ) improve JobConf to add option to not share Credentials between jobs. Key: MAPREDUCE-6269 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6269 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6269.000.patch Improve JobConf to add constructor to avoid sharing Credentials between jobs. By default the Credentials will be shared to keep the backward compatibility. We can add a new constructor with a new parameter to decide whether to share Credentials. Some issues reported in cascading is due to corrupted credentials at https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e If we add this support in JobConf, it will benefit all job clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6298) Job#toString throws an exception when not in state RUNNING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6298: Labels: BB2015-05-TBR (was: ) Job#toString throws an exception when not in state RUNNING -- Key: MAPREDUCE-6298 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6298 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6298.1.patch Job#toString calls {{ensureState(JobState.RUNNING);}} as the very first thing. That method causes an Exception to be thrown which is not nice. One thing this breaks is usage of Job on the Scala (e.g. Spark) REPL as that calls toString after every invocation and that fails every time. I'll attach a patch that checks state and if it's RUNNING prints the original message and if not prints something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6356) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6356: Labels: BB2015-05-TBR (was: ) Misspelling of threshold in log4j.properties for tests -- Key: MAPREDUCE-6356 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6356 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6356.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2094: Labels: BB2015-05-TBR (was: ) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Labels: BB2015-05-TBR Attachments: MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6279) AM should explicity exit JVM after all services have stopped
[ https://issues.apache.org/jira/browse/MAPREDUCE-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6279: Labels: BB2015-05-TBR (was: ) AM should explicity exit JVM after all services have stopped Key: MAPREDUCE-6279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6279 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Labels: BB2015-05-TBR Attachments: MAPREDUCE-6279.v1.txt, MAPREDUCE-6279.v2.txt, MAPREDUCE-6279.v3.patch, MAPREDUCE-6279.v4.patch Occasionally the MapReduce AM can get stuck trying to shut down. MAPREDUCE-6049 and MAPREDUCE-5888 were specific instances that have been fixed, but this can also occur with uber jobs if the task code inadvertently leaves non-daemon threads lingering. We should explicitly shutdown the JVM after the MapReduce AM has unregistered and all services have been stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6174: Labels: BB2015-05-TBR (was: ) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput. --- Key: MAPREDUCE-6174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Labels: BB2015-05-TBR Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, MAPREDUCE-6174.v1.txt Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing similar things with regards to IFile streams. In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are different from 3rd-party implementations, this JIRA will make them subclass a common class (see https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5747) Potential null pointer deference in HsTasksBlock#render()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5747: Labels: BB2015-05-TBR newbie patch (was: newbie patch) Potential null pointer deference in HsTasksBlock#render() - Key: MAPREDUCE-5747 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5747 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu Priority: Minor Labels: BB2015-05-TBR, newbie, patch Attachments: MAPREDUCE-5747-1.patch At line 140: {code} } else { ta = new TaskAttemptInfo(successful, type, false); {code} There is no check for type against null. TaskAttemptInfo ctor deferences type: {code} public TaskAttemptInfo(TaskAttempt ta, TaskType type, Boolean isRunning) { final TaskAttemptReport report = ta.getReport(); this.type = type.toString(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6337) add a mode to replay MR job history files to the timeline service
[ https://issues.apache.org/jira/browse/MAPREDUCE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6337: Labels: BB2015-05-TBR (was: ) add a mode to replay MR job history files to the timeline service - Key: MAPREDUCE-6337 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6337 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Sangjin Lee Assignee: Sangjin Lee Labels: BB2015-05-TBR Attachments: MAPREDUCE-6337-YARN-2928.001.patch The subtask covers the work on top of YARN-3437 to add a mode to replay MR job history files to the timeline service storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName
[ https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6079: Labels: BB2015-05-TBR (was: ) Renaming JobImpl#username to reporterUserName - Key: MAPREDUCE-6079 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-6079.1.patch On MAPREDUCE-6033, we found the bug because of confusing field names {{userName}} and {{username}}. We should change the names to distinguish them easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6320) Configuration of retrieved Job via Cluster is not properly set-up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6320: Labels: BB2015-05-TBR (was: ) Configuration of retrieved Job via Cluster is not properly set-up - Key: MAPREDUCE-6320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6320 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jens Rabe Assignee: Jens Rabe Labels: BB2015-05-TBR Attachments: MAPREDUCE-6320.001.patch, MAPREDUCE-6320.002.patch, MAPREDUCE-6320.003.patch When getting a Job via the Cluster API, it is not correctly configured. To reproduce this: # Submit a MR job, and set some arbitrary parameter to its configuration {code:java} job.getConfiguration().set(foo, bar); job.setJobName(foo-bug-demo); {code} # Get the job in a client: {code:java} final Cluster c = new Cluster(conf); final JobStatus[] statuses = c.getAllJobStatuses(); final JobStatus s = ... // get the status for the job named foo-bug-demo final Job j = c.getJob(s.getJobId()); final Configuration conf = job.getConfiguration(); {code} # Get its foo entry {code:java} final String s = conf.get(foo); {code} # Expected: s is bar; But: s is null. The reason is that the job's configuration is stored on HDFS (the Configuration has a resource with a *hdfs://* URL) and in the *loadResource* it is changed to a path on the local file system (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is not populated. The bug happens in the *Cluster* class, where *JobConfs* are created from *status.getJobFile()*. A quick fix would be to copy this job file to a temporary file in the local file system and populate the JobConf from this file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6128: Labels: BB2015-05-TBR (was: ) Automatic addition of bundled jars to distributed cache Key: MAPREDUCE-6128 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.5.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Labels: BB2015-05-TBR Attachments: MAPREDUCE-6128.v01.patch, MAPREDUCE-6128.v02.patch, MAPREDUCE-6128.v03.patch, MAPREDUCE-6128.v04.patch, MAPREDUCE-6128.v05.patch, MAPREDUCE-6128.v06.patch, MAPREDUCE-6128.v07.patch, MAPREDUCE-6128.v08.patch On the client side, JDK adds Class-Path elements from the job jar manifest on the classpath. In theory there could be many bundled jars in many directories such that adding them manually via libjars or similar means to task classpaths is cumbersome. If this property is enabled, the same jars are added to the task classpaths automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar
[ https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4683: Labels: BB2015-05-TBR (was: ) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar Key: MAPREDUCE-4683 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Arun C Murthy Assignee: Akira AJISAKA Priority: Critical Labels: BB2015-05-TBR Attachments: MAPREDUCE-4683.patch We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6310: Labels: BB2015-05-TBR (was: ) Add jdiff support to MapReduce -- Key: MAPREDUCE-6310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Labels: BB2015-05-TBR Attachments: MAPRED-6310-040615.patch Previously we used jdiff for Hadoop common and HDFS. Now we're extending the support of jdiff to YARN. Probably we'd like to do similar things with MapReduce? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6271) org.apache.hadoop.mapreduce.Cluster GetJob() display warn log
[ https://issues.apache.org/jira/browse/MAPREDUCE-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6271: Labels: BB2015-05-TBR (was: ) org.apache.hadoop.mapreduce.Cluster GetJob() display warn log - Key: MAPREDUCE-6271 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6271 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.0 Reporter: Peng Zhang Assignee: Peng Zhang Labels: BB2015-05-TBR Attachments: MAPREDUCE-6271.v2.patch, MR-6271.patch When using getJob() with MapReduce 2.7, warn log caused by configuration loaded twice is displayed every time. And when job completed, this function will display warn log of java.io.FileNotFoundException And I think this is related with MAPREDUCE-5875, the change in GetJob() seems to be not needed, cause it's only for test. {noformat} 15/03/04 13:41:23 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:23 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:24 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:24 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:25 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:25 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:26 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:26 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:27 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:27 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:28 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:28 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:29 WARN conf.Configuration: hdfsG://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:29 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:29 INFO exec.Task: 2015-03-04 13:41:29,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.37 sec 15/03/04 13:41:30 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:30 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter:
[jira] [Updated] (MAPREDUCE-6296) A better way to deal with InterruptedException on waitForCompletion
[ https://issues.apache.org/jira/browse/MAPREDUCE-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6296: Labels: BB2015-05-TBR (was: ) A better way to deal with InterruptedException on waitForCompletion --- Key: MAPREDUCE-6296 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6296 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Yang Hao Assignee: Yang Hao Labels: BB2015-05-TBR Attachments: MAPREDUCE-6296.patch Some code in method waitForCompletion of Job class is {code:title=Job.java|borderStyle=solid} public boolean waitForCompletion(boolean verbose ) throws IOException, InterruptedException, ClassNotFoundException { if (state == JobState.DEFINE) { submit(); } if (verbose) { monitorAndPrintJob(); } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { } } } return isSuccessful(); } {code} but a better way to deal with InterruptException is {code:title=Job.java|borderStyle=solid} public boolean waitForCompletion(boolean verbose ) throws IOException, InterruptedException, ClassNotFoundException { if (state == JobState.DEFINE) { submit(); } if (verbose) { monitorAndPrintJob(); } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { Thread.currentThread().interrupt(); } } } return isSuccessful(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3517) map.input.path is null at the first split when use CombieFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3517: Labels: BB2015-05-TBR (was: ) map.input.path is null at the first split when use CombieFileInputFormat --- Key: MAPREDUCE-3517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3517 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.203.0 Reporter: wanbin Labels: BB2015-05-TBR Attachments: CombineFileRecordReader.diff, MAPREDUCE-3517.02.patch map.input.path is null at the first split when use CombieFileInputFormat. because in runNewMapper function, mapContext instead of taskContext which is set map.input.path. so we need set map.input.path again to mapContext -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5883) Total megabyte-seconds in job counters is slightly misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5883: Labels: BB2015-05-TBR (was: ) Total megabyte-seconds in job counters is slightly misleading --- Key: MAPREDUCE-5883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5883.patch The following counters are in milliseconds so megabyte-seconds might be better stated as megabyte-milliseconds MB_MILLIS_MAPS.name= Total megabyte-seconds taken by all map tasks MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce tasks VCORES_MILLIS_MAPS.name= Total vcore-seconds taken by all map tasks VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6027) mr jobs with relative paths can fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6027: Labels: BB2015-05-TBR (was: ) mr jobs with relative paths can fail Key: MAPREDUCE-6027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6027 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Reporter: Wing Yew Poon Assignee: Wing Yew Poon Labels: BB2015-05-TBR Attachments: MAPREDUCE-6027.patch I built hadoop from branch-2 and tried to run terasort as follows: {noformat} wypoon$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-SNAPSHOT.jar terasort sort-input sort-output 14/08/07 08:57:55 INFO terasort.TeraSort: starting 2014-08-07 08:57:56.229 java[36572:1903] Unable to load realm info from SCDynamicStore 14/08/07 08:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/08/07 08:57:57 INFO input.FileInputFormat: Total input paths to process : 2 Spent 156ms computing base-splits. Spent 2ms computing TeraScheduler splits. Computing input splits took 159ms Sampling 2 splits of 2 Making 1 from 10 sampled records Computing parititions took 626ms Spent 789ms computing partitions. 14/08/07 08:57:57 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 14/08/07 08:57:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/wypoon/.staging/job_1407426900134_0001 java.lang.IllegalArgumentException: Can not create a Path from an empty URI at org.apache.hadoop.fs.Path.checkPathArg(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:192) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:316) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} If I used absolute paths for the input and out directories, the job runs fine. This breakage is due to HADOOP-10876. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5876) SequenceFileRecordReader NPE if close() is called before initialize()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5876: Labels: BB2015-05-TBR (was: ) SequenceFileRecordReader NPE if close() is called before initialize() - Key: MAPREDUCE-5876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5876 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.3.0, 2.4.0 Reporter: Reinis Vicups Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-5876.1.patch org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader extends org.apache.hadoop.mapreduce.RecordReader which in turn implements java.io.Closeable. According to java spec the java.io.Closeable#close() has to be idempotent (http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html) which is not. An NPE is being thrown if close() method is invoked without previously calling initialize() method. This happens because SequenceFile.Reader in is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6003: Labels: BB2015-05-TBR (was: ) Resource Estimator suggests huge map output in some cases - Key: MAPREDUCE-6003 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.1 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6003-branch-1.2.patch In some cases, ResourceEstimator can return way too large map output estimation. This happens when input size is not correctly calculated. A typical case is when joining two Hive tables (one in HDFS and the other in HBase). The maps that process the HBase table finish first, which has a 0 length of inputs due to its TableInputFormat. Then for a map that processes HDFS table, the estimated output size is very large because of the wrong input size, causing the map task not possible to be assigned. There are two possible solutions to this problem: (1) Make input size correct for each case, e.g. HBase, etc. (2) Use another algorithm to estimate the map output, or at least make it closer to reality. I prefer the second way, since the first would require all possibilities to be taken care of. It is not easy for some inputs such as URIs. In my opinion, we could make a second estimation which is independent of the input size: estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10 Here, multiplying by 10 makes the estimation more conservative, so that it will be less likely to assign it to some where not big enough. The former estimation goes like this: estimationA = (inputSize * completedMapOutputSize * 2.0) / completedMapInputSize My suggestion is to take minimum of the two estimations: estimation = min(estimationA, estimationB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3182) loadgen ignores -m command line when writing random data
[ https://issues.apache.org/jira/browse/MAPREDUCE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3182: Labels: BB2015-05-TBR (was: ) loadgen ignores -m command line when writing random data Key: MAPREDUCE-3182 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3182 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 0.23.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Chen He Labels: BB2015-05-TBR Attachments: MAPREDUCE-3182.patch If no input directories are specified, loadgen goes into a special mode where random data is generated and written. In that mode, setting the number of mappers (-m command line option) is overridden by a calculation. Instead, it should take into consideration the user specified number of mappers and fall back to the calculation. In addition, update the documentation as well to match the new behavior in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1380: Labels: BB2015-05-TBR (was: ) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5845: Labels: BB2015-05-TBR (was: ) TestShuffleHandler failing intermittently on windows Key: MAPREDUCE-5845 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: BB2015-05-TBR Attachments: apache-mapreduce-5845.0.patch TestShuffleHandler fails intermittently on Windows - specifically, testClientClosesConnection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5225) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5225: Labels: BB2015-05-TBR (was: ) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits --- Key: MAPREDUCE-5225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5225 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: BB2015-05-TBR Attachments: MAPREDUCE-5225.1.patch Now, SplitSampler only samples the first maxSplitsSampled splits, caused by MAPREDUCE-1820. However, jumping around all splits is in general preferable than the first N splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4216: Labels: BB2015-05-TBR Output (was: Output) Make MultipleOutputs generic to support non-file output formats --- Key: MAPREDUCE-4216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 1.0.2 Reporter: Robbie Strickland Labels: BB2015-05-TBR, Output Attachments: MAPREDUCE-4216.patch The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records
[ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4840: Labels: BB2015-05-TBR (was: ) Delete dead code and deprecate public API related to skipping bad records - Key: MAPREDUCE-4840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Mostafa Elhemali Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-4840.patch It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right? Dead code I'm talking about: 1. Task class: skipping, skipRanges, writeSkipRecs 2. MapTask class: SkippingRecordReader inner class 3. ReduceTask class: SkippingReduceValuesIterator inner class 4. Tests: TestBadRecords Public API: 1. SkipBadRecords class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3115) OOM When the value for the property mapred.map.multithreadedrunner.class is set to MultithreadedMapper instance.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3115: Labels: BB2015-05-TBR (was: ) OOM When the value for the property mapred.map.multithreadedrunner.class is set to MultithreadedMapper instance. -- Key: MAPREDUCE-3115 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3115 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.23.0, 1.0.0 Environment: NA Reporter: Bhallamudi Venkata Siva Kamesh Labels: BB2015-05-TBR Attachments: MAPREDUCE-3115.2.patch, MAPREDUCE-3115.patch When we set the value for the property *mapred.map.multithreadedrunner.class* as instance of MultithreadedMapper, using MultithreadedMapper.setMapperClass(), it simply throws IllegalArgumentException. But when we set the same property, using job's conf object using job.getConfiguration().setClass(*mapred.map.multithreadedrunner.class*, MultithreadedMapper.class, Mapper.class), throws OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5969: Labels: BB2015-05-TBR (was: ) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-5969.branch1.1.patch, MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5227) JobTrackerMetricsSource and QueueMetrics should standardize naming rules
[ https://issues.apache.org/jira/browse/MAPREDUCE-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5227: Labels: BB2015-05-TBR (was: ) JobTrackerMetricsSource and QueueMetrics should standardize naming rules Key: MAPREDUCE-5227 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5227 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 1.1.3, 1.2.1 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5227-1.1-branch.1.patch, MAPREDUCE-5227-branch-1.1.patch, MAPREDUCE-5227.1.patch JobTrackerMetricsSource and QueueMetrics provides users with some metrics, but its naming rules( jobs_running, running_maps, running_reduces) sometimes confuses users. It should be standardized. One concern is backward compatibility, so one idea is to share MetricMutableGaugeInt object from old and new property name. e.g. to share runningMaps from running_maps and maps_running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5700) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5700: Labels: BB2015-05-TBR (was: ) historyServer can't show container's log when aggregation is not enabled Key: MAPREDUCE-5700 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5700 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 Environment: yarn.log-aggregation-enable=false , HistoryServer will show like this: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Reporter: Hong Shen Assignee: Hong Shen Labels: BB2015-05-TBR Attachments: yarn-647-2.patch, yarn-647.patch When yarn.log-aggregation-enable is seted to false, after a MR_App complete, we can't view the container's log from the HistoryServer, it shows message like: Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 Since we don't want to aggregate the container's log, because it will be a pressure to namenode. but sometimes we also want to take a look at container's log. Should we show the container's log across HistoryServer even if yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test
[ https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5248: Labels: BB2015-05-TBR (was: ) Let NNBenchWithoutMR specify the replication factor for its test Key: MAPREDUCE-5248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, test Affects Versions: 3.0.0 Reporter: Erik Paulson Assignee: Erik Paulson Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5248.patch, MAPREDUCE-5248.txt Original Estimate: 1h Remaining Estimate: 1h The NNBenchWithoutMR test creates files with a replicationFactorPerFile hard-coded to 1. It'd be nice to be able to specify that on the commandline. Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5264) FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or FileContext#mkdir in the codebase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5264: Labels: BB2015-05-TBR (was: ) FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or FileContext#mkdir in the codebase Key: MAPREDUCE-5264 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5264 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Rémy SAISSY Labels: BB2015-05-TBR Attachments: MAPREDUCE-5264.20130607.1.patch According to https://issues.apache.org/jira/browse/HADOOP-9438, FileSystem#mkdirs and FileContext#mkdir do not throw FileAlreadyExistsException if the directory already exist. Some places in the mapreduce codebase assumes FileSystem#mkdirs or FileContext#mkdir throw FileAlreadyExistsException. At least the following files are concerned: - YarnChild.java - JobHistoryEverntHandler.java - HistoryFileManager.java It would be good to re-review and patch this if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5708) Duplicate String.format in getSpillFileForWrite
[ https://issues.apache.org/jira/browse/MAPREDUCE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5708: Labels: BB2015-05-TBR (was: ) Duplicate String.format in getSpillFileForWrite --- Key: MAPREDUCE-5708 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5708 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Konstantin Weitz Priority: Minor Labels: BB2015-05-TBR Attachments: 0001-Removed-duplicate-String.format.patch Original Estimate: 10m Remaining Estimate: 10m The code responsible for formatting the spill file name (namely _getSpillFileForWrite_) unnecessarily calls _String.format_ twice. This does not only affect performance, but leads to the weird requirement that task attempt ids cannot contain _%_ characters (because these would be interpreted as format specifiers in the outside _String.format_ call). I assume this was done by mistake, as it could only be useful if task attempt ids contained _%n_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5216: Labels: BB2015-05-TBR (was: ) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. --- Key: MAPREDUCE-5216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Gelesh Labels: BB2015-05-TBR Attachments: MAPREDUCE-5216.patch Original Estimate: 1h Remaining Estimate: 1h While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. ie, Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = S, instead of Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = S, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5871: Labels: BB2015-05-TBR (was: ) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Labels: BB2015-05-TBR Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6023) Fix SuppressWarnings from unchecked to rawtypes in O.A.H.mapreduce.lib.input.TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6023: Labels: BB2015-05-TBR newbie (was: newbie) Fix SuppressWarnings from unchecked to rawtypes in O.A.H.mapreduce.lib.input.TaggedInputSplit - Key: MAPREDUCE-6023 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6023 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Junping Du Assignee: Abhilash Srimat Tirumala Pallerlamudi Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-6023.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4818: Labels: BB2015-05-TBR usability (was: usability) Easier identification of tasks that timeout during localization --- Key: MAPREDUCE-4818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 0.23.3, 2.0.3-alpha Reporter: Jason Lowe Assignee: Siqi Li Labels: BB2015-05-TBR, usability Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful. The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started. There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5577) Allow querying the JobHistoryServer by job arrival time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5577: Labels: BB2015-05-TBR (was: ) Allow querying the JobHistoryServer by job arrival time --- Key: MAPREDUCE-5577 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: BB2015-05-TBR Attachments: MAPREDUCE-5577.patch The JobHistoryServer REST APIs currently allow querying by job submit time and finish time. However, jobs don't necessarily arrive in order of their finish time, meaning that a client who wants to stay on top of all completed jobs needs to query large time intervals to make sure they're not missing anything. Exposing functionality to allow querying by the time a job lands at the JobHistoryServer would allow clients to set the start of their query interval to the time of their last query. The arrival time of a job would be defined as the time that it lands in the done directory and can be picked up using the last modified date on history files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4487) Reduce job latency by removing hardcoded sleep statements
[ https://issues.apache.org/jira/browse/MAPREDUCE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4487: Labels: BB2015-05-TBR (was: ) Reduce job latency by removing hardcoded sleep statements - Key: MAPREDUCE-4487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4487 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 1.0.3, 2.0.0-alpha Reporter: Tom White Assignee: Tom White Labels: BB2015-05-TBR Attachments: MAPREDUCE-4487-mr2.patch, MAPREDUCE-4487.patch There are a few places in MapReduce where there are hardcoded sleep statements. By replacing them with wait/notify or similar it's possible to reduce latency for short running jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6273) HistoryFileManager should check whether summaryFile exists to avoid FileNotFoundException causing HistoryFileInfo into MOVE_FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6273: Labels: BB2015-05-TBR (was: ) HistoryFileManager should check whether summaryFile exists to avoid FileNotFoundException causing HistoryFileInfo into MOVE_FAILED state Key: MAPREDUCE-6273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6273 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6273.000.patch HistoryFileManager should check whether summaryFile exists to avoid FileNotFoundException causing HistoryFileInfo into MOVE_FAILED state, I saw the following error message: {code} 2015-02-17 19:13:45,198 ERROR org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Error while trying to move a job to done java.io.FileNotFoundException: File does not exist: /user/history/done_intermediate/agd_laci-sluice/job_1423740288390_1884.summary at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1878) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1819) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1771) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1181) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1169) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:230) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1457) at org.apache.hadoop.fs.Hdfs.open(Hdfs.java:318) at org.apache.hadoop.fs.Hdfs.open(Hdfs.java:59) at org.apache.hadoop.fs.AbstractFileSystem.open(AbstractFileSystem.java:621) at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:789) at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:785) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.open(FileContext.java:785) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getJobSummary(HistoryFileManager.java:953) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$400(HistoryFileManager.java:82) at
[jira] [Updated] (MAPREDUCE-4919) All maps hangs when set mapreduce.task.io.sort.factor to 1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4919: Labels: BB2015-05-TBR (was: ) All maps hangs when set mapreduce.task.io.sort.factor to 1 -- Key: MAPREDUCE-4919 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4919 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Jerry Chen Assignee: Jerry Chen Labels: BB2015-05-TBR Attachments: MAPREDUCE-4919.patch Original Estimate: 2h Remaining Estimate: 2h In one of my testing that when I set mapreduce.task.io.sort.factor to 1, all the maps hang and will never end. But the CPU usage for each node are very high and until killed by the app master when time out comes, and the job failed. I traced the problem and found out that all the maps hangs on the final merge phase. The while loop in computeBytesInMerges will never end with a factor of 1: int f = 1; //in my case int n = 16; //in my case while (n f || considerFinalMerge) { ... n -= (f-1); f = factor; } As the f-1 will equals 0 and n will always be 16 and the while runs for ever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5951: Labels: BB2015-05-TBR (was: ) Add support for the YARN Shared Cache - Key: MAPREDUCE-5951 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Chris Trezzo Assignee: Chris Trezzo Labels: BB2015-05-TBR Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch Implement the necessary changes so that the MapReduce application can leverage the new YARN shared cache (i.e. YARN-1492). Specifically, allow per-job configuration so that MapReduce jobs can specify which set of resources they would like to cache (i.e. jobjar, libjars, archives, files). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1362) Pipes should be ported to the new mapreduce API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1362: Labels: BB2015-05-TBR (was: ) Pipes should be ported to the new mapreduce API --- Key: MAPREDUCE-1362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1362 Project: Hadoop Map/Reduce Issue Type: Improvement Components: pipes Reporter: Bassam Tabbara Labels: BB2015-05-TBR Attachments: MAPREDUCE-1362-trunk.patch, MAPREDUCE-1362.patch, MAPREDUCE-1362.patch Pipes is still currently using the old mapred API. This prevents us from using pipes with HBase's TableInputFormat, HRegionPartitioner, etc. Here is a rough proposal for how to accomplish this: * Add a new package org.apache.hadoop.mapreduce.pipes that uses the new mapred API. * the new pipes package will run side by side with the old one. old one should get deprecated at some point. * the wire protocol used between PipesMapper and PipesReducer and C++ programs must not change. * bin/hadoop should support both pipes (old api) and pipes2 (new api) Does this sound reasonable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3914) Mismatched free() / delete / delete [] in HadoopPipes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3914: Labels: BB2015-05-TBR (was: ) Mismatched free() / delete / delete [] in HadoopPipes - Key: MAPREDUCE-3914 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3914 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 0.20.205.0, 0.23.0, 1.0.0 Environment: Based upon map reduce pipes task executed on Ubuntu 11.10 Reporter: Charles Earl Labels: BB2015-05-TBR Attachments: MAPREDUCE-3914-branch-0.23.patch, MAPREDUCE-3914-branch-1.0.patch, MAPREDUCE-3914.patch Original Estimate: 1h Remaining Estimate: 1h When running valgrind on a simple MapReduce pipes job, valgrind identifies a mismatched new / delete: ==20394== Mismatched free() / delete / delete [] ==20394==at 0x4C27FF2: operator delete(void*) (vg_replace_malloc.c:387) ==20394==by 0x4328A5: HadoopPipes::runTask(HadoopPipes::Factory const) (HadoopPipes.cc:1171) ==20394==by 0x424C33: main (ProcessRow.cpp:118) ==20394== Address 0x9c5b540 is 0 bytes inside a block of size 131,072 alloc'd ==20394==at 0x4C2864B: operator new[](unsigned long) (vg_replace_malloc.c:305) ==20394==by 0x431E5D: HadoopPipes::runTask(HadoopPipes::Factory const) (HadoopPipes.cc:1121) ==20394==by 0x424C33: main (ProcessRow.cpp:118) ==20394== ==20394== Mismatched free() / delete / delete [] ==20394==at 0x4C27FF2: operator delete(void*) (vg_replace_malloc.c:387) ==20394==by 0x4328AF: HadoopPipes::runTask(HadoopPipes::Factory const) (HadoopPipes.cc:1172) ==20394==by 0x424C33: main (ProcessRow.cpp:118) ==20394== Address 0x9c7b580 is 0 bytes inside a block of size 131,072 alloc'd ==20394==at 0x4C2864B: operator new[](unsigned long) (vg_replace_malloc.c:305) ==20394==by 0x431E6A: HadoopPipes::runTask(HadoopPipes::Factory const) (HadoopPipes.cc:1122) ==20394==by 0x424C33: main (ProcessRow.cpp:118) The new [] calls in Lines 1121 and 1122 of HadoopPipes.cc: bufin = new char[bufsize]; bufout = new char[bufsize]; should have matching delete [] calls but are instead bracketed my delete on lines 1171 and 1172: delete bufin; delete bufout; So these should be replaced by delete[] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5491) DFSIO do not initialize write buffer correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5491: Labels: BB2015-05-TBR (was: ) DFSIO do not initialize write buffer correctly -- Key: MAPREDUCE-5491 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5491 Project: Hadoop Map/Reduce Issue Type: Bug Components: benchmarks, test Reporter: Raymond Liu Assignee: Raymond Liu Labels: BB2015-05-TBR Attachments: MAPREDUCE-5491-v2.patch, MAPREDUCE-5491.patch In DFSIO test, the IOMapperBase will set bufferSize in configure method, while writeMapper, appendMapper etc use bufferSize to initialize buffer in the constructor. This will lead to buffer not initialized at all. It is ok for non compression route, while compression is used, the output data size will be very small due to all 0 in buffer. Thus, the overrided configure method should be be the correct place for initial buffer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5549) distcp app should fail if m/r job fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5549: Labels: BB2015-05-TBR (was: ) distcp app should fail if m/r job fails --- Key: MAPREDUCE-5549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5549 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp, mrv2 Affects Versions: 3.0.0 Reporter: David Rosenstrauch Labels: BB2015-05-TBR Attachments: MAPREDUCE-5549-001.patch, MAPREDUCE-5549-002.patch I run distcpv2 in a scripted manner. The script checks if the distcp step fails and, if so, aborts the rest of the script. However, I ran into an issue today where the distcp job failed, but my calling script went on its merry way. Digging into the code a bit more (at https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java), I think I see the issue: the distcp app is not returning an error exit code to the shell when the distcp job fails. This is a big problem, IMO, as it prevents distcp from being successfully used in a scripted environment. IMO, the code should change like so: Before: {code:title=org.apache.hadoop.tools.DistCp.java} //... public int run(String[] argv) { //... try { execute(); } catch (InvalidInputException e) { LOG.error(Invalid input: , e); return DistCpConstants.INVALID_ARGUMENT; } catch (DuplicateFileException e) { LOG.error(Duplicate files in input path: , e); return DistCpConstants.DUPLICATE_INPUT; } catch (Exception e) { LOG.error(Exception encountered , e); return DistCpConstants.UNKNOWN_ERROR; } return DistCpConstants.SUCCESS; } //... {code} After: {code:title=org.apache.hadoop.tools.DistCp.java} //... public int run(String[] argv) { //... Job job = null; try { job = execute(); } catch (InvalidInputException e) { LOG.error(Invalid input: , e); return DistCpConstants.INVALID_ARGUMENT; } catch (DuplicateFileException e) { LOG.error(Duplicate files in input path: , e); return DistCpConstants.DUPLICATE_INPUT; } catch (Exception e) { LOG.error(Exception encountered , e); return DistCpConstants.UNKNOWN_ERROR; } if (job.isSuccessful()) { return DistCpConstants.SUCCESS; } else { return DistCpConstants.UNKNOWN_ERROR; } } //... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4443: Labels: BB2015-05-TBR usability (was: usability) MR AM and job history server should be resilient to jobs that exceed counter limits Key: MAPREDUCE-4443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Rahul Jain Assignee: Mayank Bansal Labels: BB2015-05-TBR, usability Attachments: MAPREDUCE-4443-trunk-1.patch, MAPREDUCE-4443-trunk-2.patch, MAPREDUCE-4443-trunk-3.patch, MAPREDUCE-4443-trunk-draft.patch, am_failed_counter_limits.txt We saw this problem migrating applications to MapReduceV2: Our applications use hadoop counters extensively (1000+ counters for certain jobs). While this may not be one of recommended best practices in hadoop, the real issue here is reliability of the framework when applications exceed counter limits. The hadoop servers (yarn, history server) were originally brought up with mapreduce.job.counters.max=1000 under core-site.xml We then ran map-reduce job under an application using its own job specific overrides, with mapreduce.job.counters.max=1 All the tasks for the job finished successfully; however the overall job still failed due to AM encountering exceptions as: {code} 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 1001 max=1000 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) at java.lang.Thread.run(Thread.java:662) 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 17:31:43,503 INFO [Thread-1] org.apache.had {code} The
[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6096: Labels: BB2015-05-TBR easyfix patch (was: easyfix patch) SummarizedJob class NPEs with some jhist files -- Key: MAPREDUCE-6096 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: zhangyubiao Labels: BB2015-05-TBR, easyfix, patch Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096-v3.patch, MAPREDUCE-6096-v4.patch, MAPREDUCE-6096-v5.patch, MAPREDUCE-6096-v6.patch, MAPREDUCE-6096-v7.patch, MAPREDUCE-6096-v8.patch, MAPREDUCE-6096.patch, job_1410427642147_0124-1411726671220-hadp-word+count-1411726696863-1-1-SUCCEEDED-default.jhist When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's map-reduce-client-core project org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like job_1408862281971_489761-1410883171851_XXX.jhist) and it throw an Exception Just Like Exception in thread pool-1-thread-1 java.lang.NullPointerException at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626) at com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is NULL , So I change the order of attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) and it works well . So I wonder If we can change all attempt.getTaskStatus() after TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5917) Be able to retrieve configuration keys by index
[ https://issues.apache.org/jira/browse/MAPREDUCE-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5917: Labels: BB2015-05-TBR (was: ) Be able to retrieve configuration keys by index --- Key: MAPREDUCE-5917 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5917 Project: Hadoop Map/Reduce Issue Type: New Feature Components: pipes Reporter: Joe Mudd Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5917.patch The pipes C++ side does not have a configuration key/value pair iterator. It is useful to be able to iterate through all of the configuration keys without having to expose a C++ map iterator since that is specific to the JobConf internals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3097) archive does not archive if the content specified is a file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3097: Labels: BB2015-05-TBR (was: ) archive does not archive if the content specified is a file --- Key: MAPREDUCE-3097 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3097 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.203.0, 0.20.205.0 Reporter: Arpit Gupta Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-3097.patch archive command only archives directories. when the content specified is a file it proceeds with the archive job but does not archive the content this can be misleading as the user might think that archive was successful. We should change it to either throw an error or make it archive files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5608) Replace and deprecate mapred.tasktracker.indexcache.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5608: Labels: BB2015-05-TBR configuration newbie (was: configuration newbie) Replace and deprecate mapred.tasktracker.indexcache.mb -- Key: MAPREDUCE-5608 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5608 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Akira AJISAKA Labels: BB2015-05-TBR, configuration, newbie Attachments: MAPREDUCE-5608-002.patch, MAPREDUCE-5608.patch In MR2 mapred.tasktracker.indexcache.mb still works for configuring the size of the shuffle service index cache. As the tasktracker no longer exists, we should replace this with something like mapreduce.shuffle.indexcache.mb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5854) Move the search box in UI from the right side to the left side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5854: Labels: BB2015-05-TBR (was: ) Move the search box in UI from the right side to the left side -- Key: MAPREDUCE-5854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5854 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.9 Reporter: Jinhui Liu Labels: BB2015-05-TBR Attachments: MAPREDUCE-5854.patch, MAPREDUCE-5854.patch In the UI for resoure manager, job history, and job configuration (this might not be a complete list), there is a search box at the top-right corner of the listed content. This search box is frequently used but it is usually not visible due to right-alignment. Extra scroll is needed to make it visable and it is not convenient. It would be good to move it to the left-side, next to the Show ... Entries drop-down box. In the same spirit, the First|Preious|...|Next|Last at the bottom-right corner of the listed content can also be moved to the left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3202) Integrating Hadoop Vaidya with Job History UI in Hadoop 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3202: Labels: BB2015-05-TBR (was: ) Integrating Hadoop Vaidya with Job History UI in Hadoop 2.0 Key: MAPREDUCE-3202 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3202 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobhistoryserver Affects Versions: 2.0.0-alpha Reporter: vitthal (Suhas) Gogate Assignee: vitthal (Suhas) Gogate Labels: BB2015-05-TBR Attachments: MAPREDUCE-3202.patch, MAPREDUCE-3202.patch Hadoop Vaidya provides a detailed analysis of the M/R job in terms of various execution inefficiencies and the associated remedies that user can easily understand and fix. This Jira patch integrates it with Job History UI under Hadoop 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4594: Labels: BB2015-05-TBR (was: ) Add init/shutdown methods to mapreduce Partitioner -- Key: MAPREDUCE-4594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Radim Kolar Assignee: Radim Kolar Labels: BB2015-05-TBR Attachments: partitioner1.txt, partitioner2.txt, partitioner2.txt, partitioner3.txt, partitioner4.txt, partitioner5.txt, partitioner6.txt, partitioner6.txt, partitioner7.txt, partitioner8.txt, partitioner9.txt The Partitioner supports only the Configurable API, which can be used for basic init in setConf(). Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer. Use case is that I need to start and stop spring context and datagrid client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5860) Hadoop pipes Combiner is closed before all of its reduce calls
[ https://issues.apache.org/jira/browse/MAPREDUCE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5860: Labels: BB2015-05-TBR (was: ) Hadoop pipes Combiner is closed before all of its reduce calls -- Key: MAPREDUCE-5860 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5860 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 0.23.0 Environment: 0.23.0 on 64 bit linux Reporter: Joe Mudd Labels: BB2015-05-TBR Attachments: MAPREDUCE-5860.patch When a Combiner is specified to runTask() its reduce() method may be called after its close() method has been called due to how the Combiner's containing object, CombineRunner, is closed after the TaskContextImpl's reducer member is closed (see TaskContextImpl::closeAll()). I believe the fix is to delegate the Combiner's ownership to CombineRunner, making it responsible for calling the Combiner's close() method and deleting the Combiner instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6117: Labels: BB2015-05-TBR (was: ) Hadoop ignores yarn.nodemanager.hostname for RPC listeners -- Key: MAPREDUCE-6117 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, task Affects Versions: 2.2.1, 2.4.1, 2.5.1 Environment: Any mapreduce example with standard cluster. In our case each node has four networks. It is important that all internode communication be done on a specific network. Reporter: Waldyn Benbenek Assignee: Waldyn Benbenek Labels: BB2015-05-TBR Fix For: 2.5.1 Attachments: MapReduce-534.patch Original Estimate: 48h Time Spent: 384h Remaining Estimate: 0h The RPC listeners for an application are using the hostname of the node as the binding address of the listener, They ignore yarn.nodemanager.hostname for this. In our setup we want all communication between nodes to be done via the network addresses we specify in yarn.nodemanager.hostname on each node. TaskAttemptListenerImpl.java and MRClientService.java are two places I have found where the default address is used rather that NM_host. The node Manager hostname should be used for all communication between nodes including the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6204: Labels: BB2015-05-TBR (was: ) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS --- Key: MAPREDUCE-6204 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: sam liu Assignee: sam liu Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)