[jira] [Updated] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-6234: Summary: TestHighRamJob fails due to the change in MAPREDUCE-5785 (was: MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml) TestHighRamJob fails due to the change in MAPREDUCE-5785 Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, MAPREDUCE-6234.003.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-6234: Attachment: MAPREDUCE-6234.003.patch 003 fixes test failure without changing the value of DEFAULT_*_MEMORY_MB. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, MAPREDUCE-6234.003.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313469#comment-14313469 ] Hadoop QA commented on MAPREDUCE-6174: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697564/MAPREDUCE-6174.v1.txt against trunk revision af08425. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//console This message is automatically generated. Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput. --- Key: MAPREDUCE-6174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: MAPREDUCE-6174.v1.txt Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing similar things with regards to IFile streams. In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are different from 3rd-party implementations, this JIRA will make them subclass a common class (see https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
Jing Zhao created MAPREDUCE-6248: Summary: Persist DistCp job id in the staging directory Key: MAPREDUCE-6248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Currently the DistCp is acting as a tool and the corresponding MapReduce Job is created and used inside of its {{execute}} method. It is thus difficult for external services to query its progress and counters. It may be helpful to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;;)) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info
[ https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312642#comment-14312642 ] Tsuyoshi OZAWA commented on MAPREDUCE-6244: --- Cancelling for the previous comment. Hadoop examples when run without an argument, gives ERROR instead of just usage info Key: MAPREDUCE-6244 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0, trunk-win, 2.6.0 Reporter: Robert Justice Assignee: Abhishek Kapoor Priority: Minor Attachments: HADOOP-8834.patch, HADOOP-8834.patch Hadoop sort example should not give an ERROR and only should display usage when run with no parameters. {code} $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort ERROR: Wrong number of parameters: 0 instead of 2. sort [-m maps] [-r reduces] [-inFormat input format class] [-outFormat output format class] [-outKey output key class] [-outValue output value class] [-totalOrder pcnt num samples max splits] input output Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Attachment: MAPREDUCE-6246.patch DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: DB2, mapreduce Fix For: 2.4.1 Attachments: MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) was: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: DB2, mapreduce Fix For: 2.4.1 Attachments: MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312791#comment-14312791 ] Akira AJISAKA commented on MAPREDUCE-6223: -- Thanks [~varun_saxena] for updating the patch. +1 pending [~kasha]'s review. The findbugs warnings look unrelated to the patch. TestJobConf#testNegativeValueForTaskVmem failures - Key: MAPREDUCE-6223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Gera Shegalov Assignee: Varun Saxena Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, MAPREDUCE-6223.003.patch {code} Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec FAILURE! - in org.apache.hadoop.conf.TestJobConf testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time elapsed: 0.089 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
ramtin created MAPREDUCE-6246: - Summary: DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312583#comment-14312583 ] Kannan Rajah commented on MAPREDUCE-6237: - [~ozawa] Is the patch alright? Anything else I need to do to get this committed? DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5381) Support graceful decommission of tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-5381. Resolution: Won't Fix Hardly any development is happening in 1.x now. I am closing this in favor of YARN's YARN-914. Please reopen if need be. Support graceful decommission of tasktracker Key: MAPREDUCE-5381 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5381 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 1.2.0 Reporter: Luke Lu Assignee: Binglin Chang Attachments: MAPREDUCE-5381-graceful-decomm.v1.patch When TTs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running jobs. Currently if a TT is decommissioned, all running tasks on the TT need to be rescheduled on other TTs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a tasktracker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;;)) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. was: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;;)) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Fix Version/s: 2.6.1 Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Resolution: Fixed Status: Resolved (was: Patch Available) Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO command without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312636#comment-14312636 ] Tsuyoshi OZAWA commented on MAPREDUCE-6237: --- Committed this to trunk, branch-2, and branch-2.6. Thanks [~rkannan82] for your contribution and thanks [~jira.shegalov] for your review. [~rkannan82], BTW, do you mind creating following JIRA to use thread pool based on Gera's suggestion? Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info
[ https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6244: -- Status: Open (was: Patch Available) Hadoop examples when run without an argument, gives ERROR instead of just usage info Key: MAPREDUCE-6244 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0, 0.23.0, trunk-win Reporter: Robert Justice Assignee: Abhishek Kapoor Priority: Minor Attachments: HADOOP-8834.patch, HADOOP-8834.patch Hadoop sort example should not give an ERROR and only should display usage when run with no parameters. {code} $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort ERROR: Wrong number of parameters: 0 instead of 2. sort [-m maps] [-r reduces] [-inFormat input format class] [-outFormat output format class] [-outKey output key class] [-outValue output value class] [-totalOrder pcnt num samples max splits] input output Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312590#comment-14312590 ] Tsuyoshi OZAWA commented on MAPREDUCE-6237: --- +1, findbugs look not related to your patch. I'll commit this to branch-2 and trunk shortly. DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Affects Version/s: 2.6.0 Hadoop Flags: Reviewed DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-6237: -- Summary: Multiple mappers with DBInputFormat don't work because of reusing conections (was: DBRecordReader is not thread safe) Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313278#comment-14313278 ] Hadoop QA commented on MAPREDUCE-6246: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697540/MAPREDUCE-6246.patch against trunk revision af08425. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//console This message is automatically generated. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: DB2, mapreduce Fix For: 2.4.1 Attachments: MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312817#comment-14312817 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Yarn-trunk #833 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/833/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated MAPREDUCE-6174: -- Status: Patch Available (was: Open) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput. --- Key: MAPREDUCE-6174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0, 3.0.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: MAPREDUCE-6174.v1.txt Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing similar things with regards to IFile streams. In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are different from 3rd-party implementations, this JIRA will make them subclass a common class (see https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated MAPREDUCE-6174: -- Attachment: MAPREDUCE-6174.v1.txt [~jira.shegalov], I have uploaded a patch for this issue. Would you please have a look? Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput. --- Key: MAPREDUCE-6174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: MAPREDUCE-6174.v1.txt Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing similar things with regards to IFile streams. In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are different from 3rd-party implementations, this JIRA will make them subclass a common class (see https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312122#comment-14312122 ] kumar ranganathan commented on MAPREDUCE-5903: -- I am also facing the same exception when enabling LDAP for windows active directory in hadoop-2.6.0. If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase Key: MAPREDUCE-5903 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.4.0 Environment: hadoop: 2.4.0.2.1.2.0 Reporter: Victor Kim Priority: Critical Labels: shuffle I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos principal. Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos principal on all boxes). Result: job successfully completed. Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result: Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). The use case with user impersonation used to work on earlier versions, without YARN (with JTTT). I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved, which is not the case when Kerberos Authentication is enabled. The exception trace from YarnChild JVM: 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed with too many fetch failures and insufficient progress! 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-207: --- Status: Open (was: Patch Available) Cancelling patch, as it no longer applies. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312951#comment-14312951 ] Hudson commented on MAPREDUCE-6237: --- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #99 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/99/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4413: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 committed to trunk. Thanks! MR lib dir contains jdiff (which is gpl) Key: MAPREDUCE-4413 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Nemon Lou Priority: Critical Fix For: 3.0.0 Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch A tarball built from trunk contains the following: ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar jdiff is gplv2, we need to exclude it from the build artifact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313047#comment-14313047 ] Hudson commented on MAPREDUCE-6237: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313044#comment-14313044 ] Hudson commented on MAPREDUCE-4413: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/]) MAPREDUCE-4413. MR lib dir contains jdiff (which is gpl) (Nemon Lou via aw) (aw: rev aab459c904bf2007c5b230af8c058793935faf89) * hadoop-mapreduce-project/CHANGES.txt * hadoop-assemblies/src/main/resources/assemblies/hadoop-mapreduce-dist.xml MR lib dir contains jdiff (which is gpl) Key: MAPREDUCE-4413 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Nemon Lou Priority: Critical Fix For: 3.0.0 Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch A tarball built from trunk contains the following: ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar jdiff is gplv2, we need to exclude it from the build artifact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Fix Version/s: 2.4.1 Labels: DB2 mapreduce (was: ) Status: Patch Available (was: Open) I changed the current DBOutputFormat class by checking the product name from connection object to see if it is DB2 then generates INSERT INTO query without semicolon(;). This technique is already used in DBInputFormat class for generating different SELECT statements for Oracle and MySQL databases. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: mapreduce, DB2 Fix For: 2.4.1 Original Estimate: 24h Remaining Estimate: 24h In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: DB2, mapreduce Fix For: 2.4.1 Attachments: MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramtin updated MAPREDUCE-6246: -- Description: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. was: DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: DB2, mapreduce Fix For: 2.4.1 Attachments: MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6247) Use DBCP connection pooling in DBInputFormat
Kannan Rajah created MAPREDUCE-6247: --- Summary: Use DBCP connection pooling in DBInputFormat Key: MAPREDUCE-6247 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6247 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0, 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Priority: Minor As part of MAPREDUCE-6237, we removed caching of DB connection. [~jira.shegalov] and [~ozawa] suggested that we use DBCP connection pooling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated MAPREDUCE-6242: Status: Patch Available (was: Open) Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena Attachments: MAPREDUCE-6242.001.patch We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313167#comment-14313167 ] Karthik Kambatla commented on MAPREDUCE-6223: - Patch looks mostly good to me. Nit: I would leave the test for negative values, but update the asserts to reflect the expected behavior. TestJobConf#testNegativeValueForTaskVmem failures - Key: MAPREDUCE-6223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Gera Shegalov Assignee: Varun Saxena Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, MAPREDUCE-6223.003.patch {code} Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec FAILURE! - in org.apache.hadoop.conf.TestJobConf testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time elapsed: 0.089 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313096#comment-14313096 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/100/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated MAPREDUCE-6242: Status: Open (was: Patch Available) Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena Attachments: MAPREDUCE-6242.001.patch We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313139#comment-14313139 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2031 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2031/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313146#comment-14313146 ] Hudson commented on MAPREDUCE-6237: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2050 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2050/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313148#comment-14313148 ] Karthik Kambatla commented on MAPREDUCE-6234: - Thanks for working on this, folks. As you might see in the description of the config, it is kind of hard to pick a single value for DEFAULT_MAP_MEMORY_MB, and the most appropriate value seemed 1024 since we fallback to that value. I like Gera's proposal of adding a helper method to get the default value; however, I wonder if that would just translate to calling {{JobConf#getMemoryRequired}} on the default conf. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313414#comment-14313414 ] Jing Zhao commented on MAPREDUCE-6248: -- yes, actually that will be even better! I will upload a patch for this later. Persist DistCp job id in the staging directory -- Key: MAPREDUCE-6248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Currently the DistCp is acting as a tool and the corresponding MapReduce Job is created and used inside of its {{execute}} method. It is thus difficult for external services to query its progress and counters. It may be helpful to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313411#comment-14313411 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-6248: Why not have a public API in DistCp and use that programmatically instead of persisting IDs into files and then reading them? Persist DistCp job id in the staging directory -- Key: MAPREDUCE-6248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Currently the DistCp is acting as a tool and the corresponding MapReduce Job is created and used inside of its {{execute}} method. It is thus difficult for external services to query its progress and counters. It may be helpful to persist the job id into a file inside its staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives
Liu Xiao created MAPREDUCE-6249: --- Summary: Streaming task will not untar tgz uploaded with -archives Key: MAPREDUCE-6249 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 2.5.2 Environment: hadoop-2.5.2 hadoop-streaming-2.5.2.jar Reporter: Liu Xiao when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck. Here is the hadoop streaming task starting command with hadoop-2.5.2 hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \ -files mapper.sh -archives /home/hadoop/tmp/test.tgz#test \ -D mapreduce.job.maps=1 \ -D mapreduce.job.reduces=1 \ -input /test/test.txt \ -output /res/ \ -mapper sh mapper.sh \ -reducer cat and mapper.sh cat /dev/null ls -l test exit 0 in test.tgz there is two files test.1.txt and test.2.txt echo abcd test.1.txt echo efgh test.2.txt tar zcvf test.tgz test.1.txt test.2.txt the output from above task lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test - /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz but what desired may be like this -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the tgz being untarred -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312745#comment-14312745 ] Kannan Rajah commented on MAPREDUCE-6237: - Created MAPREDUCE-6247 to track connection pooling. Multiple mappers with DBInputFormat don't work because of reusing conections Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.6.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Fix For: 2.6.1 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313534#comment-14313534 ] Hadoop QA commented on MAPREDUCE-6234: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697632/MAPREDUCE-6234.003.patch against trunk revision 260b5e3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-gridmix. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//console This message is automatically generated. TestHighRamJob fails due to the change in MAPREDUCE-5785 Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, MAPREDUCE-6234.003.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)