[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103096#comment-14103096 ] Hudson commented on MAPREDUCE-6012: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/]) MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102652#comment-14102652 ] Hudson commented on MAPREDUCE-6012: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/]) MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102403#comment-14102403 ] Hudson commented on MAPREDUCE-6012: --- FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/651/]) MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101588#comment-14101588 ] Hudson commented on MAPREDUCE-6012: --- FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6086/]) MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101167#comment-14101167 ] Ray Chiang commented on MAPREDUCE-6012: --- Thanks Wei. Glad to see this fixed. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092202#comment-14092202 ] Karthik Kambatla commented on MAPREDUCE-6012: - +1 Spoke to Wei offline to understand the issue better, and his fix makes sense to me. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078756#comment-14078756 ] Hadoop QA commented on MAPREDUCE-6012: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console This message is automatically generated. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078711#comment-14078711 ] Hadoop QA commented on MAPREDUCE-6012: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console This message is automatically generated. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078706#comment-14078706 ] Hadoop QA commented on MAPREDUCE-6012: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.lib.db.TestDbClasses {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console This message is automatically generated. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078702#comment-14078702 ] zhihai xu commented on MAPREDUCE-6012: -- [~ywskycn] 's patch looks good to me. His patch used getEnd() instead of getStart() + getLength(); in the SQL Query, which simplified the old code. > DBInputSplit creates invalid ranges on Oracle > - > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1, 2.4.1 >Reporter: Julien Serdaru >Assignee: Wei Yan > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)