subject:"\[jira\] \[Commented\] \(MAPREDUCE\-6012\) DBInputSplit creates invalid ranges on Oracle"

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103096#comment-14103096
 ] 

Hudson commented on MAPREDUCE-6012:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/])
MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java


> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Fix For: 1.3.0, 2.6.0
>
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102652#comment-14102652
 ] 

Hudson commented on MAPREDUCE-6012:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/])
MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java


> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Fix For: 1.3.0, 2.6.0
>
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102403#comment-14102403
 ] 

Hudson commented on MAPREDUCE-6012:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/651/])
MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java


> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Fix For: 1.3.0, 2.6.0
>
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101588#comment-14101588
 ] 

Hudson commented on MAPREDUCE-6012:
---

FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6086/])
MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java


> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Fix For: 1.3.0, 2.6.0
>
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-18 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101167#comment-14101167
 ] 

Ray Chiang commented on MAPREDUCE-6012:
---

Thanks Wei.  Glad to see this fixed.

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Fix For: 1.3.0, 2.6.0
>
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-08-10 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092202#comment-14092202
 ] 

Karthik Kambatla commented on MAPREDUCE-6012:
-

+1

Spoke to Wei offline to understand the issue better, and his fix makes sense to 
me. 

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-07-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078756#comment-14078756
 ] 

Hadoop QA commented on MAPREDUCE-6012:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console

This message is automatically generated.

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, 
> MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-07-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078711#comment-14078711
 ] 

Hadoop QA commented on MAPREDUCE-6012:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console

This message is automatically generated.

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-07-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078706#comment-14078706
 ] 

Hadoop QA commented on MAPREDUCE-6012:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.mapreduce.lib.db.TestDbClasses

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console

This message is automatically generated.

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

2014-07-29 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078702#comment-14078702
 ] 

zhihai xu commented on MAPREDUCE-6012:
--

[~ywskycn] 's patch looks good to me. His patch used getEnd() instead of 
getStart() + getLength(); in the SQL Query, which simplified the old code.

> DBInputSplit creates invalid ranges on Oracle
> -
>
> Key: MAPREDUCE-6012
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.4.1
>Reporter: Julien Serdaru
>Assignee: Wei Yan
> Attachments: HADOOP-9530.patch, MAPREDUCE-6012-branch-1.patch
>
>
> The DBInputFormat on Oracle does not create valid ranges.
> The method getSplit line 263 is as follows:
>   split = new DBInputSplit(i * chunkSize, (i * chunkSize) + 
> chunkSize);
> So the first split will have a start value of 0 (0*chunkSize).
> However, the OracleDBRecordReader, line 84 is as follows:
>   if (split.getLength() > 0 && split.getStart() > 0){
> Since the start value of the first range is equal to 0, we will skip the 
> block that partitions the input set. As a result, one of the map task will 
> process the entire data set, rather than the partition.
> I'm assuming the fix is trivial and would involve removing the second check 
> in the if block.
> Also, I believe the OracleDBRecordReader paging query is incorrect.
> Line 92 should read:
>   query.append(" ) WHERE dbif_rno > ").append(split.getStart());
> instead of (note > instead of >=)
>   query.append(" ) WHERE dbif_rno >= ").append(split.getStart());
> Otherwise some rows will be ignored and some counted more than once.
> A map/reduce job that counts the number of rows based on a predicate will 
> highlight the incorrect behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

[jira] [Commented] (MAPREDUCE-6012) DBInputSplit creates invalid ranges on Oracle

10 matches

Site Navigation

Mail list logo

Footer information