[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014169#comment-15014169 ] Lars Hofhansl commented on HBASE-14791: --- Yeah! :) > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014141#comment-15014141 ] Alex Araujo commented on HBASE-14791: - [~apurtell], [~vik.karma] ran a CopyTable with this patch and the job finished significantly faster: 7.5 minutes vs 8.5 hours without the patch. > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011123#comment-15011123 ] Alex Araujo commented on HBASE-14791: - Those build failures look unrelated. HBase-0.98-on-Hadoop-1.1 has been failing with a compilation error since #[1132|https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1132/]: {noformat} [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/hbase-common/src/main/java/org/apache/hadoop/hbase/security/UserProvider.java:[70,49] error: cannot find symbol {noformat} HBase-0.98-matrix appears to have unrelated test failures. Most of the failures have been around since #[258|https://builds.apache.org/job/HBase-0.98-matrix/258/]. These are new, but unrelated: *jdk=latest1.7,label=Hadoop* java.lang.ExceptionInInitializerError: null at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) hbase-default.xml file seems to be for and old version of HBase (0.98.16), this version is 0.98.17-SNAPSHOT java.lang.ExceptionInInitializerError: null at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:120) at org.apache.hadoop.hbase.HBaseCommonTestingUtility.(HBaseCommonTestingUtility.java:46) at org.apache.hadoop.hbase.TestClassFinder.(TestClassFinder.java:57) Could not initialize class org.apache.hadoop.hbase.TestClassFinder java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.TestClassFinder Could not initialize class org.apache.hadoop.hbase.util.TestCoprocessorClassLoader java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.util.TestCoprocessorClassLoader Could not initialize class org.apache.hadoop.hbase.util.TestDynamicClassLoader java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.util.TestDynamicClassLoader *jdk=latest1.6,label=Hadoop* java.lang.NoSuchFieldError: in at org.apache.hadoop.hbase.codec.CellCodecWithTags$CellDecoder.parseCell(CellCodecWithTags.java:86) at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:67) at org.apache.hadoop.hbase.codec.TestCellCodecWithTags.testCellWithTag(TestCellCodecWithTags.java:76) java.lang.NoSuchFieldError: in at org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:70) at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:67) at org.apache.hadoop.hbase.codec.TestKeyValueCodec.testOne(TestKeyValueCodec.java:80) > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010921#comment-15010921 ] Hudson commented on HBASE-14791: FAILURE: Integrated in HBase-0.98-matrix #261 (See [https://builds.apache.org/job/HBase-0.98-matrix/261/]) HBASE-14791 Batch Deletes in MapReduce jobs (larsh: rev d25d74e121901d4d9705ae2c94256d947ad0a708) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/BufferedHTable.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestBufferedHTable.java > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010881#comment-15010881 ] Hudson commented on HBASE-14791: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1134 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1134/]) HBASE-14791 Batch Deletes in MapReduce jobs (larsh: rev d25d74e121901d4d9705ae2c94256d947ad0a708) * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestBufferedHTable.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/BufferedHTable.java > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010034#comment-15010034 ] Alex Araujo commented on HBASE-14791: - Thanks for reviewing and committing! > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009998#comment-15009998 ] Andrew Purtell commented on HBASE-14791: Don't worry about it. The javadoc check for 0.98 is incorrect/untuned. > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009987#comment-15009987 ] Lars Hofhansl commented on HBASE-14791: --- I do not see any java doc warnings pertaining to the files changed in this patch. Is that a dud? > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009962#comment-15009962 ] Hadoop QA commented on HBASE-14791: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12772837/HBASE-14791-0.98.patch against 0.98 branch at commit dadfe7da0484be81ae09ad61f976967b9893c38d. ATTACHMENT ID: 12772837 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 28 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16556//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16556//console This message is automatically generated. > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009826#comment-15009826 ] Lars Hofhansl commented on HBASE-14791: --- +1, thanks [~alexaraujo] Going to commit now. > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009765#comment-15009765 ] Alex Araujo commented on HBASE-14791: - Based on [~vik.karma]'s perf analysis, this should fix the reported issue. We'll deploy the patch and verify. > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009606#comment-15009606 ] Andrew Purtell commented on HBASE-14791: Test results look fine. I'll commit this tomorrow unless objection or [~lhofhansl] has additional comment > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)
[ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009579#comment-15009579 ] Andrew Purtell commented on HBASE-14791: I think HadoopQA may be MIA at the moment. I'll apply this patch now and run the mapreduce tests locally. I assume you've determined the patch addresses the reported perf issues satisfactorily [~alexaraujo]? > Batch Deletes in MapReduce jobs (0.98) > -- > > Key: HBASE-14791 > URL: https://issues.apache.org/jira/browse/HBASE-14791 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.16 >Reporter: Lars Hofhansl >Assignee: Alex Araujo > Labels: mapreduce > Fix For: 0.98.17 > > Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, > HBASE-14791-0.98.patch > > > We found that some of our copy table job run for many hours, even when there > isn't that much data to copy. > [~vik.karma] did his magic and found that the issue is with copying delete > markers (we use raw mode to also move deletes across). > Looking at the code in 0.98 it's immediately obvious that deletes (unlike > puts) are not batched and hence sent to the other side one by one, causing a > network RTT for each delete marker. > Looks like in trunk it's doing the right thing (using BufferedMutators for > all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, > 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)