[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014169#comment-15014169
 ] 

Lars Hofhansl commented on HBASE-14791:
---

Yeah! :)

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-19 Thread Alex Araujo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014141#comment-15014141
 ] 

Alex Araujo commented on HBASE-14791:
-

[~apurtell], [~vik.karma] ran a CopyTable with this patch and the job finished 
significantly faster: 7.5 minutes vs 8.5 hours without the patch.

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-18 Thread Alex Araujo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011123#comment-15011123
 ] 

Alex Araujo commented on HBASE-14791:
-

Those build failures look unrelated. HBase-0.98-on-Hadoop-1.1 has been failing 
with a compilation error since 
#[1132|https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1132/]:

{noformat}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/hbase-common/src/main/java/org/apache/hadoop/hbase/security/UserProvider.java:[70,49]
 error: cannot find symbol
{noformat}

HBase-0.98-matrix appears to have unrelated test failures. Most of the failures 
have been around since 
#[258|https://builds.apache.org/job/HBase-0.98-matrix/258/]. These are new, but 
unrelated:

*jdk=latest1.7,label=Hadoop*

java.lang.ExceptionInInitializerError: null
at 
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73)

hbase-default.xml file seems to be for and old version of HBase (0.98.16), this 
version is 0.98.17-SNAPSHOT

java.lang.ExceptionInInitializerError: null
at 
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73)
at 
org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105)
at 
org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:120)
at 
org.apache.hadoop.hbase.HBaseCommonTestingUtility.(HBaseCommonTestingUtility.java:46)
at 
org.apache.hadoop.hbase.TestClassFinder.(TestClassFinder.java:57)

Could not initialize class org.apache.hadoop.hbase.TestClassFinder
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hadoop.hbase.TestClassFinder

Could not initialize class 
org.apache.hadoop.hbase.util.TestCoprocessorClassLoader
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hadoop.hbase.util.TestCoprocessorClassLoader

Could not initialize class org.apache.hadoop.hbase.util.TestDynamicClassLoader
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hadoop.hbase.util.TestDynamicClassLoader

*jdk=latest1.6,label=Hadoop*

java.lang.NoSuchFieldError: in
at 
org.apache.hadoop.hbase.codec.CellCodecWithTags$CellDecoder.parseCell(CellCodecWithTags.java:86)
at 
org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:67)
at 
org.apache.hadoop.hbase.codec.TestCellCodecWithTags.testCellWithTag(TestCellCodecWithTags.java:76)

java.lang.NoSuchFieldError: in
at 
org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:70)
at 
org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:67)
at 
org.apache.hadoop.hbase.codec.TestKeyValueCodec.testOne(TestKeyValueCodec.java:80)



> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010921#comment-15010921
 ] 

Hudson commented on HBASE-14791:


FAILURE: Integrated in HBase-0.98-matrix #261 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/261/])
HBASE-14791 Batch Deletes in MapReduce jobs (larsh: rev 
d25d74e121901d4d9705ae2c94256d947ad0a708)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/BufferedHTable.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestBufferedHTable.java


> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010881#comment-15010881
 ] 

Hudson commented on HBASE-14791:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1134 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1134/])
HBASE-14791 Batch Deletes in MapReduce jobs (larsh: rev 
d25d74e121901d4d9705ae2c94256d947ad0a708)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestBufferedHTable.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/BufferedHTable.java


> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Alex Araujo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010034#comment-15010034
 ] 

Alex Araujo commented on HBASE-14791:
-

Thanks for reviewing and committing!

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009998#comment-15009998
 ] 

Andrew Purtell commented on HBASE-14791:


Don't worry about it. The javadoc check for 0.98 is incorrect/untuned. 

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009987#comment-15009987
 ] 

Lars Hofhansl commented on HBASE-14791:
---

I do not see any java doc warnings pertaining to the files changed in this 
patch. Is that a dud?

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009962#comment-15009962
 ] 

Hadoop QA commented on HBASE-14791:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772837/HBASE-14791-0.98.patch
  against 0.98 branch at commit dadfe7da0484be81ae09ad61f976967b9893c38d.
  ATTACHMENT ID: 12772837

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
28 warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16556//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16556//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16556//console

This message is automatically generated.

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009826#comment-15009826
 ] 

Lars Hofhansl commented on HBASE-14791:
---

+1, thanks [~alexaraujo]

Going to commit now.

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Alex Araujo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009765#comment-15009765
 ] 

Alex Araujo commented on HBASE-14791:
-

Based on [~vik.karma]'s perf analysis, this should fix the reported issue. 
We'll deploy the patch and verify.

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009606#comment-15009606
 ] 

Andrew Purtell commented on HBASE-14791:


Test results look fine. I'll commit this tomorrow unless objection or 
[~lhofhansl] has additional comment

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009579#comment-15009579
 ] 

Andrew Purtell commented on HBASE-14791:


I think HadoopQA may be MIA at the moment. I'll apply this patch now and run 
the mapreduce tests locally. I assume you've determined the patch addresses the 
reported perf issues satisfactorily [~alexaraujo]? 

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)