subject:"\[jira\] \[Commented\] \(HBASE\-13071\) Hbase Streaming Scan Feature"

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553312#comment-14553312
 ] 

Hudson commented on HBASE-13071:


FAILURE: Integrated in HBase-TRUNK #6499 (See 
[https://builds.apache.org/job/HBase-TRUNK/6499/])
HBASE-13071 synchronous scanner -- cache size-in-bytes bug fix (stack: rev 
7f2b33dbbf90474a8f73e4d38ea8f6817ee3dcdb)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientAsyncPrefetchScanner.java


 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071-0_98.patch, 
 HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548148#comment-14548148
]

stack commented on HBASE-13071:
---

[~eshcar] Thanks for finding issue. Please open new issue. This one is dense
enough already. Thank you (FYI, you do not need to clean up old patches --
thanks).

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
Fix For: 2.0.0

Attachments: 99.eshcar.png, HBASE-13071-0_98.patch,
HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt,
gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png,
hits.png, latency.delay.png, latency.png, network.png

A scan operation iterates over all rows of a table or a subrange of the
table. The synchronous nature in which the data is served at the client side
hinders the speed the application traverses the data: it increases the
overall processing time, and may cause a great variance in the times the
application waits for the next piece of data.
The scanner next() method at the client side invokes an RPC to the
regionserver and then stores the results in a cache. The application can
specify how many rows will be transmitted per RPC; by default this is set to
100 rows.
The cache can be considered as a producer-consumer queue, where the hbase
client pushes the data to the queue and the application consumes it.
Currently this queue is synchronous, i.e., blocking. More specifically, when
the application consumed all the data from the cache --- so the cache is
empty --- the hbase client retrieves additional data from the server and
re-fills the cache with new data. During this time the application is blocked.
Under the assumption that the application processing time can be balanced by
the time it takes to retrieve the data, an asynchronous approach can reduce
the time the application is waiting for data.
We attach a design document.
We also have a patch that is based on a private branch, and some evaluation
results of this code.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-18 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547627#comment-14547627
 ] 

Eshcar Hillel commented on HBASE-13071:
---

Hi ~stack,

Attached 2 new patches for branch-1 and 0.98.
While preparing these patches I discovered that in asynchronous scanner the 
cache byte-size variable is not updated in one of the places where polling item 
from the cache. Therefore I also attach a patch to fix this bug in trunk - it 
is a small local fix in ClientAsyncPrefetchScanner.java (this is already fixed 
in the patches for branch-1 and 0.98).

Will you be able to apply the patches?

Also do we need to open a new Jira for the refugee patch or is it ok to post it 
here?

Thanks,
Eshcar

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071-0_98.patch, 
 HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-14 Thread Edward Bortnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543715#comment-14543715
 ] 

Edward Bortnikov commented on HBASE-13071:
--

We'll be happy to get guidance as per how to contribute to refguide. 

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-14 Thread Edward Bortnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543714#comment-14543714
 ] 

Edward Bortnikov commented on HBASE-13071:
--

We'll be happy to get guidance as per how to contribute to refguide. 

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-14 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544043#comment-14544043
]

stack commented on HBASE-13071:
---

bq. We'll be happy to get guidance as per how to contribute to refguide.

Make a patch for the refguide -- it is at src/main/asciidoc/ -- in a new issue?
You'll have to figure where you think it sits best (perf, scan?) Copy/paste
your release note would make good candidate text.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
Fix For: 2.0.0

Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt,
gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png,
hits.png, latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-13 Thread Edward Bortnikov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541789#comment-14541789
]

Edward Bortnikov commented on HBASE-13071:
--

Release note attached, please advise if some different format is expected.
We are working on the blog - will complete next week, hopefully should not
preclude commit.

Thanks [~stack] for volunteering to commit. Which release will this feature
become candidate for - 1.1, 2.0, or both?

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt,
gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png,
hits.png, latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-13 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541822#comment-14541822
]

Hadoop QA commented on HBASE-13071:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12732546/Releasenote-13071.txt
against master branch at commit 220ac141bfcea7798faa5f73295ec61d8b173af9.
ATTACHMENT ID: 12732546

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+0 tests included{color}. The patch appears to be a
documentation, build,
or dev-support patch that doesn't require tests.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/14035//console

This message is automatically generated.

Hbase Streaming Scan Feature

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542396#comment-14542396
 ] 

Hudson commented on HBASE-13071:


SUCCESS: Integrated in HBase-TRUNK #6481 (See 
[https://builds.apache.org/job/HBase-TRUNK/6481/])
HBASE-13071 Hbase Streaming Scan Feature (stack: rev 
86b91997d0590fcf00634e9e90216e77da607fd2)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientSimpleScanner.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientAsyncPrefetchScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientSmallScanner.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ReversedClientScanner.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientSmallScanner.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientScanner.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientSmallReversedScanner.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableConfiguration.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java


 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-13 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542120#comment-14542120
 ] 

stack commented on HBASE-13071:
---

Hopefully this will go into refguide when HBASE-13681 gets attention.

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
 Fix For: 2.0.0

 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, 
 gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, 
 hits.png, latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-13 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542093#comment-14542093
]

stack commented on HBASE-13071:
---

Very nice release note. I took the contents and inserted them in the release
note section in this JIRA (For future, see how when you hit 'edit', and if you
scroll down, there is a 'release note' textbox). I added a sentence on the end
about more load on server and YMMV.

Is there a place in the refguide where we should shove your release note? Just
say and I will take care of it.

I committed to master, so 2.0. I tried the branch-1 patch but it failed apply.
If you update it, I'll apply it to branch-1.

Thank you for the persistence.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt,
gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png,
hits.png, latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-11 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538419#comment-14538419
 ] 

Eshcar Hillel commented on HBASE-13071:
---

2 check styles error added in this patch: (1) forgot to remove redundant import 
in ClientSimpleScanner, (2) added a line to the method loadCache() in 
ClientScanner which caused it to overflow (151 lines).

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-11 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538418#comment-14538418
 ] 

Eshcar Hillel commented on HBASE-13071:
---

2 check styles error added in this patch: (1) forgot to remove redundant import 
in ClientSimpleScanner, (2) added a line to the method loadCache() in 
ClientScanner which caused it to overflow (151 lines).

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-11 Thread Edward Bortnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538227#comment-14538227
 ] 

Edward Bortnikov commented on HBASE-13071:
--

Thanks [~stack]. 

We'll post release notes to the jira tomorrow (is this the right destination?), 
and a blog post a tad later (probably, early next week), including the perf 
results. 


 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-10 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537523#comment-14537523
]

stack commented on HBASE-13071:
---

+1 on this last patch. At a minimum, its an nice illustration of what is
possible. I'll commit in a day or so. Anyone else want to have a look?

A few questions [~eshcar].

Do the changes in table-scoped configuration -- the changes in
TableConfiguration -- make sense? Having Scan defaults -- a client-side op --
in the Configuration seems a little overbroad. I seem no harm done since it off
by default.

Is the checkstyle error from your report? No harm, I can check on commit so
don't worry about it.

I suggest you write up a fat release note. Release note is probably how folks
will learn of this feature (unless you do a blog post or something -- which
might make sense since you have those nice perf findings -- have you redone
them for this patch that is now size-base?). If you have done th size-base
perf analysis, suggest you link to that in the release notes too.

Nice work.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537161#comment-14537161
 ] 

Hadoop QA commented on HBASE-13071:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12731786/HBASE-13071_trunk_rebase_2.0.patch
  against master branch at commit 5a2ca43fa16a95d8db67e5a3d8b48e4d3f3a9aeb.
  ATTACHMENT ID: 12731786

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1898 checkstyle errors (more than the master's current 1896 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13996//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13996//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13996//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13996//console

This message is automatically generated.

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-10 Thread Edward Bortnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537322#comment-14537322
 ] 

Edward Bortnikov commented on HBASE-13071:
--

Community - please review the last patch (covers all the previous requests).

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-06 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531326#comment-14531326
 ] 

Eshcar Hillel commented on HBASE-13071:
---

Aligning with the size-in-bytes basis for scan requests -
here is a snippet of the code to set the cache capacity and to determine 
whether or not to invoke prefetch when next() is called in 
ClientAsyncPrefetchScanner

{code}
  // double buffer - double cache size
  private int calcCacheCapacity() {
int capacity = Integer.MAX_VALUE;
if(caching = 0  caching  (Integer.MAX_VALUE /2)) {
  capacity = caching * 2 + 1;
}
if(capacity == Integer.MAX_VALUE){
  capacity = (int) (maxScannerResultSize / ESTIMATED_SINGLE_RESULT_SIZE);
}
return capacity;
  }

  private boolean prefetchCondition() {
return
(getCacheCount()  getCountThreshold()) 
(getCacheSizeInBytes()  getSizeThreshold()) ;
  }

  private int getCountThreshold() {
return cacheCapacity / 2 ;
  }

  private long getSizeThreshold() {
return maxScannerResultSize / 2 ;
  }
{code}

where cacheSizeInBytes is an AtomicInteger that is updated whenever the cache 
is (increased when adding results to cache, decreased when removing them).

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf, 
 HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-03 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526155#comment-14526155
]

stack commented on HBASE-13071:
---

bq. 1. No problem having a per-scan parameter. The assumption is that scans
should be big in order for the feature to be efficient.

Good. Can say in javadoc that scan needs to be big to get the benefit.

bq. 2. No problem moving to the size-in-bytes parameter. The API should be
identical for synchronous and asynchronous clients.

Good.

Bytes is what we do now rather than rows, since this work
https://blogs.apache.org/hbase/

bq. In the optimistic interpretation, the client would directly relay the API
parameter to the server.

What parameter and why go to the server?

Whats wrong w/ optimistic other than client carrying extra data? I'd say go for
optimistic.

I'd be fine that it 'costs' more on the server as long as tangible benefit.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf,
HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-03 Thread Edward Bortnikov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525825#comment-14525825
]

Edward Bortnikov commented on HBASE-13071:
--

1. No problem having a per-scan parameter. The assumption is that scans should
be big in order for the feature to be efficient.
2. No problem moving to the size-in-bytes parameter. The API should be
identical for synchronous and asynchronous clients.

Let's agree on the upper-bound parameter semantics (whether rows or bytes).
Should it be conservative or optimistic? In the optimistic interpretation, the
client would directly relay the API parameter to the server. A new prefetch
request is issued when 50% of the old buffer consumed, so when the new buffer
arrives the old one might not be released yet. This overlap should be short but
the bound semantics are soft (best-effort). In the conservative interpretation,
the client would adapt the API parameters, and issue requests for less data, to
prevent any overflow. For legacy scans, there was no difference because the
prefetch and computation parts did not overlap.

Which approach would be better?

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf,
HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-05-01 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523816#comment-14523816
]

stack commented on HBASE-13071:
---

bq. I'd suggest leaving the use of this feature manual rather than expecting
the system to auto-tune.

Ok. But you would have to turn it on globally for the client, right? You can't
do it on a per-scan basis. How hard to add enabling this facility on a
per-scan basis. It would make it easier to commit this feature if it was not a
choice between being globally on or off.

This patch also does sizing using (row) caching count. Caching is going to go
away as first class attribute of Scan in hbase 1.1+ as we have moved to a
size-in-bytes basis for our scan requests; size-in-bytes would make more sense
sizing the client-size cache too I'd say. Any plans for moving off the row
caching basis?

Thanks.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf,
HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-27 Thread Edward Bortnikov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513647#comment-14513647
]

Edward Bortnikov commented on HBASE-13071:
--

I'd suggest leaving the use of this feature manual rather than expecting the
system to auto-tune. It is often hard to know whether the application requires
aggressive caching at the client side. For example, consider an application
that does some tricky aggregation of the scanned data, in which the compute
part is considerable. There is no way for HBase to know that in advance. The
optimization does not come for free (up to 2x caching at the client side), so
IMHO it's up to the application to decide whether to use it.

Dear community - could you please review and vote on the last patch before it
becomes obsolete again? The JIRA is still not assigned to any committer.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf,
HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-20 Thread Eshcar Hillel (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503549#comment-14503549
]

Eshcar Hillel commented on HBASE-13071:
---

Done rebase.
Thanks to HBASE-13090 next and loadCache methods are separated so this rebase
wasn't too painful (thanks [~jonathan.lawlor]).
I also changed some new scanner tests to account for the change in scanner
cache interface (it is now a Queue).

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf,
HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503897#comment-14503897
 ] 

Hadoop QA commented on HBASE-13071:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12726649/HBASE-13071_trunk_rebase_1.0.patch
  against master branch at commit 702aea5b38ed6ad0942b0c59c3accca476b46873.
  ATTACHMENT ID: 12726649

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1902 checkstyle errors (more than the master's current 1898 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13744//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13744//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13744//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13744//console

This message is automatically generated.

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf, 
 HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-16 Thread Eshcar Hillel (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497714#comment-14497714
]

Eshcar Hillel commented on HBASE-13071:
---

ClientScanner is an abstract class that bares the code shared by the sync and
async scanner classes, like the prefetch method.
#prefetch does not replace #next, it is invoked from #next in
ClientSimpleScanner (the sync scanner) thereby preserving the same sync
behavior as before. In ClientAsyncPrefetchScanner the prefetch method is
invoked in the run method of a background thread when the buffer at the client
side is half full.
I hope this makes sense.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-16 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498201#comment-14498201
]

stack commented on HBASE-13071:
---

[~eshcar] I was talking about patch. Your patch no longer applies. Trunk has
changed (the bit that does not apply is overwrite of next by prefetch...) Sorry
I was not clear. Would you mind rebasing your patch? Thank you. Pardon my
letting it rot.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-15 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496916#comment-14496916
]

stack commented on HBASE-13071:
---

Pardon me [~eshcar] but the patch has rotted. I can't make sense of what is
supposed to be happening in ClientScanner where we remove #next and replace it
with #prefetch. Help me out. Thanks.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-14 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494748#comment-14494748
 ] 

Eshcar Hillel commented on HBASE-13071:
---

I looked into the PerformanceEvaluation tool, the code is easy to read and 
maintain.
I believe the changes that are required in the implementation of testRow() in 
ScanTest:
  * set caching to 100 (or even to DEFAULT_HBASE_CLIENT_SCANNER_CACHING) 
instead of 30
  * add timeout before calling testScanner.next() [I think you already added 
this one]
  * make sure setFilter(FilterAllFilter) is not invoked
and optionally, add a scanRange10 class to do really big scans

[~stack], do you have by any chance the results of the client latency 
distribution collected by the tool in your previous experiments?

BTW, 30 is not the default value for prefetch size. 
DEFAULT_HBASE_CLIENT_SCANNER_CACHING is set to 100 in 0.98 and to 
Integer.MAX_VALUE in master.

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, 
 gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, 
 latency.delay.png, latency.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-12 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491898#comment-14491898
]

stack commented on HBASE-13071:
---

bq. the scans should be meaty, with large prefetches (we used 100-1000
records), and the per-record processing at the client side should be
non-negligible

What would you suggest then [~ebortnik] and [~eshcar]? Defaults in hbase are
30 rows at a time, not 1000. Would it make sense if this facility could be
turned on by enabling a property on a Scan object?

bq. We are not familiar with the PerformanceEvaluation tool

Np. It is a coarse tool we've been using since early days to run loadings on
hbase. See bin/hbase pe

bq. Re/ auto-tuning, I believe this is a bit premature. Let's keep the code
simple, and let the client control. The optimization does not necessarily need
to be a default.

I suggest auto-tune so the feature is useful more often than not. Regards it
not needing to be the default, would be cool if user didn't have to go figure
an opaque option to get this benefit.

Let me try and repro the benefit seen in posted graphs.

Thanks.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-09 Thread Edward Bortnikov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487621#comment-14487621
]

Edward Bortnikov commented on HBASE-13071:
--

Chiming in ... The discussion is becoming loaded, let me summarize up to this
point, so that we can figure out what's missing. Apologies about the possible
duplication of what was said before, and might sound obvious.

The feature is 100% client-side. The metrics we've been measuring are
client-side as well. Ycsb is the workload generator; [~eshcar] provided the
source. The network and server hardware are pretty much standard. In order for
the optimization results to be observable, the scans should be meaty, with
large prefetches (we used 100-1000 records), and the per-record processing at
the client side should be non-negligible. In this context, it makes sense to
mask the network delay by prefetching in the background.

We are not familiar with the PerformanceEvaluation tool. Does it measure
server-side metrics? If so, it can definitely happen that the server side is
more congested (and consequently, a bit slower) because many clients move
faster. Still, the elimination of the stop-and-wait pattern is significant to
boost the client throughput metrics, as our results suggest. We did not measure
network congestion, but it's hard to believe that the 1G backbone gets
congested in this context.

Re/ auto-tuning, I believe this is a bit premature. Let's keep the code simple,
and let the client control. The optimization does not necessarily need to be a
default.

Thanks.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-07 Thread Eshcar Hillel (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482738#comment-14482738
]

Eshcar Hillel commented on HBASE-13071:
---

Thanks [~stack] for running this rig tests.
I believe the right way to see the benefit of this feature is to measure the
scan.next() latency at the client side, there you should see the latency going
down as you increase the delays.
Obviously, an async scanner puts more pressure on the server since the rate it
is asking for records is higher. Since you are already stress testing the
server with 50 (heavy scanners) clients, it could be that the extra pressure
the async clients put on the server push it beyond its peak point.
Other than that, what is the prefetch size you are using? I assume it is less
than 100. The scenarios in which async scanner would have maximum gain is when
the client side processing (i.e., delays) are equal to the server side I/O time
+ network delays. If the prefetch size is too small the network delays are more
pronounced, and therefore the delays should be longer.

Finally, [~stack] could you please share the client code you use for your
tests, either via this Jira or send it directly to me, so I can take a closer
look, and try it out myself.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-04-07 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483600#comment-14483600
]

stack commented on HBASE-13071:
---

bq. Thanks stack for running this rig tests.

It is my pleasure.

bq. I believe the right way to see the benefit of this feature is to measure
the scan.next() latency at the client side, there you should see the latency
going down as you increase the delays.

Let me do this. I will do it with a single process of ten clients only so
server is not near capacity.

I am using default of 30. I will up it.

I am using PerformanceEvaluation tool with the scan1000 option. Above I
describe the dataset I am scanning.

So, [~eshcar], it would seem that this feature would need to be self tuning to
add general benefit given size of prefetch, client processing time, and other
factors, all hinder its ability to shine?

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png,
gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png,
latency.delay.png, latency.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-26 Thread Edward Bortnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381587#comment-14381587
 ] 

Edward Bortnikov commented on HBASE-13071:
--

+1 on this feature

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, 
 hits.eshcar.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-19 Thread Edward Bortnikov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368689#comment-14368689
]

Edward Bortnikov commented on HBASE-13071:
--

I second [~eshcar]. This is not a huge feature, and everybody seems to benefit.
If there is anything else we should do about the code review - let's do it, and
race to commit :)

Thanks,
Edward

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png,
hits.eshcar.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-18 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367141#comment-14367141
]

Hadoop QA commented on HBASE-13071:
---

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12705332/HBASE-13071_trunk_10.patch
against master branch at commit f9a17edc252a88c5a1a2c7764e3f9f65623e0ced.
ATTACHMENT ID: 12705332

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//artifact/patchprocess/checkstyle-aggregate.html

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/13294//console

This message is automatically generated.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png,
hits.eshcar.png, network.png

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-18 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367166#comment-14367166
 ] 

Eshcar Hillel commented on HBASE-13071:
---

Hi everyone,

What would be the next thing to do to get this patch in (now that all the 
lights are green ;) )?

Thanks,
Eshcar

 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, 
 hits.eshcar.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-18 Thread Eshcar Hillel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366767#comment-14366767
 ] 

Eshcar Hillel commented on HBASE-13071:
---

Yes it's all about setting the delays, but I don't want to change  them to make 
the results look better.They are there just to make the point.

  From: Edward Bortnikov (JIRA) j...@apache.org
 To: esh...@yahoo-inc.com 
 Sent: Monday, March 16, 2015 7:52 AM
 Subject: [jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
   

    [ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362777#comment-14362777
 ] 

Edward Bortnikov commented on HBASE-13071:
--

Eshcar,
Do you have an idea why there are still steps in the async graph? This probably 
means that our delays are not long enough. 
Eddie 


    On Monday, March 16, 2015 1:14 AM, Eshcar Hillel (JIRA) j...@apache.org 
wrote:
  

 
    [ 
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13071:
--
    Attachment: HBASE-13071_trunk_10.patch




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



 Hbase Streaming Scan Feature
 

 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, 
 HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, 
 HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, 
 HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, 
 HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, 
 HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, 
 HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, 
 HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, 
 hits.eshcar.png, network.png


 A scan operation iterates over all rows of a table or a subrange of the 
 table. The synchronous nature in which the data is served at the client side 
 hinders the speed the application traverses the data: it increases the 
 overall processing time, and may cause a great variance in the times the 
 application waits for the next piece of data.
 The scanner next() method at the client side invokes an RPC to the 
 regionserver and then stores the results in a cache. The application can 
 specify how many rows will be transmitted per RPC; by default this is set to 
 100 rows. 
 The cache can be considered as a producer-consumer queue, where the hbase 
 client pushes the data to the queue and the application consumes it. 
 Currently this queue is synchronous, i.e., blocking. More specifically, when 
 the application consumed all the data from the cache --- so the cache is 
 empty --- the hbase client retrieves additional data from the server and 
 re-fills the cache with new data. During this time the application is blocked.
 Under the assumption that the application processing time can be balanced by 
 the time it takes to retrieve the data, an asynchronous approach can reduce 
 the time the application is waiting for data.
 We attach a design document.
 We also have a patch that is based on a private branch, and some evaluation 
 results of this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-15 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362424#comment-14362424
]

Hadoop QA commented on HBASE-13071:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12704663/HBASE-13071_trunk_9.patch
against master branch at commit 01bc979ea29e9282786de13c1cb8cbc107e92e9f.
ATTACHMENT ID: 12704663

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:red}-1 checkstyle{color}. The applied patch generated
1918 checkstyle errors (more than the master's current 1917 errors).

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//artifact/patchprocess/checkstyle-aggregate.html

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/13254//console

This message is automatically generated.

Hbase Streaming Scan Feature

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-15 Thread Eshcar Hillel (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362606#comment-14362606
]

Eshcar Hillel commented on HBASE-13071:
---

New patch is attached.

Also attached the evaluation results for multiple parallel scanners.
Bottom line, on client side results show similar latency improvement trends for
multiple async scanners as for a single scanner thread.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png,
hits.eshcar.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-15 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362659#comment-14362659
]

Ted Yu commented on HBASE-13071:

Results shown in the pdf are impressive.

Hbase Streaming Scan Feature

Key: HBASE-13071
URL: https://issues.apache.org/jira/browse/HBASE-13071
Project: HBase
Issue Type: New Feature
Reporter: Eshcar Hillel
Attachments: 99.eshcar.png, HBASE-13071_98_1.patch,
HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch,
HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch,
HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch,
HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch,
HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf,
HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png,
hits.eshcar.png, network.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature

2015-03-15 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362649#comment-14362649
]

Hadoop QA commented on HBASE-13071:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12704698/HBASE-13071_trunk_10.patch
against master branch at commit 0505b7941e175d86004daf9a31ef5ce240d4570f.
ATTACHMENT ID: 12704698