[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

2017-05-28 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027874#comment-16027874
 ] 

Jonathan Lawlor commented on HBASE-11544:
-

[~karanmehta93] Only the last result should be a partial. Repeated is likely a 
bug. Please file an issue

> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
> batch even if it means OOME
> --
>
> Key: HBASE-11544
> URL: https://issues.apache.org/jira/browse/HBASE-11544
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Jonathan Lawlor
>Priority: Critical
> Fix For: 2.0.0, 1.1.0
>
> Attachments: Allocation_Hot_Spots.html, gc.j.png, 
> HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, 
> HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, 
> HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, 
> HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, 
> HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, 
> HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, hits.j.png, h.png, 
> mean.png, m.png, net.j.png, q (2).png
>
>
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
> large cells.  I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the 
> client whatever we've gathered once we pass out a certain size threshold 
> rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-13541) Deprecate Scan caching in 2.0.0

2015-05-01 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13541:

Attachment: HBASE-13541-WIP.patch

Here's an early WIP patch before it gets much uglier. Since caching has been a 
core concept of Scans for so long, it has quite a broad range of usages 
throughout the codebase. 

The intention, as stated in the description, was to completely strip out all 
the usages of caching and deprecate the API. However, it looks like this may 
not be the way to go. It certainly seems like in particular instances it can be 
a useful to have control over how many Results get transferred per RPC. In 
particular, such control is useful when:
- The user knows ahead of time they will only require X rows
- The user intends to use caching as a paging mechanism. They want X rows now, 
they will do some work, and come back for another X rows.

If both of these workflows could be replicated without caching, it wouldn't be 
a problem. However, paging filters cannot accurately reproduce this exact 
behavior. This is because filters do no carry state when scanning multiple 
regions. Also because filters have no way of forcing a response back to the 
client other than saying that all other rows will be filtered out (which is not 
what we want). 

Thus, it seemed better to repurpose caching as a row limit concept as we 
initially wanted to in HBASE-13442 (we have come full circle...). Of course 
alternative naming is up for debate, we want it to be as clear and true to what 
is occurring as possible.

What still needs to be done? 
More grooming through the usages of the caching API as well as references to 
caching in general (in variable names, method names, javadoc, etc..). Also, 
auto generated models such as protobuf models of Scan, and ScanMessage as well 
as the Thrift model TScan need to be repurposed to use the new terminology.

 Deprecate Scan caching in 2.0.0
 ---

 Key: HBASE-13541
 URL: https://issues.apache.org/jira/browse/HBASE-13541
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
 Attachments: HBASE-13541-WIP.patch


 The public Scan API exposes caching to the application. Caching deals with 
 the number of rows that are transferred per scan RPC request issued to the 
 server. It does not seem like a detail that users of a scan should control 
 and introduces some unneeded complication. Seems more like a detail that 
 should be controlled from the server based on the current scan request RPC 
 load. This issue proposes that we deprecate the caching API in 2.0.0 so that 
 it can be removed later. Of course, if there are any concerns please raise 
 them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13333) Renew Scanner Lease without advancing the RegionScanner

2015-05-01 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523776#comment-14523776
 ] 

Jonathan Lawlor commented on HBASE-1:
-

Change LGTM, don't see any reason why it would conflict with any of the recent 
scanner fixes. Sounds like a nice feature!

 Renew Scanner Lease without advancing the RegionScanner
 ---

 Key: HBASE-1
 URL: https://issues.apache.org/jira/browse/HBASE-1
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.1.1

 Attachments: 1-0.98.txt


 We have a usecase (for Phoenix) where we want to let the server know that the 
 client is still around. Like a client-side heartbeat.
 Doing a full heartbeat is complicated, but we could add the ability to make 
 scanner call with caching set to 0. The server already does the right thing 
 (it renews the lease, but does not advance the scanner).
 It looks like the client (ScannerCallable) also does the right thing. We 
 cannot break ResultScanner before HBase 2.0, but we can add a renewLease() 
 method to AbstractClientScaner. Phoenix (or any other caller) can then cast 
 to ClientScanner and call that method to ensure we renew the lease on the 
 server.
 It would be a simple and fully backwards compatible change. [~giacomotaylor]
 Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-29 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519680#comment-14519680
 ] 

Jonathan Lawlor commented on HBASE-5980:


[~anoop.hbase] Good idea, let me add some more tests to see if we are indeed 
missing some counts. Thanks for taking a look [~anoop.hbase]!

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, 
 HBASE-5980-v2.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-29 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-v3.patch

Here's an updated patch with some new tests. The tests now stress more filter 
scenarios as well as different scan configurations (varying values of caching 
and max result size).

Moved around the increment to the number of rows scanned to put it in a better 
spot (I agree with [~anoop.hbase] that it doesn't belong in nextRow). Now it is 
incremented within populateResult once we have recognized that there are no 
more cells in the row.

Let's see what QA has to say about this one, looks like the patch causing 
issues with test failures has been reverted so should be a cleaner run.

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, 
 HBASE-5980-v2.patch, HBASE-5980-v2.patch, HBASE-5980-v3.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-29 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-v4.patch

Patch to address line length issues from non-generated code. The hanging test 
(TestRowCounter) passes when run locally, so retry.

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, 
 HBASE-5980-v2.patch, HBASE-5980-v2.patch, HBASE-5980-v3.patch, 
 HBASE-5980-v4.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13597) Add ability for Filters to force response back to client during scans

2015-04-29 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13597:
---

 Summary: Add ability for Filters to force response back to client 
during scans
 Key: HBASE-13597
 URL: https://issues.apache.org/jira/browse/HBASE-13597
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Lawlor


Currently, the only way for a filter to force a response back to the client 
during the execution of a scan is via the use of filter#filterAllRemaining(). 
When this method call returns true, the region server interprets it as meaning 
that all remaining rows should be filtered out. This also signals to the client 
that the scanner should close (it's finished...).

It would be nice if there was a mechanism that allowed the filter to force a 
response back to the client without actually terminating the scan. The client 
would receive the response from the server and could continue the scan from 
where it left off. 

I would imagine that such a feature would be used primarily in instances where 
real-time behavior was a concern. In a sense it would allow filters to 
implement their own restrictions on the client-server scan protocol. I think 
this feature can now be supported since we started to send back the 
moreResultsOnServer flag in the ScanResponse (HBASE-13262) to tell the client 
that the current region is not exhausted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-28 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-branch-1.patch

Test failure looks unrelated. That test seems to be causing issues in other 
precommit builds as well as trunk: e.g. 
https://builds.apache.org/job/HBase-TRUNK/6429/testReport/ and 
https://builds.apache.org/job/HBase-TRUNK/6428/testReport/

The checkstyle warning is due to the fact that ServerSideScanMetrics member's 
are public (but this is done to keep it consistent with ScanMetrics). 

Attaching a branch-1 patch

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, 
 HBASE-5980-v2.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-28 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-v2.patch

Reattaching patch because it looks like precommit build still didn't kick off. 

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch, 
 HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-28 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Status: Open  (was: Patch Available)

Resubmitting patch since last time precommit build didn't kick off

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-28 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Status: Patch Available  (was: Open)

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-27 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-v2.patch

Attaching updated patch

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-27 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515056#comment-14515056
 ] 

Jonathan Lawlor commented on HBASE-5980:


bq. you need to add this option to the Scan help in the shell else no one will 
find it.

Good point. Added explanation and examples to the scan help in next patch.

bq. I'd say drop the 'GET_'... because redundant

I like this proposal. Changed to ALL_METRICS and METRICS in latest patch

bq. Purge the abstract class? Only used once?

Makes sense to me. Purged the abstract class in latest patch and moved all of 
its functionality into ServerSideMetrics in latest patch.

bq. Should we have ClientScanMetrics and ServerScanMetrics?

Hmm, interesting idea. So something like: add new class ClientScanMetrics and 
have instances of both ClientScanMetrics and ServerScanMetrics as members 
of a ScanMetrics instance? Since ScanMetrics is a public-evolving API, I think 
this change would be okay. The change would be beneficial in a clean-code 
sense. As a user, I don't think the distinction between client-side vs. 
server-side would be too important when looking at the metrics (a user probably 
just cares that a metric exists, not whether it is classified as server-side or 
client-side). How does that sound, worth doing?

Note: In the latest patch I have also changed the interface audience of 
ServerSideScanMetrics to public-evolving to match the existing interface 
annotation of ScanMetrics. I did this because it seemed wrong that a 
public-evolving class inherited from a private class.

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-27 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Status: Patch Available  (was: Reopened)

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13575) TestChoreService has to make sure that the opened ChoreService is closed for each unit test

2015-04-27 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514597#comment-14514597
 ] 

Jonathan Lawlor commented on HBASE-13575:
-

Good idea [~syuanjiang]

 TestChoreService has to make sure that the opened ChoreService is closed for 
 each unit test
 ---

 Key: HBASE-13575
 URL: https://issues.apache.org/jira/browse/HBASE-13575
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang
Priority: Trivial

 The TestChoreService shut down the opened ChoreService after each individual 
 unit test.  This is to avoid test failure with enormous amount of active 
 threads at the end of test on slow virtual host (see HBASE-12992).  However, 
 the service shut down was not wrapped around the 'finally'-block to guarantee 
 the execution when exception throws.  The fix is trial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative

2015-04-24 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13552:

Attachment: HBASE-13552.patch

Hope it's okay if I take this one, caught my eye because of my earlier work in 
this area. Here's a simple patch that adds a toString to ScheduledChore and 
uses scheduled chores in the log message rather than the runnables returned 
from shutdown. 

Example of new output format:

{noformat}
Chore service for: testShutdownCancelsScheduledChores had [[ScheduledChore: 
Name: sc2 Period: 100 Unit: MILLISECONDS], [ScheduledChore: Name: sc3 Period: 
100 Unit: MILLISECONDS], [ScheduledChore: Name: sc1 Period: 100 Unit: 
MILLISECONDS]] on shutdown
{noformat}

 ChoreService shutdown message could be more informative
 ---

 Key: HBASE-13552
 URL: https://issues.apache.org/jira/browse/HBASE-13552
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Andrew Purtell
Priority: Trivial
 Attachments: HBASE-13552.patch


 {noformat}
 2015-04-23 18:34:38,163 INFO  [M:0;localhost:43244] hbase.ChoreService: Chore 
 service for: localhost,43244,1429833734975 had 
 [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778]
  on shutdown
 {noformat}
 Let's give those tasks human meaningful names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative

2015-04-24 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13552:

Status: Patch Available  (was: Open)

 ChoreService shutdown message could be more informative
 ---

 Key: HBASE-13552
 URL: https://issues.apache.org/jira/browse/HBASE-13552
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
Priority: Trivial
 Attachments: HBASE-13552.patch


 {noformat}
 2015-04-23 18:34:38,163 INFO  [M:0;localhost:43244] hbase.ChoreService: Chore 
 service for: localhost,43244,1429833734975 had 
 [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778]
  on shutdown
 {noformat}
 Let's give those tasks human meaningful names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-13552) ChoreService shutdown message could be more informative

2015-04-24 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor reassigned HBASE-13552:
---

Assignee: Jonathan Lawlor

 ChoreService shutdown message could be more informative
 ---

 Key: HBASE-13552
 URL: https://issues.apache.org/jira/browse/HBASE-13552
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
Priority: Trivial
 Attachments: HBASE-13552.patch


 {noformat}
 2015-04-23 18:34:38,163 INFO  [M:0;localhost:43244] hbase.ChoreService: Chore 
 service for: localhost,43244,1429833734975 had 
 [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778]
  on shutdown
 {noformat}
 Let's give those tasks human meaningful names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13552) ChoreService shutdown message could be more informative

2015-04-24 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511677#comment-14511677
 ] 

Jonathan Lawlor commented on HBASE-13552:
-

ChoreService was introduced by HBASE-6778 into what was branch-1+ at the time. 
As such, this change would currently affect branch-1.1, branch-1, and master. 
The attached patch applied cleanly for me to all branches.

 ChoreService shutdown message could be more informative
 ---

 Key: HBASE-13552
 URL: https://issues.apache.org/jira/browse/HBASE-13552
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
Priority: Trivial
 Attachments: HBASE-13552.patch


 {noformat}
 2015-04-23 18:34:38,163 INFO  [M:0;localhost:43244] hbase.ChoreService: Chore 
 service for: localhost,43244,1429833734975 had 
 [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778]
  on shutdown
 {noformat}
 Let's give those tasks human meaningful names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative

2015-04-24 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13552:

Affects Version/s: 1.2.0
   1.1.0

 ChoreService shutdown message could be more informative
 ---

 Key: HBASE-13552
 URL: https://issues.apache.org/jira/browse/HBASE-13552
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
Priority: Trivial
 Attachments: HBASE-13552.patch


 {noformat}
 2015-04-23 18:34:38,163 INFO  [M:0;localhost:43244] hbase.ChoreService: Chore 
 service for: localhost,43244,1429833734975 had 
 [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859,
  
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778]
  on shutdown
 {noformat}
 Let's give those tasks human meaningful names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13441) Scan API improvements

2015-04-23 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509324#comment-14509324
 ] 

Jonathan Lawlor commented on HBASE-13441:
-

Thanks [~stack], I think those would be some good methods that could use some 
reworking/cleanup. I'll file a sub issue here for each one and also take a look 
to see if any other parts of the Scan API could be refined.

 Scan API improvements
 -

 Key: HBASE-13441
 URL: https://issues.apache.org/jira/browse/HBASE-13441
 Project: HBase
  Issue Type: Umbrella
Reporter: Jonathan Lawlor

 Umbrella task for improvements that could be made to the Scan API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13543) Deprecate Scan maxResultSize in 2.0.0

2015-04-23 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13543:
---

 Summary: Deprecate Scan maxResultSize in 2.0.0
 Key: HBASE-13543
 URL: https://issues.apache.org/jira/browse/HBASE-13543
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


The Scan API exposes a maxResultSize to the application. The max result size is 
used to determine how large the chunks sent back per RPC request should be. 
This seems like a configuration that should not be present in the public API 
used by the application but rather a detail that the server should control 
instead. In a situation where there are multiple concurrent scans being issued 
against a single region server, it would seem more appropriate to give the 
server control over this parameter so that it could be optimized against the 
current load. This issue proposes that the max result size be deprecated in 
2.0.0 so that future optimizations could be made to the way that Scan RPC 
requests are handled by the server. Of course if there are any concerns please 
raise them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13542) Deprecate Scan batch in 2.0.0

2015-04-23 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13542:
---

 Summary: Deprecate Scan batch in 2.0.0
 Key: HBASE-13542
 URL: https://issues.apache.org/jira/browse/HBASE-13542
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


The public Scan API exposes a batch API to the client. The batch API allows the 
application to specify a maximum number of cells to be contained per 
{{Result}}. It seems as though this API was introduced to allow the server to 
deal with large rows. However, now that RPC chunking has been addressed by 
HBASE-11544, it seems that this API may no longer be necessary since large rows 
will now be returned to the client as partials. This issue proposes that we 
deprecate Scan batch in 2.0.0 since it introduces some unneeded complication 
into the public API and doesn't seem all that useful any more. If there are any 
concerns, please raise them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13541) Deprecate Scan caching in 2.0.0

2015-04-23 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13541:
---

 Summary: Deprecate Scan caching in 2.0.0
 Key: HBASE-13541
 URL: https://issues.apache.org/jira/browse/HBASE-13541
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


The public Scan API exposes caching to the application. Caching deals with the 
number of rows that are transferred per scan RPC request issued to the server. 
It does not seem like a detail that users of a scan should control and 
introduces some unneeded complication. Seems more like a detail that should be 
controlled from the server based on the current scan request RPC load. This 
issue proposes that we deprecate the caching API in 2.0.0 so that it can be 
removed later. Of course, if there are any concerns please raise them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13545) Provide better documentation for the public API Scan#setRowOffsetPerColumnFamily

2015-04-23 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13545:
---

 Summary: Provide better documentation for the public API 
Scan#setRowOffsetPerColumnFamily
 Key: HBASE-13545
 URL: https://issues.apache.org/jira/browse/HBASE-13545
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


Currently, the public API Scan#setRowOffsetPerColumnFamily seems a little odd 
and misplaced. The API was introduced in HBASE-5104 to handle behavior that 
could not be sufficiently created through the use of filters. This issue 
proposes that the documentation around this API be improved to make it clear 
how and when it should be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13544) Provide better documentation for the public API Scan#setMaxResultsPerColumnFamily

2015-04-23 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13544:
---

 Summary: Provide better documentation for the public API 
Scan#setMaxResultsPerColumnFamily
 Key: HBASE-13544
 URL: https://issues.apache.org/jira/browse/HBASE-13544
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


Currently, the public API Scan#setMaxResultsPerColumnFamily seems a little odd. 
The API was introduced in HBASE-5104 to handle behavior that could not be 
sufficiently created through the use of filters. This issue proposes that the 
documentation around this API be improved to make it clear how and when it 
should be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit

2015-04-23 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor resolved HBASE-13442.
-
Resolution: Duplicate

Closing this one in favor of HBASE-13541. Please reopen if there are any 
outstanding concerns.

 Rename scanner caching to a more semantically correct term such as row limit
 

 Key: HBASE-13442
 URL: https://issues.apache.org/jira/browse/HBASE-13442
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
 Attachments: HBASE-13442-proposal.diff


 Caching acts more as a row limit now. By default in branch-1+, a Scan is 
 configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we 
 service scans on the basis of buffer size rather than number of rows. As a 
 result, caching should now only be configured in instances where the user 
 knows that they will only need X rows. Thus, caching should be renamed to 
 something that is more semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans

2015-04-22 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13527:

Attachment: HBASE-13527-branch-1.0-addendum.patch

Attaching an addendum that addresses compilation error in branch-1.0. 

 The default value for hbase.client.scanner.max.result.size is never actually 
 set on Scans
 -

 Key: HBASE-13527
 URL: https://issues.apache.org/jira/browse/HBASE-13527
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0, 0.98.12, 1.0.2

 Attachments: HBASE-13527-0.98.patch, 
 HBASE-13527-branch-1.0-addendum.patch, HBASE-13527-v1.patch


 Now that max result size is driven from the client side like caching 
 (HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
 hbase.client.scanner.max.result.size which is never performed. I think this 
 has gone unnoticed because the server used to read the configuration 
 hbase.client.scanner.max.result.size for itself, but now we expect the 
 serialized Scan sent from the client side to contain this information. 
 Realistically this should have been set on the Scans even before HBASE-13362, 
 it's surprising that it's not as the scanner code seems to indicate otherwise.
 Ultimately, the end result is that, by default, scan RPC's are limited by 
 hbase.server.scanner.max.result.size (note this is the new server side config 
 not the client side config) which has a default value of 100 MB. The scan 
 RPC's should instead be limited by hbase.client.scanner.max.result.size which 
 has a default value of 2 MB.
 The reason why this issue occurs is because, by default, a new Scan() 
 initializes Scan.maxResultSize to -1. This initial value of -1 will never be 
 changed unless Scan#setMaxResultSize() is called. In the event that this 
 value is not changed, the Scan that is serialized and sent to the server will 
 also have Scan.maxResultSize = -1. Then, when the server is deciding what 
 size limit should be enforced, it sees that Scan.maxResultSize = -1 so it 
 uses the most relaxed size restriction possible, which is 
 hbase.server.scanner.max.result.size (default value 100 MB).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit

2015-04-21 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13442:

Attachment: HBASE-13442-proposal.diff

Here's a look at what the proposed API change would look like (before it is 
integrated into the rest of the codebase). In summary:
* The current APIs setCaching/getCaching are deprecated in 2.0.0
* The new APIs setRPCRowLimit/getRPCRowLimit take the place of caching
* There is no change in the actual behavior behavior (yet... see below), just a 
change in name. The name change makes it clear what using the API will actually 
do.

Discussion:
* Thoughts on new API names? Other recommendations?
* Should behavior stay the same? Alternatively, as [~davelatham] suggested, 
should we instead make it actually limit the number of rows the client returns 
to the app (once the row limit is reached the scanner would be closed and 
return null on future calls to scanner.next())? If the alternative is pursued, 
it would be more appropriate to call it rowLimit rather than rpcRowLimit. 
* Anything other suggestions?

 Rename scanner caching to a more semantically correct term such as row limit
 

 Key: HBASE-13442
 URL: https://issues.apache.org/jira/browse/HBASE-13442
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
 Attachments: HBASE-13442-proposal.diff


 Caching acts more as a row limit now. By default in branch-1+, a Scan is 
 configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we 
 service scans on the basis of buffer size rather than number of rows. As a 
 result, caching should now only be configured in instances where the user 
 knows that they will only need X rows. Thus, caching should be renamed to 
 something that is more semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit

2015-04-21 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505836#comment-14505836
 ] 

Jonathan Lawlor commented on HBASE-13442:
-

bq. I don't see why an app should care specifically about how many rows the 
client transfers from the server in each RPC - bytes seem the more relevant 
currency to tune for performance.

Really good point, I can't think of such a scenario either. Certainly we want 
to return results from the server on the basis of size rather than some 
arbitrary number of rows (since row size can vary table to table, there isn't a 
universally good row limit). This is supported by the move to the default 
configurations of (caching = Integer.MAX_VALUE, maxResultSize = 2 MB). So 
actually, the best course of action here wouldn't be to rename caching... but 
actually to deprecate it so eventually it can be removed completely in favor of 
rowLimit.

The feature in the protocol that allows the client to ask for a certain number 
of rows would remain, but only be used for backwards compatibility and for the 
scenario that the client wants to limit itself to only a certain number of 
rows. Makes sense to me.

With such a change, we would also want to remove any associated configurations 
for caching/rowlimit  in hbase-site.xml and hbase-default.xml. There isn't a 
scenario (at least that I can think of) where it would be appropriate to limit 
all scans to a particular number of rows and then close them. The row limit 
would be like the startRow or stopRow settings on scans, configured on a per 
scan basis with no means to set a global default for all scans.

 Rename scanner caching to a more semantically correct term such as row limit
 

 Key: HBASE-13442
 URL: https://issues.apache.org/jira/browse/HBASE-13442
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
 Attachments: HBASE-13442-proposal.diff


 Caching acts more as a row limit now. By default in branch-1+, a Scan is 
 configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we 
 service scans on the basis of buffer size rather than number of rows. As a 
 result, caching should now only be configured in instances where the user 
 knows that they will only need X rows. Thus, caching should be renamed to 
 something that is more semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans

2015-04-21 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13527:

Description: 
Now that max result size is driven from the client side like caching 
(HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
hbase.client.scanner.max.result.size which is never performed. I think this has 
gone unnoticed because the server used to read the configuration 
hbase.client.scanner.max.result.size for itself, but now we expect the 
serialized Scan sent from the client side to contain this information. 
Realistically this should have been set on the Scans even before HBASE-13362, 
it's surprising that it's not as the scanner code seems to indicate otherwise.

Ultimately, the end result is that, by default, scan RPC's are limited by 
hbase.server.scanner.max.result.size (note this is the new server side config 
not the client side config) which has a default value of 100 MB. The scan RPC's 
should instead be limited by hbase.client.scanner.max.result.size which has a 
default value of 2 MB.

The reason why this issue occurs is because, by default, a new Scan() 
initializes Scan.maxResultSize to -1. This initial value of -1 will never be 
changed unless Scan#setMaxResultSize() is called. In the event that this value 
is not changed, the Scan that is serialized and sent to the server will also 
have Scan.maxResultSize = -1. Then, when the server is deciding what size limit 
should be enforced, it sees that Scan.maxResultSize = -1 so it uses the most 
relaxed size restriction possible, which is 
hbase.server.scanner.max.result.size (default value 100 MB).

  was:
Now that max result size is driven from the client side like caching 
(HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
hbase.client.scanner.max.result.size which is never performed. I think this has 
gone unnoticed because the server used to read the configuration 
hbase.client.scanner.max.result.size for itself, but now we expect the 
serialized Scan sent from the client side to contain this information. 
Realistically this should have been set on the Scans even before HBASE-13362, 
it's surprising that it's not as the scanner code seems to indicate otherwise.

Ultimately, the end result is that, by default, scan RPC's are limited by 
hbase.server.scanner.max.result.size (note this is the new server side config 
not the client side config) which has a default value of 100 MB. The scan RPC's 
should instead be limited by hbase.client.scanner.max.result.size which has a 
default value of 2 MB.


 The default value for hbase.client.scanner.max.result.size is never actually 
 set on Scans
 -

 Key: HBASE-13527
 URL: https://issues.apache.org/jira/browse/HBASE-13527
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0
Reporter: Jonathan Lawlor
 Attachments: HBASE-13527-v1.patch


 Now that max result size is driven from the client side like caching 
 (HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
 hbase.client.scanner.max.result.size which is never performed. I think this 
 has gone unnoticed because the server used to read the configuration 
 hbase.client.scanner.max.result.size for itself, but now we expect the 
 serialized Scan sent from the client side to contain this information. 
 Realistically this should have been set on the Scans even before HBASE-13362, 
 it's surprising that it's not as the scanner code seems to indicate otherwise.
 Ultimately, the end result is that, by default, scan RPC's are limited by 
 hbase.server.scanner.max.result.size (note this is the new server side config 
 not the client side config) which has a default value of 100 MB. The scan 
 RPC's should instead be limited by hbase.client.scanner.max.result.size which 
 has a default value of 2 MB.
 The reason why this issue occurs is because, by default, a new Scan() 
 initializes Scan.maxResultSize to -1. This initial value of -1 will never be 
 changed unless Scan#setMaxResultSize() is called. In the event that this 
 value is not changed, the Scan that is serialized and sent to the server will 
 also have Scan.maxResultSize = -1. Then, when the server is deciding what 
 size limit should be enforced, it sees that Scan.maxResultSize = -1 so it 
 uses the most relaxed size restriction possible, which is 
 hbase.server.scanner.max.result.size (default value 100 MB).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans

2015-04-21 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13527:

Status: Patch Available  (was: Open)

 The default value for hbase.client.scanner.max.result.size is never actually 
 set on Scans
 -

 Key: HBASE-13527
 URL: https://issues.apache.org/jira/browse/HBASE-13527
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.12, 1.0.0, 2.0.0, 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
 Attachments: HBASE-13527-v1.patch


 Now that max result size is driven from the client side like caching 
 (HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
 hbase.client.scanner.max.result.size which is never performed. I think this 
 has gone unnoticed because the server used to read the configuration 
 hbase.client.scanner.max.result.size for itself, but now we expect the 
 serialized Scan sent from the client side to contain this information. 
 Realistically this should have been set on the Scans even before HBASE-13362, 
 it's surprising that it's not as the scanner code seems to indicate otherwise.
 Ultimately, the end result is that, by default, scan RPC's are limited by 
 hbase.server.scanner.max.result.size (note this is the new server side config 
 not the client side config) which has a default value of 100 MB. The scan 
 RPC's should instead be limited by hbase.client.scanner.max.result.size which 
 has a default value of 2 MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans

2015-04-21 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13527:

Attachment: HBASE-13527-v1.patch

Attaching a patch that sets max result size in a manner similar to how caching 
is set inside HTable.getScanner().

 The default value for hbase.client.scanner.max.result.size is never actually 
 set on Scans
 -

 Key: HBASE-13527
 URL: https://issues.apache.org/jira/browse/HBASE-13527
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0
Reporter: Jonathan Lawlor
 Attachments: HBASE-13527-v1.patch


 Now that max result size is driven from the client side like caching 
 (HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
 hbase.client.scanner.max.result.size which is never performed. I think this 
 has gone unnoticed because the server used to read the configuration 
 hbase.client.scanner.max.result.size for itself, but now we expect the 
 serialized Scan sent from the client side to contain this information. 
 Realistically this should have been set on the Scans even before HBASE-13362, 
 it's surprising that it's not as the scanner code seems to indicate otherwise.
 Ultimately, the end result is that, by default, scan RPC's are limited by 
 hbase.server.scanner.max.result.size (note this is the new server side config 
 not the client side config) which has a default value of 100 MB. The scan 
 RPC's should instead be limited by hbase.client.scanner.max.result.size which 
 has a default value of 2 MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans

2015-04-21 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13527:
---

 Summary: The default value for 
hbase.client.scanner.max.result.size is never actually set on Scans
 Key: HBASE-13527
 URL: https://issues.apache.org/jira/browse/HBASE-13527
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.12, 1.0.0, 2.0.0, 1.1.0, 1.2.0
Reporter: Jonathan Lawlor


Now that max result size is driven from the client side like caching 
(HBASE-13362), we also need to set Scan.maxResultSize to the default value of 
hbase.client.scanner.max.result.size which is never performed. I think this has 
gone unnoticed because the server used to read the configuration 
hbase.client.scanner.max.result.size for itself, but now we expect the 
serialized Scan sent from the client side to contain this information. 
Realistically this should have been set on the Scans even before HBASE-13362, 
it's surprising that it's not as the scanner code seems to indicate otherwise.

Ultimately, the end result is that, by default, scan RPC's are limited by 
hbase.server.scanner.max.result.size (note this is the new server side config 
not the client side config) which has a default value of 100 MB. The scan RPC's 
should instead be limited by hbase.client.scanner.max.result.size which has a 
default value of 2 MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting for

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Summary: Fix test failures in TestScannerHeartbeatMessages caused by a too 
restrictive setting for   (was: Fix test failures in 
TestScannerHeartbeatMessages in branch-1.1 and branch-1)

 Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive 
 setting for 
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Description: The test inside TestScannerHeartbeatMessages is failing 
because the configured value of hbase.rpc.timeout cannot be less than 2 
seconds in branch-1 and branch-1.1 but the test expects that it can be set to 
0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in 
{{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer 
in master.  (was: The test inside TestScannerHeartbeatMessages is failing 
because the configured value of hbase.rpc.timeout cannot be less than 2 
seconds in branch-1 and branch-1.1 but the test expects that it can be set to 
0.5 seconds.)

 Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting 
 of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds. This is because of the field MIN_RPC_TIMEOUT in 
 {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no 
 longer in master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-04-20 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503889#comment-14503889
 ] 

Jonathan Lawlor commented on HBASE-13082:
-

I'm a little late to the party but this versioned data structure sounds neat. 
If I'm understanding correctly, it sounds like this versioned data structure 
would also allow us to remove the lingering lock in updateReaders (and 
potentially remove updateReaders completely?). Instead of having to update the 
readers, the compaction/flush would occur in the background and be made visible 
to new readers via a new latest version in the data structure, is that 
correct? In other words, would the introduction of this new versioned data 
structure make StoreScanner single threaded (and thus remove any need for 
synchronization)?

 Coarsen StoreScanner locks to RegionScanner
 ---

 Key: HBASE-13082
 URL: https://issues.apache.org/jira/browse/HBASE-13082
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
 13082-v4.txt, 13082.txt, 13082.txt, gc.png, gc.png, gc.png, hits.png, 
 next.png, next.png


 Continuing where HBASE-10015 left of.
 We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
 the lock already held by the RegionScanner.
 In tests this shows quite a scan improvement and reduced CPU (the fences make 
 the cores wait for memory fetches).
 There are some drawbacks too:
 * All calls to RegionScanner need to be remain synchronized
 * Implementors of coprocessors need to be diligent in following the locking 
 contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
 required in the documentation (not picking on Phoenix, this one is my fault 
 as I told them it's OK)
 * possible starving of flushes and compaction with heavy read load. 
 RegionScanner operations would keep getting the locks and the 
 flushes/compactions would not be able finalize the set of files.
 I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor reassigned HBASE-13514:
---

Assignee: Jonathan Lawlor

 Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting 
 of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13514-branch-1.1.patch, 
 HBASE-13514-branch-1.patch, HBASE-13514.patch


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds. This is because of the field MIN_RPC_TIMEOUT in 
 {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no 
 longer in master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Status: Patch Available  (was: Open)

 Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting 
 of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13514-branch-1.1.patch, 
 HBASE-13514-branch-1.patch, HBASE-13514.patch


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds. This is because of the field MIN_RPC_TIMEOUT in 
 {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no 
 longer in master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Attachment: HBASE-13514-branch-1.patch
HBASE-13514-branch-1.1.patch
HBASE-13514.patch

Attaching a patch for each branch to get a QA run on each. The patch addresses 
the test failure and also adds a deleteTable in test cleanup. [~tedyu] got some 
time to take a quick looksee?

 Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting 
 of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13514-branch-1.1.patch, 
 HBASE-13514-branch-1.patch, HBASE-13514.patch


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds. This is because of the field MIN_RPC_TIMEOUT in 
 {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no 
 longer in master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-20 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503574#comment-14503574
 ] 

Jonathan Lawlor commented on HBASE-13090:
-

Filed HBASE-13514 to address the test failures in branch-1 and branch-1.1

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, 
 HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, 
 HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages in branch-1.1 and branch-1

2015-04-20 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13514:
---

 Summary: Fix test failures in TestScannerHeartbeatMessages in 
branch-1.1 and branch-1
 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor


The test inside TestScannerHeartbeatMessages is failing because the configured 
value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and 
branch-1.1 but the test expects that it can be set to 0.5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Summary: Fix test failures in TestScannerHeartbeatMessages caused by a too 
restrictive setting of hbase.rpc.timeout  (was: Fix test failures in 
TestScannerHeartbeatMessages caused by a too restrictive setting for )

 Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive 
 setting of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

2015-04-20 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13514:

Summary: Fix test failures in TestScannerHeartbeatMessages caused by 
incorrect setting of hbase.rpc.timeout  (was: Fix test failures in 
TestScannerHeartbeatMessages caused by a too restrictive setting of 
hbase.rpc.timeout)

 Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting 
 of hbase.rpc.timeout
 --

 Key: HBASE-13514
 URL: https://issues.apache.org/jira/browse/HBASE-13514
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.1.0, 1.2.0
Reporter: Jonathan Lawlor
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 1.2.0


 The test inside TestScannerHeartbeatMessages is failing because the 
 configured value of hbase.rpc.timeout cannot be less than 2 seconds in 
 branch-1 and branch-1.1 but the test expects that it can be set to 0.5 
 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-20 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503512#comment-14503512
 ] 

Jonathan Lawlor commented on HBASE-13090:
-

[~tedyu] thanks for digging in here. I have done some investigation into the 
root cause of this issue and it seems to be coming from the field 
{{MIN_RPC_TIMEOUT}} inside {{RpcRetryingCaller}} in branch-1. This 
{{MIN_RPC_TIMEOUT}} field in branch-1 prevents setting the RPC timeout value to 
anything less than 2 seconds. In master this field no longer exists and the 
timeout value can be specified to be as small as we wish. In the case of 
TestScannerHeartbeatMessages, the RPC timeout was specified to be 0.5 seconds 
which is why it fails when it is 2 seconds instead. I will attach a patch 
shortly to address this issue, thanks!

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, 
 HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, 
 HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-17 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13090:

Release Note: 
Previously, there was no way to enforce a time limit on scan RPC requests. The 
server would receive a scan RPC request and take as much time as it needed to 
accumulate enough results to reach a limit or exhaust the region. The problem 
with this approach was that, in the case of a very selective scan, the 
processing of the scan could take too long and cause timeouts client side.

With this fix, the server will now enforce a time limit on the execution of 
scan RPC requests. When a scan RPC request arrives to the server, a time limit 
is calculated to be half of whichever timeout value is more restictive between 
the configurations (hbase.client.scanner.timeout.period and 
hbase.rpc.timeout). When the time limit is reached, the server will return 
whatever results it has accumulated up to that point. The results may be empty.

To ensure that timeout checks do not occur too often (which would hurt the 
performance of scans), the configuration 
hbase.cells.scanned.per.heartbeat.check has been introduced. This 
configuration controls how often System.currentTimeMillis() is called to update 
the progress towards the time limit. Currently, the default value of this 
configuration value is 1. Specifying a smaller value will provide a tighter 
bound on the time limit, but may hurt scan performance due to the higher 
frequency of calls to System.currentTimeMillis().

Protobuf models for ScanRequest and ScanResponse have been updated so that 
heartbeat support can be communicated. Support for heartbeat messages is 
specified in the request sent to the server via 
ScanRequest.Builder#setClientHandlesHeartbeats. Only when the server sees that 
ScanRequest#getClientHandlesHeartbeats() is true will it send heartbeat 
messages back to the client. A response is marked as a heartbeat message via 
the boolean flag ScanResponse#getHeartbeatMessage

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, 
 HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, 
 HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-17 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500349#comment-14500349
 ] 

Jonathan Lawlor commented on HBASE-13090:
-

[~ndimiduk] I believe the change is solid. Just figured with branch-1.1 release 
so close may be a bit 'risky' to stick such a large change in right before 
release. While the unit tests added do stress the relevant code paths, it would 
be nice to run it against a workload that was having timeout problems before to 
prove its worth

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, 
 HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, 
 HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-17 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500924#comment-14500924
 ] 

Jonathan Lawlor commented on HBASE-13090:
-

[~tedyu] Thanks for catching that. Seems HRegionServer no longer throws 
InterruptedException in master. Addendum lgtm.

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, 
 HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, 
 HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-16 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-5980:
---
Attachment: HBASE-5980-v1.patch

Attaching a patch that exposes the following server side metrics to the client 
side ScanMetrics:
* Number of rows scanned (metric name is ROWS_SCANNED)
* Number of rows filtered (metric name is ROWS_FILTERED)

Important notes:
* ScanMetrics now contains a mix of both client side and server side metrics
* AbstractScanMetrics and ServerSideScanMetrics were added to try and keep the 
ScanMetrics stuff clean
* The following new arguments are now supported in scans from the shell:
** GET_ALL_METRICS: boolean indicating whether or not all scan metrics should 
be output
** GET_METRICS: array of metric keys the user wants to see (this argument 
trumps GET_ALL_METRICS)
** Example usages:
*** scan 'table', {GET_ALL_METRICS = true}
*** scan 'table', {GET_METRICS = ['RPC_RETRIES', 'ROWS_FILTERED']}
* Metrics are always output in alphabetical order

Discussion points:
* I think the name of the metrics and shell arguments could be improved, just 
chose some easy names to show their usage. Thoughts?
* The other metric mentioned above still needs to be added (number of times 
each filter response was returned by filterKeyValue() - corresponding to 
Filter.ReturnCode). Adding new metrics is easy: just specify the new field in 
ServerSideScanMetrics and add the appropriate tracking calls. I wanted to get 
some feedback on how these metrics looked first rather than add a bunch of 
metrics all at once.
* All of the metrics [~lhofhansl] mentioned sound great. In terms of 
coprocessors, what kind of metrics would be valuable to expose to the client?


 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Priority: Minor
 Attachments: HBASE-5980-v1.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners

2015-04-16 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13090:

Attachment: HBASE-13090-v7.patch

Updated patch incorporating feedback from reviewboard

 Progress heartbeats for long running scanners
 -

 Key: HBASE-13090
 URL: https://issues.apache.org/jira/browse/HBASE-13090
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell
Assignee: Jonathan Lawlor
 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, 
 HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, 
 HBASE-13090-v6.patch, HBASE-13090-v7.patch


 It can be necessary to set very long timeouts for clients that issue scans 
 over large regions when all data in the region might be filtered out 
 depending on scan criteria. This is a usability concern because it can be 
 hard to identify what worst case timeout to use until scans are 
 occasionally/intermittently failing in production, depending on variable scan 
 criteria. It would be better if the client-server scan protocol can send back 
 periodic progress heartbeats to clients as long as server scanners are alive 
 and making progress.
 This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-16 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor reassigned HBASE-5980:
--

Assignee: Jonathan Lawlor

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Assignee: Jonathan Lawlor
Priority: Minor
 Attachments: HBASE-5980-v1.patch


 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9444) EncodedScannerV2#isSeeked does not behave as described in javadoc

2015-04-14 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-9444:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving as fixed as HBASE-9915 has addressed this issue. Please reopen if 
there are additional concerns not covered by the solution in HBASE-9915.

 EncodedScannerV2#isSeeked does not behave as described in javadoc
 -

 Key: HBASE-9444
 URL: https://issues.apache.org/jira/browse/HBASE-9444
 Project: HBase
  Issue Type: Bug
  Components: HFile
Reporter: Chao Shi
Priority: Minor
 Attachments: hbase-9444.patch


 I hit this when my tool is scanning HFiles using the scanner. I found 
 isSeeked behaves different whether the HFiles are prefix-encoded or not.
 There is a test case in my patch that demonstrates the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

2015-04-14 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor reopened HBASE-5980:


This one was recently closed due to inactivity but caught my eye because it 
sounds like a nice one to have. Currently we track some client side metrics 
during scans such as count of regions scanned, count of RPCs, etc... (full list 
available in ScanMetrics class). However, these client side metrics do not 
include information regarding events that have occurred server side (like how 
many kv's have been filtered). 

If we wanted to have these metrics available client side, I believe it could be 
achieved in the following manner:
1. Define a new class to encapsulate the server side metrics that we wish to 
access/track client side
2. Define a new protobuf message type for this new metrics class
3. Add the metrics as another field in the ScanResponse
4. Add new fields to ScanMetrics (the class that already exists client side) 
corresponding to the server side metrics and update these metrics after each 
RPC response in ScannerCallable

In terms of how to actually track these metrics during Scan RPC's, we can add 
an instance of this new server side metrics class to the ScannerContext class 
that was added in HBASE-13421. Then all metric tracking could be performed via 
ScannerContext#getMetrics()#update...

Any thoughts/comments?

 Scanner responses from RS should include metrics on rows/KVs filtered
 -

 Key: HBASE-5980
 URL: https://issues.apache.org/jira/browse/HBASE-5980
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, regionserver
Affects Versions: 0.95.2
Reporter: Todd Lipcon
Priority: Minor

 Currently it's difficult to know, when issuing a filter, what percentage of 
 rows were skipped by that filter. We should expose some basic counters back 
 to the client scanner object. For example:
 - number of rows filtered by row key alone (filterRowKey())
 - number of times each filter response was returned by filterKeyValue() - 
 corresponding to Filter.ReturnCode
 What would be slickest is if this could actually return a tree of counters 
 for cases where FilterList or other combining filters are used. But a 
 top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12930) Check single row size not exceed configured max row size across families for Get/Scan

2015-04-13 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492605#comment-14492605
 ] 

Jonathan Lawlor commented on HBASE-12930:
-

[~cuijianwei] Recently there was a change made (HBASE-13421) to the solution 
initially conceived in HBASE-11544. The remaining result size limit still 
exists, but now it is carried within the new class, ScannerContext (see 
ScannerContext#incrementaSizeProgress and ScannerContext#checkSizeLimit(...)). 
ScannerContext was introduced to allow us to reduce the number of object 
creations in the scanner hot code paths and also provides a nice encapsulation 
of limits and limit progress. Please let me know if you have any questions :)

 Check single row size not exceed configured max row size across families for 
 Get/Scan
 -

 Key: HBASE-12930
 URL: https://issues.apache.org/jira/browse/HBASE-12930
 Project: HBase
  Issue Type: Improvement
  Components: Scanners
Reporter: cuijianwei
Priority: Minor
 Fix For: 2.0.0


 StoreScanner#next will check the 'totalBytesRead' not exceed configured 
 ‘hbase.table.max.rowsize’ for each family. However, if there are several 
 families, the single row will also achieve unexpected big size even if 
 'totalBytesRead' of each family not exceed 'hbase.table.max.rowsize'. This 
 may cause the region server fail because of OOM. What about checking single 
 row size across families in StoreScanner#next(ListCell, int)?
 {code}
 long totalBytesRead = 0;
 // == compute the size of cells have been read
 for (Cell cell : outResult) {
   totalBytesRead += CellUtil.estimatedSerializedSizeOf(old);
 }
 LOOP: while((cell = this.heap.peek()) != null) {
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13441) Scan API improvements

2015-04-09 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13441:
---

 Summary: Scan API improvements
 Key: HBASE-13441
 URL: https://issues.apache.org/jira/browse/HBASE-13441
 Project: HBase
  Issue Type: Umbrella
Reporter: Jonathan Lawlor


Umbrella task for improvements that could be made to the Scan API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit

2015-04-09 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13442:
---

 Summary: Rename scanner caching to a more semantically correct 
term such as row limit
 Key: HBASE-13442
 URL: https://issues.apache.org/jira/browse/HBASE-13442
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor


Caching acts more as a limit now. By default in branch-1+, a Scan is configured 
with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on 
the basis of buffer size rather than number of rows. As a result, caching 
should now only be configured in instances where the user knows that they will 
only need X rows. Thus, caching should be renamed to something that is more 
semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

2015-04-09 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488140#comment-14488140
 ] 

Jonathan Lawlor commented on HBASE-11544:
-

[~lhofhansl] good point. I have filed HBASE-13441 as an umbrella issue for 
discussion regarding potential improvements to the Scan API. HBASE-13442 deals 
specifically with the rename to rowLimit.

 [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
 batch even if it means OOME
 --

 Key: HBASE-11544
 URL: https://issues.apache.org/jira/browse/HBASE-11544
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Jonathan Lawlor
Priority: Critical
 Fix For: 2.0.0, 1.1.0

 Attachments: Allocation_Hot_Spots.html, 
 HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, 
 HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, 
 HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, 
 HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, 
 HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, 
 HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, 
 hits.j.png, m.png, mean.png, net.j.png, q (2).png


 Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
 large cells.  I kept OOME'ing.
 Serverside, we should measure how much we've accumulated and return to the 
 client whatever we've gathered once we pass out a certain size threshold 
 rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit

2015-04-09 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13442:

Description: Caching acts more as a row limit now. By default in branch-1+, 
a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so 
that we service scans on the basis of buffer size rather than number of rows. 
As a result, caching should now only be configured in instances where the user 
knows that they will only need X rows. Thus, caching should be renamed to 
something that is more semantically correct such as rowLimit.  (was: Caching 
acts more as a limit now. By default in branch-1+, a Scan is configured with 
(caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the 
basis of buffer size rather than number of rows. As a result, caching should 
now only be configured in instances where the user knows that they will only 
need X rows. Thus, caching should be renamed to something that is more 
semantically correct such as rowLimit.)

 Rename scanner caching to a more semantically correct term such as row limit
 

 Key: HBASE-13442
 URL: https://issues.apache.org/jira/browse/HBASE-13442
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor

 Caching acts more as a row limit now. By default in branch-1+, a Scan is 
 configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we 
 service scans on the basis of buffer size rather than number of rows. As a 
 result, caching should now only be configured in instances where the user 
 knows that they will only need X rows. Thus, caching should be renamed to 
 something that is more semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: HBASE-13421-branch-1.patch

Attaching the branch-1 patch

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, 
 HBASE-13421-v2.patch, HBASE-13421-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485613#comment-14485613
 ] 

Jonathan Lawlor commented on HBASE-13421:
-

Whoops, already had reviewboard link... please ignore noise

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, 
 HBASE-13421-v2.patch, HBASE-13421-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13215) A limit on the raw key values is needed for each next call of a scanner

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485795#comment-14485795
 ] 

Jonathan Lawlor commented on HBASE-13215:
-

Hey [~heliangliang], have you had any time to work on this lately; any updates?

 A limit on the raw key values is needed for each next call of a scanner
 ---

 Key: HBASE-13215
 URL: https://issues.apache.org/jira/browse/HBASE-13215
 Project: HBase
  Issue Type: Improvement
  Components: Scanners
Reporter: He Liangliang
Assignee: He Liangliang

 In the current scanner next, there are several limits: caching, batch and 
 size. But there is no limit on raw data scanned, so the time consumed by a 
 next call is unbounded. For example, many consecutive deleted or filtered out 
 cells will leads to a socket timeout. This can make user code get stuck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485825#comment-14485825
 ] 

Jonathan Lawlor commented on HBASE-12266:
-

Sounds like this issue is related to the heartbeat/keepalive idea in 
HBASE-13090. 

The idea over there is to track how long a scan has been executing server side 
and to return periodic heartbeat/keepalive messages in the event that the scan 
is taking a long time. The frequency of these heartbeats messages would be 
dependent upon the configured scanner timeout (a more restrictive timeout would 
lead to more frequent heartbeat messages). This solution would address the 
issue of a slow scan and would also remove the possibility of this dead loop. 

Thoughts? Think we could close this one out?

 Slow Scan can cause dead loop in ClientScanner 
 ---

 Key: HBASE-12266
 URL: https://issues.apache.org/jira/browse/HBASE-12266
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.96.0
Reporter: Qiang Tian
Priority: Minor
 Attachments: 12266-v2.txt, HBASE-12266-master.patch


 see http://search-hadoop.com/m/DHED45SVsC1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-6491) add limit function at ClientScanner

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-6491:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

This issue is quite old. Resolving as won't fix for now but please feel free to 
reopen if the need for the feature is still present. 

As [~stack] pointed out, this may be unsafe for the client in the case that we 
are dealing with very large rows. It may be more appropriate to implement such 
a behavior in the form of a coprocessor rather than making RPC calls for the 
sake of skipping results client side.

 add limit function at ClientScanner
 ---

 Key: HBASE-6491
 URL: https://issues.apache.org/jira/browse/HBASE-6491
 Project: HBase
  Issue Type: New Feature
  Components: Client
Affects Versions: 0.95.2
Reporter: ronghai.ma
Assignee: ronghai.ma
  Labels: patch
 Attachments: ClientScanner.java, HBASE-6491.patch


 Add a new method in ClientScanner to implement a function like LIMIT in MySQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-5978) Scanner next() calls should return after a configurable time threshold regardless of number of accumulated rows

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor resolved HBASE-5978.

Resolution: Duplicate

Resolving as duplicate as this issue seems to be the same as HBASE-13090.

 Scanner next() calls should return after a configurable time threshold 
 regardless of number of accumulated rows
 ---

 Key: HBASE-5978
 URL: https://issues.apache.org/jira/browse/HBASE-5978
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Affects Versions: 0.90.7, 0.92.1
Reporter: Todd Lipcon

 Currently if you pass a very restrictive filter to a scanner, along with a 
 high caching value, you will end up causing RPC timeouts, lease exceptions, 
 etc. Although this is a poor configuration and easy to work around by 
 lowering caching, HBase should be resilient to a badly chosen caching value. 
 As such, the scanner next() call should record the elapsed time, and after 
 some number of seconds have passed, return any accumulated rows regardless of 
 the caching value. This prevents the calls from starving out other threads or 
 region operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: HBASE-13421-v3.patch

Attaching patch to address issues in commit message as well as the checkstyle 
error (added a getInstance() method to NoLimitScannerContext)

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch, 
 HBASE-13421-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485517#comment-14485517
 ] 

Jonathan Lawlor commented on HBASE-13421:
-

Making the branch-1 patch now and will attach once conflicts are resolved

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch, 
 HBASE-13421-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485919#comment-14485919
 ] 

Jonathan Lawlor commented on HBASE-13421:
-

Looks like the build was green but the comment couldn't be made because of a 
login error with hadoopQA: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13621/consoleFull

Build Result:
{quote}
{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723952/HBASE-13421-v3.patch
  against master branch at commit 8cd3001f817915df20a4d209c450ac9b69b915d7.
  ATTACHMENT ID: 12723952

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 148 
new or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13621//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13621//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13621//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13621//console
{quote}

Error:
{quote}
==
==
Adding comment to Jira.
==
==


Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 Cause: com.atlassian.jira.rpc.exception.RemoteAuthenticationException: Attempt 
to log in user 'hadoopqa' failed. The maximum number of failed login attempts 
has been reached. Please log into the application through the web interface to 
reset the number of failed login attempts.
{quote}

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, 
 HBASE-13421-v2.patch, HBASE-13421-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-7026) Make metrics collection in StoreScanner.java more efficient

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor resolved HBASE-7026.

Resolution: Fixed

Marking this old one as fixed. I do not see these metrics being recorded inside 
StoreScanner anymore and thus potential performance regressions seem to have 
been addressed.

 Make metrics collection in StoreScanner.java more efficient
 ---

 Key: HBASE-7026
 URL: https://issues.apache.org/jira/browse/HBASE-7026
 Project: HBase
  Issue Type: Sub-task
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan

 Per the benchmarks I ran, the following block of code seems to be inefficient:
 StoreScanner.java:
 public synchronized boolean next(ListKeyValue outResult, int limit,
   String metric) throws IOException {
 // ...
   // update the counter 
   if (addedResultsSize  0  metric != null) {
 HRegion.incrNumericMetric(this.metricNamePrefix + metric, 
 addedResultsSize);
   }
 // ...
 Removing this block increased throughput by 10%. We should move this to the 
 outer layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13262) ResultScanner doesn't return all rows in Scan

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486431#comment-14486431
 ] 

Jonathan Lawlor commented on HBASE-13262:
-

[~davelatham] so with the workaround of setting 
hbase.client.scanner.max.result.size=1000, you receive a batch of 30 
rows back from the server and then the scan terminates without scanning the 
rest of the region, is that right? Also, do you have HFileV3 turned on (i.e. 
what is the configured value for hfile.format.version)? When tests were run in 
0.98 (details above), this issue wasn't readily producible (0.98 uses HFileV2 
by default, and this issue was discovered to be a result of using HFileV3). If 
you do receive the full 30 rows back from the server, and you are *not* using 
HFileV3, I would be inclined to agree with you and say that this is in fact a 
different issue. Would you be able to provide any more details about the 
particular scan configuration?



 ResultScanner doesn't return all rows in Scan
 -

 Key: HBASE-13262
 URL: https://issues.apache.org/jira/browse/HBASE-13262
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0, 1.1.0
 Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13262-0.98-testpatch.txt, HBASE-13262-0.98-v7.patch, 
 HBASE-13262-branch-1-v2.patch, HBASE-13262-branch-1-v3.patch, 
 HBASE-13262-branch-1.0-v7.patch, HBASE-13262-branch-1.patch, 
 HBASE-13262-v1.patch, HBASE-13262-v2.patch, HBASE-13262-v3.patch, 
 HBASE-13262-v4.patch, HBASE-13262-v5.patch, HBASE-13262-v6.patch, 
 HBASE-13262-v7.patch, HBASE-13262-v7.patch, HBASE-13262.patch, 
 regionserver-logging.diff, testrun_0.98.txt, testrun_branch1.0.txt


 Tried to write a simple Java client again 1.1.0-SNAPSHOT.
 * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), 
 for a total of 10M cells written
 * Read back the data from the table, ensure I saw 10M cells
 Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of 
 the actual rows. Running against 1.0.0, returns all 10M records as expected.
 [Code I was 
 running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
  for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10060) Unsynchronized scanning

2015-04-08 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485997#comment-14485997
 ] 

Jonathan Lawlor commented on HBASE-10060:
-

Hey [~lhofhansl] is this one the same as HBASE-13082? Should we mark this one 
as a duplicate of HBASE-13082?

 Unsynchronized scanning
 ---

 Key: HBASE-10060
 URL: https://issues.apache.org/jira/browse/HBASE-10060
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 10060-trunk-v2.txt, 10060-trunk.txt


 HBASE-10015 has some lengthy discussion. The solution there ended up 
 replacing synchronized with ReentrantLock, which - somewhat surprisingly - 
 yielded a non-trivial improvement for tall tables.
 The goal should be to avoid locking in StoreScanner at all. StoreScanner is 
 only accessed by a single thread *except* when we have a concurrent flush or 
 a compaction, which is rare (we'd acquire and release the lock millions of 
 times per second, and compact/flush a few time an hour at the most).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-5607) Implement scanner caching throttling to prevent too big responses

2015-04-08 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor resolved HBASE-5607.

Resolution: Fixed

Resolving this old one as fixed. With HBASE-2214 the throttling mechanism is 
available to the client to achieve the described behavior via 
Scan.setMaxResultSize. Furthermore, with HBASE-12976 in branch-1+, the default 
value of max result size is set to a reasonable value that will prevent a 
client from unintentionally causing issues on the region server. Feel free to 
reopen if you feel there are additional concerns that I have missed.

 Implement scanner caching throttling to prevent too big responses 
 --

 Key: HBASE-5607
 URL: https://issues.apache.org/jira/browse/HBASE-5607
 Project: HBase
  Issue Type: Improvement
Reporter: Ferdy Galema

 When a misconfigured client retrieves fat rows with a scanner caching value 
 set too high, there is a big chance the regionserver cannot handle the 
 response buffers. (See log example below). Also see the mailing list thread: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24819
 This issue is for tracking a solution that throttles the scanner caching 
 value in the case the response buffers are too big.
 A few possible solutions:
 a) Is a response (repeatedly) over 100MB (configurable), then reduce the 
 scanner-caching by half its size. (In either server or client).
 b) Introduce a property that defines a fixed (target) response size, instead 
 of defining the numbers of rows to cache.
 2012-03-20 07:57:40,092 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 5 on 60020, responseTooLarge for: next(4438820558358059204, 1000) 
 from 172.23.122.15:50218: Size: 105.0m
 2012-03-20 07:57:53,226 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 3 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) 
 from 172.23.122.15:50218: Size: 214.4m
 2012-03-20 07:57:57,839 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 5 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) 
 from 172.23.122.15:50218: Size: 103.2m
 2012-03-20 07:57:59,442 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 2 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) 
 from 172.23.122.15:50218: Size: 101.8m
 2012-03-20 07:58:20,025 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 6 on 60020, responseTooLarge for: next(9033159548564260857, 1000) 
 from 172.23.122.15:50218: Size: 107.2m
 2012-03-20 07:58:27,273 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 3 on 60020, responseTooLarge for: next(9033159548564260857, 1000) 
 from 172.23.122.15:50218: Size: 100.1m
 2012-03-20 07:58:52,783 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 1 on 60020, responseTooLarge for: next(-8611621895979000997, 1000) 
 from 172.23.122.15:50218: Size: 101.7m
 2012-03-20 07:59:02,541 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 0 on 60020, responseTooLarge for: next(-511305750191148153, 1000) 
 from 172.23.122.15:50218: Size: 120.9m
 2012-03-20 07:59:25,346 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 6 on 60020, responseTooLarge for: next(1570572538285935733, 1000) 
 from 172.23.122.15:50218: Size: 107.8m
 2012-03-20 07:59:46,805 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 3 on 60020, responseTooLarge for: next(-727080724379055435, 1000) 
 from 172.23.122.15:50218: Size: 102.7m
 2012-03-20 08:00:00,138 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 3 on 60020, responseTooLarge for: next(-3701270248575643714, 1000) 
 from 172.23.122.15:50218: Size: 122.1m
 2012-03-20 08:00:21,232 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 6 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: Size: 157.5m
 2012-03-20 08:00:23,199 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 9 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: Size: 160.7m
 2012-03-20 08:00:28,174 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 2 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: Size: 160.8m
 2012-03-20 08:00:32,643 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 7 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: Size: 182.4m
 2012-03-20 08:00:36,826 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 9 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: Size: 237.2m
 2012-03-20 08:00:40,850 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 3 on 60020, responseTooLarge for: next(5831907615409186602, 1000) 
 from 172.23.122.15:50218: 

[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Status: Patch Available  (was: Open)

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Status: Open  (was: Patch Available)

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: HBASE-13421-v1.patch

Reattaching with appropriate name to avoid confusion

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: (was: HBASE-11544-addendum-v3.patch)

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Status: Patch Available  (was: Open)

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-11544-addendum-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

2015-04-07 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484343#comment-14484343
 ] 

Jonathan Lawlor commented on HBASE-11544:
-

Filed sub-task HBASE-13421 to address the fix to reduce the number of objects 
being created.

 [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
 batch even if it means OOME
 --

 Key: HBASE-11544
 URL: https://issues.apache.org/jira/browse/HBASE-11544
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Jonathan Lawlor
Priority: Critical
 Fix For: 2.0.0, 1.1.0

 Attachments: Allocation_Hot_Spots.html, 
 HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, 
 HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, 
 HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, 
 HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, 
 HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, 
 HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, 
 hits.j.png, m.png, mean.png, net.j.png, q (2).png


 Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
 large cells.  I kept OOME'ing.
 Serverside, we should measure how much we've accumulated and return to the 
 client whatever we've gathered once we pass out a certain size threshold 
 rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13421:
---

 Summary: Reduce the number of object creations introduced by 
HBASE-11544 in scan RPC hot code paths
 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0


HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
InternalScanner#next to allow state information to be passed back from a 
scanner (it was formerly a boolean indicating whether or not more values 
existed). The change in this return type led to an increased amount of objects 
being created... In the case that a scan spanned millions of rows, there was 
the potential for millions of object to be created.

This issue looks to reduce the large amount of object creations from 
potentially many to at most one per RPC request. 

Please see the tail of the parent issue for relevant discussion on the design 
decisions related to this solution. This sub-task has been filed as it seems 
more appropriate to address the fix here rather than as an addendum to the 
parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: HBASE-11544-addendum-v3.patch

Attaching latest patch that incorporates latest feedback from reviewboard. 
Let's see what QA has to say

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-11544-addendum-v3.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths

2015-04-07 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13421:

Attachment: HBASE-13421-v2.patch

Updated patch to address checkstyle and javadoc warnings.

The failing test (TestFastFail) passed locally so may have just been flaky, 
retry.

 Reduce the number of object creations introduced by HBASE-11544 in scan RPC 
 hot code paths
 --

 Key: HBASE-13421
 URL: https://issues.apache.org/jira/browse/HBASE-13421
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch


 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw 
 InternalScanner#next to allow state information to be passed back from a 
 scanner (it was formerly a boolean indicating whether or not more values 
 existed). The change in this return type led to an increased amount of 
 objects being created... In the case that a scan spanned millions of rows, 
 there was the potential for millions of object to be created.
 This issue looks to reduce the large amount of object creations from 
 potentially many to at most one per RPC request. 
 Please see the tail of the parent issue for relevant discussion on the design 
 decisions related to this solution. This sub-task has been filed as it seems 
 more appropriate to address the fix here rather than as an addendum to the 
 parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13362) set max result size from client only (like caching)?

2015-04-06 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482070#comment-14482070
 ] 

Jonathan Lawlor commented on HBASE-13362:
-

+1, looks good to me.

Question, should we also add an entry for this new configuration to 
hbase-default.xml? I'm just thinking, as a user, how would I know about this 
new configuration value and the semantics behind it?

 set max result size from client only (like caching)?
 

 Key: HBASE-13362
 URL: https://issues.apache.org/jira/browse/HBASE-13362
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: 13362-0.98.txt, 13362-master.txt


 With the recent problems we've been seeing client/server result size 
 mismatch, I was thinking: Why was this not a problem with scanner caching?
 There are two reasons:
 # number of rows is easy to calculate (and we did it correctly)
 # caching is only controlled from the client, never set on the server alone
 We did fix both #1 and #2 in HBASE-13262.
 Still, I'd like to discuss the following:
 * default the client sent max result size to 2mb
 * remove any server only result sizing
 * continue to use hbase.client.scanner.max.result.size but enforce it via the 
 client only (as the name implies anyway).
 Comments? Concerns?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-03 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: HBASE-13374-v1.patch

Reattaching again now that apache infra is stable, let's get a QA run in

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-02 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392994#comment-14392994
 ] 

Jonathan Lawlor commented on HBASE-13374:
-

Ohh I see, that makes sense then, thanks!

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-02 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: HBASE-13374-v1.patch

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-02 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Status: Open  (was: Patch Available)

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-02 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Status: Patch Available  (was: Open)

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-02 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392965#comment-14392965
 ] 

Jonathan Lawlor commented on HBASE-13374:
-

Looks like there is an issue fetching from git. I see this in the console 
output of the precommit build:

{quote}
FATAL: Failed to fetch from https://git-wip-us.apache.org/repos/asf/hbase.git
{quote}

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-01 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: HBASE-13374-v1.patch

Re-attaching the patch as the precommit build didn't start. Let's see what QA 
thinks about this

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, 
 small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-01 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391753#comment-14391753
 ] 

Jonathan Lawlor commented on HBASE-13374:
-

bq. If lastResult is from previous region and I am in new region, how can I get 
a row from before?
Prior to this change, this else if block would set the start key to 
lastResult.getRow() and the first result returned from the server would be 
skipped. Rather than skipping the first Result returned from the server, it 
would be better if we could set the start key such that the first Result 
returned from the server would be the Result that follows lastResult. In this 
case, since we are performing a reversed scan, the start key that should be 
used is the key of the closest row that could occur before lastResult.getRow.

bq. Where was the int overflow? In the int i count?
Integer overflow was occurring on the following line
{code}
cacheNum++;
{code}
Looks like this increment to caching was performed as part of the 
skipRowOfFirstResult hack (since the first row is skipped, increment caching by 
one)

bq. How did you find the issues? Especially int overflow one?
For issue #1 (integer overflow), I tracked it down after noticing that not all 
rows were being retrieved with the following scan configuration 
(caching=Int.MAX, maxResultSize=1, small=true). What ends up happening is that 
{{cacheNum++}} overflows and we end up sending a negative caching value to the 
server (this is equivalent to telling this server that no rows should be 
retrieved). The result is that all RPC's after the overflow will return empty 
results and the client will think that all regions have been exhausted.

For issue #2, I tracked it down after noticing that there were still rows 
missing if I used the following scan configuration (caching=100, 
maxResultSize=1, small=true). In this case, what happens is that a single row 
does not fit into the defined max result size (i.e. we reach the size limit 
after retrieving only a single row). Thus, we will receive only one row back 
from the first RPC. Then, in the next RPC, the only row returned will be that 
same row. This is because the small scanner expects that by increasing the 
caching limit it will be able to skip this row... it doesn't account for the 
fact that the size limit may be reached before the caching limit. Thus, since 
only one row is returned, and that row is skipped, the scanner interprets this 
as meaning that the region is exhausted and will skip all of the remaining rows 
in that region.

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, 
 small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate 

[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-01 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: HBASE-13374-v1.patch

Attaching a patch that fixes both issues. If the start key is set correctly on 
the small scanner callable then the caching does not need to be incremented 
(thus avoiding the integer overflow) and skipRowOfFirstResult can be removed.

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, 
 small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-01 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Status: Patch Available  (was: Open)

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13374-v1.patch, 
 small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-04-01 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor reassigned HBASE-13374:
---

Assignee: Jonathan Lawlor

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
Assignee: Jonathan Lawlor
Priority: Blocker
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-03-31 Thread Jonathan Lawlor (JIRA)
Jonathan Lawlor created HBASE-13374:
---

 Summary: Small scanners (with particular configurations) do not 
return all rows
 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor


I recently ran into a couple data loss issues with small scans. Similar to 
HBASE-13262, these issues only appear when scans are configured in such a way 
that the max result size limit is reached before the caching limit is reached. 
As far as I can tell, this issue affects branches 0.98+

I should note that after investigation it looks like the root cause of these 
issues is not the same as HBASE-13262. Rather, these issue are caused by errors 
in the small scanner logic (I will explain in more depth below). 

Furthermore, I do know that the solution from HBASE-13262 has not made its way 
into small scanners (it is being addressed in HBASE-13335). As a result I made 
sure to test these issues with the patch from HBASE-13335 applied and I saw 
that they were still present.

The following two issues have been observed (both lead to data loss):

1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
and a maxResultSize limit that is reached before the region is exhausted, 
integer overflow will occur. This eventually leads to a preemptive skip of the 
regions.

2. When a small scan is configured with a maxResultSize that is smaller than 
the size of a single row, the small scanner will jump between regions 
preemptively. This issue seems to be because small scanners assume that, unless 
a region is exhausted, at least 2 rows will be returned from the server. This 
assumption isn't clearly state in the small scanners but is implied through the 
use of {{skipRowOfFirstResult}}.

Again, I would like to stress that the root cause of these issues is *NOT* 
related to the cause of HBASE-13262. These issues occur because of 
inappropriate assumption made in the small scanner logic. The inappropriate 
assumptions are:
1. Integer overflow will not occur when incrementing caching
2. At least 2 rows will be returned from the server unless the region has been 
exhausted

I am attaching a patch that contains tests to display these issues. If these 
issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-03-31 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: small-scanner-data-loss-tests-branch-1.0+.patch

Attaching patch that can be applied to branch-1.0+. This patch does not contain 
a fix. It contains the tests cases that allow us to see the failure modes.

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
 Attachments: small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows

2015-03-31 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13374:

Attachment: small-scanner-data-loss-tests-0.98.patch

Corresponding patch for 0.98

 Small scanners (with particular configurations) do not return all rows
 --

 Key: HBASE-13374
 URL: https://issues.apache.org/jira/browse/HBASE-13374
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13
Reporter: Jonathan Lawlor
 Attachments: small-scanner-data-loss-tests-0.98.patch, 
 small-scanner-data-loss-tests-branch-1.0+.patch


 I recently ran into a couple data loss issues with small scans. Similar to 
 HBASE-13262, these issues only appear when scans are configured in such a way 
 that the max result size limit is reached before the caching limit is 
 reached. As far as I can tell, this issue affects branches 0.98+
 I should note that after investigation it looks like the root cause of these 
 issues is not the same as HBASE-13262. Rather, these issue are caused by 
 errors in the small scanner logic (I will explain in more depth below). 
 Furthermore, I do know that the solution from HBASE-13262 has not made its 
 way into small scanners (it is being addressed in HBASE-13335). As a result I 
 made sure to test these issues with the patch from HBASE-13335 applied and I 
 saw that they were still present.
 The following two issues have been observed (both lead to data loss):
 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, 
 and a maxResultSize limit that is reached before the region is exhausted, 
 integer overflow will occur. This eventually leads to a preemptive skip of 
 the regions.
 2. When a small scan is configured with a maxResultSize that is smaller than 
 the size of a single row, the small scanner will jump between regions 
 preemptively. This issue seems to be because small scanners assume that, 
 unless a region is exhausted, at least 2 rows will be returned from the 
 server. This assumption isn't clearly state in the small scanners but is 
 implied through the use of {{skipRowOfFirstResult}}.
 Again, I would like to stress that the root cause of these issues is *NOT* 
 related to the cause of HBASE-13262. These issues occur because of 
 inappropriate assumption made in the small scanner logic. The inappropriate 
 assumptions are:
 1. Integer overflow will not occur when incrementing caching
 2. At least 2 rows will be returned from the server unless the region has 
 been exhausted
 I am attaching a patch that contains tests to display these issues. If these 
 issues should be split into separate JIRAs please let me know.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13335) Update ClientSmallScanner and ClientSmallReversedScanner

2015-03-31 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389395#comment-14389395
 ] 

Jonathan Lawlor commented on HBASE-13335:
-

+1, these changes look good to me. Those tests look great.

It may also be a good idea to extend TestSizeFailures so that it also tests to 
ensure that all data is seen when the scan is small (e.g. perform that same 
scan near the end with but configure it with Scan.setSmall(true)). Even though 
that wouldn't be a small scan, it would test to make sure the fix behaves as 
expected.

 Update ClientSmallScanner and ClientSmallReversedScanner
 

 Key: HBASE-13335
 URL: https://issues.apache.org/jira/browse/HBASE-13335
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13335-0.98-v1.patch, 
 HBASE-13335-branch-1-v1.patch, HBASE-13335-v1.patch, HBASE-13335.patch


 Some follow-on work for HBASE-13262:
 it's unlikely that clients using the small scanners would get enough data to 
 run into the initial bug, but the scanner implementations should still adhere 
 to the moreResultsInRegion flag when the server sends it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13335) Update ClientSmallScanner and ClientSmallReversedScanner

2015-03-31 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389419#comment-14389419
 ] 

Jonathan Lawlor commented on HBASE-13335:
-

Sounds good to me

 Update ClientSmallScanner and ClientSmallReversedScanner
 

 Key: HBASE-13335
 URL: https://issues.apache.org/jira/browse/HBASE-13335
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13

 Attachments: HBASE-13335-0.98-v1.patch, 
 HBASE-13335-branch-1-v1.patch, HBASE-13335-v1.patch, HBASE-13335.patch


 Some follow-on work for HBASE-13262:
 it's unlikely that clients using the small scanners would get enough data to 
 run into the initial bug, but the scanner implementations should still adhere 
 to the moreResultsInRegion flag when the server sends it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13362) set max result size from client only (like caching)?

2015-03-30 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386983#comment-14386983
 ] 

Jonathan Lawlor commented on HBASE-13362:
-

This sounds like a great idea. As [~lhofhansl] pointed out with the test in 
HBASE-13297, it easy for the client and server to have different configurations 
for the default max result size. Prior to HBASE-13262 this would have meant 
data loss. With HBASE-13262 we no longer have data loss, it's just ugly. 

If instead it was dealt with in the same manner as caching (e.g. the caching 
value must be carried in the ScanRequest to the server) it would be much 
cleaner. The 2mb default sounds good. 

Probably obvious, but just to be clear, we would still need to support 
instances where the client uses a negative maxResultSize to indicate that the 
response should not be limited by the result size (i.e. negative maxResultSize 
is equivalent to maxResultSize = Long.MAX_VALUE).

If backported prior to branch-1, it would be nice to accompany this change with 
a change in the default caching value (from the current default of 100 to 
Integer.MAX_VALUE) so that the size limit is reached by default, rather than 
the caching/row limit (I say prior to branch-1 because the defaults of 
caching/maxResultSize in branch-1+ will already produce this behavior). 
Granted, this accompanying change would probably be dealt with best in a 
separate JIRA.

 set max result size from client only (like caching)?
 

 Key: HBASE-13362
 URL: https://issues.apache.org/jira/browse/HBASE-13362
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl

 With the recent problems we've been seeing client/server result size 
 mismatch, I was thinking: Why was this not a problem with scanner caching?
 There are two reasons:
 # number of rows is easy to calculate (and we did it correctly)
 # caching is only controlled from the client, never set on the server alone
 We did fix both #1 and #2 in HBASE-13262.
 Still, I'd like to discuss the following:
 * default the client sent max result size to 2mb
 * remove any server only result sizing
 * continue to use hbase.client.scanner.max.result.size but enforce it via the 
 client only (as the name implies anyway).
 Comments? Concerns?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

2015-03-30 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-11544:

Attachment: HBASE-11544-addendum-v2.patch

Attaching a rebased version of the patch since recent changes on master 
prevented a clean apply. Anyone have any thoughts on how ScannerContexts fits 
into the scanner RPC workflow? Questions, ideas for improvement, alternative 
approaches?

 [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
 batch even if it means OOME
 --

 Key: HBASE-11544
 URL: https://issues.apache.org/jira/browse/HBASE-11544
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Jonathan Lawlor
Priority: Critical
 Fix For: 2.0.0, 1.1.0

 Attachments: Allocation_Hot_Spots.html, 
 HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, 
 HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, 
 HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, 
 HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, 
 HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, 
 HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, 
 hits.j.png, m.png, mean.png, net.j.png, q (2).png


 Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
 large cells.  I kept OOME'ing.
 Serverside, we should measure how much we've accumulated and return to the 
 client whatever we've gathered once we pass out a certain size threshold 
 rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

2015-03-27 Thread Jonathan Lawlor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-11544:

Attachment: HBASE-11544-addendum-v1.patch

Work in progress update:

I've been working on an addendum that includes the ScannerContext changes 
described above and would like to get some feedback. I am attaching the patch 
but I would like to highlight the following point given the discussion above:
* The intention was to modify the RegionScanner/InternalScanner interfaces as 
[~stack] and I described above. Specifically, I wanted to have the following 
signatures in RegionScanner (and equivalent ones in InternalScanner):
{code}
ScannerContext nextRaw(ListCell result) throws IOException;
ScannerContext nextRaw(ListCell result, ScannerContext scannerContext)
{code}
** As far as I can tell, the proposed interface change has two problems: In the 
event that the first method is called, a ScannerContext object would need to be 
created (object creations are what we want to avoid). Also, if the second 
method is called, we are simply returning the same object that the caller 
passed in, so the return value is redundant
** Instead I made NextState an enum and I return that. A NextState enum was 
used instead of the previous boolean return type because it allows the caller 
to determine when a partial result has been formed. An argument could be made 
that the return type should be boolean and we should just put NextState inside 
the context, but I didn't do that because it would make the code messier (would 
have to call scannerContext.setState() before every return statement and opens 
up the potential to miss setting the state when really we just want to return 
it). 
* This way ScannerContext simply holds the limits and tracks the progress 
towards those limits

So with this patch what we get is:
The good:
* One object creation per session/rpc instead of potentially millions in the 
case of large batch scans
* Much more explicit state information is returned from 
RegionScanner/InternalScanner

The bad:
* An object is being passed around between scanners whereas we had a primitive 
per limit before.
** However, note that the drawback of having a primitive per limit is that it 
does not tell us about the progress that has been made towards those limits and 
thus any progress must be recalculated by the caller
* The RegionScanner interface is changed from Stable to Evolving due to the 
changes necessary in the interface (this change was noted over in HBASE-13306 
but given that we are filing an addendum it makes more sense to address it 
here).

As this is a work in progress the docs could still use a little love, but at 
the very least this patch lets us see the way that Scan RPC's would look server 
side in the event that ScannerContext is introduced.

 [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
 batch even if it means OOME
 --

 Key: HBASE-11544
 URL: https://issues.apache.org/jira/browse/HBASE-11544
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Jonathan Lawlor
Priority: Critical
 Fix For: 2.0.0, 1.1.0

 Attachments: Allocation_Hot_Spots.html, 
 HBASE-11544-addendum-v1.patch, HBASE-11544-branch_1_0-v1.patch, 
 HBASE-11544-branch_1_0-v2.patch, HBASE-11544-v1.patch, HBASE-11544-v2.patch, 
 HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch, 
 HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, 
 HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, 
 gc.j.png, h.png, hits.j.png, m.png, mean.png, net.j.png, q (2).png


 Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
 large cells.  I kept OOME'ing.
 Serverside, we should measure how much we've accumulated and return to the 
 client whatever we've gathered once we pass out a certain size threshold 
 rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >