[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027874#comment-16027874 ] Jonathan Lawlor commented on HBASE-11544: - [~karanmehta93] Only the last result should be a partial. Repeated is likely a bug. Please file an issue > [Ergonomics] hbase.client.scanner.caching is dogged and will try to return > batch even if it means OOME > -- > > Key: HBASE-11544 > URL: https://issues.apache.org/jira/browse/HBASE-11544 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Jonathan Lawlor >Priority: Critical > Fix For: 2.0.0, 1.1.0 > > Attachments: Allocation_Hot_Spots.html, gc.j.png, > HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, > HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, > HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, > HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, > HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, > HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, hits.j.png, h.png, > mean.png, m.png, net.j.png, q (2).png > > > Running some tests, I set hbase.client.scanner.caching=1000. Dataset has > large cells. I kept OOME'ing. > Serverside, we should measure how much we've accumulated and return to the > client whatever we've gathered once we pass out a certain size threshold > rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-13541) Deprecate Scan caching in 2.0.0
[ https://issues.apache.org/jira/browse/HBASE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13541: Attachment: HBASE-13541-WIP.patch Here's an early WIP patch before it gets much uglier. Since caching has been a core concept of Scans for so long, it has quite a broad range of usages throughout the codebase. The intention, as stated in the description, was to completely strip out all the usages of caching and deprecate the API. However, it looks like this may not be the way to go. It certainly seems like in particular instances it can be a useful to have control over how many Results get transferred per RPC. In particular, such control is useful when: - The user knows ahead of time they will only require X rows - The user intends to use caching as a paging mechanism. They want X rows now, they will do some work, and come back for another X rows. If both of these workflows could be replicated without caching, it wouldn't be a problem. However, paging filters cannot accurately reproduce this exact behavior. This is because filters do no carry state when scanning multiple regions. Also because filters have no way of forcing a response back to the client other than saying that all other rows will be filtered out (which is not what we want). Thus, it seemed better to repurpose caching as a row limit concept as we initially wanted to in HBASE-13442 (we have come full circle...). Of course alternative naming is up for debate, we want it to be as clear and true to what is occurring as possible. What still needs to be done? More grooming through the usages of the caching API as well as references to caching in general (in variable names, method names, javadoc, etc..). Also, auto generated models such as protobuf models of Scan, and ScanMessage as well as the Thrift model TScan need to be repurposed to use the new terminology. Deprecate Scan caching in 2.0.0 --- Key: HBASE-13541 URL: https://issues.apache.org/jira/browse/HBASE-13541 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Attachments: HBASE-13541-WIP.patch The public Scan API exposes caching to the application. Caching deals with the number of rows that are transferred per scan RPC request issued to the server. It does not seem like a detail that users of a scan should control and introduces some unneeded complication. Seems more like a detail that should be controlled from the server based on the current scan request RPC load. This issue proposes that we deprecate the caching API in 2.0.0 so that it can be removed later. Of course, if there are any concerns please raise them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13333) Renew Scanner Lease without advancing the RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523776#comment-14523776 ] Jonathan Lawlor commented on HBASE-1: - Change LGTM, don't see any reason why it would conflict with any of the recent scanner fixes. Sounds like a nice feature! Renew Scanner Lease without advancing the RegionScanner --- Key: HBASE-1 URL: https://issues.apache.org/jira/browse/HBASE-1 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.13, 1.0.2, 1.1.1 Attachments: 1-0.98.txt We have a usecase (for Phoenix) where we want to let the server know that the client is still around. Like a client-side heartbeat. Doing a full heartbeat is complicated, but we could add the ability to make scanner call with caching set to 0. The server already does the right thing (it renews the lease, but does not advance the scanner). It looks like the client (ScannerCallable) also does the right thing. We cannot break ResultScanner before HBase 2.0, but we can add a renewLease() method to AbstractClientScaner. Phoenix (or any other caller) can then cast to ClientScanner and call that method to ensure we renew the lease on the server. It would be a simple and fully backwards compatible change. [~giacomotaylor] Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519680#comment-14519680 ] Jonathan Lawlor commented on HBASE-5980: [~anoop.hbase] Good idea, let me add some more tests to see if we are indeed missing some counts. Thanks for taking a look [~anoop.hbase]! Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, HBASE-5980-v2.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-v3.patch Here's an updated patch with some new tests. The tests now stress more filter scenarios as well as different scan configurations (varying values of caching and max result size). Moved around the increment to the number of rows scanned to put it in a better spot (I agree with [~anoop.hbase] that it doesn't belong in nextRow). Now it is incremented within populateResult once we have recognized that there are no more cells in the row. Let's see what QA has to say about this one, looks like the patch causing issues with test failures has been reverted so should be a cleaner run. Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, HBASE-5980-v2.patch, HBASE-5980-v2.patch, HBASE-5980-v3.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-v4.patch Patch to address line length issues from non-generated code. The hanging test (TestRowCounter) passes when run locally, so retry. Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, HBASE-5980-v2.patch, HBASE-5980-v2.patch, HBASE-5980-v3.patch, HBASE-5980-v4.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13597) Add ability for Filters to force response back to client during scans
Jonathan Lawlor created HBASE-13597: --- Summary: Add ability for Filters to force response back to client during scans Key: HBASE-13597 URL: https://issues.apache.org/jira/browse/HBASE-13597 Project: HBase Issue Type: New Feature Reporter: Jonathan Lawlor Currently, the only way for a filter to force a response back to the client during the execution of a scan is via the use of filter#filterAllRemaining(). When this method call returns true, the region server interprets it as meaning that all remaining rows should be filtered out. This also signals to the client that the scanner should close (it's finished...). It would be nice if there was a mechanism that allowed the filter to force a response back to the client without actually terminating the scan. The client would receive the response from the server and could continue the scan from where it left off. I would imagine that such a feature would be used primarily in instances where real-time behavior was a concern. In a sense it would allow filters to implement their own restrictions on the client-server scan protocol. I think this feature can now be supported since we started to send back the moreResultsOnServer flag in the ScanResponse (HBASE-13262) to tell the client that the current region is not exhausted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-branch-1.patch Test failure looks unrelated. That test seems to be causing issues in other precommit builds as well as trunk: e.g. https://builds.apache.org/job/HBase-TRUNK/6429/testReport/ and https://builds.apache.org/job/HBase-TRUNK/6428/testReport/ The checkstyle warning is due to the fact that ServerSideScanMetrics member's are public (but this is done to keep it consistent with ScanMetrics). Attaching a branch-1 patch Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-branch-1.patch, HBASE-5980-v1.patch, HBASE-5980-v2.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-v2.patch Reattaching patch because it looks like precommit build still didn't kick off. Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Status: Open (was: Patch Available) Resubmitting patch since last time precommit build didn't kick off Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Status: Patch Available (was: Open) Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-v2.patch Attaching updated patch Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515056#comment-14515056 ] Jonathan Lawlor commented on HBASE-5980: bq. you need to add this option to the Scan help in the shell else no one will find it. Good point. Added explanation and examples to the scan help in next patch. bq. I'd say drop the 'GET_'... because redundant I like this proposal. Changed to ALL_METRICS and METRICS in latest patch bq. Purge the abstract class? Only used once? Makes sense to me. Purged the abstract class in latest patch and moved all of its functionality into ServerSideMetrics in latest patch. bq. Should we have ClientScanMetrics and ServerScanMetrics? Hmm, interesting idea. So something like: add new class ClientScanMetrics and have instances of both ClientScanMetrics and ServerScanMetrics as members of a ScanMetrics instance? Since ScanMetrics is a public-evolving API, I think this change would be okay. The change would be beneficial in a clean-code sense. As a user, I don't think the distinction between client-side vs. server-side would be too important when looking at the metrics (a user probably just cares that a metric exists, not whether it is classified as server-side or client-side). How does that sound, worth doing? Note: In the latest patch I have also changed the interface audience of ServerSideScanMetrics to public-evolving to match the existing interface annotation of ScanMetrics. I did this because it seemed wrong that a public-evolving class inherited from a private class. Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Status: Patch Available (was: Reopened) Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch, HBASE-5980-v2.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13575) TestChoreService has to make sure that the opened ChoreService is closed for each unit test
[ https://issues.apache.org/jira/browse/HBASE-13575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514597#comment-14514597 ] Jonathan Lawlor commented on HBASE-13575: - Good idea [~syuanjiang] TestChoreService has to make sure that the opened ChoreService is closed for each unit test --- Key: HBASE-13575 URL: https://issues.apache.org/jira/browse/HBASE-13575 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Trivial The TestChoreService shut down the opened ChoreService after each individual unit test. This is to avoid test failure with enormous amount of active threads at the end of test on slow virtual host (see HBASE-12992). However, the service shut down was not wrapped around the 'finally'-block to guarantee the execution when exception throws. The fix is trial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative
[ https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13552: Attachment: HBASE-13552.patch Hope it's okay if I take this one, caught my eye because of my earlier work in this area. Here's a simple patch that adds a toString to ScheduledChore and uses scheduled chores in the log message rather than the runnables returned from shutdown. Example of new output format: {noformat} Chore service for: testShutdownCancelsScheduledChores had [[ScheduledChore: Name: sc2 Period: 100 Unit: MILLISECONDS], [ScheduledChore: Name: sc3 Period: 100 Unit: MILLISECONDS], [ScheduledChore: Name: sc1 Period: 100 Unit: MILLISECONDS]] on shutdown {noformat} ChoreService shutdown message could be more informative --- Key: HBASE-13552 URL: https://issues.apache.org/jira/browse/HBASE-13552 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Andrew Purtell Priority: Trivial Attachments: HBASE-13552.patch {noformat} 2015-04-23 18:34:38,163 INFO [M:0;localhost:43244] hbase.ChoreService: Chore service for: localhost,43244,1429833734975 had [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778] on shutdown {noformat} Let's give those tasks human meaningful names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative
[ https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13552: Status: Patch Available (was: Open) ChoreService shutdown message could be more informative --- Key: HBASE-13552 URL: https://issues.apache.org/jira/browse/HBASE-13552 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Andrew Purtell Assignee: Jonathan Lawlor Priority: Trivial Attachments: HBASE-13552.patch {noformat} 2015-04-23 18:34:38,163 INFO [M:0;localhost:43244] hbase.ChoreService: Chore service for: localhost,43244,1429833734975 had [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778] on shutdown {noformat} Let's give those tasks human meaningful names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13552) ChoreService shutdown message could be more informative
[ https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor reassigned HBASE-13552: --- Assignee: Jonathan Lawlor ChoreService shutdown message could be more informative --- Key: HBASE-13552 URL: https://issues.apache.org/jira/browse/HBASE-13552 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Andrew Purtell Assignee: Jonathan Lawlor Priority: Trivial Attachments: HBASE-13552.patch {noformat} 2015-04-23 18:34:38,163 INFO [M:0;localhost:43244] hbase.ChoreService: Chore service for: localhost,43244,1429833734975 had [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778] on shutdown {noformat} Let's give those tasks human meaningful names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13552) ChoreService shutdown message could be more informative
[ https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511677#comment-14511677 ] Jonathan Lawlor commented on HBASE-13552: - ChoreService was introduced by HBASE-6778 into what was branch-1+ at the time. As such, this change would currently affect branch-1.1, branch-1, and master. The attached patch applied cleanly for me to all branches. ChoreService shutdown message could be more informative --- Key: HBASE-13552 URL: https://issues.apache.org/jira/browse/HBASE-13552 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Andrew Purtell Assignee: Jonathan Lawlor Priority: Trivial Attachments: HBASE-13552.patch {noformat} 2015-04-23 18:34:38,163 INFO [M:0;localhost:43244] hbase.ChoreService: Chore service for: localhost,43244,1429833734975 had [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778] on shutdown {noformat} Let's give those tasks human meaningful names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13552) ChoreService shutdown message could be more informative
[ https://issues.apache.org/jira/browse/HBASE-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13552: Affects Version/s: 1.2.0 1.1.0 ChoreService shutdown message could be more informative --- Key: HBASE-13552 URL: https://issues.apache.org/jira/browse/HBASE-13552 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Andrew Purtell Assignee: Jonathan Lawlor Priority: Trivial Attachments: HBASE-13552.patch {noformat} 2015-04-23 18:34:38,163 INFO [M:0;localhost:43244] hbase.ChoreService: Chore service for: localhost,43244,1429833734975 had [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@420579b4, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@75793a48, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@69e18938, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@55f7f1d6, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@92644b2, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2f6806cf, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@56971859, java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59bfa778] on shutdown {noformat} Let's give those tasks human meaningful names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13441) Scan API improvements
[ https://issues.apache.org/jira/browse/HBASE-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509324#comment-14509324 ] Jonathan Lawlor commented on HBASE-13441: - Thanks [~stack], I think those would be some good methods that could use some reworking/cleanup. I'll file a sub issue here for each one and also take a look to see if any other parts of the Scan API could be refined. Scan API improvements - Key: HBASE-13441 URL: https://issues.apache.org/jira/browse/HBASE-13441 Project: HBase Issue Type: Umbrella Reporter: Jonathan Lawlor Umbrella task for improvements that could be made to the Scan API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13543) Deprecate Scan maxResultSize in 2.0.0
Jonathan Lawlor created HBASE-13543: --- Summary: Deprecate Scan maxResultSize in 2.0.0 Key: HBASE-13543 URL: https://issues.apache.org/jira/browse/HBASE-13543 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor The Scan API exposes a maxResultSize to the application. The max result size is used to determine how large the chunks sent back per RPC request should be. This seems like a configuration that should not be present in the public API used by the application but rather a detail that the server should control instead. In a situation where there are multiple concurrent scans being issued against a single region server, it would seem more appropriate to give the server control over this parameter so that it could be optimized against the current load. This issue proposes that the max result size be deprecated in 2.0.0 so that future optimizations could be made to the way that Scan RPC requests are handled by the server. Of course if there are any concerns please raise them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13542) Deprecate Scan batch in 2.0.0
Jonathan Lawlor created HBASE-13542: --- Summary: Deprecate Scan batch in 2.0.0 Key: HBASE-13542 URL: https://issues.apache.org/jira/browse/HBASE-13542 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor The public Scan API exposes a batch API to the client. The batch API allows the application to specify a maximum number of cells to be contained per {{Result}}. It seems as though this API was introduced to allow the server to deal with large rows. However, now that RPC chunking has been addressed by HBASE-11544, it seems that this API may no longer be necessary since large rows will now be returned to the client as partials. This issue proposes that we deprecate Scan batch in 2.0.0 since it introduces some unneeded complication into the public API and doesn't seem all that useful any more. If there are any concerns, please raise them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13541) Deprecate Scan caching in 2.0.0
Jonathan Lawlor created HBASE-13541: --- Summary: Deprecate Scan caching in 2.0.0 Key: HBASE-13541 URL: https://issues.apache.org/jira/browse/HBASE-13541 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor The public Scan API exposes caching to the application. Caching deals with the number of rows that are transferred per scan RPC request issued to the server. It does not seem like a detail that users of a scan should control and introduces some unneeded complication. Seems more like a detail that should be controlled from the server based on the current scan request RPC load. This issue proposes that we deprecate the caching API in 2.0.0 so that it can be removed later. Of course, if there are any concerns please raise them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13545) Provide better documentation for the public API Scan#setRowOffsetPerColumnFamily
Jonathan Lawlor created HBASE-13545: --- Summary: Provide better documentation for the public API Scan#setRowOffsetPerColumnFamily Key: HBASE-13545 URL: https://issues.apache.org/jira/browse/HBASE-13545 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Currently, the public API Scan#setRowOffsetPerColumnFamily seems a little odd and misplaced. The API was introduced in HBASE-5104 to handle behavior that could not be sufficiently created through the use of filters. This issue proposes that the documentation around this API be improved to make it clear how and when it should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13544) Provide better documentation for the public API Scan#setMaxResultsPerColumnFamily
Jonathan Lawlor created HBASE-13544: --- Summary: Provide better documentation for the public API Scan#setMaxResultsPerColumnFamily Key: HBASE-13544 URL: https://issues.apache.org/jira/browse/HBASE-13544 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Currently, the public API Scan#setMaxResultsPerColumnFamily seems a little odd. The API was introduced in HBASE-5104 to handle behavior that could not be sufficiently created through the use of filters. This issue proposes that the documentation around this API be improved to make it clear how and when it should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
[ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor resolved HBASE-13442. - Resolution: Duplicate Closing this one in favor of HBASE-13541. Please reopen if there are any outstanding concerns. Rename scanner caching to a more semantically correct term such as row limit Key: HBASE-13442 URL: https://issues.apache.org/jira/browse/HBASE-13442 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Attachments: HBASE-13442-proposal.diff Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans
[ https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13527: Attachment: HBASE-13527-branch-1.0-addendum.patch Attaching an addendum that addresses compilation error in branch-1.0. The default value for hbase.client.scanner.max.result.size is never actually set on Scans - Key: HBASE-13527 URL: https://issues.apache.org/jira/browse/HBASE-13527 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0, 0.98.12, 1.0.2 Attachments: HBASE-13527-0.98.patch, HBASE-13527-branch-1.0-addendum.patch, HBASE-13527-v1.patch Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. The reason why this issue occurs is because, by default, a new Scan() initializes Scan.maxResultSize to -1. This initial value of -1 will never be changed unless Scan#setMaxResultSize() is called. In the event that this value is not changed, the Scan that is serialized and sent to the server will also have Scan.maxResultSize = -1. Then, when the server is deciding what size limit should be enforced, it sees that Scan.maxResultSize = -1 so it uses the most relaxed size restriction possible, which is hbase.server.scanner.max.result.size (default value 100 MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
[ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13442: Attachment: HBASE-13442-proposal.diff Here's a look at what the proposed API change would look like (before it is integrated into the rest of the codebase). In summary: * The current APIs setCaching/getCaching are deprecated in 2.0.0 * The new APIs setRPCRowLimit/getRPCRowLimit take the place of caching * There is no change in the actual behavior behavior (yet... see below), just a change in name. The name change makes it clear what using the API will actually do. Discussion: * Thoughts on new API names? Other recommendations? * Should behavior stay the same? Alternatively, as [~davelatham] suggested, should we instead make it actually limit the number of rows the client returns to the app (once the row limit is reached the scanner would be closed and return null on future calls to scanner.next())? If the alternative is pursued, it would be more appropriate to call it rowLimit rather than rpcRowLimit. * Anything other suggestions? Rename scanner caching to a more semantically correct term such as row limit Key: HBASE-13442 URL: https://issues.apache.org/jira/browse/HBASE-13442 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Attachments: HBASE-13442-proposal.diff Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
[ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505836#comment-14505836 ] Jonathan Lawlor commented on HBASE-13442: - bq. I don't see why an app should care specifically about how many rows the client transfers from the server in each RPC - bytes seem the more relevant currency to tune for performance. Really good point, I can't think of such a scenario either. Certainly we want to return results from the server on the basis of size rather than some arbitrary number of rows (since row size can vary table to table, there isn't a universally good row limit). This is supported by the move to the default configurations of (caching = Integer.MAX_VALUE, maxResultSize = 2 MB). So actually, the best course of action here wouldn't be to rename caching... but actually to deprecate it so eventually it can be removed completely in favor of rowLimit. The feature in the protocol that allows the client to ask for a certain number of rows would remain, but only be used for backwards compatibility and for the scenario that the client wants to limit itself to only a certain number of rows. Makes sense to me. With such a change, we would also want to remove any associated configurations for caching/rowlimit in hbase-site.xml and hbase-default.xml. There isn't a scenario (at least that I can think of) where it would be appropriate to limit all scans to a particular number of rows and then close them. The row limit would be like the startRow or stopRow settings on scans, configured on a per scan basis with no means to set a global default for all scans. Rename scanner caching to a more semantically correct term such as row limit Key: HBASE-13442 URL: https://issues.apache.org/jira/browse/HBASE-13442 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Attachments: HBASE-13442-proposal.diff Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans
[ https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13527: Description: Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. The reason why this issue occurs is because, by default, a new Scan() initializes Scan.maxResultSize to -1. This initial value of -1 will never be changed unless Scan#setMaxResultSize() is called. In the event that this value is not changed, the Scan that is serialized and sent to the server will also have Scan.maxResultSize = -1. Then, when the server is deciding what size limit should be enforced, it sees that Scan.maxResultSize = -1 so it uses the most relaxed size restriction possible, which is hbase.server.scanner.max.result.size (default value 100 MB). was: Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. The default value for hbase.client.scanner.max.result.size is never actually set on Scans - Key: HBASE-13527 URL: https://issues.apache.org/jira/browse/HBASE-13527 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0 Reporter: Jonathan Lawlor Attachments: HBASE-13527-v1.patch Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. The reason why this issue occurs is because, by default, a new Scan() initializes Scan.maxResultSize to -1. This initial value of -1 will never be changed unless Scan#setMaxResultSize() is called. In the event that this value is not changed, the Scan that is serialized and sent to the server will also have Scan.maxResultSize = -1. Then, when the server is deciding what size limit should be enforced, it sees that Scan.maxResultSize = -1 so it uses the most relaxed size restriction possible, which is hbase.server.scanner.max.result.size (default value 100 MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans
[ https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13527: Status: Patch Available (was: Open) The default value for hbase.client.scanner.max.result.size is never actually set on Scans - Key: HBASE-13527 URL: https://issues.apache.org/jira/browse/HBASE-13527 Project: HBase Issue Type: Bug Affects Versions: 0.98.12, 1.0.0, 2.0.0, 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Attachments: HBASE-13527-v1.patch Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans
[ https://issues.apache.org/jira/browse/HBASE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13527: Attachment: HBASE-13527-v1.patch Attaching a patch that sets max result size in a manner similar to how caching is set inside HTable.getScanner(). The default value for hbase.client.scanner.max.result.size is never actually set on Scans - Key: HBASE-13527 URL: https://issues.apache.org/jira/browse/HBASE-13527 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12, 1.2.0 Reporter: Jonathan Lawlor Attachments: HBASE-13527-v1.patch Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13527) The default value for hbase.client.scanner.max.result.size is never actually set on Scans
Jonathan Lawlor created HBASE-13527: --- Summary: The default value for hbase.client.scanner.max.result.size is never actually set on Scans Key: HBASE-13527 URL: https://issues.apache.org/jira/browse/HBASE-13527 Project: HBase Issue Type: Bug Affects Versions: 0.98.12, 1.0.0, 2.0.0, 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Now that max result size is driven from the client side like caching (HBASE-13362), we also need to set Scan.maxResultSize to the default value of hbase.client.scanner.max.result.size which is never performed. I think this has gone unnoticed because the server used to read the configuration hbase.client.scanner.max.result.size for itself, but now we expect the serialized Scan sent from the client side to contain this information. Realistically this should have been set on the Scans even before HBASE-13362, it's surprising that it's not as the scanner code seems to indicate otherwise. Ultimately, the end result is that, by default, scan RPC's are limited by hbase.server.scanner.max.result.size (note this is the new server side config not the client side config) which has a default value of 100 MB. The scan RPC's should instead be limited by hbase.client.scanner.max.result.size which has a default value of 2 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting for
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Summary: Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting for (was: Fix test failures in TestScannerHeartbeatMessages in branch-1.1 and branch-1) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting for -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Description: The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. (was: The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds.) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503889#comment-14503889 ] Jonathan Lawlor commented on HBASE-13082: - I'm a little late to the party but this versioned data structure sounds neat. If I'm understanding correctly, it sounds like this versioned data structure would also allow us to remove the lingering lock in updateReaders (and potentially remove updateReaders completely?). Instead of having to update the readers, the compaction/flush would occur in the background and be made visible to new readers via a new latest version in the data structure, is that correct? In other words, would the introduction of this new versioned data structure make StoreScanner single threaded (and thus remove any need for synchronization)? Coarsen StoreScanner locks to RegionScanner --- Key: HBASE-13082 URL: https://issues.apache.org/jira/browse/HBASE-13082 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 13082-v4.txt, 13082.txt, 13082.txt, gc.png, gc.png, gc.png, hits.png, next.png, next.png Continuing where HBASE-10015 left of. We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock already held by the RegionScanner. In tests this shows quite a scan improvement and reduced CPU (the fences make the cores wait for memory fetches). There are some drawbacks too: * All calls to RegionScanner need to be remain synchronized * Implementors of coprocessors need to be diligent in following the locking contract. For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation (not picking on Phoenix, this one is my fault as I told them it's OK) * possible starving of flushes and compaction with heavy read load. RegionScanner operations would keep getting the locks and the flushes/compactions would not be able finalize the set of files. I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor reassigned HBASE-13514: --- Assignee: Jonathan Lawlor Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13514-branch-1.1.patch, HBASE-13514-branch-1.patch, HBASE-13514.patch The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Status: Patch Available (was: Open) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13514-branch-1.1.patch, HBASE-13514-branch-1.patch, HBASE-13514.patch The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Attachment: HBASE-13514-branch-1.patch HBASE-13514-branch-1.1.patch HBASE-13514.patch Attaching a patch for each branch to get a QA run on each. The patch addresses the test failure and also adds a deleteTable in test cleanup. [~tedyu] got some time to take a quick looksee? Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13514-branch-1.1.patch, HBASE-13514-branch-1.patch, HBASE-13514.patch The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503574#comment-14503574 ] Jonathan Lawlor commented on HBASE-13090: - Filed HBASE-13514 to address the test failures in branch-1 and branch-1.1 Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages in branch-1.1 and branch-1
Jonathan Lawlor created HBASE-13514: --- Summary: Fix test failures in TestScannerHeartbeatMessages in branch-1.1 and branch-1 Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Summary: Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting of hbase.rpc.timeout (was: Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting for ) Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13514: Summary: Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout (was: Fix test failures in TestScannerHeartbeatMessages caused by a too restrictive setting of hbase.rpc.timeout) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0, 1.2.0 The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503512#comment-14503512 ] Jonathan Lawlor commented on HBASE-13090: - [~tedyu] thanks for digging in here. I have done some investigation into the root cause of this issue and it seems to be coming from the field {{MIN_RPC_TIMEOUT}} inside {{RpcRetryingCaller}} in branch-1. This {{MIN_RPC_TIMEOUT}} field in branch-1 prevents setting the RPC timeout value to anything less than 2 seconds. In master this field no longer exists and the timeout value can be specified to be as small as we wish. In the case of TestScannerHeartbeatMessages, the RPC timeout was specified to be 0.5 seconds which is why it fails when it is 2 seconds instead. I will attach a patch shortly to address this issue, thanks! Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13090: Release Note: Previously, there was no way to enforce a time limit on scan RPC requests. The server would receive a scan RPC request and take as much time as it needed to accumulate enough results to reach a limit or exhaust the region. The problem with this approach was that, in the case of a very selective scan, the processing of the scan could take too long and cause timeouts client side. With this fix, the server will now enforce a time limit on the execution of scan RPC requests. When a scan RPC request arrives to the server, a time limit is calculated to be half of whichever timeout value is more restictive between the configurations (hbase.client.scanner.timeout.period and hbase.rpc.timeout). When the time limit is reached, the server will return whatever results it has accumulated up to that point. The results may be empty. To ensure that timeout checks do not occur too often (which would hurt the performance of scans), the configuration hbase.cells.scanned.per.heartbeat.check has been introduced. This configuration controls how often System.currentTimeMillis() is called to update the progress towards the time limit. Currently, the default value of this configuration value is 1. Specifying a smaller value will provide a tighter bound on the time limit, but may hurt scan performance due to the higher frequency of calls to System.currentTimeMillis(). Protobuf models for ScanRequest and ScanResponse have been updated so that heartbeat support can be communicated. Support for heartbeat messages is specified in the request sent to the server via ScanRequest.Builder#setClientHandlesHeartbeats. Only when the server sees that ScanRequest#getClientHandlesHeartbeats() is true will it send heartbeat messages back to the client. A response is marked as a heartbeat message via the boolean flag ScanResponse#getHeartbeatMessage Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500349#comment-14500349 ] Jonathan Lawlor commented on HBASE-13090: - [~ndimiduk] I believe the change is solid. Just figured with branch-1.1 release so close may be a bit 'risky' to stick such a large change in right before release. While the unit tests added do stress the relevant code paths, it would be nice to run it against a workload that was having timeout problems before to prove its worth Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500924#comment-14500924 ] Jonathan Lawlor commented on HBASE-13090: - [~tedyu] Thanks for catching that. Seems HRegionServer no longer throws InterruptedException in master. Addendum lgtm. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-5980: --- Attachment: HBASE-5980-v1.patch Attaching a patch that exposes the following server side metrics to the client side ScanMetrics: * Number of rows scanned (metric name is ROWS_SCANNED) * Number of rows filtered (metric name is ROWS_FILTERED) Important notes: * ScanMetrics now contains a mix of both client side and server side metrics * AbstractScanMetrics and ServerSideScanMetrics were added to try and keep the ScanMetrics stuff clean * The following new arguments are now supported in scans from the shell: ** GET_ALL_METRICS: boolean indicating whether or not all scan metrics should be output ** GET_METRICS: array of metric keys the user wants to see (this argument trumps GET_ALL_METRICS) ** Example usages: *** scan 'table', {GET_ALL_METRICS = true} *** scan 'table', {GET_METRICS = ['RPC_RETRIES', 'ROWS_FILTERED']} * Metrics are always output in alphabetical order Discussion points: * I think the name of the metrics and shell arguments could be improved, just chose some easy names to show their usage. Thoughts? * The other metric mentioned above still needs to be added (number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode). Adding new metrics is easy: just specify the new field in ServerSideScanMetrics and add the appropriate tracking calls. I wanted to get some feedback on how these metrics looked first rather than add a bunch of metrics all at once. * All of the metrics [~lhofhansl] mentioned sound great. In terms of coprocessors, what kind of metrics would be valuable to expose to the client? Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Priority: Minor Attachments: HBASE-5980-v1.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13090: Attachment: HBASE-13090-v7.patch Updated patch incorporating feedback from reviewboard Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor reassigned HBASE-5980: -- Assignee: Jonathan Lawlor Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Assignee: Jonathan Lawlor Priority: Minor Attachments: HBASE-5980-v1.patch Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9444) EncodedScannerV2#isSeeked does not behave as described in javadoc
[ https://issues.apache.org/jira/browse/HBASE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-9444: --- Resolution: Fixed Status: Resolved (was: Patch Available) Resolving as fixed as HBASE-9915 has addressed this issue. Please reopen if there are additional concerns not covered by the solution in HBASE-9915. EncodedScannerV2#isSeeked does not behave as described in javadoc - Key: HBASE-9444 URL: https://issues.apache.org/jira/browse/HBASE-9444 Project: HBase Issue Type: Bug Components: HFile Reporter: Chao Shi Priority: Minor Attachments: hbase-9444.patch I hit this when my tool is scanning HFiles using the scanner. I found isSeeked behaves different whether the HFiles are prefix-encoded or not. There is a test case in my patch that demonstrates the bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered
[ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor reopened HBASE-5980: This one was recently closed due to inactivity but caught my eye because it sounds like a nice one to have. Currently we track some client side metrics during scans such as count of regions scanned, count of RPCs, etc... (full list available in ScanMetrics class). However, these client side metrics do not include information regarding events that have occurred server side (like how many kv's have been filtered). If we wanted to have these metrics available client side, I believe it could be achieved in the following manner: 1. Define a new class to encapsulate the server side metrics that we wish to access/track client side 2. Define a new protobuf message type for this new metrics class 3. Add the metrics as another field in the ScanResponse 4. Add new fields to ScanMetrics (the class that already exists client side) corresponding to the server side metrics and update these metrics after each RPC response in ScannerCallable In terms of how to actually track these metrics during Scan RPC's, we can add an instance of this new server side metrics class to the ScannerContext class that was added in HBASE-13421. Then all metric tracking could be performed via ScannerContext#getMetrics()#update... Any thoughts/comments? Scanner responses from RS should include metrics on rows/KVs filtered - Key: HBASE-5980 URL: https://issues.apache.org/jira/browse/HBASE-5980 Project: HBase Issue Type: Improvement Components: Client, metrics, regionserver Affects Versions: 0.95.2 Reporter: Todd Lipcon Priority: Minor Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example: - number of rows filtered by row key alone (filterRowKey()) - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12930) Check single row size not exceed configured max row size across families for Get/Scan
[ https://issues.apache.org/jira/browse/HBASE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492605#comment-14492605 ] Jonathan Lawlor commented on HBASE-12930: - [~cuijianwei] Recently there was a change made (HBASE-13421) to the solution initially conceived in HBASE-11544. The remaining result size limit still exists, but now it is carried within the new class, ScannerContext (see ScannerContext#incrementaSizeProgress and ScannerContext#checkSizeLimit(...)). ScannerContext was introduced to allow us to reduce the number of object creations in the scanner hot code paths and also provides a nice encapsulation of limits and limit progress. Please let me know if you have any questions :) Check single row size not exceed configured max row size across families for Get/Scan - Key: HBASE-12930 URL: https://issues.apache.org/jira/browse/HBASE-12930 Project: HBase Issue Type: Improvement Components: Scanners Reporter: cuijianwei Priority: Minor Fix For: 2.0.0 StoreScanner#next will check the 'totalBytesRead' not exceed configured ‘hbase.table.max.rowsize’ for each family. However, if there are several families, the single row will also achieve unexpected big size even if 'totalBytesRead' of each family not exceed 'hbase.table.max.rowsize'. This may cause the region server fail because of OOM. What about checking single row size across families in StoreScanner#next(ListCell, int)? {code} long totalBytesRead = 0; // == compute the size of cells have been read for (Cell cell : outResult) { totalBytesRead += CellUtil.estimatedSerializedSizeOf(old); } LOOP: while((cell = this.heap.peek()) != null) { ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13441) Scan API improvements
Jonathan Lawlor created HBASE-13441: --- Summary: Scan API improvements Key: HBASE-13441 URL: https://issues.apache.org/jira/browse/HBASE-13441 Project: HBase Issue Type: Umbrella Reporter: Jonathan Lawlor Umbrella task for improvements that could be made to the Scan API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
Jonathan Lawlor created HBASE-13442: --- Summary: Rename scanner caching to a more semantically correct term such as row limit Key: HBASE-13442 URL: https://issues.apache.org/jira/browse/HBASE-13442 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Caching acts more as a limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488140#comment-14488140 ] Jonathan Lawlor commented on HBASE-11544: - [~lhofhansl] good point. I have filed HBASE-13441 as an umbrella issue for discussion regarding potential improvements to the Scan API. HBASE-13442 deals specifically with the rename to rowLimit. [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Assignee: Jonathan Lawlor Priority: Critical Fix For: 2.0.0, 1.1.0 Attachments: Allocation_Hot_Spots.html, HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, hits.j.png, m.png, mean.png, net.j.png, q (2).png Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
[ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13442: Description: Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. (was: Caching acts more as a limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit.) Rename scanner caching to a more semantically correct term such as row limit Key: HBASE-13442 URL: https://issues.apache.org/jira/browse/HBASE-13442 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: HBASE-13421-branch-1.patch Attaching the branch-1 patch Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, HBASE-13421-v2.patch, HBASE-13421-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485613#comment-14485613 ] Jonathan Lawlor commented on HBASE-13421: - Whoops, already had reviewboard link... please ignore noise Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, HBASE-13421-v2.patch, HBASE-13421-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13215) A limit on the raw key values is needed for each next call of a scanner
[ https://issues.apache.org/jira/browse/HBASE-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485795#comment-14485795 ] Jonathan Lawlor commented on HBASE-13215: - Hey [~heliangliang], have you had any time to work on this lately; any updates? A limit on the raw key values is needed for each next call of a scanner --- Key: HBASE-13215 URL: https://issues.apache.org/jira/browse/HBASE-13215 Project: HBase Issue Type: Improvement Components: Scanners Reporter: He Liangliang Assignee: He Liangliang In the current scanner next, there are several limits: caching, batch and size. But there is no limit on raw data scanned, so the time consumed by a next call is unbounded. For example, many consecutive deleted or filtered out cells will leads to a socket timeout. This can make user code get stuck. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner
[ https://issues.apache.org/jira/browse/HBASE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485825#comment-14485825 ] Jonathan Lawlor commented on HBASE-12266: - Sounds like this issue is related to the heartbeat/keepalive idea in HBASE-13090. The idea over there is to track how long a scan has been executing server side and to return periodic heartbeat/keepalive messages in the event that the scan is taking a long time. The frequency of these heartbeats messages would be dependent upon the configured scanner timeout (a more restrictive timeout would lead to more frequent heartbeat messages). This solution would address the issue of a slow scan and would also remove the possibility of this dead loop. Thoughts? Think we could close this one out? Slow Scan can cause dead loop in ClientScanner --- Key: HBASE-12266 URL: https://issues.apache.org/jira/browse/HBASE-12266 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.0 Reporter: Qiang Tian Priority: Minor Attachments: 12266-v2.txt, HBASE-12266-master.patch see http://search-hadoop.com/m/DHED45SVsC1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-6491) add limit function at ClientScanner
[ https://issues.apache.org/jira/browse/HBASE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-6491: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) This issue is quite old. Resolving as won't fix for now but please feel free to reopen if the need for the feature is still present. As [~stack] pointed out, this may be unsafe for the client in the case that we are dealing with very large rows. It may be more appropriate to implement such a behavior in the form of a coprocessor rather than making RPC calls for the sake of skipping results client side. add limit function at ClientScanner --- Key: HBASE-6491 URL: https://issues.apache.org/jira/browse/HBASE-6491 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.95.2 Reporter: ronghai.ma Assignee: ronghai.ma Labels: patch Attachments: ClientScanner.java, HBASE-6491.patch Add a new method in ClientScanner to implement a function like LIMIT in MySQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-5978) Scanner next() calls should return after a configurable time threshold regardless of number of accumulated rows
[ https://issues.apache.org/jira/browse/HBASE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor resolved HBASE-5978. Resolution: Duplicate Resolving as duplicate as this issue seems to be the same as HBASE-13090. Scanner next() calls should return after a configurable time threshold regardless of number of accumulated rows --- Key: HBASE-5978 URL: https://issues.apache.org/jira/browse/HBASE-5978 Project: HBase Issue Type: Improvement Components: Client, regionserver Affects Versions: 0.90.7, 0.92.1 Reporter: Todd Lipcon Currently if you pass a very restrictive filter to a scanner, along with a high caching value, you will end up causing RPC timeouts, lease exceptions, etc. Although this is a poor configuration and easy to work around by lowering caching, HBase should be resilient to a badly chosen caching value. As such, the scanner next() call should record the elapsed time, and after some number of seconds have passed, return any accumulated rows regardless of the caching value. This prevents the calls from starving out other threads or region operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: HBASE-13421-v3.patch Attaching patch to address issues in commit message as well as the checkstyle error (added a getInstance() method to NoLimitScannerContext) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch, HBASE-13421-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485517#comment-14485517 ] Jonathan Lawlor commented on HBASE-13421: - Making the branch-1 patch now and will attach once conflicts are resolved Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch, HBASE-13421-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485919#comment-14485919 ] Jonathan Lawlor commented on HBASE-13421: - Looks like the build was green but the comment couldn't be made because of a login error with hadoopQA: https://builds.apache.org/job/PreCommit-HBASE-Build/13621/consoleFull Build Result: {quote} {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723952/HBASE-13421-v3.patch against master branch at commit 8cd3001f817915df20a4d209c450ac9b69b915d7. ATTACHMENT ID: 12723952 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 148 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13621//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13621//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13621//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13621//console {quote} Error: {quote} == == Adding comment to Jira. == == Unable to log in to server: https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa. Cause: com.atlassian.jira.rpc.exception.RemoteAuthenticationException: Attempt to log in user 'hadoopqa' failed. The maximum number of failed login attempts has been reached. Please log into the application through the web interface to reset the number of failed login attempts. {quote} Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-branch-1.patch, HBASE-13421-v1.patch, HBASE-13421-v2.patch, HBASE-13421-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-7026) Make metrics collection in StoreScanner.java more efficient
[ https://issues.apache.org/jira/browse/HBASE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor resolved HBASE-7026. Resolution: Fixed Marking this old one as fixed. I do not see these metrics being recorded inside StoreScanner anymore and thus potential performance regressions seem to have been addressed. Make metrics collection in StoreScanner.java more efficient --- Key: HBASE-7026 URL: https://issues.apache.org/jira/browse/HBASE-7026 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Per the benchmarks I ran, the following block of code seems to be inefficient: StoreScanner.java: public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { // ... // update the counter if (addedResultsSize 0 metric != null) { HRegion.incrNumericMetric(this.metricNamePrefix + metric, addedResultsSize); } // ... Removing this block increased throughput by 10%. We should move this to the outer layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13262) ResultScanner doesn't return all rows in Scan
[ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486431#comment-14486431 ] Jonathan Lawlor commented on HBASE-13262: - [~davelatham] so with the workaround of setting hbase.client.scanner.max.result.size=1000, you receive a batch of 30 rows back from the server and then the scan terminates without scanning the rest of the region, is that right? Also, do you have HFileV3 turned on (i.e. what is the configured value for hfile.format.version)? When tests were run in 0.98 (details above), this issue wasn't readily producible (0.98 uses HFileV2 by default, and this issue was discovered to be a result of using HFileV3). If you do receive the full 30 rows back from the server, and you are *not* using HFileV3, I would be inclined to agree with you and say that this is in fact a different issue. Would you be able to provide any more details about the particular scan configuration? ResultScanner doesn't return all rows in Scan - Key: HBASE-13262 URL: https://issues.apache.org/jira/browse/HBASE-13262 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0, 1.1.0 Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT Reporter: Josh Elser Assignee: Josh Elser Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13262-0.98-testpatch.txt, HBASE-13262-0.98-v7.patch, HBASE-13262-branch-1-v2.patch, HBASE-13262-branch-1-v3.patch, HBASE-13262-branch-1.0-v7.patch, HBASE-13262-branch-1.patch, HBASE-13262-v1.patch, HBASE-13262-v2.patch, HBASE-13262-v3.patch, HBASE-13262-v4.patch, HBASE-13262-v5.patch, HBASE-13262-v6.patch, HBASE-13262-v7.patch, HBASE-13262-v7.patch, HBASE-13262.patch, regionserver-logging.diff, testrun_0.98.txt, testrun_branch1.0.txt Tried to write a simple Java client again 1.1.0-SNAPSHOT. * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total of 10M cells written * Read back the data from the table, ensure I saw 10M cells Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual rows. Running against 1.0.0, returns all 10M records as expected. [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java] for the curious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10060) Unsynchronized scanning
[ https://issues.apache.org/jira/browse/HBASE-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485997#comment-14485997 ] Jonathan Lawlor commented on HBASE-10060: - Hey [~lhofhansl] is this one the same as HBASE-13082? Should we mark this one as a duplicate of HBASE-13082? Unsynchronized scanning --- Key: HBASE-10060 URL: https://issues.apache.org/jira/browse/HBASE-10060 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 10060-trunk-v2.txt, 10060-trunk.txt HBASE-10015 has some lengthy discussion. The solution there ended up replacing synchronized with ReentrantLock, which - somewhat surprisingly - yielded a non-trivial improvement for tall tables. The goal should be to avoid locking in StoreScanner at all. StoreScanner is only accessed by a single thread *except* when we have a concurrent flush or a compaction, which is rare (we'd acquire and release the lock millions of times per second, and compact/flush a few time an hour at the most). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-5607) Implement scanner caching throttling to prevent too big responses
[ https://issues.apache.org/jira/browse/HBASE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor resolved HBASE-5607. Resolution: Fixed Resolving this old one as fixed. With HBASE-2214 the throttling mechanism is available to the client to achieve the described behavior via Scan.setMaxResultSize. Furthermore, with HBASE-12976 in branch-1+, the default value of max result size is set to a reasonable value that will prevent a client from unintentionally causing issues on the region server. Feel free to reopen if you feel there are additional concerns that I have missed. Implement scanner caching throttling to prevent too big responses -- Key: HBASE-5607 URL: https://issues.apache.org/jira/browse/HBASE-5607 Project: HBase Issue Type: Improvement Reporter: Ferdy Galema When a misconfigured client retrieves fat rows with a scanner caching value set too high, there is a big chance the regionserver cannot handle the response buffers. (See log example below). Also see the mailing list thread: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24819 This issue is for tracking a solution that throttles the scanner caching value in the case the response buffers are too big. A few possible solutions: a) Is a response (repeatedly) over 100MB (configurable), then reduce the scanner-caching by half its size. (In either server or client). b) Introduce a property that defines a fixed (target) response size, instead of defining the numbers of rows to cache. 2012-03-20 07:57:40,092 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020, responseTooLarge for: next(4438820558358059204, 1000) from 172.23.122.15:50218: Size: 105.0m 2012-03-20 07:57:53,226 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) from 172.23.122.15:50218: Size: 214.4m 2012-03-20 07:57:57,839 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) from 172.23.122.15:50218: Size: 103.2m 2012-03-20 07:57:59,442 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020, responseTooLarge for: next(-7429189123174849941, 1000) from 172.23.122.15:50218: Size: 101.8m 2012-03-20 07:58:20,025 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020, responseTooLarge for: next(9033159548564260857, 1000) from 172.23.122.15:50218: Size: 107.2m 2012-03-20 07:58:27,273 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, responseTooLarge for: next(9033159548564260857, 1000) from 172.23.122.15:50218: Size: 100.1m 2012-03-20 07:58:52,783 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020, responseTooLarge for: next(-8611621895979000997, 1000) from 172.23.122.15:50218: Size: 101.7m 2012-03-20 07:59:02,541 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020, responseTooLarge for: next(-511305750191148153, 1000) from 172.23.122.15:50218: Size: 120.9m 2012-03-20 07:59:25,346 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020, responseTooLarge for: next(1570572538285935733, 1000) from 172.23.122.15:50218: Size: 107.8m 2012-03-20 07:59:46,805 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, responseTooLarge for: next(-727080724379055435, 1000) from 172.23.122.15:50218: Size: 102.7m 2012-03-20 08:00:00,138 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, responseTooLarge for: next(-3701270248575643714, 1000) from 172.23.122.15:50218: Size: 122.1m 2012-03-20 08:00:21,232 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218: Size: 157.5m 2012-03-20 08:00:23,199 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218: Size: 160.7m 2012-03-20 08:00:28,174 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218: Size: 160.8m 2012-03-20 08:00:32,643 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218: Size: 182.4m 2012-03-20 08:00:36,826 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218: Size: 237.2m 2012-03-20 08:00:40,850 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, responseTooLarge for: next(5831907615409186602, 1000) from 172.23.122.15:50218:
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Status: Patch Available (was: Open) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Status: Open (was: Patch Available) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: HBASE-13421-v1.patch Reattaching with appropriate name to avoid confusion Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: (was: HBASE-11544-addendum-v3.patch) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Status: Patch Available (was: Open) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-11544-addendum-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484343#comment-14484343 ] Jonathan Lawlor commented on HBASE-11544: - Filed sub-task HBASE-13421 to address the fix to reduce the number of objects being created. [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Assignee: Jonathan Lawlor Priority: Critical Fix For: 2.0.0, 1.1.0 Attachments: Allocation_Hot_Spots.html, HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, hits.j.png, m.png, mean.png, net.j.png, q (2).png Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
Jonathan Lawlor created HBASE-13421: --- Summary: Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: HBASE-11544-addendum-v3.patch Attaching latest patch that incorporates latest feedback from reviewboard. Let's see what QA has to say Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-11544-addendum-v3.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13421) Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths
[ https://issues.apache.org/jira/browse/HBASE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13421: Attachment: HBASE-13421-v2.patch Updated patch to address checkstyle and javadoc warnings. The failing test (TestFastFail) passed locally so may have just been flaky, retry. Reduce the number of object creations introduced by HBASE-11544 in scan RPC hot code paths -- Key: HBASE-13421 URL: https://issues.apache.org/jira/browse/HBASE-13421 Project: HBase Issue Type: Sub-task Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13421-v1.patch, HBASE-13421-v2.patch HBASE-11544 made NextState the new return type of RegionScanner#nextRaw InternalScanner#next to allow state information to be passed back from a scanner (it was formerly a boolean indicating whether or not more values existed). The change in this return type led to an increased amount of objects being created... In the case that a scan spanned millions of rows, there was the potential for millions of object to be created. This issue looks to reduce the large amount of object creations from potentially many to at most one per RPC request. Please see the tail of the parent issue for relevant discussion on the design decisions related to this solution. This sub-task has been filed as it seems more appropriate to address the fix here rather than as an addendum to the parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13362) set max result size from client only (like caching)?
[ https://issues.apache.org/jira/browse/HBASE-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482070#comment-14482070 ] Jonathan Lawlor commented on HBASE-13362: - +1, looks good to me. Question, should we also add an entry for this new configuration to hbase-default.xml? I'm just thinking, as a user, how would I know about this new configuration value and the semantics behind it? set max result size from client only (like caching)? Key: HBASE-13362 URL: https://issues.apache.org/jira/browse/HBASE-13362 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 13362-0.98.txt, 13362-master.txt With the recent problems we've been seeing client/server result size mismatch, I was thinking: Why was this not a problem with scanner caching? There are two reasons: # number of rows is easy to calculate (and we did it correctly) # caching is only controlled from the client, never set on the server alone We did fix both #1 and #2 in HBASE-13262. Still, I'd like to discuss the following: * default the client sent max result size to 2mb * remove any server only result sizing * continue to use hbase.client.scanner.max.result.size but enforce it via the client only (as the name implies anyway). Comments? Concerns? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: HBASE-13374-v1.patch Reattaching again now that apache infra is stable, let's get a QA run in Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392994#comment-14392994 ] Jonathan Lawlor commented on HBASE-13374: - Ohh I see, that makes sense then, thanks! Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: HBASE-13374-v1.patch Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Status: Open (was: Patch Available) Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Status: Patch Available (was: Open) Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392965#comment-14392965 ] Jonathan Lawlor commented on HBASE-13374: - Looks like there is an issue fetching from git. I see this in the console output of the precommit build: {quote} FATAL: Failed to fetch from https://git-wip-us.apache.org/repos/asf/hbase.git {quote} Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: HBASE-13374-v1.patch Re-attaching the patch as the precommit build didn't start. Let's see what QA thinks about this Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391753#comment-14391753 ] Jonathan Lawlor commented on HBASE-13374: - bq. If lastResult is from previous region and I am in new region, how can I get a row from before? Prior to this change, this else if block would set the start key to lastResult.getRow() and the first result returned from the server would be skipped. Rather than skipping the first Result returned from the server, it would be better if we could set the start key such that the first Result returned from the server would be the Result that follows lastResult. In this case, since we are performing a reversed scan, the start key that should be used is the key of the closest row that could occur before lastResult.getRow. bq. Where was the int overflow? In the int i count? Integer overflow was occurring on the following line {code} cacheNum++; {code} Looks like this increment to caching was performed as part of the skipRowOfFirstResult hack (since the first row is skipped, increment caching by one) bq. How did you find the issues? Especially int overflow one? For issue #1 (integer overflow), I tracked it down after noticing that not all rows were being retrieved with the following scan configuration (caching=Int.MAX, maxResultSize=1, small=true). What ends up happening is that {{cacheNum++}} overflows and we end up sending a negative caching value to the server (this is equivalent to telling this server that no rows should be retrieved). The result is that all RPC's after the overflow will return empty results and the client will think that all regions have been exhausted. For issue #2, I tracked it down after noticing that there were still rows missing if I used the following scan configuration (caching=100, maxResultSize=1, small=true). In this case, what happens is that a single row does not fit into the defined max result size (i.e. we reach the size limit after retrieving only a single row). Thus, we will receive only one row back from the first RPC. Then, in the next RPC, the only row returned will be that same row. This is because the small scanner expects that by increasing the caching limit it will be able to skip this row... it doesn't account for the fact that the size limit may be reached before the caching limit. Thus, since only one row is returned, and that row is skipped, the scanner interprets this as meaning that the region is exhausted and will skip all of the remaining rows in that region. Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: HBASE-13374-v1.patch Attaching a patch that fixes both issues. If the start key is set correctly on the small scanner callable then the caching does not need to be incremented (thus avoiding the integer overflow) and skipRowOfFirstResult can be removed. Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Status: Patch Available (was: Open) Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor reassigned HBASE-13374: --- Assignee: Jonathan Lawlor Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Blocker Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
Jonathan Lawlor created HBASE-13374: --- Summary: Small scanners (with particular configurations) do not return all rows Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: small-scanner-data-loss-tests-branch-1.0+.patch Attaching patch that can be applied to branch-1.0+. This patch does not contain a fix. It contains the tests cases that allow us to see the failure modes. Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Attachments: small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows
[ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: Attachment: small-scanner-data-loss-tests-0.98.patch Corresponding patch for 0.98 Small scanners (with particular configurations) do not return all rows -- Key: HBASE-13374 URL: https://issues.apache.org/jira/browse/HBASE-13374 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 Reporter: Jonathan Lawlor Attachments: small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. The following two issues have been observed (both lead to data loss): 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: 1. Integer overflow will not occur when incrementing caching 2. At least 2 rows will be returned from the server unless the region has been exhausted I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13335) Update ClientSmallScanner and ClientSmallReversedScanner
[ https://issues.apache.org/jira/browse/HBASE-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389395#comment-14389395 ] Jonathan Lawlor commented on HBASE-13335: - +1, these changes look good to me. Those tests look great. It may also be a good idea to extend TestSizeFailures so that it also tests to ensure that all data is seen when the scan is small (e.g. perform that same scan near the end with but configure it with Scan.setSmall(true)). Even though that wouldn't be a small scan, it would test to make sure the fix behaves as expected. Update ClientSmallScanner and ClientSmallReversedScanner Key: HBASE-13335 URL: https://issues.apache.org/jira/browse/HBASE-13335 Project: HBase Issue Type: Sub-task Components: Client Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13335-0.98-v1.patch, HBASE-13335-branch-1-v1.patch, HBASE-13335-v1.patch, HBASE-13335.patch Some follow-on work for HBASE-13262: it's unlikely that clients using the small scanners would get enough data to run into the initial bug, but the scanner implementations should still adhere to the moreResultsInRegion flag when the server sends it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13335) Update ClientSmallScanner and ClientSmallReversedScanner
[ https://issues.apache.org/jira/browse/HBASE-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389419#comment-14389419 ] Jonathan Lawlor commented on HBASE-13335: - Sounds good to me Update ClientSmallScanner and ClientSmallReversedScanner Key: HBASE-13335 URL: https://issues.apache.org/jira/browse/HBASE-13335 Project: HBase Issue Type: Sub-task Components: Client Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13335-0.98-v1.patch, HBASE-13335-branch-1-v1.patch, HBASE-13335-v1.patch, HBASE-13335.patch Some follow-on work for HBASE-13262: it's unlikely that clients using the small scanners would get enough data to run into the initial bug, but the scanner implementations should still adhere to the moreResultsInRegion flag when the server sends it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13362) set max result size from client only (like caching)?
[ https://issues.apache.org/jira/browse/HBASE-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386983#comment-14386983 ] Jonathan Lawlor commented on HBASE-13362: - This sounds like a great idea. As [~lhofhansl] pointed out with the test in HBASE-13297, it easy for the client and server to have different configurations for the default max result size. Prior to HBASE-13262 this would have meant data loss. With HBASE-13262 we no longer have data loss, it's just ugly. If instead it was dealt with in the same manner as caching (e.g. the caching value must be carried in the ScanRequest to the server) it would be much cleaner. The 2mb default sounds good. Probably obvious, but just to be clear, we would still need to support instances where the client uses a negative maxResultSize to indicate that the response should not be limited by the result size (i.e. negative maxResultSize is equivalent to maxResultSize = Long.MAX_VALUE). If backported prior to branch-1, it would be nice to accompany this change with a change in the default caching value (from the current default of 100 to Integer.MAX_VALUE) so that the size limit is reached by default, rather than the caching/row limit (I say prior to branch-1 because the defaults of caching/maxResultSize in branch-1+ will already produce this behavior). Granted, this accompanying change would probably be dealt with best in a separate JIRA. set max result size from client only (like caching)? Key: HBASE-13362 URL: https://issues.apache.org/jira/browse/HBASE-13362 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl With the recent problems we've been seeing client/server result size mismatch, I was thinking: Why was this not a problem with scanner caching? There are two reasons: # number of rows is easy to calculate (and we did it correctly) # caching is only controlled from the client, never set on the server alone We did fix both #1 and #2 in HBASE-13262. Still, I'd like to discuss the following: * default the client sent max result size to 2mb * remove any server only result sizing * continue to use hbase.client.scanner.max.result.size but enforce it via the client only (as the name implies anyway). Comments? Concerns? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-11544: Attachment: HBASE-11544-addendum-v2.patch Attaching a rebased version of the patch since recent changes on master prevented a clean apply. Anyone have any thoughts on how ScannerContexts fits into the scanner RPC workflow? Questions, ideas for improvement, alternative approaches? [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Assignee: Jonathan Lawlor Priority: Critical Fix For: 2.0.0, 1.1.0 Attachments: Allocation_Hot_Spots.html, HBASE-11544-addendum-v1.patch, HBASE-11544-addendum-v2.patch, HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, hits.j.png, m.png, mean.png, net.j.png, q (2).png Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
[ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-11544: Attachment: HBASE-11544-addendum-v1.patch Work in progress update: I've been working on an addendum that includes the ScannerContext changes described above and would like to get some feedback. I am attaching the patch but I would like to highlight the following point given the discussion above: * The intention was to modify the RegionScanner/InternalScanner interfaces as [~stack] and I described above. Specifically, I wanted to have the following signatures in RegionScanner (and equivalent ones in InternalScanner): {code} ScannerContext nextRaw(ListCell result) throws IOException; ScannerContext nextRaw(ListCell result, ScannerContext scannerContext) {code} ** As far as I can tell, the proposed interface change has two problems: In the event that the first method is called, a ScannerContext object would need to be created (object creations are what we want to avoid). Also, if the second method is called, we are simply returning the same object that the caller passed in, so the return value is redundant ** Instead I made NextState an enum and I return that. A NextState enum was used instead of the previous boolean return type because it allows the caller to determine when a partial result has been formed. An argument could be made that the return type should be boolean and we should just put NextState inside the context, but I didn't do that because it would make the code messier (would have to call scannerContext.setState() before every return statement and opens up the potential to miss setting the state when really we just want to return it). * This way ScannerContext simply holds the limits and tracks the progress towards those limits So with this patch what we get is: The good: * One object creation per session/rpc instead of potentially millions in the case of large batch scans * Much more explicit state information is returned from RegionScanner/InternalScanner The bad: * An object is being passed around between scanners whereas we had a primitive per limit before. ** However, note that the drawback of having a primitive per limit is that it does not tell us about the progress that has been made towards those limits and thus any progress must be recalculated by the caller * The RegionScanner interface is changed from Stable to Evolving due to the changes necessary in the interface (this change was noted over in HBASE-13306 but given that we are filing an addendum it makes more sense to address it here). As this is a work in progress the docs could still use a little love, but at the very least this patch lets us see the way that Scan RPC's would look server side in the event that ScannerContext is introduced. [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME -- Key: HBASE-11544 URL: https://issues.apache.org/jira/browse/HBASE-11544 Project: HBase Issue Type: Bug Reporter: stack Assignee: Jonathan Lawlor Priority: Critical Fix For: 2.0.0, 1.1.0 Attachments: Allocation_Hot_Spots.html, HBASE-11544-addendum-v1.patch, HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch, HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch, HBASE-11544-v8.patch, gc.j.png, h.png, hits.j.png, m.png, mean.png, net.j.png, q (2).png Running some tests, I set hbase.client.scanner.caching=1000. Dataset has large cells. I kept OOME'ing. Serverside, we should measure how much we've accumulated and return to the client whatever we've gathered once we pass out a certain size threshold rather than keep accumulating till we OOME. -- This message was sent by Atlassian JIRA (v6.3.4#6332)