[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252491#comment-13252491 ] Keith Turner commented on HBASE-5754: - The counts for the 1B run seem odd to me , but maybe thats just an artifact of how many map task you ran for the generator and how much data each task generated. If a map task does not does not generate a multiple of 25,000,000 then it will leave some unreferenced. It generates a circular linked list every 25M. {noformat} 12/04/12 03:54:11 INFO mapred.JobClient: REFERENCED=564459547 12/04/12 03:54:11 INFO mapred.JobClient: UNREFERENCED=104000 {noformat} If you were to run 10 map task each generating 100M, then this should generate 1B with all nodes referenced. Minimizing the number of unreferenced is ideal, because the test can not detect the loss of unreferenced nodes. I should probably add this info to the readme. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252502#comment-13252502 ] Keith Turner commented on HBASE-5754: - When Eric ran 10 generators each adding 100M against HBase and no data was lost, he saw 1B referenced, 0 unref, and 0 undef. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250769#comment-13250769 ] Keith Turner commented on HBASE-5487: - Accumulo 1.3 cannot not survive running our random walk test w/ the agitator (a perl script that kills accumulo processes, it not as devious as Todd's gremlins). Random walk + Agitation + Accumulo 1.3 == foobar Attempting the above would leave Accumulo in an inconsistent state (like corrupted metadata table) or test clients would die with unexpected exceptions. My point is that while developing FATE it was nice to have Random Walk + Agitation to really beat up the FATE framework and the FATE table operations. We also wrote some new random walk test for 1.4 that were even meaner. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Labels: noob Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250770#comment-13250770 ] Keith Turner commented on HBASE-5487: - To add context to my above comment, Accumulo 1.3 does not have FATE it was introduced in 1.4. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Labels: noob Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251017#comment-13251017 ] Keith Turner commented on HBASE-5754: - You may run into GORA-116, a bug in the gora-hbase store. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time
[ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246443#comment-13246443 ] Keith Turner commented on HBASE-5479: - There is exception to what I said above. User requested compaction are still done depth first with an optimization. If a user request a tablet with 30 files compact, it will allocate a compaction thread to compact that tablet to one file. It still only does up to 10 files at time though. * compact 10 smallest, results in 21 files * compact 10 smallest, results in 12 files * compact 3 smallest, results in 10 files --this is the optimization to avoid redundant work * compact 10 smallest, results in 1 file Postpone CompactionSelection to compaction execution time - Key: HBASE-5479 URL: https://issues.apache.org/jira/browse/HBASE-5479 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Reporter: Matt Corgan It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created. The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection. The CompactionSelection should be created at compaction execution time rather than compaction request time. The current mechanism breaks down during high volume insertion. The inefficiency is clearest when the inserts are finished. Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue. When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old. This ends up re-compacting the same data many times. The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap. With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters. The only goal should be to reduce file count, not to minimize number of files merged for each read. There are other aspects to the current queuing mechanism that would need to be looked at. You would want to avoid having the same Store in the queue multiple times. And you would want the completion of one compaction to possibly queue another compaction request for the store. A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc. Then you create a CompactionPriorityComparator implements ComparatorStore and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes). The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority X. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245528#comment-13245528 ] Keith Turner commented on HBASE-4821: - I am an Accumulo developer, there is some cruft in our test dir. The two most successful cluster test we have are continuous ingest and random walk. We have found lots of bugs w/ these test. I wrote a Gora version of continuous ingest that should run against HBASE. The readme on github has a nice description. https://github.com/keith-turner/goraci/ The accumulo version of continuous ingest can be found here. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/continuous/ This dir contains an old set of open office slides that also give an overview of continuous ingest. At the end of the slides is the beginning of the idea of random walk test. I am not sure if we have a nice description of random walk anywhere. It is a fairly simple test framework. You write test nodes in Java and link the nodes together in a graph using XML. You start a test clients each node in a cluster. The test client just does a random walk of the test graph. We have found a ton of bugs in 1.3 and 1.4 using random walk. Actually the Accumulo features page may be the only place we give an overview of randomwalk. I noticed that our random walk readme only tells you how to run it, not what it is. Below is a link to the random walk test, but like I said its not very informative. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/randomwalk/ The actual Java code at the link below. The framework and test nodes code is all here. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/src/server/src/main/java/org/apache/accumulo/server/test/randomwalk/ The short description of randomwalk I mentioned is here. http://accumulo.apache.org/notable_features.html#testing If anyone is interested in generalizing random walk so that HBase could use it to, let me know. One last thing. We tested Accumulo for over a month on a 10 node cluster using Continuous ingest, Random Walk, and the Agitator. Below are some of the bugs we found during that time period. [Bugs found in 1.4 testing|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=labels+%3D+14_qa_bug] A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245545#comment-13245545 ] Keith Turner commented on HBASE-5487: - I am an Accumulo developer. CompactRange is our operation to force a range of tablets(regions) to major compact all of their files into one file. The TableRangeOp will merge a range of tablets into one tablet. TableRangeOp can also delete a range of row from a table efficiently. It inserts splits points at the rows you want to delete, drops the tablets, and then merges whats left. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Labels: noob Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245557#comment-13245557 ] Keith Turner commented on HBASE-4821: - Eric Newton has been experimenting with running goraci against HBase. One issue he ran into was that gora-hbase uses auto flush on every HTable connection. This really slowed down ingest. He modified the gora code locally so it would not do this. Eric posted a question on the gora user list asking why it behaved this way. The Gora API has a flush call. A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245596#comment-13245596 ] Keith Turner commented on HBASE-5487: - The description of FATE given in this ticket is pretty good. The following resources may provide a little more info. http://people.apache.org/~kturner/accumulo14_15.pdf http://mail-archives.apache.org/mod_mbox/incubator-accumulo-dev/201202.mbox/%3CCAGUtCHpcHTDue-C_2RyDkZm0diW=zojd7-bzcgszqdtidzn...@mail.gmail.com%3E Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Labels: noob Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-643) Rename tables
[ https://issues.apache.org/jira/browse/HBASE-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245620#comment-13245620 ] Keith Turner commented on HBASE-643: Accumulo supports this feature by using table ids. Tables ids are generated using zookeeper and are never reused (base 36 numbers are used to keep them short and readable). A mapping from table id to table name is stored in zookeeper. To rename a table, lock the table and change the mapping in zookeeper. Accumulo used to not use table ids, it stored the table name in meta and hdfs. Now it uses the table id in hdfs and meta. We were discussing renaming tables, and it seemed so complicated. Then someone thought of this table id solution, it was such an elegant solution and made the problem trivial. Although table ids were implemented to support table renaming, they had the nice side effect of making hdfs and meta entries much shorter. Rename tables - Key: HBASE-643 URL: https://issues.apache.org/jira/browse/HBASE-643 Project: HBase Issue Type: New Feature Reporter: Michael Bieniosek Attachments: copy_table.rb, rename_table.rb It would be nice to be able to rename tables, if this is possible. Some of our internal users are doing things like: upload table mytable - realize they screwed up - upload table mytable_2 - decide mytable_2 looks better - have to go on using mytable_2 instead of originally desired table name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245705#comment-13245705 ] Keith Turner commented on HBASE-4821: - I noticed an early comment about Python code in the Accumulo test dir. This is code in test/auto and we call these functional test. This code is probably similar to some HBase unit test. It supports test that run against a live instance of Accumulo. The test framework starts an instance of Accumulo, runs a python or JAva test against that instance, and then shuts the instance down. Running all of the functional test takes 1 to 2 hours. This test framework was written before random walk and it ensures basic functionality works. For example theres a test to verify that adding split points to a table continues to work. Since we have implemented random walk, I have found myself writing a lot more random walk test and less functional test. The reason for this is that the functional test usually test the feature when the system is one state, where as random walk test the same feature with the system in many different states. For example a random walk test that adds splits points to a table will try to do that when the table and system are in many different states. It may try to add the split when a tablet/region is migrating, currently splitting, minor compacting, major compacting, offline, etc. So the likelyhood of finding a bug with addsplits() in randomwalk is much greater than the functional test. The functional test will detect if the feature is completely broken, random walk can detect if the feature is broken under certain circumstances. A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5090) Allow the user to specify inclusiveness of start/end rows on Scan.
[ https://issues.apache.org/jira/browse/HBASE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178930#comment-13178930 ] Keith Turner commented on HBASE-5090: - One way to achieve this with the current API is to append a binary zero to the row. This is the next possible row in sorted order. Doing this will make the start row exclusive or the end row inclusive. Allow the user to specify inclusiveness of start/end rows on Scan. -- Key: HBASE-5090 URL: https://issues.apache.org/jira/browse/HBASE-5090 Project: HBase Issue Type: Improvement Reporter: Ioannis Canellos Currently Scans handle start/end rows with the following manner: startRow - row to start scanner at or after (inclusive) stopRow - row to stop scanner before (exclusive) It would be great if the user could be able to specify the inclussiveness/exclusiveness. For example 2 those two additional methods would be really needed: public Scan setStartRow(byte[] startRow, boolean inclussive); public Scan setEndRow(byte[] startRow, boolean inclussive); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira