[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406128#comment-13406128 ] Mikhail Bautin commented on HBASE-5104: --- Committed. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_12_43_28.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_15_15_30.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405471#comment-13405471 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12534463/jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_15_15_30.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.security.token.TestZKSecretWatcher org.apache.hadoop.hbase.master.TestHMasterRPCException Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2312//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2312//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2312//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2312//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_12_43_28.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_15_15_30.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397242#comment-13397242 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12532637/jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2194//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2194//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277055#comment-13277055 ] Zhihong Yu commented on HBASE-5104: --- I stumbled on the following test failure twice (with D2799.6.patch on MacBook): {code} testExecDeserialization(org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint) Time elapsed: 0.028 sec ERROR! java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readUTF(DataInputStream.java:592) at java.io.DataInputStream.readUTF(DataInputStream.java:547) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:120) at org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization(TestCoprocessorEndpoint.java:201) {code} Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277168#comment-13277168 ] Phabricator commented on HBASE-5104: Liyin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. LGTM ! Thanks Mikhail ! REVISION DETAIL https://reviews.facebook.net/D2799 To: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA, jxcn01, mbautin Cc: jxcn01, Liyin Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277182#comment-13277182 ] Phabricator commented on HBASE-5104: stack has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. lgtm INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:212 Will this be accurate if rows are inserted meantime (or deleted?). src/main/java/org/apache/hadoop/hbase/client/Get.java:201 This is great. One day we should do size-based too. src/main/java/org/apache/hadoop/hbase/client/Get.java:472 Why not just write out our version as 3? To save some bytes on wire? src/main/java/org/apache/hadoop/hbase/client/Scan.java:102 Doesn't Scan and Get share common ancestor? src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java:647 THanks for doing this. src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java:452 You need to add below to each classified test for classification to work @org.junit.Rule public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu = new org.apache.hadoop.hbase.ResourceCheckerJUnitRule(); REVISION DETAIL https://reviews.facebook.net/D2799 To: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA, jxcn01, mbautin Cc: jxcn01, Liyin Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277188#comment-13277188 ] Phabricator commented on HBASE-5104: madhuvaidya has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:472 This was done to maintain inter-op if we are not using either the storeLimit or the storeOffset. REVISION DETAIL https://reviews.facebook.net/D2799 To: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA, jxcn01, mbautin Cc: jxcn01, Liyin Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273743#comment-13273743 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526586/D2799.6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 27 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1853//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1853//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1853//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273747#comment-13273747 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Could someone please accept this? This diff is almost a month old. REVISION DETAIL https://reviews.facebook.net/D2799 To: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA, jxcn01, mbautin Cc: jxcn01 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, D2799.6.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270027#comment-13270027 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. All unit tests pass, except TestCoprocessorEndpoint, which fails without this patch too. Could someone take another look and accept? Thanks! REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269115#comment-13269115 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Michael, Jimmy: thanks for reviewing! See my responses inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:386 Done. src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:387 Done. src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:931 Done. src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:932 Done. src/main/java/org/apache/hadoop/hbase/client/Scan.java:638 Done. src/main/java/org/apache/hadoop/hbase/client/Get.java:471 Done. src/main/protobuf/Client.proto:49 Done. src/main/protobuf/Client.proto:50 Done. src/main/protobuf/Client.proto:199 Done. src/main/protobuf/Client.proto:200 Done. src/test/java/org/apache/hadoop/hbase/HTestConst.java:18 This is not a test, this is a collection of constants used in tests. I tried to save some typing, because the intended usage pattern is HTestConst.DEFAULT_{TABLE,CF,ROW,etc}... However, if you feel strongly about it, I can rename it to HTestConstants. src/test/java/org/apache/hadoop/hbase/client/TestIntraRowPagination.java:60 Added region.close(). I am assuming that takes care of closing the HLog (correct me if I'm wrong). src/main/java/org/apache/hadoop/hbase/client/Get.java:212 Yes, this offset is only within a particular (row, CF) combination. It gets reset back to zero when we move to the next row/CF. Added this to javadoc. src/main/java/org/apache/hadoop/hbase/client/Result.java:177 Got rid of this method. REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269122#comment-13269122 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525758/D2799.5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1782//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1782//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1782//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, D2799.5.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266246#comment-13266246 ] Phabricator commented on HBASE-5104: stack has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Good stuff. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:212 How does this work? We have to run through the row by column family? We set this offset back to zero when we move to a new column family on a row? src/main/java/org/apache/hadoop/hbase/client/Result.java:177 Why we need this? Isn't the Result sorted already? If not, its a bug. src/test/java/org/apache/hadoop/hbase/HTestConst.java:18 Why is this test not called HTestConstants.java src/test/java/org/apache/hadoop/hbase/client/TestIntraRowPagination.java:60 Should close the region when done and close out its hlog. REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266252#comment-13266252 ] Zhihong Yu commented on HBASE-5104: --- I got the same error reported by Hadoop QA when trying to apply the patch: {code} patch: malformed patch at line 285: Index: src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java {code} Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261939#comment-13261939 ] Phabricator commented on HBASE-5104: jxcn01 has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Looks good, just some minor things. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:386 Can we set it only if scan.getMaxResultsPerColumnFamily() = 0? src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:387 Can we set it only if the offset is 0? src/main/java/org/apache/hadoop/hbase/client/Scan.java:638 Can we check: this.storeOffset 0 || this.storeLimit -1? I assume the offset should be position, and store limit is non-negative. The other choice is to add some checking in the corresponding set methods. src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:931 ditto src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:932 ditto src/main/protobuf/Client.proto:49 uint32 should be better, with no default. If it is not set, then it is -1. src/main/protobuf/Client.proto:50 uint32 is preferred, with no default. If it is not set, then it is 0. src/main/protobuf/Client.proto:199 ditto src/main/protobuf/Client.proto:200 ditto src/main/java/org/apache/hadoop/hbase/client/Get.java:471 ditto REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261941#comment-13261941 ] Jimmy Xiang commented on HBASE-5104: I commented on phabricator. Looks good to me, just some minor things. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261168#comment-13261168 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524087/D2799.4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1635//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261198#comment-13261198 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. All unit tests have passed. Could someone familiar with the protobuf stuff in trunk please take a look and accept? REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, D2799.4.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259787#comment-13259787 ] Phabricator commented on HBASE-5104: madhuvaidya has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. LGTM (at least all the non-protocol buffer related stuff). REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258732#comment-13258732 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Ping. REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258742#comment-13258742 ] Zhihong Yu commented on HBASE-5104: --- The previous Hadoop QA run stumbled over TestLoadIncrementalHFilesSplitRecovery Please resubmit patch for QA. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254820#comment-13254820 ] Zhihong Yu commented on HBASE-5104: --- Patch didn't apply cleanly: {code} /usr/bin/patch: malformed patch at line 285: Index: src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java {code} Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255008#comment-13255008 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522843/jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254498#comment-13254498 ] Phabricator commented on HBASE-5104: mbautin has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:470 Removed. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:395 Done. src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide2.java:38 Done. REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254501#comment-13254501 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522735/D2799.3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1535//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253971#comment-13253971 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522655/D2799.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1522//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1522//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1522//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253979#comment-13253979 ] Hadoop QA commented on HBASE-5104: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522665/D2799.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1523//console This message is automatically generated. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253981#comment-13253981 ] Phabricator commented on HBASE-5104: tedyu has commented on the revision [jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism. Nice work. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:470 The assignment isn't needed here, right ? src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:395 rowOffset - storeOffset src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide2.java:38 Add test category, please. REVISION DETAIL https://reviews.facebook.net/D2799 Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190892#comment-13190892 ] Lars Hofhansl commented on HBASE-5104: -- After spending some time thinking about HBASE-5229, I think this use case can be addressed by (1) Allowing ColumnPaginationFilter to wrap another filter (similar to WhileMatchFilter) and (2) Allowing filter to be optionally evaluated after we handled versions. For #2 either each filter could carry a flag, or we have another filter wrapper to indicate after versioning evaluation. I realize I sound like a broken record, but that would handle a more general set of use cases (including this one, but correct me if I am wrong, Kannan), and also avoid adding special case API to the scanning API. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181546#comment-13181546 ] Kannan Muthukkaruppan commented on HBASE-5104: -- @Lars/Stack: In 89-fb, we had done this for adding a reliable limit mechanism (this works per-CF/per-row). Madhu had implemented this. The rev is here: http://svn.apache.org/viewvc?view=revisionrevision=1181562. [I don't think this is ported to trunk yet.] We were thinking of extending/doing something similar for offset. Lars: The startColumn type of approach doesn't work for cases for example when you are using a ColumnValueFilter instead of filter based on column names. [See my previous post.] Already, when we specify attributes such as timerange() or add a CF or specific column names, it applies to each row. So one way to think of this is that limit/offset are also applicable within each row the Scan encounters. Most folks are going to use it for Get (single row scans), but there is no need to preclude the functionality from a multi-row Scan either. This is the API that was added in 89-fb: {code} /** * Set the maximum number of values to return per row per Column Family * @param limit the maximum number of values returned / row / CF */ public void setMaxResultsPerColumnFamily(int limit) {code} The thought was we could add something like: {code} /** * Skip offset number of values to return per row per Column Family * @param offset number of values to be skipped per row / CF */ public void setOffsetPerColumnFamily(int offset) {code} Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181020#comment-13181020 ] Kannan Muthukkaruppan commented on HBASE-5104: -- I just had a discussion today about how it would be nice if one could start a scanner at a certain column prefix within a certain row and also set a stop column prefix with in a row. (i.e. not using a filter). Why would you not want to use a filter for this case? ColumnRangeFilter() handles this case nicely correct? Alternatively, there was some discussion about startRow/stopRow also allowing to specify a CF/columm. Would that work here? It would allow precise placement of a scan and might be a relatively simple change with more general applicability. This may not work for many cases. How do I, for instance say, get me the next 5 KVs in a particular row whose value is X (note: here the filter is on column value rather than column name; assume you are using the SingleColumnValueFilter()). I think limit/offset is a simple/well understood concept that we should support in a clean way. Scan/Get API seems like a good place to do it. What is the concern with adding the capability there? Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181053#comment-13181053 ] Lars Hofhansl commented on HBASE-5104: -- It seems you have a very specific usecase. A limit/offset API that is column based on an API that is inherently row based (scanner.next) will be hard to understand for users. The problem here seems to be that scanner.startRow and scanner.next do not provide enough granularity. I'm not opposed to limit/offset (but I will be interested to see how you will document that API, to make is understandable to users :) ). What about a nextColumn method on scanner along with a startColumn method? Anyway... I just want to make sure we do not add API for specific cases, and I'll shut up about it now. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira