[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1370#comment-1370 ] Viral Bajaria commented on HBASE-9079: -- Sorry for the delay. I did a local test with production data and it looks fine. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733946#comment-13733946 ] Viral Bajaria commented on HBASE-9079: -- Thanks for all the help [~ted_yu] and [~lhofhansl] to get this patch cleaned up and integrated! Look forward to contributing more. You guys rock! FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734075#comment-13734075 ] Hudson commented on HBASE-9079: --- SUCCESS: Integrated in HBase-0.94-security #249 (See [https://builds.apache.org/job/HBase-0.94-security/249/]) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734086#comment-13734086 ] Hudson commented on HBASE-9079: --- SUCCESS: Integrated in HBase-0.94 #1100 (See [https://builds.apache.org/job/HBase-0.94/1100/]) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734181#comment-13734181 ] Hudson commented on HBASE-9079: --- FAILURE: Integrated in HBase-TRUNK #4357 (See [https://builds.apache.org/job/HBase-TRUNK/4357/]) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734195#comment-13734195 ] Hudson commented on HBASE-9079: --- FAILURE: Integrated in hbase-0.95 #418 (See [https://builds.apache.org/job/hbase-0.95/418/]) HBASE-9079 Missed new test (larsh: rev 1512020) * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019) * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734293#comment-13734293 ] Hudson commented on HBASE-9079: --- FAILURE: Integrated in hbase-0.95-on-hadoop2 #225 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/225/]) HBASE-9079 Missed new test (larsh: rev 1512020) * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019) * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734310#comment-13734310 ] Hudson commented on HBASE-9079: --- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #658 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/658/]) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732119#comment-13732119 ] Lars Hofhansl commented on HBASE-9079: -- Did you get a chance to test this with real data, [~viralbajaria]? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732627#comment-13732627 ] Viral Bajaria commented on HBASE-9079: -- Not yet, got caught in something else. Will get to it before EOD (PST) for sure. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733117#comment-13733117 ] Lars Hofhansl commented on HBASE-9079: -- ping :) It's also fine to push into next month's 0.94.12. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Assignee: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731116#comment-13731116 ] Lars Hofhansl commented on HBASE-9079: -- So thinking about this again. Why can't we take the largest of any seekHint when the *and* all the filters together in a FilterList? In you case if you add the filters to the list in a different order, will your patch still work? Is the actual problem here that a Filter returns a KV from getNextKeyHint even when it would not have return SEEK_NEXT_USING_HINT? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731130#comment-13731130 ] Viral Bajaria commented on HBASE-9079: -- If we take the largest seekHint, the problem can be explained as follows: - FilterList with FuzzyRow + ColumnRange with MUST_PASS_ALL. - FuzzyRow includes the filter and says move on to ColumnRange. - ColumnRange says first column is not a match and I can give you a better seekHint - FilterList calls seekHint on both FuzzyRow and ColumnRange. FuzzyRow returns the next row that we should use while ColumnRange returns the next column from the originally selected row. If we keep max here then we move on to the next row and do no return the columns from the row that should have been returned. The test case proves that this is what happened originally (though I removed the TestFail.patch due to some Hadoop QA issues) Yes the current changes work even if you change the ordering of filters. The test in the patch verified that behavior too. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731276#comment-13731276 ] Lars Hofhansl commented on HBASE-9079: -- I see. But what you hit a scenario where the FuzzyRowFilter would also return SEEK_NEXT_USING_HINT and it happens to be first in the filter list? In that case you'd still seek past columns that should be included. So the problem is that SEEK_NEXT_USING_HINT is not transitive to following columns/row. It seems the only 100% safe way to do this in a FilterList is treat any seek optimization as a simple SKIP. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731387#comment-13731387 ] Viral Bajaria commented on HBASE-9079: -- The current patch only calls the filter that gave the SEEK_NEXT_USING_HINT, we don't go through all the filters in the FilterList if the operator is MUST_PASS_ALL. For MUST_PASS_ONE, the logic is to select the minimum of the hints and thus we will not skip the rows/columns even if one of the filters suggests to jump over since we are going to keep the minimum. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731447#comment-13731447 ] Lars Hofhansl commented on HBASE-9079: -- From your example the problem is (1) FuzzyRow includes the filter and says move on to ColumnRange paired with (2) FuzzyRow returns the next row that we should use. Even though the FuzzyRowInclude said we should include the row the call to getNextKeyHint() returns a non-null KV. So from that angle we should only consult the filters that we actually called, which your patch does correctly. Now, what if FuzzyRowFilter had been the one that returned SEEK_NEXT_USING_HINT. Then that filter would be the one to provide the getNextKeyHint as well, and might skip right past the KV you want for the ColumnRangeFilter. You are saying in that case FuzzyRowFilter did not INCLUDE it, and thus it would be correct use its getNextKeyHint? And because we're short-circuiting the and when we encounter SEEK_NEXT_USING_HINT, we can safely jump to that KV. OK. Seems that is correct. Just making sure... In that case I only have one further comment: getNextKeyHint on FilterList is only called when SEEK_NEXT_USING_HINT is returned. If this FilterList is MUST_PASS_ALL the seekHintFilter must not be null, correct? So we could simplify like this: {code} @@ -332,9 +337,15 @@ @Override public KeyValue getNextKeyHint(KeyValue currentKV) { KeyValue keyHint = null; +if (operator == Operator.MUST_PASS_ALL) { + keyHint = seekHintFilter.getNextKeyHint(currentKV); + return keyHint; +} + for (Filter filter : filters) { KeyValue curKeyHint = filter.getNextKeyHint(currentKV); - if (curKeyHint == null operator == Operator.MUST_PASS_ONE) { + if (curKeyHint == null) { // If we ever don't have a hint and this is must-pass-one, then no hint return null; } @@ -344,13 +355,7 @@ keyHint = curKeyHint; continue; } -// There is an existing hint -if (operator == Operator.MUST_PASS_ALL -KeyValue.COMPARATOR.compare(keyHint, curKeyHint) 0) { - // If all conditions must pass, we can keep the max hint - keyHint = curKeyHint; -} else if (operator == Operator.MUST_PASS_ONE -KeyValue.COMPARATOR.compare(keyHint, curKeyHint) 0) { +if (KeyValue.COMPARATOR.compare(keyHint, curKeyHint) 0) { // If any condition can pass, we need to keep the min hint keyHint = curKeyHint; } {code} And then also reset the seekHintFilter in the reset() method. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731490#comment-13731490 ] Lars Hofhansl commented on HBASE-9079: -- Then again this breaks TestFilterList.testHintPassThru. But the breaking part should be removed as it tests that MUST_PASS_ALL returns the larger of the key hints, which is no longer the case. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731571#comment-13731571 ] Ted Yu commented on HBASE-9079: --- I ran Filter related tests and they passed. {code} for (Filter filter : filters) { KeyValue curKeyHint = filter.getNextKeyHint(currentKV); - if (curKeyHint == null operator == Operator.MUST_PASS_ONE) { + if (curKeyHint == null) { // If we ever don't have a hint and this is must-pass-one, then no hint {code} nit: maybe lift the comment about must-pass-one before the for loop. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731575#comment-13731575 ] Hadoop QA commented on HBASE-9079: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596477/9079-trunk-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//console This message is automatically generated. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731612#comment-13731612 ] Lars Hofhansl commented on HBASE-9079: -- Will move the comment upon commit. So Ted and Viral, you both good with this? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731621#comment-13731621 ] Viral Bajaria commented on HBASE-9079: -- Looks good to me. I have applied the patch to my local repo and will test with real data in a bit. Will provide an update after that (hopefully before tomorrow). FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Fix For: 0.98.0, 0.95.2, 0.94.11 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729738#comment-13729738 ] Viral Bajaria commented on HBASE-9079: -- [~lhofhansl] Can you review the patch when you get a chance ? I have already deployed this to my production cluster and have not had any issues. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726775#comment-13726775 ] Viral Bajaria commented on HBASE-9079: -- (pressed enter too soon when attaching file... no easy way to edit a comment) I have uploaded a new patch for trunk after refreshing my workspace. I think the switch between branches wasn't clean for me when I did it the first time. The current patch should work fine on trunk too. I also cleaned up the TODO comment since there is no Configuration object anymore in FilterList. Also cleaned up the typo in the javadocs for areSerializedFieldsEqual() FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726777#comment-13726777 ] Ted Yu commented on HBASE-9079: --- There're a few long lines in test: {code} +ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter(Bytes.toBytes(cqStart), true, Bytes.toBytes(4), true); {code} [~lhofhansl]: What do you think of latest patch ? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727020#comment-13727020 ] Hadoop QA commented on HBASE-9079: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595494/HBASE-9079-0.94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6561//console This message is automatically generated. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727039#comment-13727039 ] Viral Bajaria commented on HBASE-9079: -- Removed the TestFail and TestSuccess patches which were only here to demonstrate what was breaking. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727245#comment-13727245 ] Hadoop QA commented on HBASE-9079: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595522/HBASE-9079-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//console This message is automatically generated. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725684#comment-13725684 ] Viral Bajaria commented on HBASE-9079: -- Uploaded patch for both 0.94 and trunk. Interestingly 0.94 FilterList and trunk FilterList are not in sync. Is that expected ? I added the test to trunk too and tested it too. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725776#comment-13725776 ] Hadoop QA commented on HBASE-9079: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595268/HBASE-9079-0.94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6548//console This message is automatically generated. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725805#comment-13725805 ] Ted Yu commented on HBASE-9079: --- @Viral: The difference you saw in FilterList was due to HBASE-8847 which went into 0.94.10 {code} - private KeyValue transformedKV = null; {code} Please refresh your workspace and put your changes while keeping HBASE-8847 FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725987#comment-13725987 ] Viral Bajaria commented on HBASE-9079: -- Is it not a good idea to work from the github.com branches ? I was working off the latest 0.94 branch and did a pull again but don't see the changes that HBASE-8847 made to it. What am I missing ? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725991#comment-13725991 ] Ted Yu commented on HBASE-9079: --- That's strange. I saw the changes from https://issues.apache.org/jira/secure/attachment/12592491/HBASE-8847.base%3D0.94.diff in 0.94 branch. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725995#comment-13725995 ] Viral Bajaria commented on HBASE-9079: -- Oh wait! I do see all those changes on 0.94 branch, I don't see those changes on trunk right now. Which is why I said that FilterList on trunk is not in sync with that on 0.94 FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726025#comment-13726025 ] Ted Yu commented on HBASE-9079: --- Here was the checkin for trunk: {code} r1499851 | tedyu | 2013-07-04 12:59:24 -0700 (Thu, 04 Jul 2013) | 3 lines HBASE-8847 Filter.transform() always applies unconditionally, even when combined in a FilterList (Christophe Taton) {code} FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723981#comment-13723981 ] Ted Yu commented on HBASE-9079: --- For trunk, can we introduce the following method to Filter: {code} public KeyHintType getKeyHintType() { {code} where KeyHintType is an enum that can carry NONE, ROW or COL. FilterList can poll the Filters and reorder them. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724652#comment-13724652 ] Viral Bajaria commented on HBASE-9079: -- What kind of re-ordering will you do ? Isn't the re-ordering more dependent on what kind of ordering the user wants in the FilterList ? i.e. apply my PrefixFilter first, then FuzzyRow then ColumnRange. If the user says apply PrefixFilter, then ColumnRange and then FuzzyRow should we not preserve that ordering ? I also take my words back on the issue existing in the current code. I think it does not because for MUST_PASS_ONE it keeps the min rowkey as the hint while for MUST_PASS_ALL it keeps the max. Maybe I could limit the scope of this ticket to MUST_PASS_ALL and keep MUST_PASS_ONE as-is ? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724719#comment-13724719 ] Ted Yu commented on HBASE-9079: --- bq. Maybe I could limit the scope of this ticket to MUST_PASS_ALL Fine with me. FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723166#comment-13723166 ] Ted Yu commented on HBASE-9079: --- For TestFuzzyAndColumnRangeFilter, please add license. Can you provide trunk patch so that we can let Hadoop QA run through it ? {code} +FilterList filterList = new FilterList(Lists.FilternewArrayList(fuzzyRowFilter, columnRangeFilter)); {code} Can you alter the order of the two filters above so that we know the correctness isn't dependent on ordering of the Filters ? Meaning both orders are tested. Indentation is off - it should be two spaces for each level of indentation. {code} +LOG.info(Got rk: + Bytes.toStringBinary(kv.getRow()) + cq: + Bytes.toStringBinary(kv.getQualifier())); {code} Length limit should be 100 per line. In getNextKeyHint(): {code} for (Filter filter : filters) { + if (seekHintFilter != null seekHintFilter != filter) { +//get hint from the filter that was responsible for the +//SEEK_NEXT_USING_HINT code +continue; {code} Does the above if block mean that only one Filter which provides seek hint would be respected ? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723193#comment-13723193 ] Lars Hofhansl commented on HBASE-9079: -- I also have a question about this: {code} + if (seekHintFilter != null seekHintFilter != filter) { +//get hint from the filter that was responsible for the +//SEEK_NEXT_USING_HINT code +continue; + } {code} As Ted asks... It seems only one filter should provide the hint. Can we turn this around and return {{filter.getNextKeyHint(...)}} if {{seekFilter == filter}}? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723305#comment-13723305 ] Viral Bajaria commented on HBASE-9079: -- I will upload a new patch with the fixes that Ted pointed out. [~te...@apache.org] When you say trunk patch you mean against the 0.95/0.96 tree ? Regards Lars comment on turning it around to ==, I could move it to the following prior to even running the for loop: {code} if (seekHintFilter != null) { return seekHintFilter.getNextKeyHint(); } {code} Regarding the ordering, I think the issue will be when operator is MUST_PASS_ONE and both filters want to give you a SEEK_HINT but one of them is operating at the row level while the other is operating at the column level. For example, if ColumnRange comes before FuzzyRow and operator is MUST_PASS_ONE, we will iterate through both the filters filterKeyValue method and keep the state returned from FuzzyRow and not from ColumnRange. I think this issue exists in current code too since we go through each filter and keep the max row. Personally I feel it's not a good use-case to make a FilterList with one filter operating at the row level and another at the column level and asking the operator to be MUST_PASS_ONE. That's almost like saying that keep a column even if row does not match. Any suggestions on what should be done here ? FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results
[ https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723343#comment-13723343 ] Ted Yu commented on HBASE-9079: --- w.r.t. ordering, there is no method in FilterBase which can tell us whether the hint provider operates at row or column level. For 0.96 / trunk, we may add such a method so that FilterList can (re)order the Filters accordingly. For 0.94, we can provide documentation on this aspect so that user can register Filters in correct order. bq. you mean against the 0.95/0.96 tree ? Yes. I meant patch against trunk. Thanks FilterList getNextKeyHint skips rows that should be included in the results --- Key: HBASE-9079 URL: https://issues.apache.org/jira/browse/HBASE-9079 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.10 Reporter: Viral Bajaria Attachments: TestFail.patch, TestSuccess.patch I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira