[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1370#comment-1370
 ] 

Viral Bajaria commented on HBASE-9079:
--

Sorry for the delay. I did a local test with production data and it looks fine.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733946#comment-13733946
 ] 

Viral Bajaria commented on HBASE-9079:
--

Thanks for all the help [~ted_yu] and [~lhofhansl] to get this patch cleaned up 
and integrated! Look forward to contributing more. You guys rock!

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734075#comment-13734075
 ] 

Hudson commented on HBASE-9079:
---

SUCCESS: Integrated in HBase-0.94-security #249 (See 
[https://builds.apache.org/job/HBase-0.94-security/249/])
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512021)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734086#comment-13734086
 ] 

Hudson commented on HBASE-9079:
---

SUCCESS: Integrated in HBase-0.94 #1100 (See 
[https://builds.apache.org/job/HBase-0.94/1100/])
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512021)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734181#comment-13734181
 ] 

Hudson commented on HBASE-9079:
---

FAILURE: Integrated in HBase-TRUNK #4357 (See 
[https://builds.apache.org/job/HBase-TRUNK/4357/])
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512018)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734195#comment-13734195
 ] 

Hudson commented on HBASE-9079:
---

FAILURE: Integrated in hbase-0.95 #418 (See 
[https://builds.apache.org/job/hbase-0.95/418/])
HBASE-9079 Missed new test (larsh: rev 1512020)
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512019)
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734293#comment-13734293
 ] 

Hudson commented on HBASE-9079:
---

FAILURE: Integrated in hbase-0.95-on-hadoop2 #225 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/225/])
HBASE-9079 Missed new test (larsh: rev 1512020)
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512019)
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734310#comment-13734310
 ] 

Hudson commented on HBASE-9079:
---

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #658 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/658/])
HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the 
results (Viral Bajaria and LarsH) (larsh: rev 1512018)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732119#comment-13732119
 ] 

Lars Hofhansl commented on HBASE-9079:
--

Did you get a chance to test this with real data, [~viralbajaria]?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-07 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732627#comment-13732627
 ] 

Viral Bajaria commented on HBASE-9079:
--

Not yet, got caught in something else. Will get to it before EOD (PST) for sure.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733117#comment-13733117
 ] 

Lars Hofhansl commented on HBASE-9079:
--

ping :)

It's also fine to push into next month's 0.94.12.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
Assignee: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731116#comment-13731116
 ] 

Lars Hofhansl commented on HBASE-9079:
--

So thinking about this again. Why can't we take the largest of any seekHint 
when the *and* all the filters together in a FilterList?
In you case if you add the filters to the list in a different order, will your 
patch still work?

Is the actual problem here that a Filter returns a KV from getNextKeyHint even 
when it would not have return SEEK_NEXT_USING_HINT?


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731130#comment-13731130
 ] 

Viral Bajaria commented on HBASE-9079:
--

If we take the largest seekHint, the problem can be explained as follows:

- FilterList with FuzzyRow + ColumnRange with MUST_PASS_ALL.
- FuzzyRow includes the filter and says move on to ColumnRange.
- ColumnRange says first column is not a match and I can give you a better 
seekHint
- FilterList calls seekHint on both FuzzyRow and ColumnRange. FuzzyRow returns 
the next row that we should use while ColumnRange returns the next column from 
the originally selected row. If we keep max here then we move on to the next 
row and do no return the columns from the row that should have been returned. 
The test case proves that this is what happened originally (though I removed 
the TestFail.patch due to some Hadoop QA issues)

Yes the current changes work even if you change the ordering of filters. The 
test in the patch verified that behavior too.


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.12

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731276#comment-13731276
 ] 

Lars Hofhansl commented on HBASE-9079:
--

I see.

But what you hit a scenario where the FuzzyRowFilter would also return 
SEEK_NEXT_USING_HINT and it happens to be first in the filter list? In that 
case you'd still seek past columns that should be included. So the problem is 
that SEEK_NEXT_USING_HINT is not transitive to following columns/row.

It seems the only 100% safe way to do this in a FilterList is treat any seek 
optimization as a simple SKIP.


 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.12

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731387#comment-13731387
 ] 

Viral Bajaria commented on HBASE-9079:
--

The current patch only calls the filter that gave the SEEK_NEXT_USING_HINT, we 
don't go through all the filters in the FilterList if the operator is 
MUST_PASS_ALL.

For MUST_PASS_ONE, the logic is to select the minimum of the hints and thus we 
will not skip the rows/columns even if one of the filters suggests to jump over 
since we are going to keep the minimum.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.12

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731447#comment-13731447
 ] 

Lars Hofhansl commented on HBASE-9079:
--

From your example the problem is (1) FuzzyRow includes the filter and says 
move on to ColumnRange paired with (2) FuzzyRow returns the next row that we 
should use.
Even though the FuzzyRowInclude said we should include the row the call to 
getNextKeyHint() returns a non-null KV.

So from that angle we should only consult the filters that we actually called, 
which your patch does correctly.

Now, what if FuzzyRowFilter had been the one that returned 
SEEK_NEXT_USING_HINT. Then that filter would be the one to provide the 
getNextKeyHint as well, and might skip right past the KV you want for the 
ColumnRangeFilter. You are saying in that case FuzzyRowFilter did not INCLUDE 
it, and thus it would be correct use its getNextKeyHint?

And because we're short-circuiting the and when we encounter 
SEEK_NEXT_USING_HINT, we can safely jump to that KV. OK. Seems that is correct. 
Just making sure...

In that case I only have one further comment:
getNextKeyHint on FilterList is only called when SEEK_NEXT_USING_HINT is 
returned. If this FilterList is MUST_PASS_ALL the seekHintFilter must not be 
null, correct? So we could simplify like this:
{code}
@@ -332,9 +337,15 @@
   @Override
   public KeyValue getNextKeyHint(KeyValue currentKV) {
 KeyValue keyHint = null;
+if (operator == Operator.MUST_PASS_ALL) {
+  keyHint = seekHintFilter.getNextKeyHint(currentKV);
+  return keyHint;
+}
+
 for (Filter filter : filters) {
   KeyValue curKeyHint = filter.getNextKeyHint(currentKV);
-  if (curKeyHint == null  operator == Operator.MUST_PASS_ONE) {
+  if (curKeyHint == null) {
 // If we ever don't have a hint and this is must-pass-one, then no hint
 return null;
   }
@@ -344,13 +355,7 @@
   keyHint = curKeyHint;
   continue;
 }
-// There is an existing hint
-if (operator == Operator.MUST_PASS_ALL 
-KeyValue.COMPARATOR.compare(keyHint, curKeyHint)  0) {
-  // If all conditions must pass, we can keep the max hint
-  keyHint = curKeyHint;
-} else if (operator == Operator.MUST_PASS_ONE 
-KeyValue.COMPARATOR.compare(keyHint, curKeyHint)  0) {
+if (KeyValue.COMPARATOR.compare(keyHint, curKeyHint)  0) {
   // If any condition can pass, we need to keep the min hint
   keyHint = curKeyHint;
 }
{code}
And then also reset the seekHintFilter in the reset() method.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.12

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior 

[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731490#comment-13731490
 ] 

Lars Hofhansl commented on HBASE-9079:
--

Then again this breaks TestFilterList.testHintPassThru. But the breaking part 
should be removed as it tests that MUST_PASS_ALL returns the larger of the key 
hints, which is no longer the case.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.12

 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731571#comment-13731571
 ] 

Ted Yu commented on HBASE-9079:
---

I ran Filter related tests and they passed.
{code}
 for (Filter filter : filters) {
   KeyValue curKeyHint = filter.getNextKeyHint(currentKV);
-  if (curKeyHint == null  operator == Operator.MUST_PASS_ONE) {
+  if (curKeyHint == null) {
 // If we ever don't have a hint and this is must-pass-one, then no hint
{code}
nit: maybe lift the comment about must-pass-one before the for loop.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731575#comment-13731575
 ] 

Hadoop QA commented on HBASE-9079:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596477/9079-trunk-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6623//console

This message is automatically generated.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on 

[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731612#comment-13731612
 ] 

Lars Hofhansl commented on HBASE-9079:
--

Will move the comment upon commit. So Ted and Viral, you both good with this?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-06 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731621#comment-13731621
 ] 

Viral Bajaria commented on HBASE-9079:
--

Looks good to me. I have applied the patch to my local repo and will test with 
real data in a bit. Will provide an update after that (hopefully before 
tomorrow).

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Fix For: 0.98.0, 0.95.2, 0.94.11

 Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
 HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-05 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729738#comment-13729738
 ] 

Viral Bajaria commented on HBASE-9079:
--

[~lhofhansl] Can you review the patch when you get a chance ? I have already 
deployed this to my production cluster and have not had any issues.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-01 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726775#comment-13726775
 ] 

Viral Bajaria commented on HBASE-9079:
--

(pressed enter too soon when attaching file... no easy way to edit a comment)

I have uploaded a new patch for trunk after refreshing my workspace. I think 
the switch between branches wasn't clean for me when I did it the first time.

The current patch should work fine on trunk too. I also cleaned up the TODO 
comment since there is no Configuration object anymore in FilterList. Also 
cleaned up the typo in the javadocs for areSerializedFieldsEqual()

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726777#comment-13726777
 ] 

Ted Yu commented on HBASE-9079:
---

There're a few long lines in test:
{code}
+ColumnRangeFilter columnRangeFilter = new 
ColumnRangeFilter(Bytes.toBytes(cqStart), true, Bytes.toBytes(4), true);
{code}
[~lhofhansl]:
What do you think of latest patch ?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727020#comment-13727020
 ] 

Hadoop QA commented on HBASE-9079:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12595494/HBASE-9079-0.94.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6561//console

This message is automatically generated.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-01 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727039#comment-13727039
 ] 

Viral Bajaria commented on HBASE-9079:
--

Removed the TestFail and TestSuccess patches which were only here to 
demonstrate what was breaking.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727245#comment-13727245
 ] 

Hadoop QA commented on HBASE-9079:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12595522/HBASE-9079-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6562//console

This message is automatically generated.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and 

[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725684#comment-13725684
 ] 

Viral Bajaria commented on HBASE-9079:
--

Uploaded patch for both 0.94 and trunk. Interestingly 0.94 FilterList and trunk 
FilterList are not in sync. Is that expected ?

I added the test to trunk too and tested it too.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725776#comment-13725776
 ] 

Hadoop QA commented on HBASE-9079:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12595268/HBASE-9079-0.94.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6548//console

This message is automatically generated.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725805#comment-13725805
 ] 

Ted Yu commented on HBASE-9079:
---

@Viral:
The difference you saw in FilterList was due to HBASE-8847 which went into 
0.94.10
{code}
-  private KeyValue transformedKV = null;
{code}
Please refresh your workspace and put your changes while keeping HBASE-8847

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725987#comment-13725987
 ] 

Viral Bajaria commented on HBASE-9079:
--

Is it not a good idea to work from the github.com branches ? I was working off 
the latest 0.94 branch and did a pull again but don't see the changes that 
HBASE-8847 made to it.

What am I missing ?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725991#comment-13725991
 ] 

Ted Yu commented on HBASE-9079:
---

That's strange.
I saw the changes from 
https://issues.apache.org/jira/secure/attachment/12592491/HBASE-8847.base%3D0.94.diff
 in 0.94 branch.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725995#comment-13725995
 ] 

Viral Bajaria commented on HBASE-9079:
--

Oh wait! I do see all those changes on 0.94 branch, I don't see those changes 
on trunk right now. Which is why I said that FilterList on trunk is not in sync 
with that on 0.94

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726025#comment-13726025
 ] 

Ted Yu commented on HBASE-9079:
---

Here was the checkin for trunk:
{code}
r1499851 | tedyu | 2013-07-04 12:59:24 -0700 (Thu, 04 Jul 2013) | 3 lines

HBASE-8847 Filter.transform() always applies unconditionally, even when 
combined in a FilterList (Christophe Taton)
{code}

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch, 
 TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723981#comment-13723981
 ] 

Ted Yu commented on HBASE-9079:
---

For trunk, can we introduce the following method to Filter:
{code}
  public KeyHintType getKeyHintType() {
{code}
where KeyHintType is an enum that can carry NONE, ROW or COL.

FilterList can poll the Filters and reorder them.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-30 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724652#comment-13724652
 ] 

Viral Bajaria commented on HBASE-9079:
--

What kind of re-ordering will you do ? Isn't the re-ordering more dependent on 
what kind of ordering the user wants in the FilterList ? i.e. apply my 
PrefixFilter first, then FuzzyRow then ColumnRange. If the user says apply 
PrefixFilter, then ColumnRange and then FuzzyRow should we not preserve that 
ordering ?

I also take my words back on the issue existing in the current code. I think it 
does not because for MUST_PASS_ONE it keeps the min rowkey as the hint while 
for MUST_PASS_ALL it keeps the max. Maybe I could limit the scope of this 
ticket to MUST_PASS_ALL and keep MUST_PASS_ONE as-is ?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724719#comment-13724719
 ] 

Ted Yu commented on HBASE-9079:
---

bq. Maybe I could limit the scope of this ticket to MUST_PASS_ALL
Fine with me.

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723166#comment-13723166
 ] 

Ted Yu commented on HBASE-9079:
---

For TestFuzzyAndColumnRangeFilter, please add license.

Can you provide trunk patch so that we can let Hadoop QA run through it ?
{code}
+FilterList filterList = new 
FilterList(Lists.FilternewArrayList(fuzzyRowFilter, columnRangeFilter));
{code}
Can you alter the order of the two filters above so that we know the 
correctness isn't dependent on ordering of the Filters ?
Meaning both orders are tested.

Indentation is off - it should be two spaces for each level of indentation.
{code}
+LOG.info(Got rk:  + Bytes.toStringBinary(kv.getRow()) +  
cq:  + Bytes.toStringBinary(kv.getQualifier()));
{code}
Length limit should be 100 per line.

In getNextKeyHint():
{code}
 for (Filter filter : filters) {
+  if (seekHintFilter != null  seekHintFilter != filter) {
+//get hint from the filter that was responsible for the
+//SEEK_NEXT_USING_HINT code
+continue;
{code}
Does the above if block mean that only one Filter which provides seek hint 
would be respected ?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723193#comment-13723193
 ] 

Lars Hofhansl commented on HBASE-9079:
--

I also have a question about this:
{code}
+  if (seekHintFilter != null  seekHintFilter != filter) {
+//get hint from the filter that was responsible for the
+//SEEK_NEXT_USING_HINT code
+continue;
+  }
{code}

As Ted asks... It seems only one filter should provide the hint. Can we turn 
this around and return {{filter.getNextKeyHint(...)}} if {{seekFilter == 
filter}}?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-29 Thread Viral Bajaria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723305#comment-13723305
 ] 

Viral Bajaria commented on HBASE-9079:
--

I will upload a new patch with the fixes that Ted pointed out.

[~te...@apache.org] When you say trunk patch you mean against the 0.95/0.96 
tree ?

Regards Lars comment on turning it around to ==, I could move it to the 
following prior to even running the for loop:
{code}
if (seekHintFilter != null) {
  return seekHintFilter.getNextKeyHint();
}
{code}

Regarding the ordering, I think the issue will be when operator is 
MUST_PASS_ONE and both filters want to give you a SEEK_HINT but one of them is 
operating at the row level while the other is operating at the column level. 
For example, if ColumnRange comes before FuzzyRow and operator is 
MUST_PASS_ONE, we will iterate through both the filters filterKeyValue method 
and keep the state returned from FuzzyRow and not from ColumnRange. I think 
this issue exists in current code too since we go through each filter and keep 
the max row. 

Personally I feel it's not a good use-case to make a FilterList with one filter 
operating at the row level and another at the column level and asking the 
operator to be MUST_PASS_ONE. That's almost like saying that keep a column even 
if row does not match. Any suggestions on what should be done here ?

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9079) FilterList getNextKeyHint skips rows that should be included in the results

2013-07-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723343#comment-13723343
 ] 

Ted Yu commented on HBASE-9079:
---

w.r.t. ordering, there is no method in FilterBase which can tell us whether the 
hint provider operates at row or column level.

For 0.96 / trunk, we may add such a method so that FilterList can (re)order the 
Filters accordingly.

For 0.94, we can provide documentation on this aspect so that user can register 
Filters in correct order.

bq. you mean against the 0.95/0.96 tree ?

Yes. I meant patch against trunk.

Thanks

 FilterList getNextKeyHint skips rows that should be included in the results
 ---

 Key: HBASE-9079
 URL: https://issues.apache.org/jira/browse/HBASE-9079
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
 Attachments: TestFail.patch, TestSuccess.patch


 I hit a weird issue/bug and am able to reproduce the error consistently. The 
 problem arises when FilterList has two filters where each implements the 
 getNextKeyHint method.
 The way the current implementation works is, StoreScanner will call 
 matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
 turn will call filter.getNextKeyHint() which at this stage is of type 
 FilterList. The implementation in FilterList iterates through all the filters 
 and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
 FilterList in which only one of them implements getNextKeyHint. but if 
 multiple of them implement then that's where things get weird.
 For example:
 - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
 Both of them implement getNextKeyHint
 - wrap them in FilterList with MUST_PASS_ALL
 - FuzzyRowFilter will seek to the correct first row and then pass it to 
 ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
 - Now in FilterList when getNextKeyHint is called, it calls the one on 
 FuzzyRow first which basically says what the next row should be. While in 
 reality we want the ColumnRangeFilter to give the seek hint.
 - The above behavior skips data that should be returned, which I have 
 verified by using a RowFilter with RegexStringComparator.
 I updated the FilterList to maintain state on which filter returns the 
 SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
 filter and reset that state. I tested it with my current queries and it works 
 fine but I need to run the entire test suite to make sure I have not 
 introduced any regression. In addition to that I need to figure out what 
 should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
 should be any different.
 Is my understanding of it being a bug correct ? Or am I trivializing it and 
 ignoring something very important ? If it's tough to wrap your head around 
 the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira